Home → Translations → Programmers' theoretical minimum
Original material by Denis Ostapenko
Most apprentice software developers, especially those, who study in colleges, do not know how to develop their skills, and what they should know to work effectively. Surprisingly, everyday usage of products and technologies, invented by other developers and based on well-developed knowledge areas, does not give them an idea how they work, and how they are implemented.
Queueing theory and GSM-based mobile networks; PHP scripts, running at remote servers, transfering generated content through Ethernet using TCP/IP protocol to network cards with NDIS interface on user PCs; CPUs, reordering and speculatively executing instructions to compensate semiconductor electronics limitations on clock rate and light speed; computer-designed cars and planes, drugs and DNA sequenced by computers; computer games, where a tiny patch of reflected light need megabytes of scientific articles full of Fresnel integrals; electronic movies and books; NLP and TreeNet algorithms, retrieving search results from huge databases — these are the things we use every day, that were created by ingenious software developers, by virtue of their fundamental knowledge and talent, and, of course, software development and complexity management methodology, proven over the years.
My friends and I have taken the trouble of preparing a software developers' theoretical minimum based on remarkable IT areas, some parts of which are already included in CS university programmes, the others are taken from interviews and practical experience. Some items require only Wikipedia and take 5 minutes to learn, others would take up months, but this is exactly what you need to know and understand. You are welcome to suggest corrections and additions.
- C++, standard, Comeau, 1TBS, Stroustrup/D&E/Josuttis/Vandevoorde, Dewhurst/Mayers/Sutter, RAII, rule of three, exception-safety, Alexandrescu/Abrahams-Gurtovoy, type erasure, CRTP, NVI, SFINAE, Koenig lookup, Duff's device, Boost, Siek-Lumsdaine/Karlsson, TR1, TR on C++ performance, ABI, Stepanov test, forwarding problem, SPECS, C++0x
- Compilers, standard implementation differences, implementation limitations, intrinsics, standard library differences (containers, rand), ABI, implementation of virtual functions, virtual inheritance, exceptions, RTTI, switch, function and method pointers; optimizations, copy elision (RVO, NRVO), sizeof on different platforms, defines of compilers and environments, __declspec, compiler command line, empty-base optimization, static and dynamic linking, mangling, distributed compilation, precompiled header, single compilation unit, (strict) aliasing/restrict, inline/_forceinline, volatile
- Multithreading, dining philosophers problem, deadlock/race condition/starvation, atomicity, CPU lock instructions, CAS or LL/SC, wait/lock/obstruction-free, ABA problem, lock-free container implementation, spin-lock, TLS/per-thread data, OpenMP, MPI, map-reduce, critical section/mutex/semaphore/condition variable, WaitForSingleObject/WaitForMultipleObjects, green thread/coroutine, pthreads, actor model
- x86 assembly language, Zubkov/Hyde/Drepper/Kasperski/Fog/Abrash, AT&T and Intel syntax, masm32, macro instructions, stack, heap/heap manager, calling conventions, hex-codes, machine data representation, IEEE754, little/big endian, SIMD, hardware exceptions, interrupts, virtual memory, reversing, stack and heap overflow, return oriented programming, alphanumeric shellcode, L1/L2/RAM/page fault and their timings
- Hardware, Horowitz-Hill, semiconductor electronics/spintronics/photonics, transistor, circuitry, microcode, processor technology, VID/PID, FPGA, Verilog/VHDL/SystemC, SISAL, Arduino, memory (ROM → EEPROM, RAM, SSD, HDD, DVD), RISC/CISC, Flynn's taxonomy ([SM]I[SM]D), Harvard and Princeton computer architecture, CPU architectures, x86 architectures
- Processors, pipelining, hyper-threading, out-of-order execution, speculative execution, branch predict, prefetching, set-associative cache, cache-line/cache-miss, clock cycle, protection rings, multiprocessor systems memory models (SMP, NUMA), memory timings
- Discrete math, K2, Post theorem, circuits, finite automata, cellular automata, Kalashnikov rifle, DFA and NFA
- Computability, Turing machine, Markov algorithms, Post machine, Hilbert's tenth problem, Church lambda functions, Clini partial recursive functions, Schönfinkel combinatory logic, Brainfuck, Turing tarpit equivalence, halting and self-applicability problems, countability of computable functions set, RAM-machine, Tarsky algorithm, SAT/SMT-solvers, formal systems theory
- Programming languages, grammars, Chomsky hierarchy, Myhill-Nerode theorem, pumping lemma for regular languages and Ogden's lemma, Clini algebra, NFA → DFA, undecidable problems in formal languages, Dragonbook, Friedl, regular expressions and their complexity, PCRE/POSIX RE, BNF, Boost.Spirit + Karma + Qi/Ragel, LL, LR/SLR/LALR/GLR, PEG/packrat, yacc/bison/flex/antlr, static code analysis, compilation/decompilation/obfuscation/deobfuscation, Clang/LLVM/XMLVM, GCCXML, OpenC++, VM implementation, JiT/AoT/GC, DSL/DSEL
- Algorithms and combinatorial optimization, Cormen/Skiena/Sedgewick/Knuth/Aho-Hopcroft-Ullman/Papadimitriou/Shriver-Goldberg/Preparata-Shamos, data structures, algorithms, complexity and Landau symbols, complexity classes, NP-complete problems, graphs and trees, network flows, Kirchhoff matrix, search trees (especially RB-tree and B-tree), occlusion detection, binary heap, hash tables and perfect hash, Petri nets, Russian peasant multiplication algorithm, Karatsuba method and Winograd-Strassen matrix multiplication, sorting algorithms, greedy algorithms and matroids, dynamic programming, linear programming, diff algorithms, randomized and fuzzy search algorithms, pseudorandom numbers, fuzzy logic
- Numerical methods, dichotomy/Newton method, inter- and extrapolation, splines, Gauss/Jakobi/Seidel methods, QR and LU-decomposition, SVD, Least squares, Runge-Kutta methods, Adams method, Newton-Cotes formulas, Monte-Carlo method, Ritz method, Bubnov-Galerkin method, finite difference/element method, FFT/STFT, convergence and stability
- Machine learning, machine vision, OpenCV, image processing, OCR, Zobel filters, Haar-like features, introduction to vision psychophisiology, TreeNet, neural networks, Kohonen self-organizing map, genetic algorithms, ant colony algorithms, information retrieval/data mining/natural language processing, optimization algorithms, PCA, SVM, gradient boosting, simulated annealing, hill climbing, AI modeling methods
- Information theory, compression, Huffman, RLE, LZ, ECC, lossy compression (images, audio, video), information entropy, Shannon formula, Kolmogorov complexity
- Cryptography, Yaschenko, symmetric (DES, AES), asymmetric (RSA), Diffie-Hellman, elliptic curves, hashing (MD5, SHA, CRCn), DHT, cryptographically strong systems, cryptographic attacks, WEP/WPA/WPA2 and attacks, digital signatures and certificates, HTTPS/SSL, zero-knowledge proof
- Math, Knuth-Graham-Patashnik/Zorich/Winberg/Rudin (Real and complex analysis, not Principles)/Lang, calculus, linear algebra, complex analysis, functional analysis, differential geometry, number theory, PDE/ODE/integral equations/variations calculus/optimal control, generating functions, series, combinatorics, probability theory/mathematical statistics/random processes/queueing theory, Markov chains, integral transforms (Fourier, Laplace, wavelet), NZQRCHOS, computer algebra (Mathematica, Maple)
- Physics, Kirchoff rules, electrical impedance, velocity and frequency of light, Lagrangian
- Chemistry, stoichiometry, silicon chemistry :)
- Architecture and code style, McConnell/Fowler/Leblanc/Gamma/Alexandrescu-Sutter/Booch, defensive programming, patterns, GRASP, UML, OOP (Smalltalk), OOD/OOA, Liskov substitution principle, code metrics
- Software development methodologies, Waterfall/RUP/Agile/Scrum/Kanban/XP, TDD/BDD, CASE
- Testing, unit tests, functional, load, integration testing, UI testing
- IDE, IntelliSense, debuggers (VS/Olly/WinDbg/kdb/gdb) and tracers (strace/ltrace), DWARF2 debug information format, valgrind, vcs (SVN, GIT), merge/branch/trunk, file and branch naming systems, continuous integration, ant, code coverage, static analysis, software verification and validation (Frama-C, RAISE (RSL), Coq), profiling, lint, bug trackers, documentation generators, build systems (cmake)
- Frameworks, Qt, moc and metainformation, signals and slots construct, Summerfield-Blanchett/Schlee, PoCo, common libraries: GMP, i18n, lapack, fftw, pcre
- Operating systems, Silberschatz/Richter/Solomon-Russinovich/Robachevsky/Vahalia/Stevens/Linux Kernel Internals, memory manager, heap manager and its internals (LAL/LFH/slab), process manager, context switch, real and protected mode, executable formats (PE/ELF/Mach), kernel objects, debug hooks (strace/ptrace/dtrace/pydbg, Debug API) and minidumps, bash, network stack and high-load web servers, netgraph, CR0, IPC, window subsystem, security: ACE/ACL and access rights, virtualization technologies, RTOS (QNX), driver development, IRQL, IRP, file systems, BigTable, NDIS/miniport/FS drivers/filter driver, Mm-, Io-, Ldr-functions, DKOM and rootkits, GDT/IDT/SDT, Windows/Linux/BSD kernel, POSIX
- COM, OLE/ActiveX/COM+, ATL, Rogerson/Tavares, apartments, monikers, MIDL, DCOM RPC, CORBA, D-bus, TAO
- Networking, OSI model/Internet model, Ethernet, TCP/IP, TCP window, Nagle algorithm, sockets, Protocol buffers/Thrift/Avro/ASN.1, AMQP, ICMP, routing/BGP/OSPF, ARP, Mitnick attack, syn flood, HTTP/FTP, P2P, DHCP, SMB/NBNS, IRC/XMPP, POP3/SMTP/ESMTP/IMAP, DNS, WiFi/WiMax/GSM/CDMA/EDGE/Bluetooth, ACE, Wireshark
- Graphics, Bresenham algorithm, color models, ray tracing vs polygonal graphics, OpenGL/GLSL/Open Inventor, DirectX/DirectShow/DirectAudio/HLSL, stencil/depth/alpha-test, DirectX 11 graphics pipeline, shaders, lighting models (Phong), throughput, fillrate, OpenCL/CUDA, landscapes, LODs, shadows, textures and filtering, antialiasing, HDR, tone mapping
- Formats, XML/XSLT/XPath/XMLStarlet/DOM/SAX, RTF/ODF, JSON/BSON, torrent, YAML, JPEG/PNG/WebP, AVI/MPEG/RIFF/WAV/MP3/OGG/WebM, SVG, Unicode, single-byte encodings/UTF-8/UTF-16/UCS-2/UTF-32
- RDBMS, Gruber, ANSI SQL, T-SQL, ODBC, MySQL/PostgreSQL/MS SQL/BDB/SQLite/Sphinx, stored procedures, triggers, Codd/А algebras, Tutorial D, normal forms, query optimization and execution, index data structures, transactions and ACID, Brewer's CAP theorem, NoSQL, key-value storage, sharding, ORM (C++ ODB), ERD, OLAP
- Application programming, C#/F#/Nemerle, Schildt/Troelsen/Richter, generic programming, yield, linq/plinq, reflection, AST, WCF, WinForms/WPF/Silverlight, AOP, logging frameworks, .NET assembly
- Quantum computing, Shor's algorithm, quantum cryptography
- Functional programming, Haskell/Ocaml/Scheme/Alice or Oz, SICP/TaPL/YAHT/Purely Functional Data Structures/Harrison-Field, HOF (map/fold/filter), Hindley-Milner type system, monads, typeclasses, algebraic data types, dependent types, lazy/eager evaluation, logic programming (Prolog or Mercury), concurrent programming (Erlang or Oz)
- Web development and scripting languages, Flanagan/Zend PHP5 Certification Course + Study Guide, Apache/nginx, CGI/FastCGI, PHP/Zend Framework/phpDaemon/Zend Engine/Doctrine or Propel/CodeIgniter or Symphony or Yii, Python/Django/Twisted, Ruby/RoR, ASP.NET MVC, JavaScript/jQuery/ExtJS/node.js, JS OOP, HTML5/XHTML/doctype/table and div layouts/CSS3, RSS, canvas/WebGL, Ajax/Comet/WebSockets, security issues (XSS, SQL injection, CSRF), highload, SWIG
- GUI design, Raskin, usability, design and typography basics, Fitts's law, layout principles, LaTeX.
UPD: Some issues are very common, so it is a good idea to give answers in this post.
This list is rightly criticized for the lack of systematization and SUDDEN neighbourhood of highly diverse topics, different in their content and depth. This is a feature, not a bug! Writing a systematic programme for almost every item here would take up the same space as the contents of some great books, therefore, including book names/authors is better. So, how to use this list? You should read good books to gain understanding of the subjects mentioned. The authors did not even think that someone would decide that Duff's device is equal by complexity and size to the Holy Standard! However, our criteria is still applicable because reading a hundred beginner books in C++ will not give you an idea what Duff's device is, but if you know what to study (some parts of our list, for example, C++) you would meet every concept very quickly. The sense of the programme, caused by its size, is to assess how much knowledge you have gained by reading books.
We also receive lots of critical replies from those who consider themselves as programmers, but suppose that learning all these things is impossible, or there is no need to know it all because an ordinary developer would not use it in common practice. We believe these people do not see the difference between erudition/memory and knowledge. A valueable knowledge for a software developer is not exact NBNS packet format, but methods other developers used, in other words, his ability to reinvent or identify those methods in diverse domains. The ability to analyze and synthesize (which is achieved by hard work and study) distinguishes human from google, that will not be able to solve div2 250 even in the long term. This list is aimed at developing these abilities. Surely, you need to supplement abilities with domain-specific knowledge like game physics, java bullshit development or IC design.
The issue of those who do not consider themselves smart enough to master this list, or suppose that they will not be able to apply their knowledge and forget/lose it also needs a separate paragraph. In general the list is inferior to CS faculties' programmes in leading universities, so mastering it in 5 years is quite possible, even combining it with work. 1/3 to 2/3 of discussed issues are commonly used in game development, others can be used by answering questions at StackOverflow.
There is also a group of people who deny learning the specified subjects because they consider that programming is only for making money. These people do not really need this list, it does not address issues of stealing, cheating or forcing others to work instead of you.
Some people say «I get paid well without such education». They should note that after 45 brain degradation is easy to see, because most people find it troublesome to work with code even of ordinary cyclomatic complexity. Slow loss of ability to write code (accompanied by lack of analysis-synthesis activity) can lead to lack of professional work and other problems. Some people continue working actively in old age, but this is caused by their outstanding performance in the past. You can use TopCoder to assess your performance.
I am grateful to everyone who helped me fix all those annoying bugs, especially my colleagues, who have not only practically mastered this list, but also made very valuable comments.
Related links:
Books that are worth reading in IT
Programmer competency matrix
Batkin's list
MIT OpenCourseWare
Internet university courses