Goal
Understand how backend systems actually work — not just how to build them, but why they behave the way they do under load, failure, and scale.
Who this is for
Developers who can build a basic backend service but want to understand storage, data consistency, failure modes, and distributed system trade-offs at a deeper level.
Prerequisites
- Can build a REST API in at least one language
- Has used a relational database (queries, indexes, transactions)
- Familiar with the basics of HTTP
Reading order
1. The Pragmatic Programmer — Hunt & Thomas
Read this first, but do not treat it as a deep technical book. It is a book about how to think about software. The habits it builds — incremental improvement, avoiding duplication, staying curious — make everything else on this list easier to absorb.
Focus on: the sections on estimation, debugging, and working with legacy code. Skip anything that feels like career advice if it does not resonate.
2. Designing Data-Intensive Applications — Martin Kleppmann
The most important book on this list. Explains storage engines, replication, partitioning, transactions, and stream processing in a way that connects directly to systems you will use at work.
Read Chapters 1–7 in order. They build on each other. Chapters 8–12 (distributed systems and batch/stream processing) can follow on a second pass if the first read feels dense.
Do not skip the footnotes. Many of them point to original papers that are worth reading later.
3. Computer Systems: A Programmer’s Perspective — Bryant & O’Hallaron
Most backend performance problems trace back to memory, caching, or I/O. This book explains all three at the hardware level. Read it after DDIA so you can connect the storage concepts to what actually happens on a machine.
Focus on: Chapters 5–6 (performance and memory hierarchy) and Chapter 11 (network programming). The assembly chapters (Chapter 3) are worth reading slowly — not to write assembly, but to understand what the compiler does.
4. Understanding Distributed Systems — Roberto Vitillo
A shorter and more focused follow-up to DDIA. Covers consensus, leader election, replication, and failure detection with enough detail to design systems that handle partial failures correctly.
Read after DDIA. It will consolidate and extend what you learned in Chapters 5–9 of DDIA.
Optional books
- Database Internals by Alex Petrov — goes deeper into B-trees, LSM trees, and distributed database internals. Read if you work on or near a database engine.
- A Philosophy of Software Design by John Ousterhout — a short and opinionated book on managing complexity in software. Read when you find yourself writing code that is hard to change.
Practice ideas
- After DDIA Chapter 7: set up two PostgreSQL connections and reproduce a read skew anomaly. Try different isolation levels and observe the difference.
- After CS:APP Chapter 6: write a matrix multiplication in two loop orders. Benchmark both. The difference will be larger than expected.
- After the full list: design a simple URL shortener. Write down every decision — storage engine, replication strategy, what happens when a node fails. Revisit your design after six months.
Expected outcome
After completing this path, you should be able to:
- Explain what happens when a write hits a database, from application layer to disk
- Choose between replication strategies with an understanding of their trade-offs
- Reason about transaction isolation without looking up the definitions
- Identify the class of problem (consistency, availability, partition) when a distributed system misbehaves
- Read engineering blog posts from companies like Stripe, Cloudflare, or PlanetScale and follow the reasoning