Databases III — Scaling Up: Replication, Partitioning & NoSQL

By Pritesh Yadav 14 min read

So far you have used one database on one server. For most apps, that is fine. But as an app grows, one server eventually struggles. This section explains why one server isn't enough, and the two main tricks to grow beyond it: replication (keep copies of the same data) and partitioning (split different data across machines). We'll also meet the NoSQL database family and learn about connection pooling, a small idea that quietly saves many apps from falling over.

The four ceilings of one server

A "ceiling" here means a limit you eventually hit. A single database server has four separate limits, and the most important lesson is that each one has a different fix. Naming them separately helps you choose the right tool.

  • Read load — too many "read" queries per second. A read is a SELECT: fetching data without changing it (showing a product page). Too many reads overwhelm the server's CPU and disk. (Easiest to fix.)
  • Write load — too many "write" queries per second. A write is an INSERT, UPDATE, or DELETE: it changes the data. (Hardest to fix.)
  • Storage — the data grows bigger than one machine's disk can hold (or bigger than its memory can keep fast).
  • Availability — "available" means up and reachable. One server is a single point of failure: if it dies, the whole app is down, and you can't even take it offline for an upgrade without downtime.
Key takeaway: Replication mainly solves read load and availability. Partitioning (sharding) solves write load and storage. They fix different problems, so large systems use both together.

Two directions to grow: up vs out

There are two ways to give a database more power.

  • Vertical scaling (scale up) — buy a bigger machine: more CPU, more memory, a faster disk. Simple, no code changes, no new complexity. But there is a hard physical ceiling (the biggest machine that exists), the cost grows faster than the power (a "twice as powerful" box costs much more than twice as much), and it's still a single point of failure.
  • Horizontal scaling (scale out) — add more machines and spread the work. Effectively unlimited, and gives you spare copies for safety. But now machines must talk over a network and stay in agreement, which adds real complexity. Replication and sharding live here.
Best practice: Try vertical scaling first — it's underrated. Modern cloud servers reach hundreds of CPU cores and terabytes of memory, which is enough for the vast majority of apps. Reach for horizontal scaling only when a bigger box genuinely can't keep up.

Replication — many copies of the same data

Replication means keeping the same dataset on several servers. It gives you more machines to serve reads, a backup ready to take over if one dies, and copies that can sit physically near your users for speed.

The standard shape is leader/follower (also called primary/replica; the old term was master/slave). One server is the leader. All writes go to the leader. The leader keeps a change log — an ordered list of every change it made (Postgres calls this the WAL, "Write-Ahead Log"; MySQL calls it the binlog). It streams this log to the followers, which replay the exact same changes in the exact same order. Reads can be answered by any server.

Why one leader? Writes must happen in a definite order to keep the data consistent, and the single leader is the one place that decides that order.

        write
client --------> [ LEADER ]
                    |  streams change log (in order)
        +-----------+-----------+
        v           v           v
     [ F1 ]      [ F2 ]      [ F3 ]   (followers)
        ^           ^           ^
        +-----------+-----------+
                    |  reads
clients ------------+   (any follower can answer)
   One arrow IN (writes), many arrows OUT (reads).
Analogy: The leader is the library's master reference copy. Followers are photocopies that visitors read from, so the original isn't mobbed. If the photocopies update a moment after the master, that small delay is replication lag.

How fast must followers keep up? Three choices

ModeLeader replies "OK" when…Trade-off
Synchronousat least one follower confirms it saved the writeNo data loss, but slower; if that follower is slow or down, writes stall.
Asynchronous (common default)the leader saves it locally — followers catch up laterFast and resilient, but if the leader dies before a write reaches a follower, that write is lost.
Semi-synchronousexactly one follower confirms; the rest update asyncPractical middle ground: always a second durable copy, with low latency. Common in production.

Replication lag and the "read-your-own-writes" bug

Replication lag is how far a follower trails the leader — usually milliseconds, but it can stretch to seconds under heavy load or network trouble. This causes a classic, confusing bug.

Example: A user changes their display name. The write goes to the leader. The page reloads instantly and reads from a follower that is 200 milliseconds behind. The follower still has the old name, so the user sees the old value and thinks the save failed. The fix is called read-your-own-writes: for a few seconds after someone writes, send their reads to the leader (or pin them to one specific replica) so they always see their own latest change.

Read replicas are followers dedicated to answering reads. Adding read replicas is the cheapest, easiest scaling win there is — each one multiplies your read capacity. The catch: your app must split its traffic — writes go to the leader, reads go to the replicas.

Failover — when the leader dies

Failover means promoting a follower to become the new leader. Three steps: (1) detect the leader is dead (usually a missed heartbeat or timeout); (2) choose the most up-to-date follower; (3) reconfigure clients and the other followers to point at the new leader.

Common mistake: Treating failover as trivially safe. Real dangers: with async replication, writes that hadn't reached any follower are lost forever. Split-brain can occur — two servers both believe they're the leader and accept conflicting writes (the old leader must be "fenced off," i.e. blocked). And a too-twitchy timeout causes needless failovers ("flapping"). Tools like Patroni (Postgres) or managed services (Amazon RDS/Aurora, Google Cloud SQL) handle this carefully so you don't have to hand-roll it.
Common mistake: Thinking replication scales writes. It does not — every write still funnels through the single leader. Replication scales reads and adds safety; only partitioning scales writes.

Partitioning / sharding — splitting the data itself

Replication copies the same data everywhere. Partitioning splits different data across machines: shard A holds users 1–1,000,000, shard B holds 1,000,001–2,000,000, and so on. ("Partitioning" is the general term; "sharding" usually means partitioning across separate servers. A "shard" is one of those pieces.) This is the only way to scale total writes and total storage past one machine, because each shard has its own leader accepting writes in parallel.

You must pick a partition key (or shard key): the column that decides which shard a row lives on. Choosing it well is the single most important sharding decision.

Two ways to assign rows to shards

  • Range partitioning — split by ranges of the key (names A–M on shard 1, N–Z on shard 2; or by date). Good: efficient ordered scans, like "all orders in March." Bad: prone to hot spots — if you shard by timestamp, every one of today's writes hits the single "today" shard.
  • Hash partitioning — run the key through a hash function (a function that turns any input into a scrambled fixed-size number) and assign by that. Good: spreads load evenly. Bad: you lose range scans, because neighboring keys scatter to random shards.
Example: Shard users by first letter (A–M / N–Z). Listing users alphabetically is easy. But a marketing import adds 5 million users whose names start with "A" — one shard melts while the other naps. Hashing the user ID spreads everyone evenly, but now "list all A–M users" must visit every shard.

The hot-spot / hot-key problem

Even hashing can't save you from a single super-popular key. In a flash sale, every "add to cart" for one product ID hashes to the same shard — that shard drowns while the others idle. (Engineers nickname this the "Justin Bieber problem": one celebrity account overwhelms one machine.) Mitigations: add a small random suffix to spread one hot key across several partitions (e.g. product#0product#9), then gather them on read; or cache that hot key. Amazon DynamoDB does this automatically — it detects a "hot partition" and splits it.

The mod-N trap and consistent hashing

A tempting but broken scheme is hash(key) % N, where N is the number of servers. It spreads data fine — until you add a server.

Common mistake: Using hash(key) % N. With 4 servers you use % 4. Add a 5th and switch to % 5, and almost every key now maps to a different server — forcing a near-total data reshuffle and a cache-miss storm. Use consistent hashing instead.

Consistent hashing imagines a ring of numbers (0 up to a huge value, then wrapping back to 0). You hash each server onto a point on the ring, and each key onto a point too. A key belongs to the first server you meet going clockwise. Now adding or removing a server only reassigns the keys in one arc — about 1/N of the data — instead of all of it.

        0 / 2^32
            .
       S3 .     . S1
        .         .
   K --> (key snaps clockwise to next server)
        .         .
          .     .
            S2
  Add S4 between S2 and S1: it only "steals"
  the arc from S2 to S4 — not the whole ring.

Real systems improve this with virtual nodes (vnodes): each physical server is placed at many points on the ring (often ~100–200). This evens out the load and, when a server leaves, spreads its share across many neighbors instead of dumping it all on one. Cassandra, DynamoDB, and many caches use this.

Common mistake: Sharding too early, or on a bad key. Rebalancing (moving data when shards are added or removed) happens while the system is live and is never free — queries slow down during the move. And cross-shard work is painful: joins (combining rows from two tables) and transactions (all-or-nothing groups of writes) that span shards become hard or impossible; totals need a "scatter-gather" across shards. Exhaust vertical scaling, read replicas, and caching first. Pick a high-variety, evenly spread shard key, never a low-variety one like country or status.

Why reads are easy and writes are hard

This is the deep idea tying the section together. Reads are stateless and order-independent: ask 1,000 replicas the same question and they all give the same answer, so you just add replicas. Writes must be serialized — put in a definite order — to keep data consistent, and that ordering needs a single decision point (the leader), which is a chokepoint. Sharding scales writes only by giving up easy cross-shard joins and transactions.

Key takeaway: This is a CAP-theorem-flavored trade-off. The more you spread data across machines, the more you drift toward eventual consistency (copies agree eventually, not instantly) — unless you pay extra latency to keep things strongly consistent. There's no free lunch.

NoSQL — and why it exists

NoSQL means "Not Only SQL." These databases appeared in the 2000s when web-scale companies needed easy horizontal scaling, flexible data shapes, and very high write throughput that single-server relational databases struggled to deliver. The core trade: most NoSQL stores drop rich relational features (joins, fixed schema, multi-row transactions) in exchange for easy scale-out and speed. There are four families.

FamilyWhat it storesExamplesGood forWeakness
Key-Valuea giant dictionary: get/put by keyRedis, DynamoDB, Memcachedcaching, sessions, rate limiters, leaderboardscan't query by value; no relationships
Documentself-contained JSON-like documentsMongoDB, Couchbasecatalogs, user profiles, evolving datajoins are awkward; you denormalize
Wide-Columnrows by partition key; flexible columnsCassandra, HBase, Bigtabletime-series, IoT, event logs, write-heavy feedsmust design tables per query up front
Graphnodes + relationships as first-classNeo4jsocial graphs, recommendations, frauddoesn't shard easily; weak for bulk analytics

("Schema" = the fixed structure of your data; a "flexible schema" lets different records have different fields. "Denormalize" = store duplicate copies of data together so you don't need a join to read it.)

Example — pick the tool: shopping-cart session → Redis (key-value); product catalog with varying attributes → MongoDB (document); millions of sensor readings per second → Cassandra (wide-column); "people who bought this also bought" → Neo4j (graph); orders + payments + inventory → PostgreSQL (SQL, because you need transactions and joins).

SQL vs NoSQL — a balanced view

SQL databases (PostgreSQL, MySQL) give you a strict schema, strong ACID transactions (reliable all-or-nothing changes), powerful joins, and decades of maturity. They're the right default whenever relationships and correctness matter — money, orders, inventory. NoSQL gives flexible schemas, built-in horizontal scale, and high throughput, but weaker (often eventual) consistency and limited joins, so you model the data around your exact queries.

Best practice: The line has blurred. Postgres now has JSONB (store and query JSON like a document DB) plus partitioning, and "NewSQL" systems (CockroachDB, Google Spanner, Vitess) offer SQL and horizontal scale. Choose by your access pattern and consistency needs, not by hype. Most apps should start with one good relational database.

Connection pooling — a small fix that saves apps

Opening a database connection is surprisingly expensive. PostgreSQL forks a separate operating-system process for each connection — several megabytes of memory each — plus a security handshake. It was never built for thousands of simultaneous connections, and max_connections (often defaulting to ~100) is a hard ceiling. A busy web app where every request opens its own connection will exhaust the database and crawl.

A connection pool keeps a small set of already-open connections and lends them out to requests, taking them back when each request finishes. App-side pools include HikariCP (Java) and pgx (Go); a popular dedicated proxy is PgBouncer, which fronts thousands of clients onto a few real backend connections (a PgBouncer client costs ~2 KB versus megabytes for a real connection).

Analogy: A bank counter keeps 10 shared pens for 500 customers, instead of every customer registering their own pen. The pool is the box of pens; PgBouncer is the clerk handing them out and taking them back.

PgBouncer offers pool modes: session (a real connection is tied to a client for its whole session), transaction (the real connection is held only for one transaction — most efficient, but breaks session features like server-side prepared statements and LISTEN/NOTIFY), and statement (per query).

Common mistake: Thinking "more connections = faster" and over-sizing the pool. A pool far larger than the database's CPU/disk capacity causes contention and can crash it. Size the pool to the database's backend capacity (often a small multiple of CPU cores), not to your request count.
Key takeaways:
  • One server hits four separate ceilings — read load, write load, storage, availability — and each needs a different fix.
  • Replication (same data, many copies) scales reads and gives availability; only sharding (different data, split across machines) scales writes and storage.
  • Reads are easy to scale (fan out to replicas); writes are hard because they must be ordered through a single leader — a CAP/eventual-consistency trade-off.
  • Use consistent hashing with virtual nodes, never hash(key) % N; and watch for hot keys that overwhelm one shard.
  • NoSQL trades joins, fixed schema, and multi-row ACID for scale and flexibility — pick by access pattern, and a relational DB is the right default for most apps.
  • Always put a connection pool in front of Postgres, and size it to the database's capacity, not your traffic.

Continue reading