Networking II — The Web Stack: HTTP, TLS, Load Balancing & Caching
In the last networking section we covered the lower-level pipes of the internet: IP addresses, TCP, UDP, and DNS. Those are the roads and the addressing system. This section walks up one floor to the application layer — the part developers actually touch every day. We'll learn how the web really works: how a browser asks a server for a page (HTTP), how that conversation is kept private (TLS / HTTPS), and the infrastructure (load balancers, proxies, CDNs, caches) that makes the web fast, secure, and able to serve millions of people at once.
1. The HTTP request/response model
HTTP (HyperText Transfer Protocol) is a text-based, request/response protocol. "Request/response" means a client (a browser, a phone app, or a command-line tool like curl) sends a request, and the server sends back a response. Crucially, it is client-driven: the server never speaks first. Nothing happens until the client asks. (That one-way limitation is exactly why WebSockets and Server-Sent Events were later invented — to let servers push data.)
A request has four parts: (1) a request line — the method, path, and version; (2) headers — key:value metadata about the request; (3) a blank line; (4) an optional body (form data, JSON, an uploaded file). A response mirrors this with a status line, headers, a blank line, and a body.
REQUEST RESPONSE
GET /products/5 HTTP/1.1 HTTP/1.1 200 OK
Host: shop.com Content-Type: application/json
Accept: application/json Content-Length: 48
<blank line> <blank line>
{ "id": 5, "name": "Mug" }
Methods (also called verbs) tell the server your intent:
GET— read/fetch a resource (no body).POST— create something or submit data.PUT— replace a whole resource with what you send.PATCH— partially update a resource.DELETE— remove a resource.HEAD— like GET but returns headers only, no body (handy for "did this change?").OPTIONS— ask what the server allows (used in CORS preflight checks).
Status codes — grouped by their first digit
The server's response always starts with a 3-digit code. The first digit tells you the family:
- 1xx informational — e.g.
103 Early Hints. - 2xx success —
200 OK,201 Created,204 No Content. - 3xx redirection —
301(moved permanently),302/307(temporary),304 Not Modified(used by caches, explained later). - 4xx client error — you made a mistake:
400,401,403,404,409 Conflict,422,429 Too Many Requests. - 5xx server error — the server failed:
500,502,503,504.
401 with 403, and 502 with 504. Memorize these: 401 Unauthorized = "we don't know who you are" (not logged in). 403 Forbidden = "we know who you are, but you're not allowed to do this." For the 5xx pair: 502 Bad Gateway = a proxy got a broken answer from the server behind it; 504 Gateway Timeout = the proxy waited and got no answer in time.2. Statelessness, cookies, sessions & tokens
HTTP is stateless: the server keeps no memory of your previous requests. Each request must carry everything needed to handle it. This sounds like a limitation, but it's the secret to horizontal scaling — running many identical servers side by side. Because no server "remembers" you, any server can handle any request, so you can add more servers freely.
But real apps need to remember things like "I'm logged in." The trick: the client carries an identifier on every request. The most common way is a cookie. The server sends Set-Cookie: session=abc123; the browser stores it and automatically attaches Cookie: session=abc123 to every later request to that site.
| Term | What it is | Trade-off |
|---|---|---|
| Cookie | A small value the browser stores and auto-sends | Easy, but you must protect it |
| Session | Server-side data (in Redis/DB) keyed by the cookie's id | Needs shared storage if you have many servers |
| Token (JWT) | A signed token that carries the user's claims itself | No server lookup, but hard to revoke early |
Three cookie flags are essential for security: HttpOnly (JavaScript can't read it, which blocks theft via XSS attacks), Secure (only sent over HTTPS), and SameSite (limits cross-site sending, which blocks CSRF attacks). XSS = injecting malicious scripts into a page; CSRF = tricking your browser into sending a request you didn't mean to.
HttpOnly, Secure, and SameSite. That cookie can then be stolen by a script or ridden by a forged request.3. HTTP/1.1 vs HTTP/2 vs HTTP/3
HTTP has evolved to be faster. The first big enemy is head-of-line (HOL) blocking — when one slow item stuck at the front of a line holds up everything behind it (like one slow shopper jamming the only checkout lane).
HTTP/1.1 (1997): text-based; one connection handles one request at a time. Connection: keep-alive lets a connection be reused for several sequential requests (avoiding the cost of re-opening it), but they still go one after another. To get parallelism, browsers cheated by opening about 6 connections per host.
HTTP/2 (2015): same meaning, new wire format. Its wins: (a) binary framing (machine-friendly, not text); (b) multiplexing — many requests (called streams) interleaved over one TCP connection, killing the 6-connection hack; (c) HPACK header compression, which removes repeated header bytes. Server Push existed but is now deprecated (Chrome removed it in 2022) — use 103 Early Hints instead.
HTTP/3 (2022): runs over QUIC, a new transport built on UDP instead of TCP. QUIC does reliability and ordering per stream, so one lost packet only stalls its own stream. It also folds in the TLS handshake (faster setup) and supports connection migration — a phone switching Wi-Fi to cellular keeps the same connection via a Connection ID rather than its IP address.
HTTP/1.1: [Req A]--wait--[Req B]--wait--[Req C] one at a time
HTTP/2: one TCP pipe: A1 B1 A2 C1 B2 ... interleaved
BUT a dropped packet stalls the WHOLE pipe (TCP)
HTTP/3: lane A | lane B | lane C independent
a drop in lane B stalls ONLY lane B
4. REST, APIs & idempotency
REST is a style for designing web APIs. Its rules: resources are named by URLs (/users/42), you act on them with HTTP methods, you transfer representations (usually JSON), and it's stateless. Use nouns in paths and verbs as methods — write GET /users/42, never GET /getUser?id=42.
Two properties drive safe API design:
- Safe = read-only, changes nothing on the server:
GET,HEAD,OPTIONS. - Idempotent = doing it many times has the same effect as doing it once:
GET,HEAD,OPTIONS,PUT,DELETE. NOT idempotent:POST,PATCH.
Why care? Because networks fail. If a request times out, you don't know if it succeeded. You can safely retry an idempotent call. Retrying a POST, though, might run it twice.
POST /charge a credit card. The response times out. Your code retries — and the customer is charged twice. Fix: send an Idempotency-Key: 7f3a... header (a unique id for that operation). The server remembers the key and ignores the duplicate. This is the pattern Stripe popularized.GET with side effects, like GET /delete?id=5. GET must be safe — search crawlers and browser prefetchers will happily call it and delete your data. Use the correct verb.5. TLS / HTTPS — keeping it private and trustworthy
HTTPS is just HTTP carried inside a TLS-encrypted channel. TLS (Transport Layer Security) gives three guarantees:
- Confidentiality — eavesdroppers see only scrambled ciphertext.
- Integrity — if anyone tampers with the data, it's detected.
- Authentication — you're really talking to the right server, proven by its certificate.
A server presents an X.509 certificate — a document binding its domain name to a public key, signed by a Certificate Authority (CA) that browsers already trust. This forms a chain of trust up to a "root" CA stored in your operating system. Let's Encrypt made these certificates free and automatic.
TLS 1.3 handshake (1 round trip):
Client --ClientHello: ciphers + key share--> Server
Client <--ServerHello: cert + key share, encrypt now-- Server
both derive the SAME session key (ECDHE),
then application data flows. Done in 1 RTT.
The key idea: slow asymmetric crypto (public/private keys) is used only to agree on a shared secret key. Then fast symmetric crypto (AES-GCM, ChaCha20) encrypts the actual bulk data. TLS 1.3 finishes the handshake in one round trip (TLS 1.2 needed two). It also requires forward secrecy (via ephemeral ECDHE keys): even if someone later steals the server's long-term key, they can't decrypt traffic they captured in the past. TLS 1.3 also offers 0-RTT resumption for returning clients (even faster), but it's replay-vulnerable, so use it only for idempotent requests.
6. Latency vs throughput vs bandwidth
These three are constantly confused:
- Latency — time for one trip (delay), measured in milliseconds.
- Throughput — actual work done per second (e.g. requests/sec achieved).
- Bandwidth — the maximum capacity of the pipe (bits/sec).
Latency has a hard floor: the speed of light is ~1ms per ~100km over fiber, so a cross-continent round trip is naturally ~150ms. A rough order-of-magnitude "latency canon" every engineer should feel:
| Operation | Rough time |
|---|---|
| Main memory read | ~100 nanoseconds |
| Same-datacenter round trip | ~0.5 ms |
| SSD random read | ~16–20 microseconds |
| Cross-continent round trip | ~150 ms |
7. Caching everywhere
Caching means storing a copy of a result closer to whoever needs it, so you avoid redoing the work. It exists at every layer of the stack:
User -> [Browser cache] governed by Cache-Control: max-age -> [CDN edge PoP] governed by Cache-Control: s-maxage -> [Reverse proxy/Varnish] -> [App + Redis cache] -> [DB buffer pool]
HTTP gives you headers to control caching:
Cache-Control: max-age=N— fresh for N seconds.s-maxage— overrides max-age, but only for shared caches like a CDN.no-cache— may store, but must revalidate before using.no-store— never store at all (for private/sensitive data).private— browser only, never a shared cache;public— anyone may cache.
When a cached copy might be stale, the cache revalidates using a validator. An ETag is a fingerprint of the content. The cache asks If-None-Match: "v7"; if nothing changed, the server replies 304 Not Modified with no body — saving bandwidth. The stale-while-revalidate directive serves the old copy instantly while quietly refreshing in the background, hiding the delay from the user.
app.a1b2c3.js) and cache forever with max-age=31536000, immutable; a new deploy is simply a new URL. HTML/API responses → short max-age + ETag revalidation. Never cache private, logged-in responses in a shared CDN.no-store everywhere out of fear, destroying performance. (2) Caching a logged-in user's private response in a shared CDN — leaking one user's data to everyone. Cache invalidation is famously hard; stale cache is the #1 cause of "I deployed but users still see the old version." Plan TTLs, ETags, and purge strategy up front.8. Load balancers
A load balancer (LB) spreads incoming traffic across many backend servers. This gives you scale (more servers = more capacity) and high availability (route around a dead server).
The big distinction is L4 vs L7 (the OSI layer numbers):
| L4 (transport) | L7 (application) | |
|---|---|---|
| Routes by | IP + port only | URL path, host, header, cookie |
| Reads payload? | No (payload-blind) | Yes (content-aware) |
| Speed | Very fast, low CPU | Smarter, more CPU |
| Can it terminate TLS? | No | Yes |
| Example | AWS NLB | AWS ALB, Nginx, Envoy |
Balancing algorithms: Round Robin (cycle through servers evenly — fine when requests are uniform); Weighted (bigger servers get more); Least Connections (send to the server with the fewest active requests — best when request durations vary); IP Hash / Consistent Hashing (same client always lands on the same server, with minimal reshuffling when servers are added/removed — vital for sharded caches).
Health checks ensure the LB only sends traffic to healthy servers. Active checks probe on a schedule ("does GET /health return 200?"). Passive checks watch real traffic for errors. Use both. Sticky sessions pin a client to one server so its session state stays reachable — but this unbalances load and breaks when that server dies.
9. Reverse proxies, CDNs & the edge
A forward proxy sits in front of clients (e.g. a company's outbound gateway). A reverse proxy sits in front of servers (Nginx, HAProxy, Envoy). The reverse proxy is your single "front door": it terminates TLS, caches, compresses, routes requests, load-balances, and filters attacks. Clients never touch your app servers directly.
A CDN (Content Delivery Network — Cloudflare, Fastly, CloudFront) is a globally distributed network of reverse-proxy caches. Users connect to the nearest edge PoP ("point of presence" — a server cluster near them). A cache hit is served straight from the edge; a miss is fetched from your origin (your main server), cached, then served. Benefits: lower latency (content is physically near users), less load on your origin, DDoS absorption, and TLS handled at the edge. "Edge computing" pushes not just caching but actual code (edge functions) out to those PoPs so even dynamic logic runs close to users.
10. Rate limiting, timeouts, retries & backoff
Rate limiting caps how many requests a client may make in a time window. It protects you from abuse, buggy clients, and overload. Over-limit requests should return 429 Too Many Requests with a Retry-After header telling the client when to try again.
TOKEN BUCKET (allows bursts): +1 token / 100ms refills the bucket 10 saved tokens -> a burst of 10 passes instantly, then throttles to the refill rate. LEAKY BUCKET (smooths bursts): pour in fast -> drips out exactly 1 / 100ms, a steady constant stream no matter the input.
Other algorithms: Fixed Window (count per clock window — simple, but allows a 2x burst at the boundary), Sliding Window (smooths that boundary problem). Token bucket allows controlled bursts and is the most common API choice.
Timeouts are mandatory. Never wait forever — every network call needs a deadline. Without one, a single hung dependency can use up all your threads/connections and take the whole system down.
Retries must be careful. Only retry idempotent requests (or those with an Idempotency-Key), and only on transient errors (timeouts, 502/503/504, 429-with-Retry-After) — never on permanent client errors like 400/401/404. Use exponential backoff (wait 1s, 2s, 4s, 8s…) plus jitter (randomize the delay).
THUNDERING HERD (no jitter): 1000 clients fail at t=0, ALL retry at t=1,2,4... -> each wave re-crashes the recovering server WITH JITTER: each client picks a random delay in [0, backoff] -> retries spread out, the server recovers
429 with Retry-After — not a generic 500 or a silent drop. Well-behaved clients then know exactly when to come back.- HTTP is a stateless, client-driven request/response protocol; statelessness is exactly what lets you scale horizontally by adding identical servers.
- Know your status codes cold —
401(who are you?) vs403(you can't do that), and502(bad upstream answer) vs504(upstream timed out). - HTTP/2 fixed application-layer head-of-line blocking via multiplexing; HTTP/3 over QUIC (UDP) fixes the remaining transport-layer blocking and adds connection migration.
- TLS uses slow asymmetric crypto only to agree a key, then fast symmetric crypto for the data; TLS 1.3 is a 1-RTT handshake with mandatory forward secrecy.
- Latency, throughput, and bandwidth are different things — a remote call is ~1,000,000× a memory read, so cache at every layer and minimize round trips.
- Only retry idempotent requests on transient errors, with exponential backoff + jitter and a circuit breaker, to avoid double-charges and thundering herds.