Networking II — The Web Stack: HTTP, TLS, Load Balancing & Caching

By Pritesh Yadav June 21, 2026 15 min read —

In the last networking section we covered the lower-level pipes of the internet: IP addresses, TCP, UDP, and DNS. Those are the roads and the addressing system. This section walks up one floor to the application layer — the part developers actually touch every day. We'll learn how the web really works: how a browser asks a server for a page (HTTP), how that conversation is kept private (TLS / HTTPS), and the infrastructure (load balancers, proxies, CDNs, caches) that makes the web fast, secure, and able to serve millions of people at once.

Analogy: Think of the whole web stack as a busy restaurant. The CDN/edge is a food truck on your street with popular dishes pre-made (no trip downtown). The reverse proxy is the host at the door who checks your ID and points you the right way. The load balancer seats you at whichever kitchen line is shortest. The kitchens are the backend servers. And statelessness means the kitchen forgets who you are between courses unless you hand back your table number (a cookie).

1. The HTTP request/response model

HTTP (HyperText Transfer Protocol) is a text-based, request/response protocol. "Request/response" means a client (a browser, a phone app, or a command-line tool like curl) sends a request, and the server sends back a response. Crucially, it is client-driven: the server never speaks first. Nothing happens until the client asks. (That one-way limitation is exactly why WebSockets and Server-Sent Events were later invented — to let servers push data.)

A request has four parts: (1) a request line — the method, path, and version; (2) headers — key:value metadata about the request; (3) a blank line; (4) an optional body (form data, JSON, an uploaded file). A response mirrors this with a status line, headers, a blank line, and a body.

REQUEST                          RESPONSE
GET /products/5 HTTP/1.1         HTTP/1.1 200 OK
Host: shop.com                   Content-Type: application/json
Accept: application/json         Content-Length: 48
<blank line>                     <blank line>
                                 { "id": 5, "name": "Mug" }

Methods (also called verbs) tell the server your intent:

GET — read/fetch a resource (no body).
POST — create something or submit data.
PUT — replace a whole resource with what you send.
PATCH — partially update a resource.
DELETE — remove a resource.
HEAD — like GET but returns headers only, no body (handy for "did this change?").
OPTIONS — ask what the server allows (used in CORS preflight checks).

Status codes — grouped by their first digit

The server's response always starts with a 3-digit code. The first digit tells you the family:

1xx informational — e.g. 103 Early Hints.
2xx success — 200 OK, 201 Created, 204 No Content.
3xx redirection — 301 (moved permanently), 302/307 (temporary), 304 Not Modified (used by caches, explained later).
4xx client error — you made a mistake: 400, 401, 403, 404, 409 Conflict, 422, 429 Too Many Requests.
5xx server error — the server failed: 500, 502, 503, 504.

Common mistake: Confusing 401 with 403, and 502 with 504. Memorize these: 401 Unauthorized = "we don't know who you are" (not logged in). 403 Forbidden = "we know who you are, but you're not allowed to do this." For the 5xx pair: 502 Bad Gateway = a proxy got a broken answer from the server behind it; 504 Gateway Timeout = the proxy waited and got no answer in time.

2. Statelessness, cookies, sessions & tokens

HTTP is stateless: the server keeps no memory of your previous requests. Each request must carry everything needed to handle it. This sounds like a limitation, but it's the secret to horizontal scaling — running many identical servers side by side. Because no server "remembers" you, any server can handle any request, so you can add more servers freely.

But real apps need to remember things like "I'm logged in." The trick: the client carries an identifier on every request. The most common way is a cookie. The server sends Set-Cookie: session=abc123; the browser stores it and automatically attaches Cookie: session=abc123 to every later request to that site.

Term	What it is	Trade-off
Cookie	A small value the browser stores and auto-sends	Easy, but you must protect it
Session	Server-side data (in Redis/DB) keyed by the cookie's id	Needs shared storage if you have many servers
Token (JWT)	A signed token that carries the user's claims itself	No server lookup, but hard to revoke early

Three cookie flags are essential for security: HttpOnly (JavaScript can't read it, which blocks theft via XSS attacks), Secure (only sent over HTTPS), and SameSite (limits cross-site sending, which blocks CSRF attacks). XSS = injecting malicious scripts into a page; CSRF = tricking your browser into sending a request you didn't mean to.

Common mistake: Storing a session id in a cookie without HttpOnly, Secure, and SameSite. That cookie can then be stolen by a script or ridden by a forged request.

3. HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP has evolved to be faster. The first big enemy is head-of-line (HOL) blocking — when one slow item stuck at the front of a line holds up everything behind it (like one slow shopper jamming the only checkout lane).

HTTP/1.1 (1997): text-based; one connection handles one request at a time. Connection: keep-alive lets a connection be reused for several sequential requests (avoiding the cost of re-opening it), but they still go one after another. To get parallelism, browsers cheated by opening about 6 connections per host.

HTTP/2 (2015): same meaning, new wire format. Its wins: (a) binary framing (machine-friendly, not text); (b) multiplexing — many requests (called streams) interleaved over one TCP connection, killing the 6-connection hack; (c) HPACK header compression, which removes repeated header bytes. Server Push existed but is now deprecated (Chrome removed it in 2022) — use 103 Early Hints instead.

HTTP/3 (2022): runs over QUIC, a new transport built on UDP instead of TCP. QUIC does reliability and ordering per stream, so one lost packet only stalls its own stream. It also folds in the TLS handshake (faster setup) and supports connection migration — a phone switching Wi-Fi to cellular keeps the same connection via a Connection ID rather than its IP address.

HTTP/1.1:  [Req A]--wait--[Req B]--wait--[Req C]   one at a time

HTTP/2:    one TCP pipe:  A1 B1 A2 C1 B2 ...   interleaved
           BUT a dropped packet stalls the WHOLE pipe (TCP)

HTTP/3:    lane A | lane B | lane C   independent
           a drop in lane B stalls ONLY lane B

Common mistake: Thinking HTTP/2 fixed all head-of-line blocking. It fixed the application-layer kind. But because it still rides on TCP, one lost packet stalls every stream — that's transport-layer HOL blocking. Only HTTP/3 / QUIC fixes it.

4. REST, APIs & idempotency

REST is a style for designing web APIs. Its rules: resources are named by URLs (/users/42), you act on them with HTTP methods, you transfer representations (usually JSON), and it's stateless. Use nouns in paths and verbs as methods — write GET /users/42, never GET /getUser?id=42.

Two properties drive safe API design:

Safe = read-only, changes nothing on the server: GET, HEAD, OPTIONS.
Idempotent = doing it many times has the same effect as doing it once: GET, HEAD, OPTIONS, PUT, DELETE. NOT idempotent: POST, PATCH.

Why care? Because networks fail. If a request times out, you don't know if it succeeded. You can safely retry an idempotent call. Retrying a POST, though, might run it twice.

Example: You POST /charge a credit card. The response times out. Your code retries — and the customer is charged twice. Fix: send an Idempotency-Key: 7f3a... header (a unique id for that operation). The server remembers the key and ignores the duplicate. This is the pattern Stripe popularized.

Common mistake: Designing a GET with side effects, like GET /delete?id=5. GET must be safe — search crawlers and browser prefetchers will happily call it and delete your data. Use the correct verb.

5. TLS / HTTPS — keeping it private and trustworthy

HTTPS is just HTTP carried inside a TLS-encrypted channel. TLS (Transport Layer Security) gives three guarantees:

Confidentiality — eavesdroppers see only scrambled ciphertext.
Integrity — if anyone tampers with the data, it's detected.
Authentication — you're really talking to the right server, proven by its certificate.

A server presents an X.509 certificate — a document binding its domain name to a public key, signed by a Certificate Authority (CA) that browsers already trust. This forms a chain of trust up to a "root" CA stored in your operating system. Let's Encrypt made these certificates free and automatic.

TLS 1.3 handshake (1 round trip):

Client --ClientHello: ciphers + key share-->        Server
Client <--ServerHello: cert + key share, encrypt now-- Server
        both derive the SAME session key (ECDHE),
        then application data flows. Done in 1 RTT.

The key idea: slow asymmetric crypto (public/private keys) is used only to agree on a shared secret key. Then fast symmetric crypto (AES-GCM, ChaCha20) encrypts the actual bulk data. TLS 1.3 finishes the handshake in one round trip (TLS 1.2 needed two). It also requires forward secrecy (via ephemeral ECDHE keys): even if someone later steals the server's long-term key, they can't decrypt traffic they captured in the past. TLS 1.3 also offers 0-RTT resumption for returning clients (even faster), but it's replay-vulnerable, so use it only for idempotent requests.

Common mistake: Assuming HTTPS makes a request safe to retry. TLS is about encryption and authentication, not retry safety. Idempotency is a separate, method-level property.

6. Latency vs throughput vs bandwidth

These three are constantly confused:

Latency — time for one trip (delay), measured in milliseconds.
Throughput — actual work done per second (e.g. requests/sec achieved).
Bandwidth — the maximum capacity of the pipe (bits/sec).

Analogy: A highway's number of lanes = bandwidth. How long your single car takes end-to-end = latency. Cars arriving per minute = throughput. Adding lanes never makes your drive shorter. Likewise, a cargo ship full of hard drives has huge bandwidth but terrible latency (days to arrive); a tiny ping packet has almost no bandwidth but arrives in milliseconds.

Latency has a hard floor: the speed of light is ~1ms per ~100km over fiber, so a cross-continent round trip is naturally ~150ms. A rough order-of-magnitude "latency canon" every engineer should feel:

Operation	Rough time
Main memory read	~100 nanoseconds
Same-datacenter round trip	~0.5 ms
SSD random read	~16–20 microseconds
Cross-continent round trip	~150 ms

Key takeaway: A remote network call is roughly a million times slower than reading from memory. So the winning strategy is always: minimize round trips, batch requests together, cache aggressively, and move data physically closer to users.

Common mistake: Trying to fix a slow, round-trip-heavy app by "buying more bandwidth." A fatter pipe doesn't shorten the trip. The fix is fewer round trips, caching, and proximity.

7. Caching everywhere

Caching means storing a copy of a result closer to whoever needs it, so you avoid redoing the work. It exists at every layer of the stack:

User
  -> [Browser cache]        governed by Cache-Control: max-age
  -> [CDN edge PoP]         governed by Cache-Control: s-maxage
  -> [Reverse proxy/Varnish]
  -> [App + Redis cache]
  -> [DB buffer pool]

HTTP gives you headers to control caching:

Cache-Control: max-age=N — fresh for N seconds.
s-maxage — overrides max-age, but only for shared caches like a CDN.
no-cache — may store, but must revalidate before using.
no-store — never store at all (for private/sensitive data).
private — browser only, never a shared cache; public — anyone may cache.

When a cached copy might be stale, the cache revalidates using a validator. An ETag is a fingerprint of the content. The cache asks If-None-Match: "v7"; if nothing changed, the server replies 304 Not Modified with no body — saving bandwidth. The stale-while-revalidate directive serves the old copy instantly while quietly refreshing in the background, hiding the delay from the user.

Best practice: Static files (JS/CSS) → fingerprint the filename (app.a1b2c3.js) and cache forever with max-age=31536000, immutable; a new deploy is simply a new URL. HTML/API responses → short max-age + ETag revalidation. Never cache private, logged-in responses in a shared CDN.

Common mistake: Two opposite errors. (1) Slapping no-store everywhere out of fear, destroying performance. (2) Caching a logged-in user's private response in a shared CDN — leaking one user's data to everyone. Cache invalidation is famously hard; stale cache is the #1 cause of "I deployed but users still see the old version." Plan TTLs, ETags, and purge strategy up front.

8. Load balancers

A load balancer (LB) spreads incoming traffic across many backend servers. This gives you scale (more servers = more capacity) and high availability (route around a dead server).

The big distinction is L4 vs L7 (the OSI layer numbers):

	L4 (transport)	L7 (application)
Routes by	IP + port only	URL path, host, header, cookie
Reads payload?	No (payload-blind)	Yes (content-aware)
Speed	Very fast, low CPU	Smarter, more CPU
Can it terminate TLS?	No	Yes
Example	AWS NLB	AWS ALB, Nginx, Envoy

Balancing algorithms: Round Robin (cycle through servers evenly — fine when requests are uniform); Weighted (bigger servers get more); Least Connections (send to the server with the fewest active requests — best when request durations vary); IP Hash / Consistent Hashing (same client always lands on the same server, with minimal reshuffling when servers are added/removed — vital for sharded caches).

Health checks ensure the LB only sends traffic to healthy servers. Active checks probe on a schedule ("does GET /health return 200?"). Passive checks watch real traffic for errors. Use both. Sticky sessions pin a client to one server so its session state stays reachable — but this unbalances load and breaks when that server dies.

Common mistake: Using round robin when request costs are wildly uneven (some take 50ms, some take 5s) — slow requests pile up unevenly. Use least connections instead. And don't lean on sticky sessions as your scaling plan; prefer stateless servers + shared session storage (Redis) or tokens, so any server can serve any request.

9. Reverse proxies, CDNs & the edge

A forward proxy sits in front of clients (e.g. a company's outbound gateway). A reverse proxy sits in front of servers (Nginx, HAProxy, Envoy). The reverse proxy is your single "front door": it terminates TLS, caches, compresses, routes requests, load-balances, and filters attacks. Clients never touch your app servers directly.

A CDN (Content Delivery Network — Cloudflare, Fastly, CloudFront) is a globally distributed network of reverse-proxy caches. Users connect to the nearest edge PoP ("point of presence" — a server cluster near them). A cache hit is served straight from the edge; a miss is fetched from your origin (your main server), cached, then served. Benefits: lower latency (content is physically near users), less load on your origin, DDoS absorption, and TLS handled at the edge. "Edge computing" pushes not just caching but actual code (edge functions) out to those PoPs so even dynamic logic runs close to users.

10. Rate limiting, timeouts, retries & backoff

Rate limiting caps how many requests a client may make in a time window. It protects you from abuse, buggy clients, and overload. Over-limit requests should return 429 Too Many Requests with a Retry-After header telling the client when to try again.

TOKEN BUCKET (allows bursts):
  +1 token / 100ms refills the bucket
  10 saved tokens -> a burst of 10 passes instantly,
  then throttles to the refill rate.

LEAKY BUCKET (smooths bursts):
  pour in fast -> drips out exactly 1 / 100ms,
  a steady constant stream no matter the input.

Other algorithms: Fixed Window (count per clock window — simple, but allows a 2x burst at the boundary), Sliding Window (smooths that boundary problem). Token bucket allows controlled bursts and is the most common API choice.

Timeouts are mandatory. Never wait forever — every network call needs a deadline. Without one, a single hung dependency can use up all your threads/connections and take the whole system down.

Retries must be careful. Only retry idempotent requests (or those with an Idempotency-Key), and only on transient errors (timeouts, 502/503/504, 429-with-Retry-After) — never on permanent client errors like 400/401/404. Use exponential backoff (wait 1s, 2s, 4s, 8s…) plus jitter (randomize the delay).

THUNDERING HERD (no jitter):
  1000 clients fail at t=0, ALL retry at t=1,2,4...
  -> each wave re-crashes the recovering server

WITH JITTER:
  each client picks a random delay in [0, backoff]
  -> retries spread out, the server recovers

Common mistake: Retrying with a fixed interval and no jitter. Thousands of clients then retry in perfect lockstep — a thundering herd that re-crashes the very service it's waiting on. Always use exponential backoff + jitter, plus a retry cap and a circuit breaker (after N failures, stop calling for a cooling-off period).

Best practice: Return the right signal. When you reject for rate limiting, send 429 with Retry-After — not a generic 500 or a silent drop. Well-behaved clients then know exactly when to come back.

Key takeaways:

HTTP is a stateless, client-driven request/response protocol; statelessness is exactly what lets you scale horizontally by adding identical servers.
Know your status codes cold — 401 (who are you?) vs 403 (you can't do that), and 502 (bad upstream answer) vs 504 (upstream timed out).
HTTP/2 fixed application-layer head-of-line blocking via multiplexing; HTTP/3 over QUIC (UDP) fixes the remaining transport-layer blocking and adds connection migration.
TLS uses slow asymmetric crypto only to agree a key, then fast symmetric crypto for the data; TLS 1.3 is a 1-RTT handshake with mandatory forward secrecy.
Latency, throughput, and bandwidth are different things — a remote call is ~1,000,000× a memory read, so cache at every layer and minimize round trips.
Only retry idempotent requests on transient errors, with exponential backoff + jitter and a circuit breaker, to avoid double-charges and thundering herds.

Continue reading

⚙️ Systems Fundamentals

The Big Picture: Why Systems Fundamentals Are Durable

⚙️ Systems Fundamentals

How a Computer Runs Your Program: CPU, Memory, Processes & Threads

⚙️ Systems Fundamentals

Concurrency & Parallelism: Doing Many Things at Once