Network, Cloud & Infrastructure Security
Application security protects the code. This section protects where the code runs — the network it talks over, the cloud account that hosts it, the operating system underneath, the container cluster, and the build pipeline that produced it. The hard lesson of 2025-26 is humbling: attackers almost never break the encryption. They walk in through a misconfigured setting, a leaked password, or a poisoned dependency. Gartner's much-quoted line still holds — through 2025-26, roughly 99% of cloud security failures are the customer's fault, not the provider's. And IBM's 2025 Cost of a Data Breach report puts the global average breach at USD 4.44 million, with credential-based breaches taking about 292 days to detect and contain.
6.1 Network security: defense in depth
Defense in depth means using many overlapping layers so that one failure does not expose everything — like the layers of an onion, or watertight bulkheads in a ship's hull where one flooded compartment doesn't sink the vessel.
- Firewall — a gatekeeper that decides what traffic may cross a boundary.
- It evolved through three generations. Packet-filtering firewalls look only at the IP address and port number. Stateful firewalls remember the state of a connection (e.g. "this is a reply to a request we sent out"). A Next-Generation Firewall (NGFW) does deep packet inspection and is application-aware — it can tell "this is HTTP traffic to Facebook," not merely "this is port 443." The golden rule everywhere is default-deny: block everything, then explicitly allow only what is needed.
- Network segmentation / VLANs — splitting one flat network into isolated zones.
- A VLAN (Virtual LAN) is a logical division of a network so that machines in one zone can't freely reach another. A DMZ (demilitarized zone) is a buffer area where internet-facing servers live, kept away from the internal "crown jewels" like the database. The canonical teaching case is the 2013 Target breach: attackers stole credentials from an HVAC (air-conditioning) vendor, got onto Target's network, and reached the point-of-sale systems because the vendor network was not segmented from the payment network. Microsegmentation pushes this idea down to a per-workload level — the foundation of zero trust.
- VPN vs ZTNA — remote access models.
- A VPN (Virtual Private Network) builds an encrypted tunnel so a remote worker appears "inside" the network — but once inside, they often get broad access. ZTNA (Zero Trust Network Access) instead grants access to one specific application at a time. VPN appliances themselves became prime targets — Ivanti, Fortinet, and Citrix VPN gateways had serious exploited CVEs through 2023-25.
- IDS vs IPS — intrusion detection vs prevention.
- An IDS (Intrusion Detection System) watches and alerts but does not block (passive). An IPS (Intrusion Prevention System) sits inline and can block the traffic. They can be network-level (NIDS) or host-level (HIDS). Detection is either signature-based (matches known bad patterns, like antivirus) or anomaly/behavior-based (flags unusual deviations — catches novel attacks but produces more false alarms).
- WAF — Web Application Firewall.
- A Layer-7 filter built specifically for web apps; it blocks SQL injection, cross-site scripting (XSS), path traversal, and bad bots. The OWASP Core Rule Set (CRS) is the standard ruleset. Crucially, a WAF is a compensating control, not a cure — it buys time, but you still must patch the vulnerable code.
DDoS protection — the 2025 escalation
A DDoS (Distributed Denial of Service) attack floods a target with junk traffic from many machines to knock it offline. The 2025 numbers are a dramatic teaching hook. Cloudflare mitigated a record 31.4 Tbps attack in November 2025 (lasting only ~35 seconds) — over 700% larger than late-2024 records. The ladder climbed all year: 7.3 → 11.5 → 22.2 → 31.4 Tbps, driven largely by the Aisuru/Kimwolf IoT botnet. Q1 2025 alone saw 20.5 million attacks, up 358% year-over-year. You cannot absorb a hyper-volumetric flood from a single origin server — you need an always-on, distributed scrubbing network at the edge in front of you.
6.2 TLS everywhere and mTLS
TLS (Transport Layer Security) encrypts data in transit (as opposed to "at rest" on disk). TLS 1.3 (RFC 8446) is the current standard — a faster one-round-trip handshake, legacy weak ciphers removed, and more of the handshake itself encrypted. Disable TLS 1.0/1.1 and SSL entirely. PCI DSS v4.0 (mandatory since 31 March 2025) requires TLS 1.2 minimum, with 1.3 increasingly mandated; NIST SP 800-52 Rev 2 is the federal baseline.
Ordinary TLS only proves the server's identity (you verify your bank's certificate; the bank does not verify yours). mTLS (mutual TLS) makes both sides present certificates, so each end proves who it is. This is the backbone of zero-trust service-to-service communication inside microservices and service meshes (Istio, Linkerd): every service proves its identity cryptographically, with no implicit trust just because a request came from "inside the network." That is exactly the principle of NIST SP 800-207 (Zero Trust Architecture): "never trust, always verify."
6.3 The cloud shared-responsibility model
The single most misunderstood idea in cloud security. The provider secures the cloud ("security OF the cloud" — physical data centers, the hypervisor, managed-service infrastructure). The customer secures their use of it ("security IN the cloud" — their data, identity, network config, OS patching, and app code). Where the line falls depends on the service model.
| Model | Provider handles | You (customer) handle |
|---|---|---|
| IaaS (e.g. EC2 VM) | Physical, hypervisor, network fabric | OS patching, runtime, app, data, IAM, network config |
| PaaS (e.g. managed DB / app platform) | + OS, runtime, middleware | App code, data, access & config |
| SaaS (e.g. email, CRM) | Almost everything | Data classification, user access, config settings |
The trap: people assume "the cloud is secure" and skip their half. In 2025, misconfigurations caused about 23% of cloud incidents, and 82% of those were human error — not provider flaws. Roughly 15% of all breaches trace back to cloud misconfiguration, on par with phishing.
6.4 Cloud IAM, over-permissive policies & public buckets
In the cloud, identity is the new perimeter — IAM (Identity and Access Management) is the real control plane, not the network. The guiding rule is least privilege: grant the minimum access needed. Real-world rot includes wildcard policies ("Action": "*", "Resource": "*" — meaning "anyone can do anything to everything"), unused over-privileged roles, long-lived static access keys, and no MFA. Prefer roles with short-lived, automatically-expiring credentials (AWS STS, OIDC federation, workload identity) over static keys.
Public S3 buckets are the iconic cloud failure. An S3 bucket is a cloud storage folder; if it is set public-readable, anyone with the URL downloads everything — no exploit, no skill required. 2025 incidents: a public bucket exposed 273,000+ Indian bank-transfer/debit-mandate PDFs (names, addresses, phone numbers, account numbers, IFSC codes) in Aug-Sep 2025; the Codefinger ransomware campaign (Jan 2025) abused leaked AWS keys to encrypt victims' buckets with SSE-C and demand ransom; and one exposed server leaked over 158 million AWS secret-key records, 1,229 of them still active. The classic case is Capital One (2019): an SSRF flaw plus an over-permissive IAM role pulled 100M+ records out of S3. AWS now turns on "Block Public Access" by default precisely because of this history.
Internet
|
[ DDoS scrubbing / CDN edge ] <- absorbs volumetric floods
|
[ WAF ] (L7: blocks SQLi/XSS/bots)
|
[ NGFW / default-deny ]
|
+------------- DMZ (public zone) -------------+
| web servers --- mTLS --- app services |
+----------------------------------------------+
| (segmented; default-deny network policy)
+------------- private zone -------------------+
| database secrets vault IAM (least priv) |
+----------------------------------------------+
6.5 Secrets management vs hardcoding
Never hardcode secrets (API keys, passwords, tokens) in source code or commit a .env file to git. Once committed, assume it is burned forever — git history persists even after you "delete" it. Instead use a secrets manager — HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager. These give you centralized, encrypted, access-controlled, audited storage plus dynamic short-lived secrets and automatic rotation.
The 2025-26 scale is alarming (GitGuardian State of Secrets Sprawl): 23.8M secrets leaked on public GitHub in 2024 (+25% YoY), and 28.65M new ones in 2025 (+34%). A remediation crisis: 70% of leaked secrets are still active two years later, and 35% of private repos contain plaintext secrets. AI-assisted commits leak secrets at ~3.2% — about double the baseline (a fresh "vibe-coding" era risk). Real impact: the December 2024 US Treasury breach traced to a single leaked BeyondTrust API key — attackers bypassed millions in security spend through one exposed credential.
6.6 Container & Kubernetes security
Container images (the packaged, runnable bundles of your app) should be scanned for known vulnerabilities (CVEs) — Trivy is the de-facto tool (it also scans IaC, secrets, and Kubernetes); Grype and Snyk are alternatives. Use minimal or distroless base images (less code = smaller attack surface), pin to a content digest rather than the moving :latest tag, never bake secrets into image layers, run as a non-root user, drop unneeded Linux capabilities, and use a read-only root filesystem. Sign images (cosign / Sigstore) and verify the signature at admission time.
Kubernetes (the cluster orchestrator) needs: RBAC least-privilege (avoid handing out cluster-admin), Pod Security Standards at the restricted level for production (PSS replaced the removed PodSecurityPolicy), and NetworkPolicies with a default-deny stance so pods can't freely talk to each other — the cluster equivalent of network segmentation that limits an attacker's lateral movement. Protect the API server, encrypt etcd (the cluster's data store) at rest, and lock down the kubelet. The CIS Kubernetes Benchmark gives 100+ checks across Level 1 (baseline) and Level 2 (stricter). Admission controllers like OPA/Gatekeeper or Kyverno enforce these policies at deploy time, before bad config reaches the cluster.
6.7 Infrastructure as Code (IaC) security
IaC means your infrastructure is defined in code files (Terraform, CloudFormation, Helm, Pulumi, Ansible). That is powerful, because it lets you shift security left — scan for misconfigurations before deployment instead of discovering them in production. Tools: Checkov (1,000+ policies, graph-based cross-resource checks mapped to CIS/SOC2/HIPAA/PCI), Trivy (successor to tfsec), KICS, and Terrascan. They catch public buckets, wide-open security groups, unencrypted volumes, and missing logging. Watch two extra traps: secrets leak into the Terraform tfstate file in plaintext (store it remotely and encrypted), and configuration drift (live infrastructure diverging from the code). The payoff is huge — one fix in the template fixes every future deployment.
6.8 Supply-chain & dependency security — the marquee theme
Modern attackers compromise what you build on, not your code directly. Two incidents every engineer should know:
- SolarWinds / SUNBURST (2020) — the Russian group APT29 (Cozy Bear) compromised the build system and injected a backdoor into signed Orion software updates. About 18,000 organizations downloaded the trojanized update; fewer than 100 were actively exploited (including US DHS, State, Commerce, Treasury, and FireEye). The lesson: a signed artifact is only as trustworthy as the pipeline that built it — this birthed build provenance and SLSA.
- xz-utils backdoor / CVE-2024-3094 (March 2024, CVSS 10.0) — a maintainer persona "Jia Tan" spent ~2.6 years earning trust on the tiny but ubiquitous open-source xz/liblzma compression project, then slipped in a backdoor (only ~8 malicious commits) targeting SSHD for pre-auth remote code execution. It was caught by accident by Microsoft engineer Andres Freund, who noticed SSH logins burning ~0.5s extra CPU. It hit rolling distros (Debian Sid, Fedora Rawhide, Kali, Arch) before stable releases shipped it. The lesson: human trust and maintainer burnout are an attack vector — the "biggest supply-chain near-miss since Log4j."
Scale in 2025: supply-chain attacks more than doubled, over 70% of organizations reported at least one, Sonatype found 454,600+ new malicious packages, and the global cost is around $60B. Recent classes include typosquatting/dependency-confusion on npm and PyPI, the self-propagating npm "Shai-Hulud" worm, and tj-actions/GitHub Actions compromises.
The modern defense triad: a SBOM (Software Bill of Materials — a full ingredient list of every dependency, version, license, and checksum, in SPDX or CycloneDX format; mandated by US EO 14028 and the EU Cyber Resilience Act); signed artifacts via Sigstore (keyless signing using short-lived Fulcio certs tied to OIDC identity, with the Rekor transparency log) plus cosign for images; and SLSA (Supply-chain Levels for Software Artifacts — graduated L1→L3+ proving tamper-resistant build provenance). Add: pin versions, verify checksums, harden and isolate the build, and minimize your trusted base.
6.9 Hardening, patching & CIS benchmarks
Hardening means reducing attack surface: disable unused services and ports, remove default accounts and passwords, enforce least privilege, enable logging. CIS Benchmarks are consensus hardening guides for ~100 technologies (operating systems, cloud, Kubernetes, databases) with Level 1 (safe baseline) and Level 2 (high-security, may break things) profiles; CIS Hardened Images ship pre-configured. DISA STIGs are the US Department of Defense equivalent.
Patch management matters because known, already-fixed CVEs cause most breaches. The textbook case is Equifax (2017): an unpatched Apache Struts flaw leaked 147M records. Prioritize patches using the CVSS severity score, the CISA KEV catalog (Known Exploited Vulnerabilities — these are being attacked right now, patch first), and EPSS (a probability score for likely exploitation). Automate it and track mean-time-to-patch.
Common mistakes
- Assuming "cloud = secure by default" and skipping your half of the shared-responsibility model.
- Leaving S3/storage buckets public, or using wildcard IAM with long-lived keys and no MFA.
- Committing secrets to git/
.env/CI logs, then "deleting" them (history persists — they're burned). - Running containers as root, from
:latest, with secrets baked into layers. - Flat networks — no segmentation, no Kubernetes NetworkPolicies (lets attackers move sideways).
- Trusting a signed artifact without verifying the build chain that made it; ignoring transitive dependencies.
- Never rotating credentials; deploying IaC without scanning it first.
Best practices
- Least privilege everywhere; default-deny on networks and firewalls.
- Short-lived credentials over static keys; MFA on every privileged identity.
- Encrypt in transit (TLS 1.3 / mTLS) and at rest.
- Shift security left — IaC, image, and secret scanning in CI before deploy.
- SBOM + artifact signing (Sigstore/cosign) + build provenance (SLSA).
- Harden to CIS Benchmarks; patch by KEV/EPSS priority, automated.
- Assume breach and segment aggressively to limit blast radius.