Security Testing & Auditing
Security testing is the discipline of finding the holes in a system before attackers do. As a working engineer you already test code for correctness; security testing is the same instinct pointed at a different question: not "does this work as intended?" but "what can someone make this do that I never intended?" The tools and the specific bugs go stale fast — last year's hot exploit is this year's footnote. What stays valuable for your whole career is the methodical process: systematically cover the whole attack surface, follow a repeatable methodology, prioritise what you find by real risk, and document it so it can be fixed and re-verified.
Why it matters in money terms: the IBM Cost of a Data Breach 2025 report puts the global average breach at $4.44M (down 9% from $4.88M — the first decline in five years, driven by faster AI-assisted detection; mean time to identify and contain a breach fell to 241 days, the lowest in nine years). The US average hit a record $10.22M. A new 2025 angle: "shadow AI" (AI tools used inside a company without approval) added about $670K to the average breach; 97% of organisations that had an AI-related incident lacked proper AI access controls, and 63% had no AI governance policy at all.
8.1 Automated testing: SAST, DAST, IAST, SCA
Four automated test types each look at the system from a different angle. None is enough alone — they are complementary layers. First, the plain-word definitions:
- SAST — Static Application Security Testing
- Reads your source code without running it ("white-box," inside-out). Like proofreading a recipe for dangerous steps before you ever turn on the stove.
- DAST — Dynamic Application Security Testing
- Attacks the running app from the outside with no source code ("black-box," outside-in). Like a mystery shopper rattling every door of the finished store.
- IAST — Interactive Application Security Testing
- Puts an agent (a small monitor) inside the running app so it watches the code execute during normal tests ("grey-box," a blend of the two above).
- SCA — Software Composition Analysis
- Inventories the third-party / open-source libraries you depend on and matches them against known-vulnerability databases. Like checking every packaged ingredient against a product-recall list.
| Type | What it does | When | Strengths | Limits |
|---|---|---|---|---|
| SAST | Analyses source/bytecode statically | Earliest — IDE, commit, CI ("shift left") | Full code coverage; finds hardcoded secrets, SQLi/XSS sinks; exact file + line; pre-deploy | High false positives; blind to runtime, config & auth issues; language-dependent |
| DAST | Attacks the deployed, running app | Staging / QA / pre-prod | Finds what is actually exploitable at runtime; language-agnostic; lower false positives | Late in cycle; no line-of-code pinpoint; misses code paths it never triggers |
| IAST | Agent watches code run during tests | In CI/CD with functional/DAST tests | Very low false positives (the vuln actually ran); real-time; pinpoints code | Only covers code your tests exercise; runtime overhead; uneven language support |
| SCA | Scans dependencies vs CVE & license data | Every build | Catches known CVEs in libraries (Log4Shell-class); license risk; builds an SBOM | Only known vulns in known components; nothing in your own logic |
The clean mental model: SAST = your code, statically. SCA = other people's code. DAST = the running app from outside. IAST = the running app from inside. Modern apps are 70–90% open-source code, so SCA is not optional — you need an SBOM (Software Bill of Materials — a list of every component you ship) so that the day a Log4Shell-class bug drops, you can answer "are we affected by CVE-X?" in minutes, not weeks. The umbrella framework that recommends this layered SDLC testing is NIST's SSDF (Secure Software Development Framework, SP 800-218; with 800-218A covering generative AI).
8.2 Manual code & security review — where humans beat tools
Automated scanners find patterns. Humans find intent and context. The biggest, costliest bugs are usually business-logic and authorization flaws that no scanner flags, because the code is syntactically perfect:
- A discount coupon that can stack on itself infinitely, dropping the price to zero.
- An "approve your own refund" path — the workflow lets the requester also be the approver.
- IDOR (Insecure Direct Object Reference): you load
/order/123, change it to/order/124, and see someone else's order because the app checks that you are logged in but never checks that the order is yours.
8.3 Penetration testing, red/blue/purple, bug bounties
A penetration test goes beyond a vulnerability scan: it actually exploits flaws to prove real impact, and chains several low-severity issues into one critical breach. Pentests are classified by how much the tester is told up front:
| Knowledge level | Tester gets | Simulates | Trade-off |
|---|---|---|---|
| Black-box | Nothing (no source, no creds) | External attacker | Most realistic; may miss deep flaws in limited time |
| Grey-box | Partial (e.g. a normal user login) | Logged-in customer / phished employee | The common, cost-effective middle ground |
| White-box | Full source, architecture, creds | Malicious insider / worst case | Most thorough; best coverage per dollar |
The "team colours" describe roles, not just tests:
- Red team (offense)
- Emulates a real adversary's full kill chain — phishing, social engineering, even physical entry — usually stealthy and goal-oriented ("can we reach the crown-jewel data?"). Broader and longer than a pentest. A pentest finds as many vulns in a scope as possible; a red team tests whether your detection and response actually work against a realistic attacker.
- Blue team (defense)
- The defenders: the SOC (Security Operations Centre) that monitors, detects, responds, and hardens.
- Purple team
- Not a separate team — a collaboration mode where red and blue work together in real time, so every attack technique immediately teaches the defenders. It closes the feedback loop: knowledge transfer, not just a pass/fail scorecard.
Bug bounties / VDP (Vulnerability Disclosure Programs) are crowdsourced, pay-for-results testing via platforms like HackerOne and Bugcrowd. 2025 numbers: HackerOne paid $81M in bounties (+13% year-over-year); the average active program pays ~$42K/yr; the top 10 programs alone paid $21.6M. AI is exploding here — AI vulnerability reports up 210%+, prompt-injection reports up 540%, 1,121 programs now include AI in scope (+270% YoY), and 70% of researchers now use AI tools ("bionic hackers"). Bounties give continuous coverage from many eyes but with unpredictable scope and quality — they complement, they don't replace, scheduled testing.
8.4 Fuzzing
Fuzzing means feeding a program huge volumes of malformed, random, or unexpected input to trigger crashes, memory corruption, or hangs — automated edge-case discovery. Coverage-guided fuzzers (libFuzzer, AFL++, Honggfuzz, Centipede) mutate inputs to reach new code paths, and pair with sanitizers (e.g. ASan, AddressSanitizer) that catch memory bugs the moment they happen.
oss-fuzz-gen auto-writes fuzz targets and found 30+ bugs reachable only through those new targets. Honest limit: most OSS-Fuzz projects still hit only ~30% runtime coverage, so ~70% of code stays unfuzzed — proof that even great automation leaves gaps a human must reason about.8.5 The vulnerability management lifecycle & prioritisation
Security testing is continuous: discover → prioritise → remediate → verify, then loop. Prioritisation is the crux — you can't fix everything, so combine three signals:
DISCOVER ── SAST/DAST/SCA/IAST/pentest/fuzz/bounty
|
v
PRIORITISE (combine 3 signals)
+-----------+-----------+------------------+
| CVSS | EPSS | CISA KEV |
| severity | probability| confirmed |
| 0-10 | 0-1 (30d) | exploitation |
| "how bad?"| "how likely?"| "is it real?" |
+-----------+-----------+------------------+
|
v
Layer 1: on KEV? -> fix immediately
Layer 2: EPSS > 0.5? -> out-of-cycle patch
Layer 3: everything else-> SSVC tree (schedule/defer)
|
v
REMEDIATE ──► VERIFY (retest/rescan) ──► loop
- CVSS (Common Vulnerability Scoring System)
- Theoretical severity, 0–10. Its trap: severity ≠ risk. "Fix all criticals first" drowns teams in high-CVSS vulns that nobody is actually exploiting.
- EPSS (Exploit Prediction Scoring System)
- A data-driven probability (0–1) that a vuln will be exploited in the next 30 days. Answers "how likely?"
- CISA KEV (Known Exploited Vulnerabilities catalog)
- Confirmed real-world exploitation — ground truth. As of 2026 it lists 1,200+ entries with US federal patch deadlines (BOD 22-01) and is the industry's de-facto "fix-now" list. In 2025, 884 KEVs were added and 28.96% were exploited on or before disclosure day (up from 23.6% in 2024) — near-instant weaponisation.
- SSVC (Stakeholder-Specific Vulnerability Categorization)
- A decision tree that turns the signals above into an action (Act / Track / etc.) for your specific context.
Then verify: retest or rescan to confirm the fix worked and caused no regression. Why speed matters, in hard numbers: Veracode 2025 says the average flaw now takes 252 days to fix (+47% since 2020); Edgescan puts critical-app MTTR around 74 days; the Verizon 2025 DBIR reports vulnerability exploitation is now ~20% of breaches (the 2nd most common entry path after stolen credentials, +34% YoY), with edge/VPN device exploitation up ~8× (3%→22%) — and the median time from an edge-device CVE being published to mass exploitation is zero days.
8.6 Audits: security assessment vs compliance audit
Two different things people constantly conflate:
| Technical security assessment | Compliance / security audit | |
|---|---|---|
| Question | "Can this be broken, and how?" | "Do you have the right controls & processes, documented and operating?" |
| Style | Depth, adversarial, finds specific exploitable flaws | Breadth, governance, attests to a standard |
| Examples | Pentest, vuln assessment, code review | SOC 2, ISO 27001 |
- SOC 2
- An attestation report written by a CPA firm against the AICPA Trust Services Criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy). Type 1 = controls at a single point in time (~45 days); Type 2 = controls operating over a period (~2–6 months). It doesn't literally mandate a pentest, but auditors expect technical testing to satisfy the monitoring criteria.
- ISO 27001
- An international certification of your ISMS (Information Security Management System) — a risk assessment plus the Annex A controls. Certification can take ~2 months to 2+ years. In 2025, both auditors and cyber insurers effectively expect regular pentests under the vuln-management / secure-development Annex A controls.
Frameworks worth knowing: OWASP Top 10 (the 2025 edition adds supply-chain and "Mishandling Exceptional Conditions" categories and elevates Security Misconfiguration); OWASP WSTG (Web Security Testing Guide, stable v4.2) is the technique reference; OWASP ASVS is the verification standard; CVSS is the shared severity language. On the regulatory side, the EU AI Act (in force since Aug 2024; prohibited-practice and AI-literacy rules live since Feb 2025; full enforcement and fines from Aug 2026) means security testing now carries an AI-governance dimension too.
8.7 What a good audit / report looks like
A report is the product of the whole discipline. A finding with no reproduction steps and no recommended fix is useless. A good report has:
- Clear scope & methodology — what was tested, how, with what access, against which standard, so it's repeatable.
- Executive summary in plain business language for non-technical leaders.
- Each finding with a severity/risk rating, concrete evidence (proof-of-concept), the business impact, and step-by-step remediation — actionable, never "be more secure."
- Prioritisation so the team knows what to fix first.
- Retest/verification of the fixes.
tests/Feature/Security/CrossTenantIsolationTest.php is a repeatable test that proves the cross-tenant flaw. A finding here isn't "done" until there's a test that reproduces it before the fix and passes after.Common mistakes
- Calling a scanner run a "pentest" — scanning finds known patterns; a pentest exploits and chains them.
- "All criticals first" instead of risk-based KEV/EPSS prioritisation, so the real exploited bugs get buried.
- Checking the compliance box but never doing real adversarial testing.
- Ignoring business-logic and authorization flaws because the tools stay green.
- Testing once a year instead of continuously, while attackers test daily.
- Never verifying fixes — patches get re-introduced, reverted, or applied incompletely.
- Drowning developers in false positives from an un-tuned SAST until they ignore the tool entirely.
- No SBOM, so you can't answer "are we affected by CVE-X?" the day a Log4Shell-class bug drops.
Best practices
- Layer the automated tools (SAST + SCA in CI, DAST/IAST in staging) — none is sufficient alone.
- Shift left: run SAST and SCA on every commit; keep an SBOM continuously updated.
- Add human code review for logic and authorization — the bugs scanners can't reason about.
- Prioritise with the layered rule: on KEV → fix now; EPSS > 0.5 → out-of-cycle; rest → SSVC tree.
- Always close the loop with a verification retest, and capture it as a repeatable test.
- Do real adversarial testing and pursue compliance — they answer different questions.
- Write reports leaders can act on: scope, plain-language summary, evidence, impact, fix, priority.
- Govern AI explicitly — access controls and a policy — before "shadow AI" becomes your $670K line item.