Topics / 🔒Security & Privacy Engineering

Security Testing & Auditing

By Pritesh Yadav June 21, 2026 12 min read —

Security testing is the discipline of finding the holes in a system before attackers do. As a working engineer you already test code for correctness; security testing is the same instinct pointed at a different question: not "does this work as intended?" but "what can someone make this do that I never intended?" The tools and the specific bugs go stale fast — last year's hot exploit is this year's footnote. What stays valuable for your whole career is the methodical process: systematically cover the whole attack surface, follow a repeatable methodology, prioritise what you find by real risk, and document it so it can be fixed and re-verified.

Analogy: A burglar versus a building inspector. The burglar only needs to find one unlocked window. The inspector methodically checks every door, window, lock, and wire, then writes a report the owner can act on. Attackers think like burglars; your job is to think like the inspector — complete coverage and an actionable report beat one clever trick.

Why it matters in money terms: the IBM Cost of a Data Breach 2025 report puts the global average breach at $4.44M (down 9% from $4.88M — the first decline in five years, driven by faster AI-assisted detection; mean time to identify and contain a breach fell to 241 days, the lowest in nine years). The US average hit a record $10.22M. A new 2025 angle: "shadow AI" (AI tools used inside a company without approval) added about $670K to the average breach; 97% of organisations that had an AI-related incident lacked proper AI access controls, and 63% had no AI governance policy at all.

8.1 Automated testing: SAST, DAST, IAST, SCA

Four automated test types each look at the system from a different angle. None is enough alone — they are complementary layers. First, the plain-word definitions:

SAST — Static Application Security Testing: Reads your source code without running it ("white-box," inside-out). Like proofreading a recipe for dangerous steps before you ever turn on the stove.
DAST — Dynamic Application Security Testing: Attacks the running app from the outside with no source code ("black-box," outside-in). Like a mystery shopper rattling every door of the finished store.
IAST — Interactive Application Security Testing: Puts an agent (a small monitor) inside the running app so it watches the code execute during normal tests ("grey-box," a blend of the two above).
SCA — Software Composition Analysis: Inventories the third-party / open-source libraries you depend on and matches them against known-vulnerability databases. Like checking every packaged ingredient against a product-recall list.

Type	What it does	When	Strengths	Limits
SAST	Analyses source/bytecode statically	Earliest — IDE, commit, CI ("shift left")	Full code coverage; finds hardcoded secrets, SQLi/XSS sinks; exact file + line; pre-deploy	High false positives; blind to runtime, config & auth issues; language-dependent
DAST	Attacks the deployed, running app	Staging / QA / pre-prod	Finds what is actually exploitable at runtime; language-agnostic; lower false positives	Late in cycle; no line-of-code pinpoint; misses code paths it never triggers
IAST	Agent watches code run during tests	In CI/CD with functional/DAST tests	Very low false positives (the vuln actually ran); real-time; pinpoints code	Only covers code your tests exercise; runtime overhead; uneven language support
SCA	Scans dependencies vs CVE & license data	Every build	Catches known CVEs in libraries (Log4Shell-class); license risk; builds an SBOM	Only known vulns in known components; nothing in your own logic

The clean mental model: SAST = your code, statically. SCA = other people's code. DAST = the running app from outside. IAST = the running app from inside. Modern apps are 70–90% open-source code, so SCA is not optional — you need an SBOM (Software Bill of Materials — a list of every component you ship) so that the day a Log4Shell-class bug drops, you can answer "are we affected by CVE-X?" in minutes, not weeks. The umbrella framework that recommends this layered SDLC testing is NIST's SSDF (Secure Software Development Framework, SP 800-218; with 800-218A covering generative AI).

Best practice: "RASP" (Runtime Application Self-Protection) is sometimes listed with these but it is a production defence, not a test — it blocks attacks live in production rather than finding bugs before release. Don't count it as part of your test coverage.

8.2 Manual code & security review — where humans beat tools

Automated scanners find patterns. Humans find intent and context. The biggest, costliest bugs are usually business-logic and authorization flaws that no scanner flags, because the code is syntactically perfect:

A discount coupon that can stack on itself infinitely, dropping the price to zero.
An "approve your own refund" path — the workflow lets the requester also be the approver.
IDOR (Insecure Direct Object Reference): you load /order/123, change it to /order/124, and see someone else's order because the app checks that you are logged in but never checks that the order is yours.

Example: This project's own multi-tenant audit found exactly this class of bug — order/job status endpoints that didn't verify the record belonged to the requesting tenant. A SAST tool stays green on it because nothing is syntactically wrong; only a human reasoning about "who should be allowed to do this?" catches it.

8.3 Penetration testing, red/blue/purple, bug bounties

A penetration test goes beyond a vulnerability scan: it actually exploits flaws to prove real impact, and chains several low-severity issues into one critical breach. Pentests are classified by how much the tester is told up front:

Knowledge level	Tester gets	Simulates	Trade-off
Black-box	Nothing (no source, no creds)	External attacker	Most realistic; may miss deep flaws in limited time
Grey-box	Partial (e.g. a normal user login)	Logged-in customer / phished employee	The common, cost-effective middle ground
White-box	Full source, architecture, creds	Malicious insider / worst case	Most thorough; best coverage per dollar

The "team colours" describe roles, not just tests:

Red team (offense): Emulates a real adversary's full kill chain — phishing, social engineering, even physical entry — usually stealthy and goal-oriented ("can we reach the crown-jewel data?"). Broader and longer than a pentest. A pentest finds as many vulns in a scope as possible; a red team tests whether your detection and response actually work against a realistic attacker.
Blue team (defense): The defenders: the SOC (Security Operations Centre) that monitors, detects, responds, and hardens.
Purple team: Not a separate team — a collaboration mode where red and blue work together in real time, so every attack technique immediately teaches the defenders. It closes the feedback loop: knowledge transfer, not just a pass/fail scorecard.

Bug bounties / VDP (Vulnerability Disclosure Programs) are crowdsourced, pay-for-results testing via platforms like HackerOne and Bugcrowd. 2025 numbers: HackerOne paid $81M in bounties (+13% year-over-year); the average active program pays ~$42K/yr; the top 10 programs alone paid $21.6M. AI is exploding here — AI vulnerability reports up 210%+, prompt-injection reports up 540%, 1,121 programs now include AI in scope (+270% YoY), and 70% of researchers now use AI tools ("bionic hackers"). Bounties give continuous coverage from many eyes but with unpredictable scope and quality — they complement, they don't replace, scheduled testing.

8.4 Fuzzing

Fuzzing means feeding a program huge volumes of malformed, random, or unexpected input to trigger crashes, memory corruption, or hangs — automated edge-case discovery. Coverage-guided fuzzers (libFuzzer, AFL++, Honggfuzz, Centipede) mutate inputs to reach new code paths, and pair with sanitizers (e.g. ASan, AddressSanitizer) that catch memory bugs the moment they happen.

Example: Google's OSS-Fuzz has, as of 2025, helped find and fix 13,000+ vulnerabilities and 50,000+ bugs across 1,000+ open-source projects. The new frontier is LLM-assisted fuzzing — Google's oss-fuzz-gen auto-writes fuzz targets and found 30+ bugs reachable only through those new targets. Honest limit: most OSS-Fuzz projects still hit only ~30% runtime coverage, so ~70% of code stays unfuzzed — proof that even great automation leaves gaps a human must reason about.

8.5 The vulnerability management lifecycle & prioritisation

Security testing is continuous: discover → prioritise → remediate → verify, then loop. Prioritisation is the crux — you can't fix everything, so combine three signals:

            DISCOVER ── SAST/DAST/SCA/IAST/pentest/fuzz/bounty
               |
               v
            PRIORITISE  (combine 3 signals)
   +-----------+-----------+------------------+
   |  CVSS     |   EPSS    |   CISA KEV       |
   | severity  | probability| confirmed       |
   |  0-10     |  0-1 (30d) | exploitation    |
   | "how bad?"| "how likely?"| "is it real?"  |
   +-----------+-----------+------------------+
               |
               v
   Layer 1: on KEV?        -> fix immediately
   Layer 2: EPSS > 0.5?     -> out-of-cycle patch
   Layer 3: everything else-> SSVC tree (schedule/defer)
               |
               v
            REMEDIATE ──► VERIFY (retest/rescan) ──► loop

CVSS (Common Vulnerability Scoring System): Theoretical severity, 0–10. Its trap: severity ≠ risk. "Fix all criticals first" drowns teams in high-CVSS vulns that nobody is actually exploiting.
EPSS (Exploit Prediction Scoring System): A data-driven probability (0–1) that a vuln will be exploited in the next 30 days. Answers "how likely?"
CISA KEV (Known Exploited Vulnerabilities catalog): Confirmed real-world exploitation — ground truth. As of 2026 it lists 1,200+ entries with US federal patch deadlines (BOD 22-01) and is the industry's de-facto "fix-now" list. In 2025, 884 KEVs were added and 28.96% were exploited on or before disclosure day (up from 23.6% in 2024) — near-instant weaponisation.
SSVC (Stakeholder-Specific Vulnerability Categorization): A decision tree that turns the signals above into an action (Act / Track / etc.) for your specific context.

Then verify: retest or rescan to confirm the fix worked and caused no regression. Why speed matters, in hard numbers: Veracode 2025 says the average flaw now takes 252 days to fix (+47% since 2020); Edgescan puts critical-app MTTR around 74 days; the Verizon 2025 DBIR reports vulnerability exploitation is now ~20% of breaches (the 2nd most common entry path after stolen credentials, +34% YoY), with edge/VPN device exploitation up ~8× (3%→22%) — and the median time from an edge-device CVE being published to mass exploitation is zero days.

8.6 Audits: security assessment vs compliance audit

Two different things people constantly conflate:

	Technical security assessment	Compliance / security audit
Question	"Can this be broken, and how?"	"Do you have the right controls & processes, documented and operating?"
Style	Depth, adversarial, finds specific exploitable flaws	Breadth, governance, attests to a standard
Examples	Pentest, vuln assessment, code review	SOC 2, ISO 27001

SOC 2: An attestation report written by a CPA firm against the AICPA Trust Services Criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy). Type 1 = controls at a single point in time (~45 days); Type 2 = controls operating over a period (~2–6 months). It doesn't literally mandate a pentest, but auditors expect technical testing to satisfy the monitoring criteria.
ISO 27001: An international certification of your ISMS (Information Security Management System) — a risk assessment plus the Annex A controls. Certification can take ~2 months to 2+ years. In 2025, both auditors and cyber insurers effectively expect regular pentests under the vuln-management / secure-development Annex A controls.

Common mistake: Believing that passing a compliance audit means you are secure. A SOC 2 report proves that processes exist, not that your app resists a determined attacker. You need both — governance and real adversarial testing.

Frameworks worth knowing: OWASP Top 10 (the 2025 edition adds supply-chain and "Mishandling Exceptional Conditions" categories and elevates Security Misconfiguration); OWASP WSTG (Web Security Testing Guide, stable v4.2) is the technique reference; OWASP ASVS is the verification standard; CVSS is the shared severity language. On the regulatory side, the EU AI Act (in force since Aug 2024; prohibited-practice and AI-literacy rules live since Feb 2025; full enforcement and fines from Aug 2026) means security testing now carries an AI-governance dimension too.

8.7 What a good audit / report looks like

A report is the product of the whole discipline. A finding with no reproduction steps and no recommended fix is useless. A good report has:

Clear scope & methodology — what was tested, how, with what access, against which standard, so it's repeatable.
Executive summary in plain business language for non-technical leaders.
Each finding with a severity/risk rating, concrete evidence (proof-of-concept), the business impact, and step-by-step remediation — actionable, never "be more secure."
Prioritisation so the team knows what to fix first.
Retest/verification of the fixes.

Example: This project mirrors that standard in code — tests/Feature/Security/CrossTenantIsolationTest.php is a repeatable test that proves the cross-tenant flaw. A finding here isn't "done" until there's a test that reproduces it before the fix and passes after.

Common mistakes

Calling a scanner run a "pentest" — scanning finds known patterns; a pentest exploits and chains them.
"All criticals first" instead of risk-based KEV/EPSS prioritisation, so the real exploited bugs get buried.
Checking the compliance box but never doing real adversarial testing.
Ignoring business-logic and authorization flaws because the tools stay green.
Testing once a year instead of continuously, while attackers test daily.
Never verifying fixes — patches get re-introduced, reverted, or applied incompletely.
Drowning developers in false positives from an un-tuned SAST until they ignore the tool entirely.
No SBOM, so you can't answer "are we affected by CVE-X?" the day a Log4Shell-class bug drops.

Best practices

Layer the automated tools (SAST + SCA in CI, DAST/IAST in staging) — none is sufficient alone.
Shift left: run SAST and SCA on every commit; keep an SBOM continuously updated.
Add human code review for logic and authorization — the bugs scanners can't reason about.
Prioritise with the layered rule: on KEV → fix now; EPSS > 0.5 → out-of-cycle; rest → SSVC tree.
Always close the loop with a verification retest, and capture it as a repeatable test.
Do real adversarial testing and pursue compliance — they answer different questions.
Write reports leaders can act on: scope, plain-language summary, evidence, impact, fix, priority.
Govern AI explicitly — access controls and a policy — before "shadow AI" becomes your $670K line item.

Key takeaway: Memorising exploits and CVEs is a depreciating asset; mastering the methodical process is the durable, valuable skill. Combine complementary automated layers (SAST, DAST, IAST, SCA) with human review and real penetration/red-team testing, prioritise findings by genuine risk (KEV → EPSS → SSVC, not raw CVSS), always verify the fix, and document everything so it's repeatable. Compliance proves processes exist; assessment proves the system resists attack — a mature program needs both, run continuously.

Continue reading

🔒 Security & Privacy Engineering

Why Security & Privacy Engineering Matters

🔒 Security & Privacy Engineering

Core Security Foundations

🔒 Security & Privacy Engineering

Cryptography Made Simple