diff --git a/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md b/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md index 3b6fd4b..1f28447 100644 --- a/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md +++ b/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md @@ -1,6 +1,6 @@ --- -title: "Building a Security Audit Skill for the LLM Age: What We Learned About Making AI Actually Useful for Security" -description: "How we built a production-grade security audit skill that fights false positives, severity inflation, and hallucination — and the design reasoning behind every decision." +title: "Most AI Security Audits Are Broken. Here's How We Fixed It." +description: "The current approach to LLM-based security auditing is fundamentally flawed — it produces noise instead of signal. We built something different, and we're publishing the full methodology." pubDate: 2026-04-11 tags: ["security", "ai", "skills", "prompt-engineering", "false-positives"] categories: ["Engineering"] @@ -17,9 +17,9 @@ We spent the last several weeks building a security audit skill for [Zaguán Bla This post is about the design reasoning, not just the artifact. The prompt itself is [published in full](https://github.com/ZaguanAI/security-audit-skill/blob/main/security-audit.md). What's interesting is *why* each piece exists. -## What Most LLM Audit Prompts Get Wrong +## How Most LLM Security Audits Fail -The failure modes are remarkably consistent across models and frameworks: +Most LLM-based security audits fail in predictable ways. The failure modes are remarkably consistent across models and frameworks: 1. **Severity inflation.** Every dangerous API is Critical. Every user-controlled input is "attacker-controlled." The model doesn't distinguish between a web-facing SQL injection and a local CLI flag that passes user input to `exec()` — they're both RCE, right? @@ -31,12 +31,14 @@ The failure modes are remarkably consistent across models and frameworks: 5. **Missing the real bugs.** While the model is busy flagging every `eval()` in your test suite, the actual vulnerability — a subtle authorization gap in a multi-tenant API, or a deserialization path through a parser the model didn't investigate — goes unnoticed. -Our goal was to build something that produces audits a security engineer would actually want to read and act on. +Our goal was to build something that produces audits a security engineer would actually want to read and act on — and that meant fixing the approach, not just tuning the prompt. ## The Key Insight: From "Find Scary Things" to "Decide What Matters" The single most important design decision was this: **a dangerous sink is not a vulnerability.** +Most tools stop at the sink. Real analysis starts at the boundary. + This sounds obvious, but it's the root of most false positives. The model sees `os.system(user_input)` and flags it. But the question isn't whether the sink is dangerous — it's whether the attacker *crosses a meaningful trust boundary* to reach it, and whether they *gain capability they didn't already have*. A desktop application executing commands from its own config file, which only the user can edit, running as that same user? That's not a vulnerability. That's the application working as designed. The user already has all the authority the "exploit" would give them. @@ -77,7 +79,7 @@ The skill now distinguishes five categories: The **Abuse Primitive** category is the one that surprises people. It captures things like "executes arbitrary shell from config" or "evaluates templates dynamically." These aren't vulnerabilities on their own — the config is trusted, the templates are trusted. But they're *perfect building blocks* for an attacker who finds a way to influence that config or those templates through a different path. -This matters because LLMs and modern attackers are increasingly capable of combining multiple low-severity issues into critical impact. If you only track standalone vulnerabilities, you miss the chains. +This matters because LLMs don't just find bugs — they combine them. That makes low-severity noise more dangerous, not less. A single Medium finding is a nuisance. Five Medium findings that chain into a sandbox escape are a catastrophe. If you only track standalone vulnerabilities, you miss the chains — and the chains are where the real damage is. ## The Same-User Exception @@ -134,7 +136,9 @@ The skill is explicitly grounded in the current threat landscape, not a 2021-era - **Supply chain failures** are now a core appsec category (SHA pinning, OIDC token scope, artifact integrity, mutable action references) - **Mishandling of exceptional conditions** is a first-class security category (fail-open paths, partial transaction recovery, missing rollback) - **AI/Agent surfaces** are real attack surfaces (prompt injection, tool-output-to-tool-input leakage, excessive agency) -- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact +- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact. This is no longer theoretical. + +This isn't just a better audit methodology. It's becoming a necessary one. When attackers can chain five low-severity issues into a critical compromise, treating each finding in isolation isn't cautious — it's negligent. The threat model has a version and a cutoff date (April 2026), so you know when it starts getting stale. @@ -164,7 +168,7 @@ The complete skill definition is [available on GitHub](https://github.com/Zaguan If you take nothing else from this post, take this: -> Never declare a security finding unless you can trace attacker-controlled data across a trust boundary to a privileged sink with positive exploit value. +> **Never call something a vulnerability unless attacker-controlled data crosses a trust boundary and produces positive exploit value.** That's the entire skill compressed into one sentence. Everything else is enforcement machinery.