Made blog post stronger

2026-04-11 16:45:38 +02:00
parent 258e89c385
commit c97c5e8135
1 changed files with 12 additions and 8 deletions
@@ -1,6 +1,6 @@
 ---
-title: "Building a Security Audit Skill for the LLM Age: What We Learned About Making AI Actually Useful for Security"
+title: "Most AI Security Audits Are Broken. Here's How We Fixed It."
-description: "How we built a production-grade security audit skill that fights false positives, severity inflation, and hallucination — and the design reasoning behind every decision."
+description: "The current approach to LLM-based security auditing is fundamentally flawed — it produces noise instead of signal. We built something different, and we're publishing the full methodology."
 pubDate: 2026-04-11
 tags: ["security", "ai", "skills", "prompt-engineering", "false-positives"]
 categories: ["Engineering"]
@@ -17,9 +17,9 @@ We spent the last several weeks building a security audit skill for [Zaguán Bla
 This post is about the design reasoning, not just the artifact. The prompt itself is [published in full](https://github.com/ZaguanAI/security-audit-skill/blob/main/security-audit.md). What's interesting is *why* each piece exists.
-## What Most LLM Audit Prompts Get Wrong
+## How Most LLM Security Audits Fail
-The failure modes are remarkably consistent across models and frameworks:
+Most LLM-based security audits fail in predictable ways. The failure modes are remarkably consistent across models and frameworks:
 1. **Severity inflation.** Every dangerous API is Critical. Every user-controlled input is "attacker-controlled." The model doesn't distinguish between a web-facing SQL injection and a local CLI flag that passes user input to `exec()` — they're both RCE, right?
@@ -31,12 +31,14 @@ The failure modes are remarkably consistent across models and frameworks:
 5. **Missing the real bugs.** While the model is busy flagging every `eval()` in your test suite, the actual vulnerability — a subtle authorization gap in a multi-tenant API, or a deserialization path through a parser the model didn't investigate — goes unnoticed.
-Our goal was to build something that produces audits a security engineer would actually want to read and act on.
+Our goal was to build something that produces audits a security engineer would actually want to read and act on — and that meant fixing the approach, not just tuning the prompt.
 ## The Key Insight: From "Find Scary Things" to "Decide What Matters"
 The single most important design decision was this: **a dangerous sink is not a vulnerability.**
 Most tools stop at the sink. Real analysis starts at the boundary.
 This sounds obvious, but it's the root of most false positives. The model sees `os.system(user_input)` and flags it. But the question isn't whether the sink is dangerous — it's whether the attacker *crosses a meaningful trust boundary* to reach it, and whether they *gain capability they didn't already have*.
 A desktop application executing commands from its own config file, which only the user can edit, running as that same user? That's not a vulnerability. That's the application working as designed. The user already has all the authority the "exploit" would give them.
@@ -77,7 +79,7 @@ The skill now distinguishes five categories:
 The **Abuse Primitive** category is the one that surprises people. It captures things like "executes arbitrary shell from config" or "evaluates templates dynamically." These aren't vulnerabilities on their own — the config is trusted, the templates are trusted. But they're *perfect building blocks* for an attacker who finds a way to influence that config or those templates through a different path.
-This matters because LLMs and modern attackers are increasingly capable of combining multiple low-severity issues into critical impact. If you only track standalone vulnerabilities, you miss the chains.
+This matters because LLMs don't just find bugs — they combine them. That makes low-severity noise more dangerous, not less. A single Medium finding is a nuisance. Five Medium findings that chain into a sandbox escape are a catastrophe. If you only track standalone vulnerabilities, you miss the chains — and the chains are where the real damage is.
 ## The Same-User Exception
@@ -134,7 +136,9 @@ The skill is explicitly grounded in the current threat landscape, not a 2021-era
 - **Supply chain failures** are now a core appsec category (SHA pinning, OIDC token scope, artifact integrity, mutable action references)
 - **Mishandling of exceptional conditions** is a first-class security category (fail-open paths, partial transaction recovery, missing rollback)
 - **AI/Agent surfaces** are real attack surfaces (prompt injection, tool-output-to-tool-input leakage, excessive agency)
- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact
+- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact. This is no longer theoretical.
 This isn't just a better audit methodology. It's becoming a necessary one. When attackers can chain five low-severity issues into a critical compromise, treating each finding in isolation isn't cautious — it's negligent.
 The threat model has a version and a cutoff date (April 2026), so you know when it starts getting stale.
@@ -164,7 +168,7 @@ The complete skill definition is [available on GitHub](https://github.com/Zaguan
 If you take nothing else from this post, take this:
-> Never declare a security finding unless you can trace attacker-controlled data across a trust boundary to a privileged sink with positive exploit value.
+> **Never call something a vulnerability unless attacker-controlled data crosses a trust boundary and produces positive exploit value.**
 That's the entire skill compressed into one sentence. Everything else is enforcement machinery.