You've already forked zblade.dev
Made blog post stronger
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: "Building a Security Audit Skill for the LLM Age: What We Learned About Making AI Actually Useful for Security"
|
title: "Most AI Security Audits Are Broken. Here's How We Fixed It."
|
||||||
description: "How we built a production-grade security audit skill that fights false positives, severity inflation, and hallucination — and the design reasoning behind every decision."
|
description: "The current approach to LLM-based security auditing is fundamentally flawed — it produces noise instead of signal. We built something different, and we're publishing the full methodology."
|
||||||
pubDate: 2026-04-11
|
pubDate: 2026-04-11
|
||||||
tags: ["security", "ai", "skills", "prompt-engineering", "false-positives"]
|
tags: ["security", "ai", "skills", "prompt-engineering", "false-positives"]
|
||||||
categories: ["Engineering"]
|
categories: ["Engineering"]
|
||||||
@@ -17,9 +17,9 @@ We spent the last several weeks building a security audit skill for [Zaguán Bla
|
|||||||
|
|
||||||
This post is about the design reasoning, not just the artifact. The prompt itself is [published in full](https://github.com/ZaguanAI/security-audit-skill/blob/main/security-audit.md). What's interesting is *why* each piece exists.
|
This post is about the design reasoning, not just the artifact. The prompt itself is [published in full](https://github.com/ZaguanAI/security-audit-skill/blob/main/security-audit.md). What's interesting is *why* each piece exists.
|
||||||
|
|
||||||
## What Most LLM Audit Prompts Get Wrong
|
## How Most LLM Security Audits Fail
|
||||||
|
|
||||||
The failure modes are remarkably consistent across models and frameworks:
|
Most LLM-based security audits fail in predictable ways. The failure modes are remarkably consistent across models and frameworks:
|
||||||
|
|
||||||
1. **Severity inflation.** Every dangerous API is Critical. Every user-controlled input is "attacker-controlled." The model doesn't distinguish between a web-facing SQL injection and a local CLI flag that passes user input to `exec()` — they're both RCE, right?
|
1. **Severity inflation.** Every dangerous API is Critical. Every user-controlled input is "attacker-controlled." The model doesn't distinguish between a web-facing SQL injection and a local CLI flag that passes user input to `exec()` — they're both RCE, right?
|
||||||
|
|
||||||
@@ -31,12 +31,14 @@ The failure modes are remarkably consistent across models and frameworks:
|
|||||||
|
|
||||||
5. **Missing the real bugs.** While the model is busy flagging every `eval()` in your test suite, the actual vulnerability — a subtle authorization gap in a multi-tenant API, or a deserialization path through a parser the model didn't investigate — goes unnoticed.
|
5. **Missing the real bugs.** While the model is busy flagging every `eval()` in your test suite, the actual vulnerability — a subtle authorization gap in a multi-tenant API, or a deserialization path through a parser the model didn't investigate — goes unnoticed.
|
||||||
|
|
||||||
Our goal was to build something that produces audits a security engineer would actually want to read and act on.
|
Our goal was to build something that produces audits a security engineer would actually want to read and act on — and that meant fixing the approach, not just tuning the prompt.
|
||||||
|
|
||||||
## The Key Insight: From "Find Scary Things" to "Decide What Matters"
|
## The Key Insight: From "Find Scary Things" to "Decide What Matters"
|
||||||
|
|
||||||
The single most important design decision was this: **a dangerous sink is not a vulnerability.**
|
The single most important design decision was this: **a dangerous sink is not a vulnerability.**
|
||||||
|
|
||||||
|
Most tools stop at the sink. Real analysis starts at the boundary.
|
||||||
|
|
||||||
This sounds obvious, but it's the root of most false positives. The model sees `os.system(user_input)` and flags it. But the question isn't whether the sink is dangerous — it's whether the attacker *crosses a meaningful trust boundary* to reach it, and whether they *gain capability they didn't already have*.
|
This sounds obvious, but it's the root of most false positives. The model sees `os.system(user_input)` and flags it. But the question isn't whether the sink is dangerous — it's whether the attacker *crosses a meaningful trust boundary* to reach it, and whether they *gain capability they didn't already have*.
|
||||||
|
|
||||||
A desktop application executing commands from its own config file, which only the user can edit, running as that same user? That's not a vulnerability. That's the application working as designed. The user already has all the authority the "exploit" would give them.
|
A desktop application executing commands from its own config file, which only the user can edit, running as that same user? That's not a vulnerability. That's the application working as designed. The user already has all the authority the "exploit" would give them.
|
||||||
@@ -77,7 +79,7 @@ The skill now distinguishes five categories:
|
|||||||
|
|
||||||
The **Abuse Primitive** category is the one that surprises people. It captures things like "executes arbitrary shell from config" or "evaluates templates dynamically." These aren't vulnerabilities on their own — the config is trusted, the templates are trusted. But they're *perfect building blocks* for an attacker who finds a way to influence that config or those templates through a different path.
|
The **Abuse Primitive** category is the one that surprises people. It captures things like "executes arbitrary shell from config" or "evaluates templates dynamically." These aren't vulnerabilities on their own — the config is trusted, the templates are trusted. But they're *perfect building blocks* for an attacker who finds a way to influence that config or those templates through a different path.
|
||||||
|
|
||||||
This matters because LLMs and modern attackers are increasingly capable of combining multiple low-severity issues into critical impact. If you only track standalone vulnerabilities, you miss the chains.
|
This matters because LLMs don't just find bugs — they combine them. That makes low-severity noise more dangerous, not less. A single Medium finding is a nuisance. Five Medium findings that chain into a sandbox escape are a catastrophe. If you only track standalone vulnerabilities, you miss the chains — and the chains are where the real damage is.
|
||||||
|
|
||||||
## The Same-User Exception
|
## The Same-User Exception
|
||||||
|
|
||||||
@@ -134,7 +136,9 @@ The skill is explicitly grounded in the current threat landscape, not a 2021-era
|
|||||||
- **Supply chain failures** are now a core appsec category (SHA pinning, OIDC token scope, artifact integrity, mutable action references)
|
- **Supply chain failures** are now a core appsec category (SHA pinning, OIDC token scope, artifact integrity, mutable action references)
|
||||||
- **Mishandling of exceptional conditions** is a first-class security category (fail-open paths, partial transaction recovery, missing rollback)
|
- **Mishandling of exceptional conditions** is a first-class security category (fail-open paths, partial transaction recovery, missing rollback)
|
||||||
- **AI/Agent surfaces** are real attack surfaces (prompt injection, tool-output-to-tool-input leakage, excessive agency)
|
- **AI/Agent surfaces** are real attack surfaces (prompt injection, tool-output-to-tool-input leakage, excessive agency)
|
||||||
- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact
|
- **AI-driven exploit chaining** — LLMs and modern attackers combine multiple low-severity issues to achieve critical impact. This is no longer theoretical.
|
||||||
|
|
||||||
|
This isn't just a better audit methodology. It's becoming a necessary one. When attackers can chain five low-severity issues into a critical compromise, treating each finding in isolation isn't cautious — it's negligent.
|
||||||
|
|
||||||
The threat model has a version and a cutoff date (April 2026), so you know when it starts getting stale.
|
The threat model has a version and a cutoff date (April 2026), so you know when it starts getting stale.
|
||||||
|
|
||||||
@@ -164,7 +168,7 @@ The complete skill definition is [available on GitHub](https://github.com/Zaguan
|
|||||||
|
|
||||||
If you take nothing else from this post, take this:
|
If you take nothing else from this post, take this:
|
||||||
|
|
||||||
> Never declare a security finding unless you can trace attacker-controlled data across a trust boundary to a privileged sink with positive exploit value.
|
> **Never call something a vulnerability unless attacker-controlled data crosses a trust boundary and produces positive exploit value.**
|
||||||
|
|
||||||
That's the entire skill compressed into one sentence. Everything else is enforcement machinery.
|
That's the entire skill compressed into one sentence. Everything else is enforcement machinery.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user