From aac08bf6262f477bc9f4d73826a9580d5810504f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stig-=C3=98rjan=20Smelror?= Date: Sat, 11 Apr 2026 16:53:11 +0200 Subject: [PATCH] Made blog post stronger and more direct --- ...2026-04-11-security-audit-skill-in-the-llm-age.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md b/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md index 1f28447..119a42d 100644 --- a/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md +++ b/src/content/blog/2026-04-11-security-audit-skill-in-the-llm-age.md @@ -33,6 +33,8 @@ Most LLM-based security audits fail in predictable ways. The failure modes are r Our goal was to build something that produces audits a security engineer would actually want to read and act on — and that meant fixing the approach, not just tuning the prompt. +The uncomfortable truth: most LLM-based audit tools aren't doing security analysis. They're doing pattern matching with authority. + ## The Key Insight: From "Find Scary Things" to "Decide What Matters" The single most important design decision was this: **a dangerous sink is not a vulnerability.** @@ -41,11 +43,7 @@ Most tools stop at the sink. Real analysis starts at the boundary. This sounds obvious, but it's the root of most false positives. The model sees `os.system(user_input)` and flags it. But the question isn't whether the sink is dangerous — it's whether the attacker *crosses a meaningful trust boundary* to reach it, and whether they *gain capability they didn't already have*. -A desktop application executing commands from its own config file, which only the user can edit, running as that same user? That's not a vulnerability. That's the application working as designed. The user already has all the authority the "exploit" would give them. - -A web server executing commands from an HTTP request body? That's a completely different trust model, and it *is* a vulnerability. - -The skill has to reason about trust boundaries and privilege deltas, not just dangerous function calls. +A desktop app executing commands from its own config, editable only by the same user it runs as? Not a vulnerability — the user already has that authority. A web server executing commands from an HTTP request body? Completely different trust model, and it *is* a vulnerability. ## The Exploit Value Test @@ -67,7 +65,7 @@ This is now the gating test before any severity assignment. If Exploit Value ≤ ## The Classification Taxonomy -Most audit frameworks have findings and... that's it. We found this forced the model to either inflate borderline issues into "findings" or drop them entirely. Neither is correct. +Most audit frameworks have findings and... that's it. That binary — flag it or ignore it — is itself a design failure. It forces the model to either inflate borderline issues into "findings" or drop them entirely. Neither is correct. The skill now distinguishes five categories: @@ -153,6 +151,8 @@ We tested the skill against the Openbox source code — a Linux/BSD desktop envi A naive audit flags all of these as Critical. Our early versions flagged most of them as Medium+. The final version correctly classified the majority as Design Properties, with specific reasoning about why trust holds (or doesn't) in each case. +For example: Openbox's `autostart.sh` mechanism executes arbitrary shell commands from a config file. A generated audit calls this RCE. Our audit classifies it as a Design Property — the config is owned by the same user who runs the window manager, the user already has shell access, and no trust boundary is crossed. But if that config were loaded from a remote sync or a shared NFS mount, the classification would flip to Confirmed Finding, because now an untrusted party can influence trusted execution. + The real vulnerabilities — the subtle authorization gaps, the parser edge cases, the incomplete fixes — actually became *more* visible once the noise was gone. ## The Full Skill