NVIDIA SkillSpector and the Runtime Trust Gap
NVIDIA open-sourced a static security scanner for AI agent skills. We pointed it at 30 production skills and got CRITICAL/100. Then we read the findings. Eight of eleven HIGHs were structural false positives. Static analysis can’t see trust context, and that’s not a bug.
Earlier this month, NVIDIA released SkillSpector to the public. It’s a static security scanner for AI agent skills: SKILL.md files, MCP tools, the manifests that AI coding assistants load to extend their behavior. 6.7k GitHub stars in three days. Apache 2.0. The methodology is solid. 64 vulnerability patterns across 16 categories, two-stage analysis (regex and AST first, optional LLM semantic pass second), and a research paper reporting that 26.1% of skills carry at least one vulnerability and 5.2% show likely malicious intent.
That research is the right paper at the right time. The agent skill ecosystem expanded faster than security tooling, and there hasn’t been a credible open static analyzer until now. NVIDIA stepped in. Good.
We pointed it at our 30 production skills.
The score: 100. The severity: CRITICAL. 52 findings. 11 HIGH.
Then we read the findings.
What the score saw
The 11 HIGH findings clustered into these patterns:
| Finding | Flagged as | Reality |
|---|---|---|
| Curl to agentlair.dev | Data Exfiltration | Our own API |
| Curl to cloudflare.com | Supply Chain | Our infra |
References to .env | Privilege Escalation | Documentation |
| HTML comments in markdown | Prompt Injection | Renderer metadata |
| String “send session” | Data Exfiltration | Email function |
curl ... | python in docs | Supply Chain | Real (doc reference) |
curl ... | bash in docs | Supply Chain | Real (doc reference) |
rm -rf dist/ | Tool Misuse | Real (destructive) |
| Autonomous decision logic | Excessive Agency | Designed autonomy |
Three of eleven are genuine. Eight are structural false positives. The score that emerged from all eleven was CRITICAL/100.
The information SkillSpector can’t see
This isn’t a bug in NVIDIA’s tool. Their own documentation lists the constraints honestly: runtime behavior requires dynamic execution. Static analysis sees code, not context. And almost every false positive above is a context problem.
A curl to a domain isn’t exfiltration or authorized infrastructure. It’s just a curl. The information that distinguishes them lives outside the code, in the operator’s head. We trust agentlair.dev because it’s ours. We trust cloudflare.com because we authorized them. The .env reference is documentation, not theft, because the file describes how the environment works. SkillSpector can’t know any of this. Nothing in the source says so.
The optional LLM pass helps. It can sometimes infer intent from surrounding text. But intent inference cannot override authorization. Even if the LLM correctly guesses that agentlair.dev is probably the operator’s own service, it has no proof. The right answer is to ask the operator. SkillSpector, by design, doesn’t.
So the score is structurally noisy. Not because the tool is bad. Because the question “is this risky?” cannot be answered from code alone.
What the trust context layer looks like
We wrote a small filter that takes SkillSpector’s JSON output plus a trust manifest, and re-scores. The manifest is short:
{
"trusted_domains": ["agentlair.dev", "cloudflare.com", "..."],
"documentation_patterns": ["*.md", "docs/**"],
"authorized_operations": [
{
"pattern": "send session",
"justification": "Email transport, operator-authorized"
}
],
"accepted_risk_categories": []
} The filter walks each issue from SkillSpector and applies four checks. Is the category in the operator’s accepted-risk list? Does the finding reference a trusted domain (for Data Exfiltration and Supply Chain)? Is the file a documentation file (for Privilege Escalation and Prompt Injection categories)? Does the finding match an authorized operation pattern? Anything matched is either suppressed or downgraded. Everything else stays confirmed and counted.
Run the filter on the same scan. The numbers move.
- Original: 52 issues, score 100, CRITICAL.
- Filtered: 19 confirmed plus 16 downgraded, 17 suppressed.
- False positive rate: 63%.
- Actionable HIGH findings: 11 down to 3.
The three actionable findings are the ones worth fixing. The pipe-to-interpreter patterns in our documentation, and a destructive command example. Nothing about our own API, our infrastructure, our environment documentation, or our agent’s authorized autonomy.
The filter is 380 lines. It isn’t a product. It’s a demonstration that the same scan produces wildly different signals depending on whether the layer above it has trust context.
Why this matters past our own workspace
Two things follow.
First, the static security layer is now a commodity. NVIDIA has it. It’s free, Apache-licensed, well-maintained, good enough to be the default. There is no commercial moat in writing another static scanner. The companies that try will get out-shipped by NVIDIA’s release cadence and absorbed into a forked nightly somewhere.
Second, the value moves up. A static scanner that produces CRITICAL/100 from 63% false positives is not actually telling an enterprise buyer whether to trust an agent. It’s telling them something is statistically unusual in the code. To get from “statistically unusual” to “trustworthy at runtime,” you need three things SkillSpector doesn’t have:
A trust context layer: the manifest above, or something richer that knows what an operator has authorized.
A runtime behavioral layer that observes what the agent actually does when it executes, not just what its code looks like at rest.
A continuous trust signal that compares observed behavior to declared scope on a rolling basis, so a skill that turns adversarial on day 31 is caught on day 31.
The first two haven’t shipped at the scale of NVIDIA’s static layer. The third is mostly research papers. That’s the open territory.
Where we sit
Commit’s behavioral trust scoring lives at a different layer than SkillSpector. Our proof-of-commitment runtime gives skills, MCP servers, and npm packages a behavioral score derived from how their authors actually commit code. Humans leave evidence in version control that AI-generated code cannot fake, and that evidence is independent of what the package’s manifest claims about itself. The static layer asks what the code says it does. The behavioral layer asks what the human did to produce it. Both matter. Neither alone is enough.
We’re not the only ones working at this layer. There will be other entrants. The L4 territory is not won. What’s clear: the L3 territory just closed.
What we’d suggest, if you run agent skills today
Run SkillSpector against your own workspace first. It’s free, it works, and a few of the findings will be real. Don’t trust the score; read the findings. About three out of every five HIGHs will be context-blind. If you want to filter by trust context, the manifest schema and four checks above are enough to roll your own in an afternoon. If you want a behavioral layer on top of either, that’s getcommit.dev/audit, and the first scan on any npm package or MCP server URL is free.
NVIDIA did the field a favor by shipping SkillSpector. It commoditized the layer below the one that is still missing. That’s how stacks build.
Source: NVIDIA/SkillSpector on GitHub, Apache 2.0, 6.7k stars at time of writing. Research stats (26.1% / 5.2%) cited from the project’s README; static-analysis limitations (“runtime behavior requires dynamic execution”) also from project documentation. Findings against 30 skills run locally on 2026-06-15.
Audit any npm package or MCP server: getcommit.dev/audit · 80% of agent skills lie about what they do · Behavioral trust scoring methodology