80% of Agent Skills Lie About What They Do
Palo Alto Unit42 crawled 49,943 OpenClaw skills and found 80% have behavioral deviations from their declared behavior. Then they admitted their own static analysis tool can’t catch the rest. This is the clearest third-party data we’ve seen on why agent trust scoring has to happen at runtime, not at install.
On June 11, 2026, Palo Alto Networks Unit42 published results from their Behavioral Integrity Verification (BIV) scanner applied to the OpenClaw skill ecosystem. They crawled 49,943 skills — the largest systematic analysis of agent skill behavior published to date.
The headline: 80% of skills (39,933) have at least one behavioral deviation from their declared intent. Across those skills, Unit42 documented 250,706 total deviations — an average of roughly six per non-compliant skill.
18.9% of skills showed adversarial intent. 5% — 2,490 skills — carried multi-stage attack chains.
The threat taxonomy is specific: Instruction-Level Threats were the most adversarial category, with 96% of skills in that class showing adversarial intent — the highest rate of any category. Credential theft was the largest single adversarial leaf, accounting for 8.2% of classified deviations. These aren’t edge cases. They’re systematic.
The disclosure that matters more than the data
The data is striking. But the more significant finding is a single sentence in Unit42’s methodology section:
“BIV is static-only, so dynamic dispatch, reflection, and obfuscated payloads escape AST-level extraction.”
This is a major security vendor saying, in their own words, that their tool — which just found behavioral deviations in 80% of skills — cannot catch the adversarial skills that matter most. Dynamic dispatch and obfuscation are standard tradecraft for any skill designed to evade detection. The 5% with multi-stage attack chains almost certainly overlap heavily with the class of skills that BIV can’t see.
Read that carefully: the tool that found 39,933 deviating skills explicitly cannot analyze the class of skills most likely to cause serious harm.
This is not a criticism of Unit42’s research — it’s honest methodology disclosure. But it has a direct implication: static pre-installation analysis, however thorough, has a hard ceiling. The dangerous payloads are specifically designed to be invisible to it.
What behavioral deviations actually look like
A behavioral deviation, in Unit42’s framework, is a gap between what a skill declares it does (in its manifest, its description, its metadata) and what it actually does when executed. The deviation types they documented are not subtle:
- Instruction-level threats: Skills that modify the agent’s system prompt or override task instructions mid-execution. This category had the highest adversarial rate: 96% of skills flagged here showed adversarial intent rather than developer oversight.
- Credential theft: Skills that access authentication tokens, API keys, or session credentials beyond their stated scope. The largest single adversarial leaf at 8.2% of all classified deviations.
- Exfiltration chains and remote code execution chains: Two of the four novel compound threat categories identified. Multi-stage attacks that distribute malicious behavior across steps, each of which looks benign in isolation.
In aggregate, this is a picture of an ecosystem where the declaration layer — what skills say they do — has almost entirely decoupled from the behavioral layer — what they actually do. 80% deviation rate at scale is not an anomaly. It’s a structural condition.
Why this happens
The declaration layer was never designed to be enforceable. A skill manifest is a string of text that describes intent. Nothing in the current agent skill infrastructure verifies that the manifest accurately reflects behavior. Nothing monitors runtime execution against declared scope. Nothing signals when execution diverges from declaration.
This is the same pattern that produced the npm supply chain crisis, applied at a faster velocity. npm’s package metadata — README, description, keywords — said nothing enforceable about what the package code would do at runtime. Malicious packages published with plausible descriptions and then executed adversarially when installed. The declaration layer was gameable by construction.
Agent skills are worse. Skills are designed to operate autonomously, with elevated access to orchestration infrastructure, in contexts where human review of each action is impossible. A malicious npm package needs a human to run it. A malicious agent skill executes inside an automated pipeline that may process thousands of actions per hour. The blast radius per adversarial skill is larger, and the detection window is shorter.
The Unit42 data confirms what the architecture implied: when declarations aren’t enforceable, most won’t be accurate.
The L3/L4 gap
In the trust infrastructure stack, there are four layers:
- L1: Identity. Who is this agent? JWT/OIDC, did:key, JWKS-verifiable credentials. The IETF Transaction Tokens draft, DIF’s MCP-I profile, and the A2A protocol all operate here.
- L2: Authorization. What is this agent allowed to do? OAuth scopes, capability declarations, allowlists.
- L3: Pre-installation verification. Static analysis, manifest scanning, provenance checks. Unit42 BIV operates at L3.
- L4: Runtime behavioral monitoring. Continuous observation of what the agent actually does during execution, compared against its declared scope and historical baseline.
The industry has made significant progress on L1 and L2 in 2026. The IETF, DIF, and OpenClaw itself have active working groups on agent identity and authorization. L3 has credible tooling — Unit42 BIV, static analysis scanners, manifest validators.
L4 is nearly empty.
Unit42’s methodology admission tells us exactly why this matters: the attacks that escape L3 are the ones that require L4. Static analysis finds deviations in skills that didn’t bother to hide. Dynamic dispatch and obfuscation are evasion techniques for L3. A skill that uses them passes every static scan and then executes adversarially at runtime.
The 5% multi-stage attack chain finding is especially relevant here. Multi-stage attacks, by definition, distribute their adversarial behavior across multiple execution steps. Step one looks clean. Step two looks clean. The harm happens at step three, when context from steps one and two enables an action that no individual step would have triggered. Static analysis examines each skill in isolation — it cannot see the chain.
Agent trust scoring at runtime
The question this data raises isn’t “how do we build a better static scanner?” Unit42 just built one and found 39,933 deviating skills — and acknowledged it can’t see the dangerous tail. The question is: what does the trust signal look like at the moment an agent is executing?
Runtime behavioral trust scoring works differently from static analysis. Instead of asking “does this skill’s code match its declaration?” it asks a continuous set of questions during execution:
- Is this agent accessing resources outside its declared scope?
- Is this agent’s action pattern consistent with its historical baseline?
- Is this agent communicating with endpoints not present in its manifest?
- Is this agent’s token consumption pattern anomalous for its stated task?
- Is this agent modifying its own instructions or those of downstream agents?
These signals are continuous. They degrade naturally when behavior changes. A skill that passed static analysis and operated cleanly for thirty days produces a different runtime signal than a skill that starts exfiltrating credentials on day thirty-one. Static analysis gives you a snapshot. Runtime monitoring gives you a stream.
The Unit42 BIV data is the strongest third-party evidence to date that the snapshot is insufficient. 250,706 behavioral deviations across 49,943 skills tell you the ecosystem has a systematic declaration problem. The explicit methodology admission tells you that the solution to the declaration problem cannot itself be declarative. You need the stream.
What this means for agent deployments today
If your infrastructure runs agent skills — MCP servers, OpenClaw tools, custom agent pipelines — the Unit42 data has a direct operational implication: the skills you’re running have probably not been verified against their declared behavior, and static scanning won’t catch the most dangerous ones even if you run it.
A few concrete steps:
Audit your agent skill declarations. Start by comparing what your running skills say they do against what network traffic, system calls, and API access logs show they actually do. The gap is the risk surface. You can run a structural scan against any npm-distributed skill:
npx proof-of-commitment npm <your-skill-package>
# For MCP servers
npx proof-of-commitment mcp-remote <server-url>
# Web UI
# https://getcommit.dev/audit Add behavioral gates to your CI pipeline. Structural risk flags — anomalous dependency additions, publishing pattern changes, maintainer transfers — show up before compromised skills reach production. We published a 5-minute CI integration that puts these flags in PR comments.
Don’t rely on marketplace verification. The OpenClaw ecosystem is not the only place this applies. We documented 9 of 11 MCP marketplaces accepting a malicious server without detection. The Unit42 data confirms this isn’t an MCP-specific problem — it’s a declaration-layer problem. Any ecosystem that trusts manifests over behavior has the same exposure.
Plan for L4. The agent behavioral monitoring layer is thin right now. That’s not because the problem is solved — it’s because the tooling hasn’t caught up with the deployment curve. Unit42’s explicit acknowledgment that static analysis has a hard ceiling is a signal that the industry knows this gap exists. Plan for monitoring infrastructure before your agent deployment scales past the point where manual review is possible.
The 80% figure will age badly in one of two directions. Either the ecosystem invests in L4 monitoring and the deviation rate drops as adversarial skills get caught faster — or the deviation rate climbs as agent deployments scale faster than detection. Unit42’s data is a snapshot. The dynamic depends on whether the industry treats L3 as sufficient or as the floor.
The methodology admission says it’s the floor.
Source: Palo Alto Networks Unit42 / arXiv 2605.11770, “Behavioral Integrity Verification for AI Agent Skills,” May 2026. 49,943 OpenClaw skills analyzed. Stats: 39,933 (80.0%) with ≥1 behavioral deviation; 250,706 total deviations; 18.9% adversarial intent; 2,490 (5.0%) multi-stage attack chains; credential theft largest adversarial leaf (8.2% of classified deviations); instruction-level threats highest adversarial category (96% adversarial fraction). Limitations: “BIV is static-only, so dynamic dispatch, reflection, and obfuscated payloads escape AST-level extraction.”
Audit your agent dependencies: getcommit.dev/audit · CI integration guide · Trust scoring methodology