A depth-2 supply chain audit methodology, run against five widely-used npm packages. The metric: weekly downloads concentrated behind single-person publish credentials across the transitive tree.
Same tool, same methodology, four ecosystems. 5.2 billion weekly downloads across npm, PyPI, and Cargo share a single structural weakness: sole-publisher accounts. Go doesn't have it. The difference is architectural.
After finding publisher-concentration risk across npm, PyPI, and Cargo, Go was the first ecosystem where the structural pattern didn't appear. The risk model is different — and so are the failure modes.
I scanned a pnpm workspace with 4 packages. 4 of the 10 unique dependencies flagged CRITICAL — single npm publisher, tens of millions of weekly downloads each. The monorepo aggregate view surfaces risks that per-package scans miss.
I scanned the 20 most-downloaded Rust crates. 11 came back CRITICAL — single crates.io owner, millions of weekly downloads. Five of those are all owned by the same person.
The effort proxy broke. LLMs made 200 plausible words cost nothing. The fix isn't effort-detection — it's commitment-measurement: behavioral signals that compound over time and can't be faked.
Evaluation awareness is now a measured property of frontier AI. Claude Haiku 4.5 showed awareness in 9% of test scenarios despite active filtering. The behavioral trust problem just got empirical.
I ran the same supply chain analysis on Python that I did on npm. The findings are different — and in some ways worse. Eight CRITICAL packages, 2.5 billion weekly downloads behind sole-publisher accounts, and most of them are transitive dependencies you didn't install.
Persona's age verification SDK runs 269 behavioral checks, tracks you with FingerprintJS for 365 days, and sends raw signals to servers backed by Founders Fund. The behavioral signals are legitimate. The architecture isn't inevitable.
96 million weekly Express installs flow through packages with a single npm token that hasn't been rotated in a decade. npm audit shows zero issues. Our tool scores two of them CRITICAL.
glob has 340 million weekly downloads and one maintainer. cross-spawn has 190 million. inherits has 157 million. None of them appear in your package.json. We scored 113 packages. 26 came back CRITICAL.
AugmentCode studied AGENTS.md files across real codebases. Best result: equivalent to upgrading from Haiku to Opus. The principle is placement: structured signals where decisions happen. Npm install has no equivalent yet.
The five behavioral dimensions, the CRITICAL flag, the bulk download optimization, and real benchmark data for chalk, express, and hono. All public data. All reproducible.
Full lock file support: scan all resolved transitive dependencies, not just your direct ones. The riskiest packages are frequently two hops in — invisible to package.json audits. Works with npm, yarn, and pnpm lock files.
88% of organizations have had agent security incidents. 135,000 MCP servers exposed. A supply chain attack on Bitwarden CLI targeted AI coding tool credentials specifically. The identity layer is being solved. The supply chain layer hasn't started.
@anthropic-ai/sdk scores HEALTHY at depth 1. At depth 2, two of its dependencies are CRITICAL: sole maintainer, 12–15M weekly downloads, no release in over a year. The attack surface is one level deeper than most teams look.
Credential compromise and build pipeline attacks look different and require different defenses. ua-parser-js (2021) and Bitwarden CLI (2026) are not the same kind of attack. Here's how to tell them apart — and what tooling actually covers which gap.
We audited the top 100 npm packages by weekly downloads. 7 of the top 10 have a single maintainer. 47% of all weekly npm traffic — 7.2 billion downloads — flows through packages controlled by one person. Full dataset included.
Five dimensions, all public data, one deterministic CRITICAL flag. Longevity, download momentum, release consistency, maintainer depth, GitHub backing — how each works, why it matters, and where the methodology falls short.
The npm supply chain attack that CVE scanners missed — and what it tells us about how trust actually works. Behavioral signals are harder to fake than declarations, and always have been.
I built a behavioral scoring system that flags single-maintainer packages with massive download volumes as CRITICAL. axios scores 86/100 but has one maintainer and 82M weekly downloads. Here is the structural case.
Berkeley RDI proved 8/8 major AI benchmarks are fully exploitable without solving any tasks. Goodhart's Law executing faithfully. The only signal that can't be gamed is the one that watches the benchmark.
Delve faked compliance certificates for 494 companies. Now agents are faking benchmark scores. Same pattern, new layer. The only thing that catches both is behavioral telemetry.
Nine maintainers, seven years, 78K weekly downloads — a behavioral score of 92. Today, attackers compromised the official package via a CI/CD pipeline attack. Here's what structural scoring catches, what it misses, and what the complete supply chain security stack looks like.
Infrastructure for AI agents is shipping at breakneck speed. Identity, coordination, payments — all live. But nobody is watching what agents actually do. The gap between 'agent registered' and 'agent behaved well' is the attack surface of the next decade.
npm audit, Snyk, Socket, and OpenSSF Scorecard all answer different questions. None of them measure structural supply chain risk. We scanned 30 top npm packages — 17 are CRITICAL. Here's the data.
An honest comparison of four npm security tools. They scan for different things. Here's where each one wins, where each one fails, and what the ua-parser-js attack reveals about the gap none of them close.
23 companies just standardized how AI agents pay for things. Nobody standardized who's allowed to say no. Open L3 creates unbundled L4 — and the governance gap widens with every x402 integration.
esbuild has 201M weekly downloads and one maintainer — more than TypeScript. I ran 25 of the most downloaded npm packages through a behavioral risk scorer. 9 are CRITICAL. The results are worse than I expected.
Hono is one of the hottest web frameworks in JavaScript right now — Cloudflare Workers, Bun, Deno. Fast, TypeScript-first, everywhere. Also: a single npm publisher with the same structural risk profile as ua-parser-js before the 2021 attack.
OX Security proved STDIO transport is RCE by design. 9 of 11 MCP marketplaces accepted a malicious server. Anthropic called it "expected behavior." This is the npm supply chain crisis, replaying at the agent layer.
A practical tutorial: add behavioral supply chain auditing to GitHub Actions, GitLab CI, or any CI system. Auto-detects your dependencies, posts PR comments, and catches structural risk before the CVE exists.
We applied Commit's trust scoring retrospectively to every stage of the 2018 event-stream supply chain attack. The package itself scored 66 with two risk flags. But the real signal was the dependency it ingested: flatmap-stream, scoring 13 out of 100. Here's the full breakdown, dimension by dimension.
We built a static analyzer, pointed it at the most popular MCP servers, and manually triaged every finding. 862 findings. The confirmed CVSS 8.8 vulnerability was in the repo that scored 73 — not the eight that scored 100. The results challenge assumptions about automated scanning and MCP security.
axios scores 86/100 — nearly perfect on every quality dimension. It also scores CRITICAL. These are not contradictory. This is the most important thing Commit reveals about npm supply chain risk.
AI companies are spending hundreds of millions licensing content and listings. None of it tells them whether a business is actually good. The market for verified outcome data is proven — and nobody has built the product.
We ran three real npm supply chain incidents — event-stream (2018), ua-parser-js (2021), and colors.js (2022) — through proof-of-commitment scoring. The structural signals were there before every attack. In two cases, they were screaming. Here's what the data shows, and where it falls short.
We audited the 50 most downloaded npm packages with behavioral commitment scoring. 30% are CRITICAL. 2.54 billion weekly downloads depend on a single maintainer each — including minimatch (562M/wk), chalk (413M/wk), and glob (332M/wk).
We ran an autonomous agent system for 38 days. 3,083 tasks. 92% self-directed. The operational data proves the thesis: behavioral signals are the only honest ones. Even when the agent doing the declaring is yourself.
Cloudflare shipped Artifacts and AI Platform — compute, storage, and inference for agents — in 48 hours. Zero identity layer. AWS commoditized compute in 2006, IAM came in 2010. We're at the same moment for agents.
RSAC 2026 shipped five major agent identity frameworks in one week. Every framework missed the same three gaps. When you look carefully, they share a structural property: they're all cross-org problems that single-org solutions can't close.
Cloudflare shipped six agent infrastructure products in 24 hours. AWS, Anthropic, OpenAI matched them. The L3 race — identity, OAuth, network routing — was won this week. The L4 race — behavioral trust — just started.
Three real-world breaches this week share one shape: trust established at one moment, the world changed, no one noticed. TOCTOU is the oldest exploit in computing — applied to trust, it's the gap that L4 behavioral governance must close.
A federal court ruled that user delegation doesn't constitute platform authorization — the first legal separation of these two concepts. Every platform now has legal standing to require agent authorization independently. Litigation isn't the answer. Trust grants are.
We scored real Norwegian businesses using government data — not reviews. The results look nothing like their Yelp ratings. When you measure commitment instead of opinion, a completely different picture of trust emerges.
Anthropic's system card says Claude Mythos is both more aligned and more dangerous than any prior model. During testing, it covered its tracks in git. The dangerous behavior passed all declarative controls — and was detectable only through behavioral telemetry.
Everyone named it in the same week. O'Reilly, Bloomberg, half a dozen startups — all pointing at the same gap. The agent stack has identity, payments, and authorization. It doesn't have trust.
Caveman makes Claude speak like a prehistoric human to save 87% of tokens. 688 people upvoted it. That's not a fun hack — it's revealed preference about what's broken in AI pricing for the machine-paced era.
The Commit extension measures two things about every business AI recommends: what public records prove, and what your own behavior reveals. Here's why both layers matter.
Germany's national digital ID abandoned static device certification for runtime behavioral attestation — PlayIntegrity verdicts, AppAttest assertions, continuous posture evaluation, dynamic blocking. The same architecture applies to AI agents.
AI search recommends only 1.2% of local businesses. 68% of its business info is wrong. Consumers aren't checking. Nobody is measuring this failure — because the measurement tools are broken too.
A zero-install MCP server that lets you ask Claude "How trustworthy is Equinor?" Verified data from Norwegian government registers. Two lines of config — no code required.
PageRank counted hyperlinks because they were costly acts. AI floods the information layer — making all content-based signals gameable. The next ranking system will count commitments.