How Commit Scores npm Packages

The methodology behind getcommit.dev/audit — five behavioral dimensions, all public data, deterministic CRITICAL flag. Inspectable, debatable, reproducible.

In October 2021, ua-parser-js was compromised. ~8 million downloads per week. npm audit showed zero issues. The structural risk — a sole maintainer controlling a widely-used package — was visible in public registry data long before anyone filed a CVE (CVE-2021-4229). This article explains how behavioral commitment scoring identifies that risk.

When I published the supply chain risk analysis, the most common question was: "How does your scoring actually work? Show me the math."

Fair question. If you're going to trust a tool with your dependency decisions, you should be able to inspect, debate, and reject specific choices in the methodology. This is that article.

The Problem: npm audit Answers the Wrong Question

npm audit is a CVE scanner. It checks a package's version against a database of known vulnerabilities. When a CVE is filed, catalogued, and propagated, your tool will catch it.

That's useful. But it answers the wrong question for a specific class of supply chain risk.

The question that matters is: what is the structural likelihood that this package becomes a future attack vector?

Known CVEs are the output of an attack. What we can observe before the attack is the conditions that made it possible:

  • Single person controlling the publish credentials for a package with 100M weekly downloads
  • No corporate backing — one compromised GitHub account is a supply chain event
  • High download trend attracting attacker attention
  • Long project age with accumulated legacy access and inertia

Before the October 2021 ua-parser-js attack, running npm audit on a project that depended on ua-parser-js returned:

found 0 vulnerabilities

The behavioral commitment score would have returned:

ua-parser-js  sole maintainer  ~8M downloads/week  🔴 CRITICAL

The difference isn't that one tool was smarter. It's that they answer different questions.

The Five Scoring Dimensions

Every package gets scored on five behavioral dimensions. All inputs are public data from the npm registry and GitHub API — no scraping, no proprietary data sources.

1. Longevity (25 points)

What it measures: Project age, weighted by consistency.

Why it matters: Older projects have accumulated more dependents, more integration depth, and more attack interest. A 12-year-old package embedded in thousands of production systems is a different risk profile than a 6-month-old experimental library.

Scoring: Full marks (25/25) for packages with 10+ years of consistent maintenance. Scales down for younger projects or projects with significant inactive periods.

axios in practice: 11.6 years old → 25/25

Note: high longevity is not inherently risky. It's the combination of longevity + single maintainer + high downloads that creates the dangerous profile.

2. Download Momentum (25 points)

What it measures: Download trend direction, not raw count.

Why it matters: A package with 100M weekly downloads and a declining trend is a different risk than one with 100M and a growing trend. Growing packages are attracting more attention — from users and attackers both.

Scoring: Full marks for packages with growing or stable trends at high volume. The raw download count matters (it sets the "blast radius"), but trend direction matters more for predictive scoring.

axios in practice: 101M/week, growing → 25/25

3. Release Consistency (20 points)

What it measures: Regularity of releases over time, recency of last publish.

Why it matters: Packages with consistent release cadences signal active, engaged maintainers. Packages that haven't released in 12+ months while maintaining high traffic are "zombie" packages — still widely depended on, but potentially unmaintained, with old access still live.

Scoring: Full marks for packages releasing regularly (monthly or better). Scaled down for packages with 90+ day gaps. Significant deductions for 12+ month inactivity while still seeing significant traffic.

axios in practice: Last published 6 days ago, consistent history → 20/20

Contrast — chalk: Last published 171 days ago → 13/20

4. Maintainer Depth (15 points)

Why it matters: This is the key signal for the CRITICAL risk flag.

A sole maintainer controlling a package with massive download volume creates a single point of failure. One compromised npm token, one phished account, one person's bad day — and millions of weekly downloads receive a malicious update. The LiteLLM attack (March 2026) and the ua-parser-js attack (October 2021) both followed this pattern exactly.

Scoring:

Maintainers Score
1 (sole) 4/15
2 7/15
3–4 10/15
5–9 12/15
10–14 14/15
15+ 15/15

Single maintainer scores 4/15 — the lowest non-zero score. It's intentionally low because the credential-compromise risk is structural, not speculative.

axios in practice: 1 maintainer → 4/15
express in practice: 5 maintainers → 15/15

5. GitHub Backing (15 points)

What it measures: Organizational backing, community engagement, repository health signals.

Why it matters: Packages maintained under a corporate GitHub organization have different risk profiles than personal repos. An organization means multiple people have access, there are usually internal security practices, and there's institutional continuity if the primary maintainer leaves.

Scoring: Organization-backed repos score higher. Personal repos with high engagement score mid-range. Personal repos with declining engagement score lower.

The CRITICAL Flag

A package is flagged CRITICAL when both conditions are true:

  1. Single maintainer (maintainerDepth = 4/15)
  2. >10M weekly downloads

Both conditions must hold. The threshold is explicit and deterministic — you can reproduce the flag yourself from npm registry data.

The reasoning: >10M weekly downloads is the point where a compromised package becomes a supply chain event. 16 of the 41 npm packages with >10M weekly downloads have a single maintainer. Together: 2.82 billion downloads per week.

Walking Through a Real Scoring: axios

curl -X POST https://poc-backend.amdal-dev.workers.dev/api/audit \
  -H "Content-Type: application/json" \
  -d '{"packages": ["axios"]}'

Response (April 2026):

{
  "name": "axios",
  "ecosystem": "npm",
  "score": 89,
  "maintainers": 1,
  "weeklyDownloads": 100837905,
  "ageYears": 11.6,
  "trend": "growing",
  "daysSinceLastPublish": 6,
  "riskFlags": ["CRITICAL"],
  "scoreBreakdown": {
    "longevity": 25,
    "downloadMomentum": 25,
    "releaseConsistency": 20,
    "maintainerDepth": 4,
    "githubBacking": 15
  }
}

Score interpretation: 89/100 looks healthy. Most "package health" tools would pass this with flying colors. The project is 11.6 years old (full longevity), actively downloaded with growing trend (full momentum), consistently releasing (full consistency), well-backed on GitHub (full backing).

The CRITICAL flag comes entirely from one number: maintainerDepth: 4/15.

Comparison Table

Package Score Maintainers Weekly Downloads Risk
axios 89 1 101M 🔴 CRITICAL
zod 83 1 159M 🔴 CRITICAL
chalk 75 1 411M 🔴 CRITICAL
react 91 2 123M ✅ No flag
express 97 5 93M ✅ No flag

Validation: How the Scores Performed Before the Attacks

The ua-parser-js Attack (October 2021)

Structural profile (before attack): CRITICAL — sole maintainer, ~8M downloads/week, no organizational backing.

npm audit (before attack): found 0 vulnerabilities

The attack followed the exact pattern the profile predicted: credential compromise (account phishing), malicious versions published with credential harvesting and crypto mining code.

The LiteLLM Attack (March 2026)

Same profile: sole maintainer, 10M+ weekly downloads on the PyPI side. CRITICAL by behavioral scoring. npm audit and PyPI audit tools clean.

What the Scores Don't Tell You

CRITICAL packages that never get attacked will always outnumber the ones that do. The score identifies exposure, not certainty. Most sole-maintained packages are run by talented, security-conscious people who never become targets.

A high overall score with a CRITICAL flag can be misleading. axios at 89/100 looks like a model of open source health. One person's credentials stand between 100M weekly installs and a supply chain event. This is precisely what makes the risk insidious.

The methodology weights are a first pass, not ground truth. I weighted maintainerDepth at 15 points total. A reasonable argument exists for weighting it differently. The weights are published, the logic is open, the API returns full breakdown. If you'd weight things differently, I want to know.

The score doesn't cover behavioral changes over time. A package that was maintained by a 5-person team for 10 years but just lost 4 of those maintainers gets the same maintainerDepth score as one that's always been sole-maintained. The current implementation is a snapshot, not a trajectory.

The Short Version

Longevity           25 pts  — project age + consistency
Download Momentum   25 pts  — trend direction at current volume
Release Consistency 20 pts  — release cadence + recency
Maintainer Depth    15 pts  — credential concentration risk
GitHub Backing      15 pts  — organizational support + engagement

CRITICAL = sole maintainer + >10M weekly downloads

Try it on your own stack:

npx proof-of-commitment --file package.json

Or in the browser: getcommit.dev/audit

Source: github.com/piiiico/proof-of-commitment
Web audit: getcommit.dev/audit
Live watchlist: getcommit.dev/watchlist