Commit Trust Scoring Specification v1.0.0

01

Design Principles

Structural over behavioral

We score the container, not the contents. Commit does not analyze source code, scan for malware, or check for known CVEs. It measures the structural conditions that make a package a high-value target — or a resilient dependency.

Predictive over reactive

The goal is to identify targets before compromise, not to detect active attacks. A sole maintainer controlling 100M weekly downloads is a structural risk regardless of whether an attack has occurred.

Complementary to CVE scanning

Commit operates on different data than Snyk, Socket, or npm audit. Those tools detect known vulnerabilities and malicious code. Commit maps the structural conditions that produce the next vulnerability. Use both.

Open methodology, reproducible scores

Every weight and threshold is published in this specification. All input data comes from public APIs (npm registry, PyPI, GitHub). Given the same input data, the same score must be produced.

No declarations, only behavior

Badges, certifications, and self-reported security practices are excluded. Only observable, verifiable behavioral signals are scored. A package that claims best practices but has one maintainer and no releases in two years is scored on the latter.

02

Package Scoring: npm & PyPI

Package scores range from 0–100. Five dimensions are evaluated independently, then summed. The score reflects structural trust — the degree to which the package's maintenance structure is resilient to single points of failure.

25 pts

Longevity

Time is a cost signal. A package that has existed for six years has survived shifts in ecosystem preference, maintainer fatigue, and the natural decay of abandoned projects. Longevity is the strongest Lindy indicator available from registry metadata.

Package age	Points
≥ 6 years	25
4–6 years	20
2–4 years	14
1–2 years	8
0.5–1 year	4
< 6 months	1

25 pts

Download Momentum

Download volume measures ecosystem adoption. But absolute count alone rewards incumbency. Download Momentum combines volume with a 90-day trend analysis — rewarding growing adoption and penalizing decline. The trajectory matters as much as the snapshot.

Base score is determined by weekly download volume (npm) or daily downloads (PyPI). A trend modifier of ±3 points is then applied based on the 90-day trajectory. Trend is computed by comparing the first 45 days to the last 45 days: growing if ratio > 1.15, declining if < 0.85, stable otherwise. Final score is clamped to 0–25.

npm (weekly downloads)

Volume	Base
≥ 1,000,000	22
≥ 100,000	18
≥ 10,000	14
≥ 1,000	10
≥ 100	6
≥ 10	3
< 10	0

PyPI (daily downloads)

Volume	Base
≥ 5,000,000	22
≥ 500,000	18
≥ 50,000	14
≥ 5,000	10
≥ 500	6
≥ 50	3
< 50	0

Trend modifier Growing: +3 Stable: 0 Declining: −3

20 pts

Release Consistency

A healthy package publishes regularly. Total version count demonstrates sustained investment over time. Recency of the last publish signals whether the project is actively maintained or dormant.

Version count (base)

Versions	Base
≥ 100	15
≥ 30	12
≥ 10	9
≥ 3	6
≥ 1	3

Recency bonus

Last publish	Bonus
< 30 days	+5
30–90 days	+3
90–365 days	+1
> 365 days	0

Final score is clamped to 0–20.

15 pts

Maintainer Depth

Credential concentration is the single strongest structural risk signal in package ecosystems. A sole maintainer is not a quality judgment — it means one set of credentials controls all publish access. The more maintainers, the more organizational resilience. This dimension measures bus factor and credential distribution.

Maintainers	Points
≥ 5	15
3–5	11
2	7
1	4
0	0

15 pts

GitHub Backing

When a package links to a GitHub repository, that repository provides independent signals about project health: contributor count, commit activity, release cadence, and community engagement. This dimension maps the linked repository's commitment score (0–100) to a 0–15 range.

github_backing = (github_repo_score / 100) × 15 If no linked repository: 0 points

Score Composition

25

20

15

Longevity Momentum Release Maintainer GitHub

03

Risk Flags

Risk flags operate independently of the numerical score. A package can score 89/100 and still carry a CRITICAL flag. This is by design: a high score means the package is well-established and widely adopted. A CRITICAL flag means one credential controls that entire surface area. Both statements are true simultaneously.

CRITICAL Single maintainer AND >10M weekly downloads

Maximum credential concentration at catastrophic blast radius. One compromised account affects the entire downstream dependency tree at infrastructure scale.

Historical validation: colors.js (January 2022), event-stream (2018), ua-parser-js (2021). Each was a sole-maintainer package with massive download volume. The structural signal was visible for years before the incident.

HIGH Either condition:

Package age < 1 year AND >1M weekly downloads
Single maintainer AND >1M weekly downloads

Elevated structural risk. Either rapid adoption without time-tested stability, or significant credential concentration at meaningful scale.

WARN No publish in >365 days

The package has not published a new version in over one year. This may indicate abandonment, or may indicate a mature, stable library. Context matters. The flag ensures the condition is visible.

04

Repository Scoring: GitHub

When a package links to a GitHub repository, or when a repository is assessed directly, a separate scoring model applies. The dimensions differ from package scoring because the available signals differ.

30 pts

Longevity

Repository age	Points
≥ 5 years	30
3–5 years	22
1–3 years	14
0.5–1 year	7
< 6 months	2

25 pts

Recent Activity

Commits in the last 30 days. A direct measure of active development.

Commits (30 days)	Points
≥ 50	25
≥ 20	20
≥ 6	15
≥ 1	8
0	0

20 pts

Community

Contributor count. More contributors means distributed knowledge, review capacity, and reduced bus factor.

Contributors	Points
≥ 20	20
≥ 6	15
≥ 2	10
1	5

15 pts

Release Cadence

Stable (non-prerelease) releases among the most recent 10. Regular releases indicate active maintenance and a commitment to shipping.

Stable releases	Points
≥ 10	15
≥ 3	10
≥ 1	5
0	0

10 pts

Social Proof

Stargazer count. An imperfect but broadly available signal of ecosystem recognition. Weighted lowest because stars are gameable and noisy.

Stars	Points
≥ 10,000	10
≥ 1,000	8
≥ 100	5
≥ 10	2
< 10	0

Critical Penalty

If a repository is archived or has had no push events in 730+ days, the final score is multiplied by 0.5. This reflects terminal maintenance risk.

Repository Score Composition

30

25

20

15

10

Longevity Activity Community Releases Social

05

Score Tiers

Numerical scores are mapped to four tiers for quick interpretation. These tiers are consistent across all ecosystem types (npm, PyPI, GitHub).

80–100

Strong structural trust

Well-established, actively maintained, multiple maintainers or strong organizational backing. Low structural risk.

60–79

Moderate

Adequate maintenance signals but with structural gaps. Review the specific dimension breakdown before depending on this package.

40–59

Weak

Multiple structural concerns. Consider alternatives or evaluate whether the risk is acceptable for your use case.

0–39

Minimal

High structural risk across multiple dimensions. The package may be abandoned, very new, or unmaintained.

Badge Colors

SVG badges served at /api/badge/:ecosystem/:package use these colors:

CRITICAL < 40 40–59 60–74 75+ Not found

06

Worked Examples

Three packages that illustrate why score and risk flags are independent signals. Data as of April 2026.

axios Score: ~89 CRITICAL

Longevity25/2511+ years old

Momentum25/25~100M weekly downloads, growing

Release20/20100+ versions, recently published

Maintainer4/151 maintainer

GitHub~15/15104k stars, active repo

Why CRITICAL despite high score: The score says "well-established, widely adopted, actively maintained." The flag says "one set of credentials controls infrastructure used by millions of projects." The ua-parser-js attack (October 2021) showed exactly what happens when that one credential is compromised. The structural signal was visible for years.

chalk Score: ~75 CRITICAL

Longevity25/2512+ years old

Momentum25/25~411M weekly downloads

Release~14/20Moderate version count, less recent

Maintainer4/151 maintainer

GitHub~7/1521k stars, lower recent activity

The asymmetry: chalk is downloaded 411 million times per week. One person controls the publish token. This is not a judgment on the maintainer — it's a statement about the ecosystem's exposure. The highest-download single-maintainer package in npm.

express Score: ~97 No flags

Longevity25/2515+ years old

Momentum25/25~93M weekly downloads, growing

Release20/20300+ versions, recently published

Maintainer15/155+ maintainers

GitHub~12/1565k stars, active multi-contributor repo

High score, no flags: express demonstrates what structural trust looks like. Long-lived, actively maintained by a team, with organizational backing from the OpenJS Foundation. No single credential controls the package.

07

Limitations

No scoring model is complete. These are the known limitations of the current methodology. Acknowledging them is a feature, not a caveat.

Weights are informed heuristics, not empirically derived

The dimension weights (25/25/20/15/15) reflect expert judgment about which structural signals correlate most with supply chain risk. They are not the output of a regression model trained on historical attack data. Such a model would require a comprehensive labeled dataset of supply chain compromises that does not exist at sufficient scale. The weights are defensible, not proven.

Code quality is not measured

Commit does not assess test coverage, code complexity, documentation quality, or architectural soundness. A well-structured package with zero tests and a chaotic package with comprehensive coverage receive the same score if their structural signals are identical.

Maintainer identity is not verified

Maintainer count is taken from registry metadata. Whether those accounts represent distinct individuals, or whether they have appropriate access controls (2FA, hardware keys), is not checked. A package with 5 maintainers where all 5 share one person's email alias would score the same as one with 5 independent maintainers.

Thresholds are step functions

A package at 999,999 weekly downloads scores 18 on momentum; at 1,000,000 it scores 22. This discrete jump does not reflect a real-world discontinuity. Continuous scoring functions would be smoother but harder to audit and reproduce. We chose transparency over precision.

No security practice assessment

Whether a package uses signed releases, requires 2FA for publishers, or has a security policy is not measured. These are important signals, but they are declarations (gameable), not behaviors (observable). Future versions may incorporate signals that bridge this gap.

Ecosystem coverage

v1.0.0 covers npm, PyPI, and GitHub repositories. Cargo, Go modules, Maven, and other ecosystems are not yet supported. Each ecosystem requires calibrated download thresholds and registry-specific metadata handling.

08

Versioning & Changelog

This specification follows Semantic Versioning.

Major Breaking changes — weight redistribution, dimension removal or addition, flag threshold changes

Minor New ecosystem support, additional flag types, new dimensions (additive only)

Patch Clarifications, worked examples, editorial corrections

Changelog

v1.0.0 2026-04-19 Initial specification. npm, PyPI, and GitHub scoring. Risk flag definitions.

Trust Scoring
Methodology

Design Principles

Structural over behavioral

Predictive over reactive

Complementary to CVE scanning

Open methodology, reproducible scores

No declarations, only behavior

Package Scoring: npm & PyPI

Longevity

Download Momentum

npm (weekly downloads)

PyPI (daily downloads)

Release Consistency

Version count (base)

Recency bonus

Maintainer Depth

GitHub Backing

Score Composition

Risk Flags

Repository Scoring: GitHub

Longevity

Recent Activity

Community

Release Cadence

Social Proof

Critical Penalty

Repository Score Composition

Score Tiers

Badge Colors

Worked Examples

Limitations

Weights are informed heuristics, not empirically derived

Code quality is not measured

Maintainer identity is not verified

Thresholds are step functions

No security practice assessment

Ecosystem coverage

Versioning & Changelog

Changelog

See it in action

Trust ScoringMethodology

Design Principles

Structural over behavioral

Predictive over reactive

Complementary to CVE scanning

Open methodology, reproducible scores

No declarations, only behavior

Package Scoring: npm & PyPI

Longevity

Download Momentum

npm (weekly downloads)

PyPI (daily downloads)

Release Consistency

Version count (base)

Recency bonus

Maintainer Depth

GitHub Backing

Score Composition

Risk Flags

Repository Scoring: GitHub

Longevity

Recent Activity

Community

Release Cadence

Social Proof

Critical Penalty

Repository Score Composition

Score Tiers

Badge Colors

Worked Examples

Limitations

Weights are informed heuristics, not empirically derived

Code quality is not measured

Maintainer identity is not verified

Thresholds are step functions

No security practice assessment

Ecosystem coverage

Versioning & Changelog

Changelog

See it in action

Trust Scoring
Methodology