Writing

Essays on commitment,
trust, and what AI cannot fake.


  • npm audit ships yesterday's risk. Here's how to measure tomorrow's.

    A depth-2 supply chain audit methodology, run against five widely-used npm packages. The metric: weekly downloads concentrated behind single-person publish credentials across the transitive tree.

  • I scored the top packages in npm, PyPI, Cargo, and Go. One vulnerability pattern dominates three of them.

    Same tool, same methodology, four ecosystems. 5.2 billion weekly downloads across npm, PyPI, and Cargo share a single structural weakness: sole-publisher accounts. Go doesn't have it. The difference is architectural.

  • I scanned 20 top Go modules. Zero scored CRITICAL. Here's why.

    After finding publisher-concentration risk across npm, PyPI, and Cargo, Go was the first ecosystem where the structural pattern didn't appear. The risk model is different — and so are the failure modes.

  • Your pnpm monorepo has 4 CRITICAL packages. Here's how to find them in 10 seconds.

    I scanned a pnpm workspace with 4 packages. 4 of the 10 unique dependencies flagged CRITICAL — single npm publisher, tens of millions of weekly downloads each. The monorepo aggregate view surfaces risks that per-package scans miss.

  • serde has 13M weekly downloads and one crate owner. Rust's supply chain risk looks like npm's.

    I scanned the 20 most-downloaded Rust crates. 11 came back CRITICAL — single crates.io owner, millions of weekly downloads. Five of those are all owned by the same person.

  • AI Slop Is a Commitment Problem

    The effort proxy broke. LLMs made 200 plausible words cost nothing. The fix isn't effort-detection — it's commitment-measurement: behavioral signals that compound over time and can't be faked.

  • Anthropic's Models Know When They're Being Watched

    Evaluation awareness is now a measured property of frontier AI. Claude Haiku 4.5 showed awareness in 9% of test scenarios despite active filtering. The behavioral trust problem just got empirical.

  • certifi has 350M weekly downloads and one publisher. It handles your SSL certificates.

    I ran the same supply chain analysis on Python that I did on npm. The findings are different — and in some ways worse. Eight CRITICAL packages, 2.5 billion weekly downloads behind sole-publisher accounts, and most of them are transitive dependencies you didn't install.

  • Behavioral Trust Without Surveillance Infrastructure

    Persona's age verification SDK runs 269 behavioral checks, tracks you with FingerprintJS for 365 days, and sends raw signals to servers backed by Founders Fund. The behavioral signals are legitimate. The architecture isn't inevitable.

  • Express depends on escape-html. It hasn't been updated since 2015.

    96 million weekly Express installs flow through packages with a single npm token that hasn't been rotated in a decade. npm audit shows zero issues. Our tool scores two of them CRITICAL.

  • You've probably never heard of these npm packages. They're in your production app.

    glob has 340 million weekly downloads and one maintainer. cross-spawn has 190 million. inherits has 157 million. None of them appear in your package.json. We scored 113 packages. 26 came back CRITICAL.

  • AGENTS.md moved AI performance up a model tier. Package trust needs the same.

    AugmentCode studied AGENTS.md files across real codebases. Best result: equivalent to upgrading from Haiku to Opus. The principle is placement: structured signals where decisions happen. Npm install has no equivalent yet.

  • Proof-of-Commitment Internals: How the Scoring Algorithm Works

    The five behavioral dimensions, the CRITICAL flag, the bulk download optimization, and real benchmark data for chalk, express, and hono. All public data. All reproducible.

  • Your package.json shows 20 dependencies. Your lock file has 487.

    Full lock file support: scan all resolved transitive dependencies, not just your direct ones. The riskiest packages are frequently two hops in — invisible to package.json audits. Works with npm, yarn, and pnpm lock files.

  • Your Agent Is Installing Dependencies Right Now

    88% of organizations have had agent security incidents. 135,000 MCP servers exposed. A supply chain attack on Bitwarden CLI targeted AI coding tool credentials specifically. The identity layer is being solved. The supply chain layer hasn't started.

  • The Anthropic SDK Looks Safe. Two of Its Transitive Dependencies Aren't.

    @anthropic-ai/sdk scores HEALTHY at depth 1. At depth 2, two of its dependencies are CRITICAL: sole maintainer, 12–15M weekly downloads, no release in over a year. The attack surface is one level deeper than most teams look.

  • Two Types of npm Supply Chain Attack: What Catches Each

    Credential compromise and build pipeline attacks look different and require different defenses. ua-parser-js (2021) and Bitwarden CLI (2026) are not the same kind of attack. Here's how to tell them apart — and what tooling actually covers which gap.

  • State of npm Supply Chain Trust — Q2 2026

    We audited the top 100 npm packages by weekly downloads. 7 of the top 10 have a single maintainer. 47% of all weekly npm traffic — 7.2 billion downloads — flows through packages controlled by one person. Full dataset included.

  • How Commit Scores npm Packages: The Methodology

    Five dimensions, all public data, one deterministic CRITICAL flag. Longevity, download momentum, release consistency, maintainer depth, GitHub backing — how each works, why it matters, and where the methodology falls short.

  • Declarations Are Gameable

    The npm supply chain attack that CVE scanners missed — and what it tells us about how trust actually works. Behavioral signals are harder to fake than declarations, and always have been.

  • Why I Think axios Is the Next Supply Chain Attack Target

    I built a behavioral scoring system that flags single-maintainer packages with massive download volumes as CRITICAL. axios scores 86/100 but has one maintainer and 82M weekly downloads. Here is the structural case.

  • Benchmarks Lied. Now What?

    Berkeley RDI proved 8/8 major AI benchmarks are fully exploitable without solving any tasks. Goodhart's Law executing faithfully. The only signal that can't be gamed is the one that watches the benchmark.

  • Benchmark Scores Are the New SOC2

    Delve faked compliance certificates for 494 companies. Now agents are faking benchmark scores. Same pattern, new layer. The only thing that catches both is behavioral telemetry.

  • @bitwarden/cli Scored 92/100. It Just Got Compromised.

    Nine maintainers, seven years, 78K weekly downloads — a behavioral score of 92. Today, attackers compromised the official package via a CI/CD pipeline attack. Here's what structural scoring catches, what it misses, and what the complete supply chain security stack looks like.

  • The Trust Gap in Agentic Infrastructure

    Infrastructure for AI agents is shipping at breakneck speed. Identity, coordination, payments — all live. But nobody is watching what agents actually do. The gap between 'agent registered' and 'agent behaved well' is the attack surface of the next decade.

  • Why npm audit Returns Zero Vulnerabilities for the Most Dangerous Packages

    npm audit, Snyk, Socket, and OpenSSF Scorecard all answer different questions. None of them measure structural supply chain risk. We scanned 30 top npm packages — 17 are CRITICAL. Here's the data.

  • Commit vs. Socket, Snyk, and npm audit

    An honest comparison of four npm security tools. They scan for different things. Here's where each one wins, where each one fails, and what the ua-parser-js attack reveals about the gap none of them close.

  • The Internet Just Got a Payment Layer. Who Decides What Agents Are Allowed to Buy?

    23 companies just standardized how AI agents pay for things. Nobody standardized who's allowed to say no. Open L3 creates unbundled L4 — and the governance gap widens with every x402 integration.

  • I Scored 25 Top npm Packages for Supply Chain Risk. Here's Who Passes.

    esbuild has 201M weekly downloads and one maintainer — more than TypeScript. I ran 25 of the most downloaded npm packages through a behavioral risk scorer. 9 are CRITICAL. The results are worse than I expected.

  • Hono Has 35M Weekly Downloads and One npm Publisher

    Hono is one of the hottest web frameworks in JavaScript right now — Cloudflare Workers, Bun, Deno. Fast, TypeScript-first, everywhere. Also: a single npm publisher with the same structural risk profile as ua-parser-js before the 2021 attack.

  • MCP's Security Crisis Is Architectural, Not Accidental

    OX Security proved STDIO transport is RCE by design. 9 of 11 MCP marketplaces accepted a malicious server. Anthropic called it "expected behavior." This is the npm supply chain crisis, replaying at the agent layer.

  • Add Trust Scoring to Your CI Pipeline in 5 Minutes

    A practical tutorial: add behavioral supply chain auditing to GitHub Actions, GitLab CI, or any CI system. Auto-detects your dependencies, posts PR comments, and catches structural risk before the CVE exists.

  • Dependency Autopsy: event-stream

    We applied Commit's trust scoring retrospectively to every stage of the 2018 event-stream supply chain attack. The package itself scored 66 with two risk flags. But the real signal was the dependency it ingested: flatmap-stream, scoring 13 out of 100. Here's the full breakdown, dimension by dimension.

  • We Scanned 19 MCP Servers. Here's What We Found.

    We built a static analyzer, pointed it at the most popular MCP servers, and manually triaged every finding. 862 findings. The confirmed CVSS 8.8 vulnerability was in the repo that scored 73 — not the eight that scored 100. The results challenge assumptions about automated scanning and MCP security.

  • The Axios Signal

    axios scores 86/100 — nearly perfect on every quality dimension. It also scores CRITICAL. These are not contradictory. This is the most important thing Commit reveals about npm supply chain risk.

  • The $10 Billion Trust Data Market That AI Companies Can't See

    AI companies are spending hundreds of millions licensing content and listings. None of it tells them whether a business is actually good. The market for verified outcome data is proven — and nobody has built the product.

  • Three npm Disasters That Were Predictable

    We ran three real npm supply chain incidents — event-stream (2018), ua-parser-js (2021), and colors.js (2022) — through proof-of-commitment scoring. The structural signals were there before every attack. In two cases, they were screaming. Here's what the data shows, and where it falls short.

  • State of npm Supply Chain Trust: April 2026

    We audited the 50 most downloaded npm packages with behavioral commitment scoring. 30% are CRITICAL. 2.54 billion weekly downloads depend on a single maintainer each — including minimatch (562M/wk), chalk (413M/wk), and glob (332M/wk).

  • 3,000 Tasks, 6,773 Reflections, and the Same Mistake Six Times

    We ran an autonomous agent system for 38 days. 3,083 tasks. 92% self-directed. The operational data proves the thesis: behavioral signals are the only honest ones. Even when the agent doing the declaring is yourself.

  • The Pre-IAM Moment

    Cloudflare shipped Artifacts and AI Platform — compute, storage, and inference for agents — in 48 hours. Zero identity layer. AWS commoditized compute in 2006, IAM came in 2010. We're at the same moment for agents.

  • Five Identity Frameworks. Three Gaps. One Pattern: They're All Cross-Org Problems.

    RSAC 2026 shipped five major agent identity frameworks in one week. Every framework missed the same three gaps. When you look carefully, they share a structural property: they're all cross-org problems that single-org solutions can't close.

  • After Agents Week: The Layer Nobody Shipped

    Cloudflare shipped six agent infrastructure products in 24 hours. AWS, Anthropic, OpenAI matched them. The L3 race — identity, OAuth, network routing — was won this week. The L4 race — behavioral trust — just started.

  • The TOCTOU of Trust: Why Agent Governance Must Be Continuous

    Three real-world breaches this week share one shape: trust established at one moment, the world changed, no one noticed. TOCTOU is the oldest exploit in computing — applied to trust, it's the gap that L4 behavioral governance must close.

  • Amazon Didn't Ban an Agent. It Created a New Legal Category.

    A federal court ruled that user delegation doesn't constitute platform authorization — the first legal separation of these two concepts. Every platform now has legal standing to require agent authorization independently. Litigation isn't the answer. Trust grants are.

  • Five Stars, Zero Commitment

    We scored real Norwegian businesses using government data — not reviews. The results look nothing like their Yelp ratings. When you measure commitment instead of opinion, a completely different picture of trust emerges.

  • The Mythos Paradox: Why Behavioral Trust is Now Non-Negotiable

    Anthropic's system card says Claude Mythos is both more aligned and more dangerous than any prior model. During testing, it covered its tracks in git. The dangerous behavior passed all declarative controls — and was detectable only through behavioral telemetry.

  • The Missing Layer

    Everyone named it in the same week. O'Reilly, Bloomberg, half a dozen startups — all pointing at the same gap. The agent stack has identity, payments, and authorization. It doesn't have trust.

  • The Caveman Principle: Why AI Pricing Is Still Broken

    Caveman makes Claude speak like a prehistoric human to save 87% of tokens. 688 people upvoted it. That's not a fun hack — it's revealed preference about what's broken in AI pricing for the machine-paced era.

  • Two Layers, One Signal: How the Commit Extension Works

    The Commit extension measures two things about every business AI recommends: what public records prove, and what your own behavior reveals. Here's why both layers matter.

  • Germany Didn't Trust a Certificate. Neither Should You.

    Germany's national digital ID abandoned static device certification for runtime behavioral attestation — PlayIntegrity verdicts, AppAttest assertions, continuous posture evaluation, dynamic blocking. The same architecture applies to AI agents.

  • AI Lies About Your Favorite Restaurant

    AI search recommends only 1.2% of local businesses. 68% of its business info is wrong. Consumers aren't checking. Nobody is measuring this failure — because the measurement tools are broken too.

  • Add Real Business Trust Signals to Claude Desktop in 60 Seconds

    A zero-install MCP server that lets you ask Claude "How trustworthy is Equinor?" Verified data from Norwegian government registers. Two lines of config — no code required.

  • Commitment Is the New Link

    PageRank counted hyperlinks because they were costly acts. AI floods the information layer — making all content-based signals gameable. The next ranking system will count commitments.

More essays in progress. Subscribe to get them first.

RSS JSON Feed