An IETF Draft Specifies Trust Scoring for AI Agents. Five Dimensions, Five Tiers, One Implementation Gap.
A March 2026 internet-draft specifies behavioral trust scoring for AI agent payments. 0–100 score, L0–L4 spend tiers, public cross-org query API. The category just got a protocol document. The implementation is still the whole thing.
Someone outside this category wrote down what we’ve been building.
On March 25, 2026, Raza Sharif (CyberSecAI Ltd, UK) submitted draft-sharif-agent-payment-trust-00 to the IETF: “Trust Scoring and Identity Verification for Autonomous AI Agent Payment Transactions.” Individual submission. Version -00. Not adopted by any working group, not approved by anyone. Just one engineer writing down what the protocol should look like when AI agents start paying for things at scale.
Read what it specifies:
| Level | Score | Per-transaction | Daily |
|---|---|---|---|
| L0 | 0–19 | $0 | $0 |
| L1 | 20–39 | $10 | $50 |
| L2 | 40–59 | $100 | $500 |
| L3 | 60–79 | $1,000 | $5,000 |
| L4 | 80–100 | $50,000 | $200,000 |
Five dimensions, equal 20% weighting. Per-agent cryptographic identity via ECDSA P-256.
Challenge-response identity verification. A public trust query API at
GET /api/public/trust/{agentId}, batch support, no authentication required.
This isn’t a competing system. It’s our system, written by someone we’ve never met.
What the draft validates
There’s a moment, when you’re building infrastructure for a category nobody has named yet, where the most useful thing that can happen is for someone unconnected, in a different country, working from different first principles, to converge on your protocol shape. That’s what an IETF submission is. It isn’t a customer, isn’t a partner. It’s a signal that the category exists.
The five dimensions in draft-sharif-agent-payment-trust-00 are the dimensions of behavioral trust:
- Code Attestation — does this agent’s code match what it claims to be running? L3 identity.
- Execution Success Rate — does it complete tasks without anomaly? Behavioral telemetry.
- Behavioural Consistency — does it behave the same way over time? Drift detection.
- Operational Tenure — how long has it been observed? Skin-in-the-game proxy.
- Anomaly History — has it ever misbehaved, and how? Negative signal weight.
We’ve written about each of these. Identity is L3; the draft accepts that L3 alone is insufficient and gives Code Attestation only 20% of the score. The remaining 80% is behavioral. “The TOCTOU of Trust” called this shape out two months ago: trust verified at handshake is not trust at use. The draft codifies the response in protocol form.
The tiered spend limits also match how this should work. $0 at L0. $50,000 per transaction at L4. Skin-in-the-game is the only unfakeable signal. An agent that has demonstrated low anomaly history over operational tenure deserves higher autonomy than one that just registered, and the protocol enforces this with money, which is the right place to enforce it.
What the draft doesn’t specify
Read the draft carefully and one thing is absent: where the score comes from.
It says the score exists. It says how to query it. It says what to do with it: clamp the spend. It does not specify who computes the score, how the inputs are gathered, what cryptographic substrate the attestations sit on, or how cross-organization claims are deduplicated. That’s normal for an early IETF draft. It’s also where the implementation lives.
The same gap exists at the payment layer. x402 specified how agents pay. It did not specify who is allowed to say no. Visa TAP, Mastercard Verifiable Intent, and Microsoft Entra Agent ID all answer “who is this agent.” None of them compute “should this agent be allowed to spend $50,000.”
This is what an attestation substrate is for.
What Commit ships today
Commit’s Proof of Commitment provides cryptographic commitments that an actor (human or agent) has skin in the game. They aren’t declarations or promises. They’re receipts. The commitments accumulate over operational tenure, are verifiable cross-organization without authentication, and are tamper-evident.
Map this onto draft-sharif’s five dimensions:
| Draft dimension | Commit substrate signal |
|---|---|
| Operational Tenure | First commitment timestamp, attestation continuity |
| Behavioural Consistency | Recurring commitment cadence and shape |
| Anomaly History | Slashing events, withdrawn commitments, revocations |
| Execution Success Rate | Task-level outcome attestations |
| Code Attestation | Paired with L3 identity sources (Visa TAP, Entra Agent ID) |
If the draft moves toward adoption, Commit is one of the substrates that can supply the inputs the score formula assumes already exist. If the draft stalls, the category still exists and the substrate is still the bottleneck.
The substrate accrues value either way. That’s the bet.
The opportunity for whoever ships first
Standards work and implementation work compound differently. Drafts move on multi-year timelines. Implementations move on weekly timelines. A draft at -00 is a flag planted in the ground. The implementation that’s already running when the draft hits -02 is the one that gets cited.
If you’re building agent-aware payment infrastructure (an ACP merchant, an x402 facilitator, a Stripe Connect platform with agent traffic, an identity-as-a-service vendor extending into behavior), the trust score query is going to land in your spec sheet. The agents will come with credentials. The merchants will ask “should we let this one buy?” Someone has to answer.
We answer with commitments. The IETF draft says the answer should be a 0–100 number computed from five dimensions. Same shape. Different vocabulary.
The category has a protocol document now. The implementation is the whole thing.