The $10 Billion Trust Data Market That AI Companies Can't See

AI companies are spending hundreds of millions on content and listings. None of it tells them whether a business is actually good.

OpenAI paid News Corp $250 million for articles. They pay Reddit $60 million a year for comments. Yelp charges $25 per thousand API calls for reviews. Perplexity is on track for $656 million in revenue this year, largely by repackaging other people's content with citations.

The AI industry has an insatiable appetite for data about the world. And virtually all of it — every dollar, every deal, every API call — buys the same thing: what someone wrote about what happened.

Nobody is selling them what actually happened.

The Spending Spree

The numbers are staggering. In the past 18 months, AI companies have signed over $1 billion in content licensing deals. OpenAI alone has agreements with News Corp ($250M+), the Associated Press, Hearst, Condé Nast, and at least a dozen other publishers. Meta — entering content licensing for the first time — signed multi-year deals with CNN, Fox News, People, and USA Today.

Reddit's data licensing hit $130 million per year from Google and OpenAI combined. Yelp's "other revenue" category — which includes AI data licensing — grew 17% year-over-year, accelerating to 30-33% in Q4. Their 2026 revenue target is $1.475 billion.

Then there's the listing data play. DataLane raised $27 million in December 2025 to build an "identity graph of 20 million local businesses." Their product verifies that businesses exist — correct names, addresses, phone numbers, hours. They don't verify whether those businesses are any good.

All of this spending falls into two categories: content (what someone wrote) and listings (that entities exist). Neither category answers the question AI users actually ask: is this business worth my time and money?

The Trust Data Market Already Exists

Here's what makes this gap so striking: companies that sell trust-adjacent data are thriving. The market validates the demand. It just doesn't serve it.

Verisk generates $3 billion in annual revenue selling risk analytics to insurance companies. Their data comes from a cooperative pool of 1,850+ insurers contributing claims data. But it's actuarial — statistical risk models, not behavioral signals about individual businesses.

Dun & Bradstreet generates $2.4 billion selling the DUNS Number system and credit data on 600 million business records. Their Paydex score is "FICO for businesses." But it's based on self-reported trade credit data — businesses voluntarily reporting how their suppliers pay. Self-reported.

FICO generates $2 billion from credit scores. Embedded in regulation, essentially a monopoly. But it scores individuals, not businesses, and the data comes from three credit bureaus whose information is notoriously error-prone.

Trustpilot just posted a 320% increase in operating profit. Why? AI search. Trustpilot is the 5th most-cited domain globally on ChatGPT. AI click-throughs to Trustpilot surged 1,490% year-over-year. The company's revenue hit $261 million with 20% growth, and they're projecting EBITDA margin expansion in 2026.

The Trustpilot number is the one that should stop you. A 1,490% surge in AI platforms pulling review data tells you exactly one thing: AI desperately wants trust signals, and reviews are the only ones that exist in an accessible format. LLMs are gorging on Trustpilot not because reviews are good data. They're gorging because there's nothing better available.

Add these up. The companies selling trust-adjacent data generate over $10 billion in annual revenue. The broader Business Information Market — everything from credit data to background checks — was valued at $191.6 billion in 2025, projected to reach $306.6 billion by 2033.

The market for trust data is enormous and proven. But every dollar of it sells one of two things: opinions (what people said) or static records (what was filed). Nobody sells what happened next.

The Carfax Gap

The clearest way to see the missing product is through Carfax.

Carfax generates an estimated $230 million per year selling verified vehicle history reports. When you buy a used car, you don't want someone's opinion about it. You want facts: Was it in an accident? Was the odometer rolled back? Was it serviced regularly?

Carfax built this by aggregating data from 151,000 sources — DMVs, insurance companies, repair shops, manufacturers, auction houses — into 35 billion records indexed by VIN. No opinions. No reviews. Just: here is what happened to this specific car, verified by institutional records.

Now ask: who does this for businesses?

Not "4.2 stars on Google." Rather: Did customers come back? Did the business pay its suppliers on time? Did revenue grow or shrink? Did it pass its last health inspection? How many years has it survived?

The answer is nobody. The business equivalent of a Carfax report doesn't exist. Not because the data doesn't exist — but because nobody has assembled it into a product.

The Data Exists. The Product Doesn't.

This is the part that should frustrate you if you work in data or AI.

In Norway, the Brønnøysund Register Centre publishes full financial statements for every registered company — revenue, profit, equity, employee count, founding date — via a free, machine-readable API. The Food Safety Authority (Mattilsynet) publishes inspection results for every restaurant. BankID provides proof of personhood for 4.6 million people with zero fraud on the NFC path.

This is outcome data. Verified. Institutional. Public. And nobody has packaged it as trust infrastructure for AI.

The Norwegian case is the most accessible example, but the pattern is global. The UK's Companies House publishes financial data. France's Infogreffe does the same. Every country with food safety regulations publishes inspection results. Payment processors — Stripe, Visa, Mastercard — sit on the most comprehensive behavioral data in existence: who paid whom, how often, and whether they came back.

A Finextra analysis from earlier this year put it bluntly: "Payment processors are sitting on the most valuable trust data in business — and nobody's using it." Longer processing history equals richer trust signal. The data is there. It's just locked inside institutions that don't think of themselves as trust infrastructure.

Three Reasons the Gap Persists

If the demand is proven and the data exists, why hasn't someone built this? Three structural reasons:

Content is easier to license. Publisher deals have familiar mechanics — contracts, IP law, payment schedules. OpenAI can wire $250 million to News Corp and get a clean data feed. Assembling outcome data requires aggregating from hundreds of fragmented sources: business registries, inspection agencies, payment processors, regulatory filings. The integration work is a moat, not a feature.

Privacy is genuinely hard. The most valuable outcome data — repeat visits, return rates, transaction frequency — involves individual behavior. Aggregating "did customers come back?" at scale required privacy infrastructure that barely existed until recently. Zero-knowledge proofs are now production-ready: zkTLS has 3 million+ verifications with zero fraud. Semaphore V4 generates proofs in 3 milliseconds. The technical bottleneck cleared in 2025-2026. The product bottleneck hasn't.

The cold start problem feels fatal. If you need behavioral data from millions of businesses, where do you start? The answer: you don't start with behavioral data. You start with public outcome data — registry filings, inspection results, financial statements — which has no cold start problem at all. Layer behavioral signals on top as the network grows. The cold start fear stops people from starting. It shouldn't.

What AI Companies Actually Need

Here's the product that doesn't exist: a trust data API where an AI model can query a business identifier and get back structured, verified outcome signals.

Not sentiment. Not reviews. Not what someone wrote on Reddit. Verified answers to concrete questions:

  • How long has this business operated? (Registry data)
  • Is it financially healthy? (Filed financial statements)
  • Did it pass its last regulatory inspection? (Government records)
  • Do customers return? (Aggregated, anonymized behavioral data)
  • How does it compare to peers in its category and location? (Computed from all of the above)

The first three are available today from public sources. They just haven't been assembled into a product. The fourth requires privacy-preserving behavioral data infrastructure that now exists technically but hasn't been built as a product. The fifth is computation on top of the other four.

Every AI company that's licensing Yelp reviews at $25 per thousand calls would pay multiples of that for verified outcome data. Because outcome data does what reviews can't: it gives AI a ground truth to anchor its recommendations to.

Yelp's AI API proves the business model works. A business that sells AI companies something better than reviews — at a price that reflects the quality difference — enters a market that's already buying.

The Timing

Three things make this moment different from any previous attempt to build trust data infrastructure:

AI is the buyer that didn't exist before. Previous trust data products (D&B, FICO) sold to institutions making individual credit decisions. The new buyer is fundamentally different: AI platforms making millions of recommendations per day, each one requiring trust data that currently doesn't exist. ChatGPT has 883 million monthly active users. AI Overviews appear in 55% of Google searches. The volume of decisions that need trust data has increased by orders of magnitude — and the willingness to pay for it is proven by the billion-dollar content licensing spree.

Privacy technology matured. Contributing behavioral data (did I return to this restaurant?) without revealing individual behavior (which restaurant, when, how often) requires zero-knowledge proofs at consumer scale. That wasn't possible two years ago. It is now.

Regulation is creating the identity layer. eIDAS 2.0 mandates digital identity wallets for 450 million Europeans by end of 2026. BankID already covers essentially all Nordic adults. World ID has verified 18 million unique humans. The sybil-resistance infrastructure — ensuring each data contribution comes from a real, unique person — is being built by others, at their expense.

The problem is proven ($10B+ market). The data exists (registries, inspections, transactions). The privacy technology works (zkTLS, Semaphore). The identity layer is being built (eIDAS, BankID, World ID). The buyer is desperate and spending (AI content licensing).

What's missing is the product.

The Race Nobody Has Entered

Here's what's strange about this moment. AI companies are spending furiously on data. Trust data companies are posting record profits. Privacy technology just crossed the production threshold. And nobody — not Yelp, not Trustpilot, not D&B, not any AI company — has built a verified outcome data product for AI.

Yelp is closest, but they sell opinions through an API, not outcomes. Trustpilot is the biggest beneficiary, but passively — AI models cite them because there's nothing better, not because Trustpilot built an AI product. DataLane verifies listings, not quality. D&B sells self-reported credit data.

The race to build the trust data layer for AI hasn't started. Everyone is selling yesterday's data type — opinions, listings, content — to tomorrow's buyer. The first company to sell verified outcome data enters the market with the product everyone needs and nobody has.

We know the market works. We know what the product looks like. We know the technology is ready. The only question left is who builds it first.


This is the fourth essay in a series on behavioral commitment as trust infrastructure. Commitment Is the New Link introduces the thesis. AI Lies About Your Favorite Restaurant maps the problem. Five Stars, Zero Commitment shows the first working prototype. This essay makes the market case. We're building Commit — starting with the one country where every company's outcome data is already public.

Stay in the loop

Early access, research updates, and the occasional strong opinion.