Webinar - May 13th: How Criteo powers real-time decisions with a reduced footprintRegister now
Blog

From pixels to protocols: The brave new world of publisher monetization for the open web

As AI reshapes content consumption, publishers need new tools to protect and monetize their work. Read about a protocol built for the machine-readable web.

April 18, 2025 | 6 min read
headshot-Daniel-Landsman-400w-150x150
Daniel Landsman
Head of Global AdTech

As generative AI (genAI) platforms scale their presence across the open web, they are rapidly reshaping the economics of digital content consumption. Publishers who were the architects of structured, monetizable content online are increasingly being bypassed in the value chain. 

Large language models and retrieval-based systems now consume content in non-human formats, extracting value at scale without attribution, licensing, or monetization pathway in return.

Here I propose a new machine-native monetization architecture designed to put publishers back in control. Specifically, I outline a framework based on vector embeddings, verifiable ownership metadata, and a proposed Agent Content Protocol (ACP) that enables licensing, tracking, and monetization for the AI-native web. I also explore the infrastructure required to make this vision operational, including embedding SDKs, vector databases, and real-time verification systems.

The problem: AI is consuming content without paying for it

GenAI models including GPT, Claude, Gemini, and open-source retrieval systems now crawl the internet to power training, summarization, inference, and query response. These systems do not browse the web like humans. They do not load pages, fire pixels, or trigger ad tags. Instead, they extract core meaning and convert it into high-dimensional vector representations and move on.

Once this process occurs, the original content becomes disconnected from its economic model. Traffic disappears. Attribution breaks. Monetization collapses.

Evidence of this shift is already emerging: 

  • Search Engine Land reports that early implementations of Google’s Search Generative Experience (SGE) can reduce publisher traffic by up to 70 percent by answering user queries directly in the search results. 

  • The New York Times has filed a landmark lawsuit against OpenAI for training large language models on its copyrighted content without consent or compensation. 

  • Industry experts such as Shiv Gupta of U of Digital have documented how publishers are increasingly competing with derivative outputs of their own material.

Right now, publishers’ only line of defense is the robots.txt file and similar voluntary restrictions. These mechanisms are unenforceable, opaque, and incapable of supporting scalable monetization.

A unified data platform for buy-side precision and sell-side scale

This white paper breaks down how Aerospike brings together document, graph, and vector search in one real-time database—so AdTech platforms can match users to the right ads, creatives, and products in under 100ms, even at global scale.

The architectural gap

The traditional publisher monetization model relies on human readers generating pageviews, ad impressions, and sessions. As AI agents replace these readers, they interact with content in fundamentally different ways, querying APIs, parsing vectors, and skipping the user interface layer entirely.

This evolution breaks the current monetization stack. In programmatic advertising, the value chain is visible, auction-based, and anchored in standards like OpenRTB, Prebid, and ads.txt. The emerging AI ecosystem lacks these structures.

What publishers now require is an equivalent layer of infrastructure: 

  • A monetization protocol that is machine-native

  • Interoperable

  • Enforceable at scale.

Vector embeddings: The foundation of machine-native ownership

Vector embeddings are dense mathematical representations that encode the semantic meaning of content. Already widely used in search, personalization, and recommendation engines, they are quickly becoming the lingua franca of machine learning systems.

This same technology can be used to encode ownership and access rights.

When vector embeddings include traceable signatures and metadata, what I term semantic watermarking, publishers can make enforceable claims of authorship. These watermarks can then be checked by AI agents and retrieval systems before content is used in inference or training.

This approach moves ownership from a document-centric model to an embedding-centric model, aligning incentives with the way AI systems actually process and interpret information.

The machine-native monetization stack

I propose a four-layer publisher stack to operationalize vector-based content ownership and licensing:

  1. Embedding SDKs: Tools that automatically generate and update semantic vector embeddings for articles, videos, or datasets. These SDKs would include ownership metadata, usage restrictions, and cryptographic signatures embedded at the time of content creation.

  2. Embedding Registries: Databases (either local or federated) that store publishers’ content embeddings and surface them for retrieval systems. These registries function as lookup tables where AI agents can verify provenance, usage permissions, and licensing terms.

  3. Agent Content Protocol (ACP): A standardized, open protocol for AI agents to query, negotiate, and license embeddings from publishers. Analogous to OpenRTB in programmatic advertising, ACP would formalize the handshake between content owners and AI consumers.

  4. Vector Database Infrastructure: High-performance vector databases, such as Aerospike, capable of sub-millisecond similarity search across billions of embeddings. These systems enable real-time content retrieval, attribution, and licensing enforcement.

Monetization pathways enabled by ACP

The implementation of this stack unlocks several viable monetization models for publishers:

  1. Crawl licensing AI crawlers and inference agents can be metered and charged for access to embeddings or raw content. This mirrors how video CDNs bill per request or impression.

  2. Embedding rentals Publishers can license their content embeddings for use in training datasets or retrieval systems. Contracts can define usage duration, scope, and renewal terms.

  3. Downstream attribution and linking Embedding-level metadata can enforce source citation or backlinking in AI outputs, restoring publisher visibility and referral traffic even in generative interfaces. Possibly leading to rev shares. 

Industry precedents and momentum

The blueprint for this kind of transformation already exists. Header bidding was a little unstructured until Prebid introduced SDKs and governance. Programmatic ad exchanges only scaled once OpenRTB created a shared protocol. Supply chain transparency came with the adoption of standards like ads.txt and sellers.json.

In the AI content economy, early signals are emerging:

  • Spawning.ai is enabling opt-in licensing for model training datasets.

  • The C2PA initiative, backed by Adobe and Microsoft, is building cryptographic content provenance standards for digital media.

  • Truepic and the Content Authenticity Initiative are promoting tamper-proof attribution layers for image and video content.

I believe a similar industry-wide effort is needed for text and structured data, governed by a consortium of publishers, infrastructure providers, and standards organizations such as the IAB Tech Lab. 

Beyond monetization: Embeddings as anchors of truth

As AI-generated content saturates the web, hallucinations and misinformation will increase. Publishers can reclaim a critical role by anchoring models to real, verified, and attributable sources. Embeddings serve as the unit of both semantic meaning and verifiable authorship, providing a mechanism for truth validation at scale.

This creates new business models, but also establishes a path toward ethical AI deployments that respect intellectual property and align incentives between publishers and platforms.

Conclusion

The shift to AI-native content consumption is irreversible. If publishers do not adapt their infrastructure, they risk losing not just traffic, but their economic rights entirely. 

The solution is clear: Build a machine-readable layer of ownership, access, and monetization based on: 

  • Embeddings

  • Open protocols, 

  • Fast vector infrastructure. 

This can be your wedge to maintain content ownership and data provenance. 

Page views and impressions were designed for people.

Embeddings are designed for machines.

Publishers who act now can become content owners again, not just content hosts.

We’ve been here before. If we build it right again, we keep the economics aligned and maybe even make the open web valuable again.

Publishers I’m looking at you.

Try Aerospike: Community or Enterprise Edition

Aerospike offers two editions to fit your needs:

Community Edition (CE)

  • A free, open-source version of Aerospike Server with the same high-performance core and developer API as our Enterprise Edition. No sign-up required.

Enterprise & Standard Editions

  • Advanced features, security, and enterprise-grade support for mission-critical applications. Available as a package for various Linux distributions. Registration required.