Back to Blog
Publishers

AI Runs on Facts. You Own Them.

Infactory Team·
Cover Image for AI Runs on Facts. You Own Them.

Why Publishers Are Fighting Back—And How Infactory Gives Them the Tools to Win

By the Infactory Editorial Team

Adapted from reporting by Isabella Simonetti and Robert McMillan for The Wall Street Journal (July 9, 2025)

Artificial intelligence may be the future, but your past currently powers it.

Every article, post, opinion piece, and explainer published over the last two decades has become training fuel for AI systems. And increasingly, those systems are feeding users direct answers, without ever sending them back to the publisher who wrote them.

In response, publishers worldwide are pushing back.

Lawsuits. Licensing deals. Technical barriers.

From Reddit to The New York Times, media companies are no longer asking AI bots to stop scraping—they’re working to block them outright, and in some cases, drag them into court.

But these reactive defenses won’t be enough.

To stay relevant—and profitable—in the AI economy, publishers need a proactive strategy.

That’s where Infactory comes in.

The Web Has Changed—and It’s Costing You

Scraping has existed since the early days of the internet. It once had mutual value: Google crawled your site, and in return, you got traffic. Those days are over.

Today’s generative AI tools, like OpenAI’s ChatGPT and Google’s Gemini, pull answers from your content but send zero visitors back to your site. This isn't just lost exposure—it's lost ad revenue, lost subscriptions, and lost influence.

“Search traffic has dropped precipitously for many publishers,” reports the Wall Street Journal.

And the next wave (AI-generated results that cite fewer links) is already arriving via features like Google's AI Mode.

Cloudflare estimates that scraping activity has surged 18% in just the past year. Publishers are essentially being looted in broad daylight, with no compensation and no transparency.

CEO of a major publication puts it plainly:

“You want humans reading your site, not bots—especially bots that aren’t returning any value to you.”

The Legal and Technical Battles Have Begun

Some publishers are responding with force:

  • The New York Times is suing Microsoft and OpenAI for unauthorized scraping, even as it licenses content to Amazon.
  • Reddit filed a suit against Anthropic for accessing its site over 100,000 times—even after being told to stop.
  • iFixit, the DIY repair site, claims Anthropic hit its servers over one million times in 24 hours, calling it both “theft” and a “resource drain.”
  • Wikimedia is changing its access policies because AI bots are overloading their infrastructure.

Others are deploying new technology.

Dotdash Meredith is working with Cloudflare, which now acts as a kind of “toll booth” for AI scrapers, allowing publishers to set the terms of engagement.

However, these methods still rely on controlling access, rather than monetizing value.

Infactory: From Blocked Access to Licensed Intelligence

Infactory flips the model.

Instead of simply locking down content and hoping AI companies will play fair, Infactory helps you turn your archives into queryable, AI-ready data products—without compromising your data.

With Infactory, You Can:

✅ License facts, not full feeds or articles

✅ Keep your content in place—no duplication or storage

✅ Set pricing, usage rules, and licensing terms

✅ Explore your data to discover valuable, monetizable insights

✅ Deploy structured APIs in minutes without heavy engineering

Infactory doesn't replace your paywalls—it adds a new layer of monetization on top of your existing archive.

“Robots.txt” Was a Do Not Trespass Sign. AI Companies Just Walked Right Through It.

For years, publishers relied on robots.txt—a simple file that tells bots not to scrape their sites. But most AI companies now ignore it outright.

As WSJ reports, “AI companies added bots that override Robots.txt instructions,” based on data from TollBit, a firm helping publishers track and monetize bot activity.

Gannett’s Chief Product Officer compared it to “putting up a ‘Do Not Trespass’ sign”—only to be ignored.

Cloudflare, Fastly, and DataDome now offer bot-blocking services. These measures may slow down bad actors, but they don’t help publishers monetize the good ones, nor do they enable safe, structured access to high-value data.

Infactory does both.


Why This Fight Matters—for Everyone

Some worry that restricting bots could have unintended consequences, like limiting academic research or cybersecurity scans. But that’s exactly why structure and transparency matter.

Instead of hiding behind fences or filing lawsuits after the fact, publishers need a platform that allows them to:

  • Control who accesses their content and how
  • Understand what their data is used for
  • Monetize without cannibalizing existing revenue streams

Infactory’s Unique Query Methodology™ allows publishers to expose specific facts, insights, or signals without giving away the entire article. This protects subscriptions while enabling AI systems to pull the value they need—legally, transparently, and profitably.

A More Equitable Web Starts With You

“The web is being partitioned to the highest bidder,” said Shayne Longpre of the Data Provenance Initiative. “That’s really bad for market concentration and openness.”

Infactory offers a better path forward—one where publishers remain in control, AI gets smarter with permission, and users benefit from reliable, attributable information.

Start Turning Your Archives into Revenue—Now

  • Explore real-time queries and test demand
  • Control how your data is accessed, used, and priced
  • Gain insights into what your archive can answer
  • Launch licensing-ready endpoints in minutes

Request a demo or get started Book a demo now!

AI Runs on Facts. You Own Them.

Infactory helps you turn that ownership into opportunity.