The Truth About AI Search Citations

How Infactory's Query-Level Control Creates a Better Model for Publishers & Enterprises in the AI Era

Infactory Truth About AI Search Citations
Infactory Truth About AI Search Citations

A recent Columbia Journalism Review study highlights a troubling trend in AI search: tools that confidently provide incorrect information while failing to properly cite sources. This research reveals critical flaws in how AI search engines handle information retrieval and attribution—particularly when it comes to news content.

The Citation Crisis in AI Search

The CJR study tested eight prominent Gen AI search tools by feeding them direct excerpts from news articles and asking them to identify the article's headline, publisher, publication date, and URL. The excerpts chosen would all come up within the first 3 results of a Google search. The results were alarming:

  • Over 60% of responses across all platforms were incorrect

  • Premium chatbots provided more confidently incorrect answers than free versions

  • Multiple platforms appeared to bypass publisher blocking preferences and accessed content they were explicitly blocked from crawling

  • Many platforms fabricated links or cited syndicated content rather than original sources, diverting traffic away from content creators

  • Content licensing deals with publishers did not guarantee accurate citations in responses

These findings underscore a fundamental issue in today's AI landscape: even as these tools derive value from trusted content, they frequently fail to provide proper attribution while confidently presenting incorrect information. This is not just a ChatGPT problem but a problem across all of the prominent LLMs. 

Ignoring Publisher Boundaries

Perhaps most concerning, the study found that several AI search tools appeared to disregard the Robot Exclusion Protocol – the standard mechanism publishers use to indicate which parts of their sites should not be crawled. For example:

  • Perplexity Pro correctly identified nearly ⅓ of articles from publishers that had explicitly blocked its crawler 

  • Some tools correctly cited articles from paywalled sources they shouldn’t have been able to access

  • Multiple platforms retrieved and displayed content from publishers that had taken specific technical measures to prevent access

This suggests these platforms may be bypassing publishers’ stated preferences about how their content should be used, undermining a long-established web standard that provides content creators with control over their work. 

Why This Matters: A Broken System for Both Publishers and Enterprises

The CJR study reveals a broken system that harms both content creators and enterprises looking to innovate in AI:

  1. Publishers & Content Creators: AI systems are using their valuable content while providing incorrect attribution, bypassing access controls, and diverting traffic away from original sources – essentially extracting value without proper compensation or credit 

  2. Enterprises: AI systems built on this foundation deliver unreliable information with unwarranted confidence, creating liability risks and eroding user trust

  3. Trust and reliability issues: When AI confidently provides incorrect information, it erodes user trust and creates business liability

  4. Source attribution failures: Not properly crediting sources damages relationships with content partners, creates legal risks, and diverts valuable traffic from content creators while still benefitting from their credibility

  5. Lack of deterministic results: The same query often yields different results, making results unpredictable and unreliable

  6. The confidence illusion: A conversational interface that presents incorrect information with authority creates a dangerous illusion of reliability

The Infactory Solution: A Better Model for Publishers and Enterprises

At Infactory, we've built a fundamentally different approach that creates value for both publishers and enterprises through our Unique Query Methodology™ (UQM). Our platform represents a new paradigm where content creators can monetize through queries while enterprises gain access to accurate, reliable data. 

1. Query-Level Control 

For publishers and content creators, Infactory turns high-quality data into revenue with a novel approach:

  • Monetize per query: Instead of flat licensing fees which few customers could afford, publishers can capture value at the query level - getting paid each time their content provides value 

  • Complete visibility & control: Publishers gain insight into how their content is being used, with control over permissions and pricing 

  • Direct attribution: Proper citations and links ensure original sources receive credit and traffic when appropriate

2. Accuracy & Reliability

For enterprises building AI applications, Infactory delivers accuracy you can count on for business-critical decisions. Unlike traditional AI search tools that rely on probabilistic models, Infactory's deterministic UQM ensures accurate, repeatable results. Ask the same question, get the same answer—every time. This eliminates the "hallucination" problem where AI fabricates information with unwarranted confidence.

3. Full Data Provenance 

Infactory maintains complete lineage tracking that shows exactly where information comes from. Every insight is fully traceable to its source data—no black boxes, no fabricated citations.

5. Respect for Data Ownership and Attribution

Unlike LLMs that ignore publisher preferences and the Robot Exclusion Protocol, Infactory is built with respect for data ownership as a core principle. Our approach enables enterprises to work with their own data and licensed third-party data with clear permissions and proper attribution. 

Building AI You Can Trust

The CJR study reinforces what we've known all along: users need AI they can trust to deliver accurate, verifiable information with proper attribution.  

With Infactory:

  • Publishers can: license and monetize their high-quality content at the query level to reach a wider market, maintain control over usage, and ensure proper attribution

  • Enterprises can: access verified data sources and only pay for the queries they use, build reliable AI applications in days not months, and avoid the risks of fabricated AI outputs

This win-win approach creates a better AI cycle wherein:

  1. Publishers are incentivized to make their best content available in AI-ready formats

  2. Enterprise developers gain access to high-quality data sources with clear usage rights

  3. Consumers receive accurate, properly attributed information they can trust

  4. The entire cycle becomes more trustworthy and sustainable

The Path Forward

As AI becomes increasingly integrated into business workflows, we need an approach that creates value for all stakeholders. 

The CJR study revealed that even premium AI search products with direct licensing deals failed to consistently provide proper attribution. This highlights that traditional licensing models aren’t working - what’s needed is a fundamentally different approach to how AI systems access, process, attribute, and monetize information. 

Infactory is leading the way in creating this new paradigm - where publishers receive fair compensation for their content, enterprise developers get the accurate data they can trust when building AI solutions, and everyone benefits from a more reliable information economy.

AI Without the Maybe: It’s not just about accuracy – it’s about building a better business model for AI.

Join Our Newsletter

Stay up to date on the latest industry insights and Infactory news

Join Our Newsletter

Stay up to date on the latest industry insights and Infactory news

Join Our Newsletter

Stay up to date on the latest industry insights and Infactory news

Ready to Talk to Your Data?

Request Access

©2025 Infactory Inc.

Utilities

Ready to Talk to Your Data?

Request Access

©2025 Infactory Inc.

Utilities

Ready to Talk to Your Data?

Request Access

©2025 Infactory Inc.

Utilities