
[Adobe Stock]
To unravel why even Silicon Valley’s giants can’t fix AI’s tendency to make stuff up, we sat down with Alon Yamin, co-founder and CEO of Copyleaks, an AI-based text analysis company whose offerings include AI content detection and plagiarism detection. He explains how generative AI’s reliance on pattern prediction perpetuates misinformation, making reliable, real-time fact-checking prohibitively complex. He also details how Copyleaks’ detection and verification tools can help authenticate text and flag inaccuracies before they reach users.
Big tech companies already have significant resources and top-tier AI talent with top salaries. Why, in your view, are they still struggling to ensure factual accuracy and alignment when summarizing or generating news content?

Alon Yamin
Yamin: Despite having significant resources and talent, big tech companies face challenges when it comes to ensuring AI-generated news is factually accurate. This is largely due to the limitations of generative AI models, which are designed to predict text patterns rather than verify facts. Real-time fact-checking would require constant tweaks to the model’s knowledge base and data sources. This can be both complicated and costly, making consistency difficult.
Have seen pundits argue tech giants could simply “bolt on” an alignment or fact-checking module to ensure that AI-generated summaries stay faithful to the original story. What technical and practical barriers make this simpler-sounding solution so difficult in reality?
Yamin: While a fact-checking or alignment module may seem like a quick fix, it is much more complicated in practice. AI models need constant updates to their knowledge base and data, which is a task itself. Additionally, fact-checking involves understanding context, something that AI is currently limited with. While some fact checking tools are in place, they often don’t fully prevent AI hallucinations or check data biases.
False headlines, fictional legal citations, and incorrectly labeled historical events have all surfaced in AI outputs. What are the main drivers behind LLMs’ propensity to produce “confidently wrong” information?
Yamin: The main drivers behind LLMs’ tendency to produce “confidently wrong” information is their design and limitations. Many models are built to predict the most likely sequence of words, not to fact-check. LLMs can learn from datasets that may include biases or outdated information and as a result, models can confidently output incorrect facts without realizing they’re wrong.
Some in the field, including from OpenAI, an Apple partner, have at various points said that the hallucination issue was close to being solved while also noting later that the issue was more of a feature than a bug. It seems it is also one of the factors complicating AI adoption at the same time. How do you think this Apple development will shape the situation?
Yamin: Apple’s integration of AI tools like ChatGPT with Siri makes the issue of AI hallucinations even more pressing, especially since these tools are interacting with millions of users every day. While the issue has improved, it’s clear that hallucinations are not just a technical bug, but a fundamental challenge of AI models. By making AI more accessible to every day tech users, Apple and OpenAI are under public scrutiny and pressure to fix the hallucination issue. Apple will likely make efforts to improve reliability, but this highlights the need for transparency in order to build public trust in these models.
Copyleaks specializes in AI-based text analysis, including plagiarism detection and verifying content authenticity. How could your tools, or similar platforms, be integrated into these AI workflows to reduce misinformation and “hallucinations”?
Yamin: AI detection tools like Copyleaks play a significant role in content creation by ensuring authenticity and originality. As generative AI becomes more advanced, identifying AI-written content is essential for maintaining credibility and confidence in both content and AI. By integrating our plagiarism and copyright infringement detection and content verification capabilities into AI workflows, platforms could cross-check generated text, flagging errors in real-time. This could help identify false information or misrepresented facts generated by AI before they reach users. We allow AI visibility and transparency so that you always aware of when AI is being used while mitigating copyright and IP risks that are common with LLM output. Our mission is to allow anyone to leverage all AI benefits while mitigating its risks.