AI’s Hallucination Problem Isn’t Going Away
By: bitcoin ethereum news|2025/05/07 05:30:01
0
Share
Topline The most recent releases of cutting-edge AI tools from OpenAI and DeepSeek have produced even higher rates of hallucinations — false information created by false reasoning — than earlier models, confounding the companies and presenting challenges as the industry evolves. Hallucinations are when AI bots produce fabricated information and present it as fact. (Photo ... More Illustration by Thomas Fuller/SOPA Images/LightRocket via Getty Images) Key Facts AI bots have always produced at least some hallucinations, which occur when the AI bot creates incorrect information based on the information it has access to, but OpenAI’s newest o3 and o4-mini models have hallucinated 30-50% of the time, according to company tests, for reasons that aren’t entirely clear. OpenAI bills o3 as its most powerful model because it is a “reasoning” model, which takes more time to “think” by working out answers to questions through step-by-step reasoning, with the company also claiming the model can think visually and process images. But it’s not just an OpenAI problem: Another recent tool, Chinese company DeepSeek’s R1 reasoning model, hallucinates much more than DeepSeek’s traditional AI models, according to independent tests by the AI research firm Vectara. Though companies are not exactly sure why reasoning models hallucinate so much, the New York Times reported these models can hallucinate at each step throughout their advanced “thinking” processes, meaning there are even more chances for incorrect responses. Researchers at Vectara acknowledged reasoning models seem to hallucinate more, but suggested the training behind these reasoning models, like R1, are to blame instead of the model’s advanced “thinking” process. How Often Do Ai Models Hallucinate? In OpenAI’s tests of its newest o3 and o4-mini reasoning models, the company found the o3 model hallucinated 33% of the time during its PersonQA tests, in which the bot is asked questions about public figures. When asked short fact-based questions in the company’s SimpleQA tests, OpenAI said o3 hallucinated 51% of the time. The o4-mini model fared even worse: It hallucinated 41% of the time during the PersonQA test and 79% of the time in the SimpleQA test, though OpenAI said its worse performance was expected as it is a smaller model designed to be faster. OpenAI’s latest update to ChatGPT, GPT-4.5, hallucinates less than its o3 and o4-mini models. The company said when GPT-4.5 was released in February the model has a hallucination rate of 37.1% for its SimpleQA test. Vectara’s independent tests, which ask chatbots to summarize news articles, found some newer reasoning models performed markedly worse than other models. OpenAI’s o3 scored a 6.8% hallucination rate on Vectara’s test, while DeepSeek’s R1 scored a 14.3%. The performance for DeepSeek’s R1 was markedly worse than other DeepSeek chatbots, like the DeepSeek-V2.5 model, which hallucinated 2.4% of the time. IBM’s Granite 3.2 model, which the company says comes with advanced reasoning capabilities, also scored worse than the company’s other models on Vectara’s test. The Granite 3.2 model, which comes in a more complex edition and a smaller edition, performed worse than earlier IBM models—the larger 8B version had a hallucination rate of 8.7%, according to Vectara, while the smaller 2B version hallucinated 16.5% of the time. Why Do Ai Chat Bots Hallucinate? AI models hallucinate because they are trained on a certain amount of data, and they are prompted to respond to queries with the most statistically likely answer. Questions asked outside of the data the AI model knows can lead to the bot responding with incorrect information, and their probability-based approach sometimes leads to the bot finding faulty patterns and creating fabricated information. AI hallucinations can be grammatically correct and are presented as fact, despite being incorrect. Incomplete or biased data sets or flaws in an AI model’s training can also contribute to AI hallucinations. Transluce, a nonprofit AI research firm, analyzed OpenAI’s o3 model and said another contributing factor to hallucinations may be that these models are designed to maximize the chance of giving an answer, meaning the bot will be more likely to give an incorrect response than admit it doesn’t know something. What Have Ai Companies Said About Hallucinations? OpenAI acknowledged o3’s hallucination rate in a research paper recapping internal tests on its models, stating o3’s tendency to make more definitive claims—as opposed to acknowledging it doesn’t know an answer—means it makes both more correct answers and more incorrect answers. The company admitted more research is needed to understand and fix the model’s hallucination issues. OpenAI CEO Sam Altman previously said hallucinating is more like a feature of AI instead of a bug, adding “a lot of value from these systems is heavily related to the fact that they do hallucinate.” Companies that develop AI products, including Google, Microsoft and Anthropic have all said they are working on fixes to hallucination issues. Microsoft and Google have both released products—Microsoft’s Correction and Google’s Vertex—that they say can flag information that may be incorrect in AI bot responses, though TechCrunch reported experts expressed doubt these will fully solve AI hallucinations. How Are Researchers Trying To Stop Hallucinations? Researchers largely say stopping AI bots from hallucinating is impossible, but many are working on various ways to reduce the rates of hallucinations. Some researchers have proposed teaching AI models uncertainty, or the ability to say, “I don’t know,” to avoid producing falsehoods, the Wall Street Journal reported. Other researchers are relying on “retrieval augmented generation,” a technique in which the AI bot retrieves documents relevant to the question to use as a reference, rather than answering immediately based on data stored in its memory. Chief Critics Some researchers have criticized the term “hallucination” because it may erroneously humanize AI models, as a human hallucination—in which a person perceives something that is not real—is not the same as an AI bot making up false information. Usama Fayyad, executive director of Northeastern University’s Institute for Experiential Artificial Intelligence, told Northeastern Global News the term “hallucination” attributes “too much to the model,” including intent and consciousness, which AI bots do not have. Further Reading A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse (New York Times) Why Do AI Chatbots Have Such a Hard Time Admitting ‘I Don’t Know’? (Wall Street Journal) What are AI chatbots actually doing when they ‘hallucinate’? Here’s why experts don’t like the term (Northeastern Global News) Source: https://www.forbes.com/sites/conormurray/2025/05/06/why-ai-hallucinations-are-worse-than-ever/
You may also like

Inter-generational Prisoner's Dilemma Resolution: The Nomadic Capital and Bitcoin's Inevitable Path
When the Baby Boomer generation collectively sells off, who will be the "bag holder" in the next asset crash?

Upstream and downstream are starting to fight, all for the sake of everyone being able to "Lobster"
「Lobster」 may not be a mature product yet, but it has already ushered in a new era of 「AI Assistants」.

Circle and Mastercard Announce Partnership, the Next Stage for the Crypto Industry Belongs to Payments
Stablecoins are transitioning from a speculative tool to real financial scenarios such as payments, cross-border transfers, and store of value.

From 5 Mao per kWh of Chinese electricity to a $45 API export: Tokens are rewriting currency units
When the same unit can both measure hashing power and facilitate payments, it ceases to be just a term and begins to evolve into a new currency of both value and influence.

Why is OpenAI playing catch-up to Claude Code instead?
Anthropic Bets Earlier on AI Programming, OpenAI Strategic Tempo Misaligned

Vitalik wrote a proposal teaching you how to secretly use AI large models
Vitalik believes that in the AI era, users should not have to sacrifice their identity to use an AI tool.

The doubling of Circle's stock price and the paradigm shift of stablecoins
The initial investments from Circle and Stripe, whether it is the R&D expenses for Arc, the high financing costs associated with Tempo, or the billion-dollar acquisitions of Bridge-type assets, are more akin to "placement fees" rather than commercially recoverable investments in the short term.

Key Market Information Discrepancy on March 13th - A Must-See! | Alpha Morning Report
1. Top News: Latest Developments in US-Iran Conflict, Son of Soleimani Vows Revenge, US Navy Plans to Escort Ships in the Strait of Hormuz
2. Token Unlock: $HTM

On-Chain Options Explosion.ActionEvent
Options are becoming the new anchor in the cryptocurrency market.

《Time》 Magazine Names Anthropic as the World's Most Disruptive Company
The most AI-wary company has created the most dangerous AI

Predictions market gains mainstream traction in the US, Canada, Claude launches Chart Interaction feature, What's the English community talking about today?
What Did Foreigners Care About Most in the Last 24 Hours?

500 Million Dollars, 12 Seconds to Zero: How an Aave Transaction Fed Ethereum's "Dark Forest" Food Chain
Spend $154,000 to buy AAVE at market price of only $111

AI Agent needs Crypto, not Crypto needs AI
It is not Crypto that needs AI to survive, but rather AI Agents that need Crypto to be implemented: when AI truly shifts from "thinking" to "executing," it must seek the boundaries of authority and funding within the programmable primitives of Crypto.

Stablecoins are breaking away from cryptocurrency, becoming the next generation of infrastructure for global payments
The use of stablecoins is shifting from facilitating low-cost cross-border remittances to supporting general commercial activities and inter-company vendor payments.

Web3 teams should stop wasting marketing budgets on the X platform
The announcements from the project party are still very important, but they should no longer be the starting point of promotional activities; instead, they should be the endpoint.

Strive buys Strategy stocks, and Bitcoin treasury companies start nesting each other
When everyone's bets are placed on the same table, the difference between "structured financing" and "concentrated gambling" may just be a few more arrows drawn on the PPT.

Strive to buy Strategy stock, Bitcoin Treasury company starts nesting dolls with each other
Bitcoin hodlers are starting to nested be in each other.

Key Market Intel on March 12th, how much did you miss out on?
1. On-chain Funds: $29.7M inflow to Hyperliquid today; $30.9M outflow from Base
2. Biggest Gainers/Losers: $DRV, $LYN
3. Top News: US plans to release 172M barrels of oil to curb prices, on-chain pre-market crude oil gains narrow by 4%
Inter-generational Prisoner's Dilemma Resolution: The Nomadic Capital and Bitcoin's Inevitable Path
When the Baby Boomer generation collectively sells off, who will be the "bag holder" in the next asset crash?
Upstream and downstream are starting to fight, all for the sake of everyone being able to "Lobster"
「Lobster」 may not be a mature product yet, but it has already ushered in a new era of 「AI Assistants」.
Circle and Mastercard Announce Partnership, the Next Stage for the Crypto Industry Belongs to Payments
Stablecoins are transitioning from a speculative tool to real financial scenarios such as payments, cross-border transfers, and store of value.
From 5 Mao per kWh of Chinese electricity to a $45 API export: Tokens are rewriting currency units
When the same unit can both measure hashing power and facilitate payments, it ceases to be just a term and begins to evolve into a new currency of both value and influence.
Why is OpenAI playing catch-up to Claude Code instead?
Anthropic Bets Earlier on AI Programming, OpenAI Strategic Tempo Misaligned
Vitalik wrote a proposal teaching you how to secretly use AI large models
Vitalik believes that in the AI era, users should not have to sacrifice their identity to use an AI tool.