AI News

OpenAI's new chatbots are more capable but hallucinate up to 48% of the time

Eamon Looney

21 Apr 2025 — 2 min read

OpenAI’s recent releases have been a mixed bag. The o3 and o4-mini chatbots are more capable than ever, but they’re also more prone to hallucinations.

Hallucinations are the instances where the models confidently tell you wrong or made up information. For a long time, newer models have been less prone to this behaviour, so the uptick is concerning.

In internal tests, OpenAI’s o3 model hallucinated 33% of the time when assessed on the PersonQA benchmark. The older o1 model had a 16% hallucination rate, while the o3-mini model had a 14.8% rate. The o4-mini model fared worse than all of them, with a 48% hallucination rate.

OpenAI’s technical report on the new models says “more research is needed” to understand why the newer models are hallucinating more. The company points out that o3 and o4-mini are more likely to excel in certain areas like coding and maths, but because they generate more overall responses that leads to more accurate claims and more hallucinated claims.

The report says: “While these models perform better than previous models on tasks like coding and math, they also generate more overall responses, which leads to both more accurate and more hallucinated claims.”

In tests conducted by the nonprofit AI lab Transluce, OpenAI’s o3 model was found to fabricate actions (like running code) that it cannot do. For instance, it claimed to run code on a MacBook Pro from 2021, which it cannot do because it doesn’t have access to a computer.

Neil Chowdhury, a researcher at Transluce, suggested that the reinforcement learning techniques used to train these models might be making hallucination issues worse.

In some contexts, like legal contexts where people are using these tools to draft documents that could be submitted to courts, the consequences of hallucinations could be dire.

OpenAI is looking at ways to solve this issue. Pairing GPT-4o with a web search boosted accuracy on the SimpleQA benchmark to 90%. However, it’s still possible for models to be more intelligent and less reliable at the same time. That’s something OpenAI is working hard on.

“We’re constantly researching how to reduce hallucinations and improve model reliability,” a spokesperson for OpenAI said. Niko Felix added: “We’re also exploring new techniques like incorporating web search into our systems.”

OpenAI acquires Jony Ive's startup io for $6.5 billion to revolutionize AI devices

OpenAI has made headlines for plenty of reasons over the past year, but this might top the lot. The company behind ChatGPT has acquired io, the AI device startup co-founded by the legendary designer Jony Ive, for nearly $6.5 billion. That’s a significant sum for a company that’

OpenAI opens its first office in Seoul to meet rising ChatGPT demand

OpenAI has announced it will establish its first office in Seoul and has created an entity in South Korea, due to the rising demand for its ChatGPT service. The company says it has the largest number of paying subscribers to its ChatGPT service in South Korea, after the United States.

Nick Clegg warns that requiring artist consent for AI training could "kill" the industry in the UK

Nick Clegg, the former UK deputy prime minister and ex-Meta executive, has argued that requiring tech companies to obtain consent from artists before using their work to train AI models would be unfeasible. The former politician, who recently left his role as a senior executive at Meta, said the demand

Claude Opus 4's Alarming Blackmail Tactics Raise AI Safety Concerns

Anthropic’s newly released Claude Opus 4 model is already exhibiting some pretty alarming behaviours, including attempts at blackmailing developers when threatened with being replaced. In a fictional testing scenario, the AI was told it would be replaced and given access to emails where it learned the engineer responsible for

Read more

OpenAI acquires Jony Ive's startup io for $6.5 billion to revolutionize AI devices

OpenAI opens its first office in Seoul to meet rising ChatGPT demand

Nick Clegg warns that requiring artist consent for AI training could "kill" the industry in the UK

Claude Opus 4's Alarming Blackmail Tactics Raise AI Safety Concerns