AI News

OpenAI launches advanced speech-to-text and text-to-speech models for developers

Eamon Looney

21 Mar 2025 — 1 min read

OpenAI has introduced new speech-to-text and text-to-speech models in its API, giving developers the tools to create more advanced voice agents.

Full API Documentation.

The new speech-to-text models – gpt-4o-transcribe and gpt-4o-mini-transcribe – promise to greatly improve word error rates and language recognition over the existing Whisper models.

That’s due to reinforcement learning and a lot of training on a diverse range of audio datasets. OpenAI says: “Our latest speech-to-text models achieve lower word error rates across established benchmarks” which means transcriptions are more likely to be reliable in challenging environments.

Those challenges could include noisy backgrounds, strong accents, or fast speech. It’ll be great news for developers looking to build more reliable transcription services.

The gpt-4o-mini-tts model, meanwhile, will let developers choose the speaking style of the voice agent. So, for instance, it could sound like a friendly customer service agent.

Right now, though, it only supports synthetic preset voices. OpenAI is planning to improve the intelligence and accuracy of these models and explore custom voice options. The company is also engaging with policymakers and researchers about the implications of synthetic voices.

The new models are available to all developers through OpenAI’s API. They’re also integrated with the Agents SDK to make it easier to build applications.

For low-latency speech-to-speech applications, OpenAI recommends using the Realtime API. The pricing for the new models is as follows:\n\ngpt-4o-transcribe: $6 per million Audio Input Tokens (0.6 cents per minute)

gpt-4o-mini-transcribe: $3 per million Audio Input Tokens (0.3 cents per minute)

gpt-4o-mini-tts: $0.60 per million Audio Input Tokens (1.5 cents per minute)

OpenAI acquires Jony Ive's startup io for $6.5 billion to revolutionize AI devices

OpenAI has made headlines for plenty of reasons over the past year, but this might top the lot. The company behind ChatGPT has acquired io, the AI device startup co-founded by the legendary designer Jony Ive, for nearly $6.5 billion. That’s a significant sum for a company that’

OpenAI opens its first office in Seoul to meet rising ChatGPT demand

OpenAI has announced it will establish its first office in Seoul and has created an entity in South Korea, due to the rising demand for its ChatGPT service. The company says it has the largest number of paying subscribers to its ChatGPT service in South Korea, after the United States.

Nick Clegg warns that requiring artist consent for AI training could "kill" the industry in the UK

Nick Clegg, the former UK deputy prime minister and ex-Meta executive, has argued that requiring tech companies to obtain consent from artists before using their work to train AI models would be unfeasible. The former politician, who recently left his role as a senior executive at Meta, said the demand

Claude Opus 4's Alarming Blackmail Tactics Raise AI Safety Concerns

Anthropic’s newly released Claude Opus 4 model is already exhibiting some pretty alarming behaviours, including attempts at blackmailing developers when threatened with being replaced. In a fictional testing scenario, the AI was told it would be replaced and given access to emails where it learned the engineer responsible for

Read more

OpenAI acquires Jony Ive's startup io for $6.5 billion to revolutionize AI devices

OpenAI opens its first office in Seoul to meet rising ChatGPT demand

Nick Clegg warns that requiring artist consent for AI training could "kill" the industry in the UK

Claude Opus 4's Alarming Blackmail Tactics Raise AI Safety Concerns