Search

OpenAI Introduces New Audio Models in API, Can Be Used for Agentic Workflows

Three new AI models, GPT-4o-transcribe, GPT-4o-mini-transcribe, and gpt-4o-mini-tts, were introduced by OpenAI.

Advertisement
Highlights
  • These models can be customised to speak in a certain manner
  • The text-to-speech models can express emotions through voice
  • OpenAI’s new generation of audio models outperforms its existing models
OpenAI Introduces New Audio Models in API, Can Be Used for Agentic Workflows

OpenAI says its transcription models can pick speech with accents in noisy environments

Photo Credit: OpenAI

OpenAI, on Thursday, introduced new audio models in application programming interface (API) that offer improved performance in accuracy and reliability. The San Francisco-based AI firm released three new artificial intelligence (AI) models for both speech-to-text transcription and text-to-speech (TTS) functions. The company claimed that these models will enable developers to build applications with agentic workflows. It also stated that the API can enable businesses to automate customer support-like operations. Notably, the new models are based on the company's GPT-4o and GPT-4o mini AI models.

OpenAI Brings New Audio Models in API

In a blog post, the AI firm detailed the new API-specific AI models. The company highlighted that over the years it has released several AI agents such as Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools. However, it added that the true potential of agents can only be unlocked when they can perform intuitively and interact across mediums beyond text.

There are three new audio models. GPT-4o-transcribe and GPT-4o-mini-transcribe are the speech-to-text models and the GPT-4o-mini-tts is, as the name suggests, a TTS model. OpenAI claims that these models outperform its existing Whisper models which were released in 2022. However, unlike the older models, the new ones are not open-source.

Coming to the GPT-4o-transcribe, the AI firm stated that it showcases improved “word error rate” (WER) performance on the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark which tests AI models on multilingual speech across 100 languages. OpenAI said the improvements were a result of targeted training techniques such as reinforcement learning (RL) and extensive midtraining with high-quality audio datasets.

These speech-to-text models can capture audio even in challenging scenarios such as heavy accents, noisy environments, and varying speech speeds.

The GPT-4o-mini-tts model also comes with significant improvements. The AI firm claims that the models can speak with customisable inflections, intonations, and emotional expressiveness. This will enable developers to build applications that can be used for a wide range of tasks including customer service and creative storytelling. Notably, the model only offers artificial and preset voices.

OpenAI's API pricing page highlights that the GPT-4o-based audio model will cost $40 (roughly Rs. 3,440) per million input tokens and $80 (roughly Rs. 6,880) per million output tokens. On the other hand, the GPT-4o mini-based audio models will be charged at the rate of $10 (roughly Rs. 860) per million input tokens and $20 (roughly Rs. 1,720) per million output tokens.

All of the audio models are now available to developers via API. OpenAI is also releasing an integration with its Agents software development kit (SDK) to help users build voice agents.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

 
Show Full Article
Please wait...
Advertisement

Related Stories

Popular Mobile Brands
  1. OnePlus 13T Chipset, Rear Camera Details Revealed in New Teasers
  2. Nvidia GeForce RTX 5060, GeForce RTX 5060 Ti Price in India Announced
  3. Lenovo Legion Y700 4th Gen Tablet Confirmed to Launch in May
  4. Infinix Note 50s 5G+ With 64-Megapixel Rear Camera Launched in India
  5. Oppo K12s 5G Launch Date, Design, Colours and Key Features Confirmed
  6. Google Pixel 9a Is Now Eligible for Android 16 Beta Programme
  7. Redmi Turbo 4 Pro Will Launch Next Week With a 2.5K Display
  8. CMF Phone 2 Pro Confirmed to Feature a Total of Three Rear Cameras
  9. Oppo A5 Pro 5G Price in India Leaked Ahead of April 24 Launch
  10. Realme Buds Air 7 Pro Launch Date, Design, Key Features Confirmed
  1. Atomfall's Game Pass Launch a 'Huge Success', Rebellion CEO Says
  2. OnePlus Pad 2 Pro Allegedly Spotted on Geekbench With Snapdragon 8 Elite Chipset
  3. Oppo A5 Pro 5G Price in India Leaked Ahead of April 24 Launch
  4. OpenAI Unveils Codex CLI, an Open-Source Agentic Coding Assistant That Can Operate Locally
  5. Google to Appeal Against Part of US Court's Decision in Monopoly Case
  6. Lenovo Legion Y700 4th Gen Tablet Officially Teased; Confirmed to Launch in May
  7. NASA Hubble Space Telescope Helps Confirm the First Solitary Black Hole
  8. CMF Phone 2 Pro Rear Camera Unit Teased; Confirmed to Get a Telephoto Sensor
  9. Indian Telecom Operators May Hike Tariffs by December 2025 As Part of Tariff Repair Efforts: Report
  10. OpenAI Introduces Flex Processing in API to Help Developers Cut AI Usage Costs
Gadgets 360 is available in
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »