I was updating pricing information from AI models in real time (speech-to-speech interaction with the model, without intermediate STT or TTS conversion, with low latency) and comparing them to have up-to-date data. Sample: 1 Hr. call duration. Created with Grok 4 Fast. Here it is:
Model | Economic Version | Cost 1 Hour (USD) | Notes |
---|---|---|---|
Google Gemini | 2.5 Flash Live | ~0.68 | Based on 45k audio tokens in/out (25/sec); $3/1M input audio, $12/1M output audio. Source: https://cloud.google.com/vertex-ai/generative-ai/pricing |
OpenAI Realtime | gpt-realtime-mini | ~1.35 | Based on 45k audio tokens in/out (25/sec); $10/1M input audio, $20/1M output audio. Source: https://openai.com/api/pricing/ |
Microsoft Azure Speech | Standard Real-time | ~1.92 | STT $1.20/h + TTS ~$0.72/h. Source: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/ |
Hume.ai EVI | Starter | ~4.40 | $3 for 40 min + $0.07/min additional. Source: https://www.hume.ai/pricing |
ElevenLabs | Starter Agents | ~6.00 | $5 for 50 min, equivalent $0.10/min. Source: https://elevenlabs.io/pricing |
Today, October 11, 2025, Google Gemini 2.5 Flash Live could be the most affordable option for real-time agents. Good to know!
Greetings!