STT Benchmarkpublished

Speech-to-Text

Accuracy, speed, and cost on clean English audio.

Provider
ElevenLabs
Scribe v2 Realtime
3.4%1.0 s$0.0067/min
Alibaba
qwen3-asr-flash
3.5%0.6 s$0.0021/min
AssemblyAI
Universal-3 Pro
5.1%4.2 s$0.0067/min
Google Cloud
Chirp 2
5.4%4.5 s$0.024/min
ElevenLabs
Scribe v1
5.4%1.1 s$0.0067/min
Google
Gemini 2.5 Flash (STT)
6.0%2.5 s$0.0002/min
AssemblyAI
Universal-2
6.2%3.7 s$0.0062/min
Deepgram
Nova-3
7.8%2.4 s$0.0043/min
OpenAI
gpt-4o-mini-transcribe
8.2%2.0 s$0.0030/min
OpenAI
whisper-1
11.6%2.6 s$0.0060/min
xAI
Grok STT
16.8%0.9 s$0.0017/min
OpenAI
gpt-4o-transcribe
19.4%1.9 s$0.0060/min

Clean English audio · word error rate, p50 latency, cost per minute · lower is better