STT · Multilingual

Which providers actually speak which language?

The English leaderboard hides a coverage cliff. Some providers transcribe Thai, Indonesian, and Vietnamese about as well as they do English; others don't support them at all. We check support first and only benchmark a provider on a language it can actually transcribe — the rest read "does not support."

Per-language accuracy

live
Provider EnglishWER SpanishWER ThaiCER IndonesianWER VietnameseWER
OpenAI GPT-4o Transcribe 2.4% 1.4% 8.1% 2.4% 2.5%
Alibaba Qwen3-ASR 2.6% 2.0% 4.8% 4.6% 2.4%
ElevenLabs Scribe v2 2.9% 1.8% 4.1% 3.0% 1.9%
xAI Grok STT 4.8% 3.1% 6.6% 2.9% 4.7%
Cartesia Ink-2 6.1% does not support does not support does not support does not support
Gradium 13.2% does not support does not support does not support does not support

FLEURS · all via the Speko gateway · English/Thai/Indonesian/Vietnamese loudness-normalized to −16 LUFS, measured 2026-06-03 (English 50-clip board, the rest 30 clips each) · Spanish FLEURS es_419, 250 clips, raw audio (not −16 LUFS), measured 2026-06-10 — so the Spanish column is a real measurement but cross-language comparison is approximate · Thai scored by CER (no word boundaries), the rest by WER

"Does not support" means the provider can't transcribe that language in its native script — it returns the wrong script or ~100% error — so we don't benchmark it there and don't publish a misleading number. Four of the six gateway providers cover the wedge: OpenAI, ElevenLabs, xAI, and Alibaba transcribe Spanish/Thai/Indonesian/Vietnamese; Cartesia and Gradium are English-only.

Wedge coverage is in progress: Malay and Filipino are not yet measured (corpus pending). Accent equality and code-switching are separate territories, below.