STT · Multilingual

Which providers actually speak which language?

The English leaderboard hides a coverage cliff. Some providers transcribe Thai, Indonesian, and Vietnamese about as well as they do English; others don't support them at all. We check support first and only benchmark a provider on a language it can actually transcribe — the rest read "does not support."

Per-language accuracy

live

Provider	EnglishWER	SpanishWER	ThaiCER	IndonesianWER	VietnameseWER
OpenAI GPT-4o Transcribe	2.4%	1.4%	8.1%	2.4%	2.5%
Alibaba Qwen3-ASR	2.6%	2.0%	4.8%	4.6%	2.4%
ElevenLabs Scribe v2	2.9%	1.8%	4.1%	3.0%	1.9%
xAI Grok STT	4.8%	3.1%	6.6%	2.9%	4.7%
Cartesia Ink-2	6.1%	does not support	does not support	does not support	does not support
Gradium	13.2%	does not support	does not support	does not support	does not support

FLEURS · all via the Speko gateway · English/Thai/Indonesian/Vietnamese loudness-normalized to −16 LUFS, measured 2026-06-03 (English 50-clip board, the rest 30 clips each) · Spanish FLEURS es_419, 250 clips, raw audio (not −16 LUFS), measured 2026-06-10 — so the Spanish column is a real measurement but cross-language comparison is approximate · Thai scored by CER (no word boundaries), the rest by WER

"Does not support" means the provider can't transcribe that language in its native script — it returns the wrong script or ~100% error — so we don't benchmark it there and don't publish a misleading number. Four of the six gateway providers cover the wedge: OpenAI, ElevenLabs, xAI, and Alibaba transcribe Spanish/Thai/Indonesian/Vietnamese; Cartesia and Gradium are English-only.

Wedge coverage is in progress: Malay and Filipino are not yet measured (corpus pending). Accent equality and code-switching are separate territories, below.