Fast on average isn't fast enough.
Production SLAs live in the tail, not the median. End-to-end latency at p50/p95/p99 shows which providers stay tight and which spike — the difference between "feels instant" and "occasionally hangs."
End-to-end latency
live| Provider | p50 | p95 | p99 |
|---|---|---|---|
| OpenAI GPT-4o Transcribe | 974 ms | 1.65 s | 10.08 s |
| ElevenLabs Scribe v2 | 1.01 s | 1.57 s | 1.94 s |
| Deepgram Nova-3 | 2.22 s | 3.25 s | 4.62 s |
| AssemblyAI Universal-3 Pro | 3.33 s | 4.79 s | 5.21 s |
n=150/provider · single-location, includes network RTT · measured 2026-06-01
OpenAI GPT-4o Transcribe has the fastest median but a ~10 s p99 tail — the median hides it. ElevenLabs is both fast and tight (p99 under 2 s). The two slower providers are at least consistent.