Loud, clean, and natural — or too clean?
Three pure-DSP measurements on the same audio the stability probe collected: integrated loudness consistency, vocal micro-imperfections (the uncanny-zone scatter), and basic mastering hygiene. Repetition-loop detection ships once the analyzer threshold is tuned.
LUFS Variance
Integrated loudness (EBU R128) across 50 short utterances per provider. Tells you whether downstream playback needs per-provider normalization.
15.6 dB spread between the loudest and quietest providers — every production app integrating these will need a per-provider gain stage.
Uncanny Zone — Jitter vs Shimmer
Vocal micro-perturbations on voiced segments. The reference box is the 5th–95th percentile of 50 real human audiobook recordings (LibriSpeech dev-clean-2) measured with this identical pipeline — jitter 1.37–3.81%, shimmer 0.71–1.27 dB.
8 of 12 providers sit inside the human reference band (jitter 1.37–3.81% × shimmer 0.71–1.27 dB). The human mean — green diamond at jitter 2.29% × shimmer 0.96 dB — gives the visual anchor for "typical human" on this metric.
Codec Hygiene
Basic mastering health on the long-form drift audio — clipping rate, DC offset, intersample true peak (dBTP), crest factor (peak/RMS in dB).
| Provider | Clipping % | DC offset | True peak (dBTP) | Crest (dB) |
|---|---|---|---|---|
| ElevenLabs v3 | 0.0098 | 0.00077 | -0.09 | 13.54 |
| Cartesia Sonic | 0.0000 | 0.00005 | -0.62 | 16.39 |
| MiniMax Speech 2.6 HD | 0.0000 | 0.00057 | -1.93 | 14.95 |
| Deepgram Aura 2 | 0.0000 | 0.00000 | -3.21 | 22.41 |
| Inworld | 0.0000 | 0.00003 | -3.73 | 17.29 |
| xAI TTS | 0.0000 | 0.00000 | -3.91 | 18.25 |
| AWS Polly Generative | 0.0000 | 0.00004 | -5.89 | 17.86 |
| Gradium | 0.0000 | 0.00011 | -7.24 | 18.47 |
| OpenAI 4o-mini TTS | 0.0000 | 0.00009 | -10.77 | 19.41 |
2 of 9 providers ship audio above the −1 dBTP safety margin — those mixes risk clipping on Apple/Spotify normalization.