TTS · Audio Quality

Loud, clean, and natural — or too clean?

Three pure-DSP measurements on the same audio the stability probe collected: integrated loudness consistency, vocal micro-imperfections (the uncanny-zone scatter), and basic mastering hygiene. Repetition-loop detection ships once the analyzer threshold is tuned.

LUFS Variance

Integrated loudness (EBU R128) across 50 short utterances per provider. Tells you whether downstream playback needs per-provider normalization.

15.6 dB spread between the loudest and quietest providers — every production app integrating these will need a per-provider gain stage.

Uncanny Zone — Jitter vs Shimmer

Vocal micro-perturbations on voiced segments. The reference box is the 5th–95th percentile of 50 real human audiobook recordings (LibriSpeech dev-clean-2) measured with this identical pipeline — jitter 1.37–3.81%, shimmer 0.71–1.27 dB.

8 of 12 providers sit inside the human reference band (jitter 1.37–3.81% × shimmer 0.71–1.27 dB). The human mean — green diamond at jitter 2.29% × shimmer 0.96 dB — gives the visual anchor for "typical human" on this metric.

Codec Hygiene

Basic mastering health on the long-form drift audio — clipping rate, DC offset, intersample true peak (dBTP), crest factor (peak/RMS in dB).

Provider	Clipping %	DC offset	True peak (dBTP)	Crest (dB)
ElevenLabs v3	0.0098	0.00077	-0.09	13.54
Cartesia Sonic	0.0000	0.00005	-0.62	16.39
MiniMax Speech 2.6 HD	0.0000	0.00057	-1.93	14.95
Deepgram Aura 2	0.0000	0.00000	-3.21	22.41
Inworld	0.0000	0.00003	-3.73	17.29
xAI TTS	0.0000	0.00000	-3.91	18.25
AWS Polly Generative	0.0000	0.00004	-5.89	17.86
Gradium	0.0000	0.00011	-7.24	18.47
OpenAI 4o-mini TTS	0.0000	0.00009	-10.77	19.41

2 of 9 providers ship audio above the −1 dBTP safety margin — those mixes risk clipping on Apple/Spotify normalization.