Does it loop, get stuck, or change its answer?
Quality on a good day is easy. Reliability is whether the bad days are rare and bounded. Two failure modes matter most in production: repetition / stuck output, where an autoregressive model falls into a loop and keeps emitting the same fragment, and determinism, how much the same input changes from one run to the next. Neither is on this page yet — the detection probes are in development, and we publish nothing until they run cleanly across enough repeats to support an honest claim.
Repetition / stuck output
will measure Whether the model falls into an autoregressive loop — repeating the same word, phrase, or spectral fragment until the turn ends or times out. This is a real production failure mode for streaming speech models, not a theoretical one: a single stuck turn can derail an entire call. We scan the generated audio and its transcript for looping n-grams and frozen frames, then report it as an occurrence rate across runs — did a loop happen, and how often.
Detection probe in development. We have an early single-run drift-by-duration signal, but it is noisy and not promotable to a headline number — so it stays off this page until the probe runs across enough repeats.
Determinism — run-to-run variance
will measure How much the same input changes from one run to the next. We run an identical prompt repeatedly and classify the spread into a class: cached (byte-identical output), deterministic (stable up to encoding noise), or stochastic (a materially different answer each time). Buyers building reproducible voice products need to know which class a model is in before they trust it in a pipeline.
Reported as a labelled class, not a single cherry-picked run. The class lands with a maturity pill once we have the repeats with a confidence interval to back it.
Reliability is the one dimension you cannot fake by showing your best output — it is defined by the worst case and the spread, not the highlight reel. So this page waits for the detection probes to run across enough repeats. When they land, repetition becomes an occurrence rate and determinism becomes a labelled class, each carrying a maturity pill. Until then there are no numbers here on purpose.