SYNTHETIC SPEECH ATLAS

Analysis Methodology & Statistical Framework

Document version: 2026-04-02 (Sign-Off) | Status: Methodology LOCKED

1. Project Objective

Build a physically-grounded, academically defensible framework for detecting synthetic speech using classical signal-processing features. The core thesis is that TTS/VC systems cannot replicate the coupled biomechanical constraints of the human vocal tract, creating detectable physics violations.

The SSA is a companion to the Human Speech Atlas. Together they provide a 1M+ synthetic / 600k+ bonafide feature corpus.

2. Pipeline Architecture

Two-Pass Extraction

Feature Tiers & Gating

3. Corpus State (as of 2026-04-02)

TableRowsBonafideSpoofBrouhaha%
in_the_wild31,77919,96311,81699% bona
fake_or_real10,6845,2865,398100% bona
sonar3,6992,2741,425100% bona
librisevoc92,40713,20179,206100% bona
asvspoof5_dev142,13431,331109,171100% bona
asvspoof5_eval681,872138,688542,086100% bona
asvspoof5_train182,35718,797163,5600% (pending)
asvspoof_2021_df193,0725,79137,64050% bona
dfadd207,95544,454163,5010%
add202255,4085,31950,0890%
mlaad156,9660156,9660%
wavefake134,2660134,266100%
TOTAL1,904,318285,1041,455,124

* Usable bonafide (Brouhaha-graded): ~213,484 clips. Ground-truth gender metadata available for ~190,000 clips.

4. Statistical Peer Review & Revisions

Following a rigorous internal diagnostic review (April 2026), the methodology transitioned from a significance-based (p-value) model to an effect-first architecture. With 1.7M+ rows, p-values merely confirm sample size; the framework must interrogate physical magnitude.

The "Acoustic Bloodbath" (KS Invariance Failure)

ZERO of 41 features are biologically invariant across environments. Every feature undergoes a statistically significant distribution shift moving from Tier 1 (Studio) to Tier 2 (Near-field). A pure biological baseline is unattainable without tier stratification.

Action Taken: The single global baseline was deprecated to exploratory use. All formal detection claims must utilize tier-stratified baselines.

Feature Quarantines & Corrections

TEO Artifact Quarantine: Teager Energy Operator features (teo_mean, teo_std) exhibit CV% up to 283%. They act as dataset-identification footprints rather than biological markers. They are strictly disqualified from the global baseline and relegated to conditional Gate 2 use.

Formant Sex-Confounding: Formant velocities (f1_velocity, f2_velocity) show significant sex-based biological gaps (Cohen's d = -0.34 for M vs F). A single global baseline is demographically naive. Action: Implemented a pooled within-sex standard deviation for detection normalisation, sourced from ground-truth datasets.

nPVI Dataset Blacklist: The 100% instant-kill rate observed on ASVspoof2021 DF was confirmed to be a codec/compression artifact, not a universal biological failure. npvi is formally blacklisted for this specific dataset to prevent metric contamination.

5. Defensible Detection Framework

Dual-Gated Baseline Architecture

Tier A: Academically Defensible Detectors

Features passing the KS D < 0.1 / KL < 0.02 invariance thresholds, demonstrating consistent directional Cohen's d across both tiers with N ≥ 385:

Conditioned Features

Final Verdict: "Statistically Honest"

"The system has moved from a model overwhelmed by data to one that interrogates the physics. The baseline is now academically defensible." — Statistician Sign-Off, April 2026.

The framework correctly abandons p-value significance in favor of KS Distance (D > 0.1) and Kullback-Leibler Divergence. Feature blacklists act as the strongest defense against academic fluff, acknowledging when a dataset's internal compression destroys a metric rather than claiming biological failure.

Data Dictionary Return to Hub