The below represents the features we have extracted from each sample in our synthetic speech atlas, and a brief explaination of the feature in question. Features have been split into four groups by class.
| Column | Type | Description |
|---|---|---|
| anon_id | int | Anonymous sequential ID (post-shuffle) |
| label | string | bonafide or spoof |
| tier | string | Brouhaha quality tier (1=studio, 2=near-field, UNKNOWN=ungraded) |
| brouhaha_graded | int | 1=real Brouhaha grades computed, 0=defaulted |
| source_licence | string | SPDX licence identifier for this row's source |
| source_dataset | string | Originating dataset name |
| duration_ms | float32 | Clip duration (ms); bonafide bucketed to 500ms |
| duration_s | float32 | Clip duration (seconds) |
| Column | Datasets | Description |
|---|---|---|
| vocoder | LibriSeVoc | Neural vocoder architecture name |
| tts_system | SONAR | TTS system name (xTTS, OpenAI, FlashSpeech, etc.) |
| attack_id | ASVspoof5 Dev, Eval | Attack system identifier from ASVspoof5 protocol |
| codec_env | ASVspoof5 Dev, Eval | Codec environment condition |
| codec_type | ASVspoof5 Dev, Eval | Codec type applied to clip |
| gender | ASVspoof5 Dev, Eval | Speaker gender (from ASVspoof5 protocol TSV) |
Note: Values are float32. DSP-extracted values are round(2dp) then reinflated to FP16 resolution. Statistical properties are preserved.
PRISTINE-gated features are NaN unless brouhaha_graded = 1. Do not impute with zero — treat as structurally missing.
| Column | Units | Description |
|---|---|---|
| snr_median | dB | Median SNR (Brouhaha); 99.0 if ungraded |
| snr_mean | dB | Mean SNR (Brouhaha) |
| c50_median | dB | Median room clarity C50; 60.0 if ungraded |
| speech_ratio | 0–1 | Active speech proportion |
| pitch_mean | Hz | Mean F0 (voiced frames) |
| pitch_std | Hz | F0 standard deviation |
| pitch_range | Hz | F0 max–min range |
| npvi | — | Normalised Pairwise Variability Index (rhythm) |
| intensity_mean | dB | Mean RMS intensity |
| intensity_max | dB | Peak intensity |
| intensity_range | dB | Dynamic range (peak – minimum) |
| intensity_velocity_max | dB/frame | Max rate of intensity change |
| jitter_local | % | Cycle-to-cycle period perturbation (PRISTINE-gated) |
| shimmer_local | % | Cycle-to-cycle amplitude perturbation (PRISTINE-gated) |
| hnr_mean | dB | Harmonics-to-noise ratio (PRISTINE-gated) |
| cpps | dB | Cepstral peak prominence, smoothed (PRISTINE-gated) |
| hnr_c50_ratio | — | HNR adjusted for room acoustics (PRISTINE-gated) |
| cpps_snr_ratio | — | CPPS normalised for noise floor (PRISTINE-gated) |
| spectral_centroid_mean | Hz | Mean spectral brightness |
| spectral_tilt | — | HF vs LF energy slope |
| mfcc_delta_mean | — | Mean first-order MFCC delta |
| mfcc_high_variance | — | Upper MFCC band variance (bands 12–20) |
| zcr_mean | — | Mean zero-crossing rate |
| teo_mean | — | Mean Teager-Kaiser Energy Operator |
| teo_std | — | TEO temporal standard deviation |
| f1_mean | Hz | Mean first formant |
| f2_mean | Hz | Mean second formant |
| f3_mean | Hz | Mean third formant |
| formant_dispersion | Hz | F3–F1 vocal tract length proxy |
| articulation_rate | syl/s | Estimated syllables per second |
| phoneme_count | — | Estimated phoneme count |
| emotion_score | 0–1 | Affective charge heuristic |
| spectral_7k8k_entropy | bits | 7–8kHz entropy; NaN = codec gate triggered |
| fam_75hz_sharpness | — | Acoustic mode sharpness at 75Hz |
| fam_86hz_sharpness | — | Acoustic mode sharpness at 86Hz |
| drr_hf_lf_slope_ratio | — | Direct-to-reverberant HF/LF slope |
Note: Z-score only. Raw extracted values permanently dropped to ensure model-agnosticism.
| Column | Description | Known Signature |
|---|---|---|
| bico_f0_f1_z | Bicoherence F0–F1 phase coupling | Universal deepfake marker — architecture-invariant |
| bico_f1_f2_z | Bicoherence F1–F2 phase coupling | Vocoder formant band independence |
| modgd_var_z | Modified group delay variance | TTS collapses to low variance |
| pgv_magnitude_correlation_z | Phase group velocity correlation | Near-zero across all synthetics |
| pgv_total_z | Total phase group velocity energy | Architecture-dependent |
| f1_velocity_z | F1 transition rate | Impossible tongue acceleration; |Z|>9 in production fakes |
| f2_velocity_z | F2 transition rate | Impossible lip acceleration |
| inertial_decay_residual_z | Biomechanical inertia decay | ~59% instant-kill on SONAR |
| teo_std_high_z | TEO high-band std | Digital vacuum in neural vocoders |
| teo_std_low_z | TEO low-band std | Synthetic LF TEO too smooth |
| pitch_velocity_max_z | Max F0 rate-of-change | [Placeholder: complete description...] |