DATA DICTIONARY

Synthetic Speech Atlas: Feature Schema

The below represents the features we have extracted from each sample in our synthetic speech atlas, and a brief explaination of the feature in question. Features have been split into four groups by class.

Global Metadata (All Datasets)

Column	Type	Description
anon_id	int	Anonymous sequential ID (post-shuffle)
label	string	bonafide or spoof
tier	string	Brouhaha quality tier (1=studio, 2=near-field, UNKNOWN=ungraded)
brouhaha_graded	int	1=real Brouhaha grades computed, 0=defaulted
source_licence	string	SPDX licence identifier for this row's source
source_dataset	string	Originating dataset name
duration_ms	float32	Clip duration (ms); bonafide bucketed to 500ms
duration_s	float32	Clip duration (seconds)

Dataset-Specific Columns

Column	Datasets	Description
vocoder	LibriSeVoc	Neural vocoder architecture name
tts_system	SONAR	TTS system name (xTTS, OpenAI, FlashSpeech, etc.)
attack_id	ASVspoof5 Dev, Eval	Attack system identifier from ASVspoof5 protocol
codec_env	ASVspoof5 Dev, Eval	Codec environment condition
codec_type	ASVspoof5 Dev, Eval	Codec type applied to clip
gender	ASVspoof5 Dev, Eval	Speaker gender (from ASVspoof5 protocol TSV)

Tier 1 — Standard DSP Features

Note: Values are float32. DSP-extracted values are round(2dp) then reinflated to FP16 resolution. Statistical properties are preserved.
PRISTINE-gated features are NaN unless brouhaha_graded = 1. Do not impute with zero — treat as structurally missing.

Column	Units	Description
snr_median	dB	Median SNR (Brouhaha); 99.0 if ungraded
snr_mean	dB	Mean SNR (Brouhaha)
c50_median	dB	Median room clarity C50; 60.0 if ungraded
speech_ratio	0–1	Active speech proportion
pitch_mean	Hz	Mean F0 (voiced frames)
pitch_std	Hz	F0 standard deviation
pitch_range	Hz	F0 max–min range
npvi	—	Normalised Pairwise Variability Index (rhythm)
intensity_mean	dB	Mean RMS intensity
intensity_max	dB	Peak intensity
intensity_range	dB	Dynamic range (peak – minimum)
intensity_velocity_max	dB/frame	Max rate of intensity change
jitter_local	%	Cycle-to-cycle period perturbation (PRISTINE-gated)
shimmer_local	%	Cycle-to-cycle amplitude perturbation (PRISTINE-gated)
hnr_mean	dB	Harmonics-to-noise ratio (PRISTINE-gated)
cpps	dB	Cepstral peak prominence, smoothed (PRISTINE-gated)
hnr_c50_ratio	—	HNR adjusted for room acoustics (PRISTINE-gated)
cpps_snr_ratio	—	CPPS normalised for noise floor (PRISTINE-gated)
spectral_centroid_mean	Hz	Mean spectral brightness
spectral_tilt	—	HF vs LF energy slope
mfcc_delta_mean	—	Mean first-order MFCC delta
mfcc_high_variance	—	Upper MFCC band variance (bands 12–20)
zcr_mean	—	Mean zero-crossing rate
teo_mean	—	Mean Teager-Kaiser Energy Operator
teo_std	—	TEO temporal standard deviation
f1_mean	Hz	Mean first formant
f2_mean	Hz	Mean second formant
f3_mean	Hz	Mean third formant
formant_dispersion	Hz	F3–F1 vocal tract length proxy
articulation_rate	syl/s	Estimated syllables per second
phoneme_count	—	Estimated phoneme count
emotion_score	0–1	Affective charge heuristic
spectral_7k8k_entropy	bits	7–8kHz entropy; NaN = codec gate triggered
fam_75hz_sharpness	—	Acoustic mode sharpness at 75Hz
fam_86hz_sharpness	—	Acoustic mode sharpness at 86Hz
drr_hf_lf_slope_ratio	—	Direct-to-reverberant HF/LF slope

Tier 2 — Biomechanical Features

Note: Z-score only. Raw extracted values permanently dropped to ensure model-agnosticism.

Column	Description	Known Signature
bico_f0_f1_z	Bicoherence F0–F1 phase coupling	Universal deepfake marker — architecture-invariant
bico_f1_f2_z	Bicoherence F1–F2 phase coupling	Vocoder formant band independence
modgd_var_z	Modified group delay variance	TTS collapses to low variance
pgv_magnitude_correlation_z	Phase group velocity correlation	Near-zero across all synthetics
pgv_total_z	Total phase group velocity energy	Architecture-dependent
f1_velocity_z	F1 transition rate	Impossible tongue acceleration; \|Z\|>9 in production fakes
f2_velocity_z	F2 transition rate	Impossible lip acceleration
inertial_decay_residual_z	Biomechanical inertia decay	~59% instant-kill on SONAR
teo_std_high_z	TEO high-band std	Digital vacuum in neural vocoders
teo_std_low_z	TEO low-band std	Synthetic LF TEO too smooth
pitch_velocity_max_z	Max F0 rate-of-change	[Placeholder: complete description...]

Extraction Methodology Project Details