This repository contains the foundational physics documentation, acoustic theory, and architectural research papers that underpin our extraction methodologies and quality gating protocols.
| Article Title | Year | Author(s) | Reference / Link |
|---|---|---|---|
| Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection | 2026 | Unknown (arXiv) | arXiv:2603.16914 |
| VoxAnchor: Grounding Speech Authenticity in Throat Vibration | 2026 | Unknown (arXiv) | arXiv:2603.27562 |
| How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection | 2026 | Unknown (arXiv) | arXiv:2602.16343 |
| Measuring the Robustness of Audio Deepfake Detectors | 2025 | Unknown (arXiv) | arXiv:2503.17577 |
| Beyond Identity: Generalizable Deepfake Audio Detection | 2025 | Unknown (arXiv) | arXiv:2505.06766 |
| Forensic Deepfake Audio Detection Using Segmental Speech Features | 2025 | Unknown (arXiv) | arXiv:2505.13847 |
| Phoneme-Level Analysis for Person-of-Interest Speech Deepfake Detection | 2025 | Unknown (arXiv) | arXiv:2507.08626 |
| AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds | 2025 | Unknown (arXiv) | arXiv:2509.04345 |
| I Can Hear You: Selective Robust Training for Deepfake Audio Detection | 2024 | Unknown (arXiv) | arXiv:2411.00121 |
| CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning | 2024 | Unknown (arXiv) | arXiv:2404.15854 |
| Towards the Detection of Speech Deepfakes for Scam Prevention | 2024 | White, A., & Watson, C. | SST2024 |
| Linear Frequency Residual Cepstral Features for Replay Spoof Detection on ASVSpoof 2019 | 2022 | Singh, P., et al. | IEEE EUSIPCO |
| Detecting AI-Synthesized Speech Using Bispectral Analysis | 2019 | AlBadawy, E. A., Lyu, S., & Farid, H. | CVPR Workshops 2019 |
| Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019 | 2019 | APSIPA ASC | IEEE APSIPA ASC |
| Audio Deepfake Detection: What Has Been Achieved and What Lies Ahead | N/A | Unknown | PMC11991371 |
| Article Title | Year | Author(s) | Reference / Link |
|---|---|---|---|
| Speech Representation and Transformation using Adaptive Interpolation of Weighted Spectrum: Vocoder Revisited | 1997 | Kawahara, H., et al. | IEEE ICASSP |
| Algebraic Code-Excited Linear Prediction (ACELP) | 1995 | Salami, R., et al. | ITU-T G.729 |
| A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding | 1995 | McCree, A., & Barnwell, T. P. | IEEE Trans. on Speech and Audio Processing |
| Linear Prediction: A Tutorial Review | 1975 | Makhoul, J. | Proceedings of the IEEE |
| Acoustic Theory of Speech Production | 1970 | Fant, G. | Mouton |
| Adaptive Predictive Coding of Speech Signals | 1970 | Atal, B. S., & Schroeder, M. R. | Bell System Technical Journal |
| Analysis Synthesis Telephony based on the Maximum Likelihood Method | 1968 | Itakura, F. & Saito, S. | Proc. 6th Int. Congress on Acoustics |
| Article Title | Year | Author(s) | Reference / Link |
|---|---|---|---|
| Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale | 2023 | Le, M., et al. (Meta) | arXiv:2306.15687 |
| Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) | 2023 | Wang, C., et al. (Microsoft) | arXiv:2301.02111 |
| HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | 2020 | Kong, J., et al. | arXiv:2010.05646 |
| FastSpeech: Fast, Robust and Controllable Text to Speech | 2019 | Ren, Y., et al. | arXiv:1905.09263 |
| Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (Tacotron 2) | 2018 | Shen, J., et al. (Google) | arXiv:1712.05884 |
| WaveNet: A Generative Model for Raw Audio | 2016 | van den Oord, A., et al. (DeepMind) | arXiv:1609.03499 |