MOONSCAPE
Menu ☰
PROVENANCE MANIFEST
Dataset Origins & Enterprise Licensing
Transparency of origin is a core MoonScape value. The following tables document the exact upstream source corpora that comprise the Human Speech Atlas and the Synthetic Speech Atlas feature databases, alongside their original distribution licenses.
Human Speech Atlas (HSA) Source Corpora
Dataset Name Release Year Organization / Origin Upstream License
Mozilla Common Voice (CV24.0)
2023 / 2024
Mozilla Foundation
CC0-1.0 (Public Domain)
Mozilla Spontaneous Speech (SPS2.0)
2024
Mozilla Foundation
CC0-1.0 (Public Domain)
TidyVoice X / X2
2024
Mozilla Foundation
CC0-1.0 (Restricted: SV tasks only)
Synthetic Speech Atlas (SSA) Source Corpora
Dataset Name Release Year Organization / Origin Upstream License
ASVspoof 5 (Train, Dev, Eval)
2024
ASVspoof Consortium
ODC-BY
ASVspoof 2021 DF
2021
ASVspoof Consortium
ODC-BY
DFADD
2024
DFADD Team
MIT
MLAAD
2023
MLAAD Team
CC-BY 4.0
In-The-Wild
2022
Various (Real-world Deepfakes)
Apache 2.0
WaveFake v1.2
2021
WaveFake Team
CC-BY-SA 4.0
LibriSeVoc
2022
LibriSeVoc Team
MIT
FakeOrReal
2024
FakeOrReal Team
GNU LGPL v3.0
SONAR
2023
Meta
CC-BY 4.0
CodecFake
2024
CodecFake Team
CC-BY-NC-ND 4.0
ADD 2022
2022
ADD Challenge
CC-BY-NC-ND 4.0
ADD 2023 (R1 & R2)
2023
ADD Challenge
CC-BY-NC-ND 4.0