PROVENANCE MANIFEST

Dataset Origins & Enterprise Licensing

Transparency of origin is a core MoonScape value. The following tables document the exact upstream source corpora that comprise the Human Speech Atlas and the Synthetic Speech Atlas feature databases, alongside their original distribution licenses.

Human Speech Atlas (HSA) Source Corpora

Dataset NameRelease YearOrganization / OriginUpstream License
Mozilla Common Voice (CV24.0) 2023 / 2024 Mozilla Foundation CC0-1.0 (Public Domain)
Mozilla Spontaneous Speech (SPS2.0) 2024 Mozilla Foundation CC0-1.0 (Public Domain)
TidyVoice X / X2 2024 Mozilla Foundation CC0-1.0 (Restricted: SV tasks only)

Synthetic Speech Atlas (SSA) Source Corpora

Dataset NameRelease YearOrganization / OriginUpstream License
ASVspoof 5 (Train, Dev, Eval) 2024 ASVspoof Consortium ODC-BY
ASVspoof 2021 DF 2021 ASVspoof Consortium ODC-BY
DFADD 2024 DFADD Team MIT
MLAAD 2023 MLAAD Team CC-BY 4.0
In-The-Wild 2022 Various (Real-world Deepfakes) Apache 2.0
WaveFake v1.2 2021 WaveFake Team CC-BY-SA 4.0
LibriSeVoc 2022 LibriSeVoc Team MIT
FakeOrReal 2024 FakeOrReal Team GNU LGPL v3.0
SONAR 2023 Meta CC-BY 4.0
CodecFake 2024 CodecFake Team CC-BY-NC-ND 4.0
ADD 2022 2022 ADD Challenge CC-BY-NC-ND 4.0
ADD 2023 (R1 & R2) 2023 ADD Challenge CC-BY-NC-ND 4.0