We provide users with deterministic, programmatic audio telemetry, and Canadian buisness analytics on request. Built on the traditional physics of DSP feature extraction, our aduio datasets are designed to facilitate the advancement of linguistics, the refinement of TTS systems, and the creation of robust, defense-in-depth synthetic voice detection systems. Our buisness analytics are produced through distilation of open source data and statistical information to provide immediately actionable data to facilitate high impact buisness decisions.
About Moonscape SoftwareTransparency and Compliance by Design: We reject the "black box" approach to machine learning. We believe the data and systems companies use to make decisions must be explainable, programmatic, and built from the ground up for regulatory compliance. We don't hide our sources, and clearly identify upstream licensing requirements
Data Quality Over Volume: We believe that better, structured data is fundamentally superior to force-feeding models massive, uncurated datasets. Precision and provenance drive our extraction processes, we assess data as we ingest it, and avoid open webscraping. No garbage in, no garbage out.
Empirical Determinism: Our methodologies are anchored in traditional physics, Digital Signal Processing (DSP) and statistical analysis. We do not guess; we measure.
Defense in Depth: In an era of synthetic media, security requires foundational truth. We seek to provide the empirical baselines necessary to detect fraud and secure voice systems at the root level.
Accuracy:In buisness high impact decisions require accurate local or national information that allows decision makers to make informed judgements based on real world conditions, not assumptions.
Establishing the ground-truth for human speech by mapping acoustic physics across 100+ languages.
Current Status: V3.0 Release in production. ~1.2 million samples, 200 unique human languages ingested. 72 languages spontaneous speech, ~200 language scripted speech assessed.
V2 Dataset (Hugging Face) Data Dictionary Methodology More InformationEstablishing the ground-truth for synthetic speech by mapping acoustic physics across hundreds of combinations of generations and styles of voice synthesis.
Current Status: ~3.5 million samples from 20 leading benchmark datasets have been audited and assessed for internal reserach.
Commercially available: ~2.2 million samples from 11 leading benchmark Datasets, see whats in your spoof training data.
Dataset (Hugging Face) Data Dictionary MethodologyA proprietary study on the effects of Telecommunications Codecs on voice transmissions.
Current Status: Complete. We ran ~7000 samples from VCTK and AMI through 25 common codecs to measure the effects on our extracted features. We then ran a subset through common secondary transmission channels to examine the compound effect of multiple codecs.
Dataset (Hugging Face) Data Dictionary Methodology
A lightweight cascading classifier for synthetic voice detection.
The Moonscape MEG is our attempt at developing a non-commercial synthetic voice detection system, based on biophysics rather than artifact hunting, as a proof of concept.
System Access Architecture Map MethodologyA listing of our ingested raw datasets with sources and license information.
Dataset ProvinanceAccess the literature, physics documentation, and architectural papers that underpin our theories and methodologies.
Research Reference Library