MoonScape Logo

Mapping the physical reality of human and synthetic voices.

Our Mission:

We provide users with deterministic, programmatic audio telemetry, and Canadian buisness analytics on request. Built on the traditional physics of DSP feature extraction, our aduio datasets are designed to facilitate the advancement of linguistics, the refinement of TTS systems, and the creation of robust, defense-in-depth synthetic voice detection systems. Our buisness analytics are produced through distilation of open source data and statistical information to provide immediately actionable data to facilitate high impact buisness decisions.

About Moonscape Software

Our Core Values

Transparency and Compliance by Design: We reject the "black box" approach to machine learning. We believe the data and systems companies use to make decisions must be explainable, programmatic, and built from the ground up for regulatory compliance. We don't hide our sources, and clearly identify upstream licensing requirements

Data Quality Over Volume: We believe that better, structured data is fundamentally superior to force-feeding models massive, uncurated datasets. Precision and provenance drive our extraction processes, we assess data as we ingest it, and avoid open webscraping. No garbage in, no garbage out.

Empirical Determinism: Our methodologies are anchored in traditional physics, Digital Signal Processing (DSP) and statistical analysis. We do not guess; we measure.

Defense in Depth: In an era of synthetic media, security requires foundational truth. We seek to provide the empirical baselines necessary to detect fraud and secure voice systems at the root level.

Accuracy:In buisness high impact decisions require accurate local or national information that allows decision makers to make informed judgements based on real world conditions, not assumptions.

About Our Values

Our Datasets

Human Speech Atlas

Establishing the ground-truth for human speech by mapping acoustic physics across 100+ languages.

Current Status: V3.0 Release in production. ~1.2 million samples, 200 unique human languages ingested. 72 languages spontaneous speech, ~200 language scripted speech assessed.

V2 Dataset (Hugging Face) Data Dictionary Methodology More Information

Synthetic Speech Atlas

Establishing the ground-truth for synthetic speech by mapping acoustic physics across hundreds of combinations of generations and styles of voice synthesis.

Current Status: ~3.5 million samples from 20 leading benchmark datasets have been audited and assessed for internal reserach.

Commercially available: ~2.2 million samples from 11 leading benchmark Datasets, see whats in your spoof training data.

Dataset (Hugging Face) Data Dictionary Methodology

Telecom Channel Degradation Study

A proprietary study on the effects of Telecommunications Codecs on voice transmissions.

Current Status: Complete. We ran ~7000 samples from VCTK and AMI through 25 common codecs to measure the effects on our extracted features. We then ran a subset through common secondary transmission channels to examine the compound effect of multiple codecs.

Dataset (Hugging Face) Data Dictionary Methodology

Project MEG

A lightweight cascading classifier for synthetic voice detection.

The Moonscape MEG is our attempt at developing a non-commercial synthetic voice detection system, based on biophysics rather than artifact hunting, as a proof of concept.

System Access Architecture Map Methodology

Provenance Guide

A listing of our ingested raw datasets with sources and license information.

Dataset Provinance

Research Resources

Access the literature, physics documentation, and architectural papers that underpin our theories and methodologies.

Research Reference Library