Many companies that work with data or AI treat provenance and legal or regulatory concerns as an afterthought, leading to poisoned datasets, legally grey products, and considerable liabilites. We reject this as bad design, and instead seek to address the issue head-on.
Moonscape Software keeps detailed data provenance manifests and intentionally curates its datasets to ensure we can account for every line in every table, and only commercially clear data makes it to our commercially licensed datasets. We then established a data anonymization system expressly designed to comply with the latest regulatory enviroments under the GDPR, Quebec Law 25, and Canadian PIPEDA. Moonscape seeks to inspire trust with our partners through transparency while minimizing legal exposure.
A secondary problem of the modern AI and Machine Learning industry is the focus on using massive amounts of unstructured raw data to brute-force the training process. This often includes mass scrapes of YouTube, Wikipedia, GitHub and other data sources, with little regard to the quality of content. A process that leads to 'Garbage In Garbage Out' results.
We believe that better, structured data is fundamentally superior to force-feeding models massive, uncurated datasets. The use of quality texts, curated data, and a focus on pedagogy are what creates an 'education' that produces meaningful results, so we at Moonscape are trying to do our part by elevating the quality of the data available.
We believe that objective reality and the physical objects within it can be weighed and measured. Human or synthetic voices are no different, so our approach is anchored in traditional physics, linguistics and Digital Signal Processing (DSP). Each field has decades of research providing peer reviewed external validation to their theories and outcomes. Rather than feed raw audio into a state of the art black box model for a statistical probability measurement, we are doing the hard work and measuring it ourselves, drawing the curtain back, and establishing that ground truth.
The misapplication of synthetic voices by bad faith actors is a persistent problem that has only grown as the technology for creating synthetics has improved. The detection of synthetics has often struggled to keep pace, and despite strong performances in the lab, most leading detection models have a serious problem with real world applications. Moonscape seeks to help address this by engaging in mapping the physical limits of human and synthetic speech in order to provide an improvement to 'defence in depth' strategies that layer multiple forms of detection.