
Not all synthetic data is created equal. In fact, most synthetic data platforms rely on black boxes to facilitate parts of their synthetic data generation process. This means that they lose transparency, trust, and auditability along the way.
At Howso, we do things differently. We own our entire tech stack, end-to-end. We don’t generate synthetic data using rented black box AI platforms (as opposed to our competitors).
Each step of the Synthesizer’s workflow is transparent and auditable because it’s the only solution built on Howso Engine, meaning your synthetic data is generated directly from your data (not some abstract model).
Users can review and edit the data properties and parameters that are configured before the data is generated (steps 2 & 3) and audit each synthetic point for privacy (step 6). If a user is unhappy with the resulting synthetic dataset, Synthesizer enables readers to remove and add synthetic data points one at a time, until their specifications are met.
We’re going to walk you through each of the six steps that allow us to generate great synthetic data. That way, you’ll have a clear understanding of how Synthesizer works and why synthetic data generated by Howso stands alone as the only synthetic data solution that you can fully audit and trust.
The 6 Steps to Great Synthetic Data
1. Load
Step 1 is to import your original, source data. You can load in your data as a file, or you can connect to a database. Just make sure your data is in a structured (or tabular) format, as Howso does not yet work with unstructured data (it’s on the long-term roadmap!).
2. Map
Mapping, or understanding, the original data is one of the most important steps in generating great synthetic data. To do this, Synthesizer will infer the feature (or column) attributes of your data. For example, does a column of your data contain sensitive nominal values, like names or social security numbers? If so, Howso assigns that property to the column, so that it will be appropriately anonymized during synthesis. Other attributes that Synthesizer learns include feature bounds (e.g., maximum and minimum values within a column, like age, in the data) and time series information. Once Synthesizer has inferred the feature attributes, it allows you to review its choices to make sure all information is correct.
3. Configure
By default, Howso Synthesizer generates private data with high utility that mimics your real data. However, there may be specific use cases in which you want to generate synthetic data that is more or less like your original data. In step 3, you can configure your synthesis to generate the data you want.
Howso provides a variety of configuration settings for you to adjust your synthetic data to your use case, including mitigating bias, adding anomalies, specifying business rules, generating more data than was contained in your original data, and adjusting distributions or trends within your synthetic data.
4. Train
In this step, Howso’s understandable AI Engine learns the properties and relationships of your source data.
5. Generate
Once the Engine has been trained on your source data, synthetic data generation begins. Synthesizer generates one synthetic data point at a time, using a variety of privacy mechanisms, including differential privacy, to ensure each point is unique (i.e, not included in the original data). At the end of the generation step, you will have synthetic data set that looks and feels like your original data but is provably distinct from the original data.
6. Audit
Finally, the Howso Validator tools can be used to generate a report on the privacy and utility of the synthetic data. Validator’s privacy test evaluates the physical difference between the synthetic data and the original data and flags any synthetic data which falls within a certain distance threshold of the original data as a privacy risk.
With Validator, users can audit each synthetic data point, understand its closest neighbors within the original data, and determine for themselves if the privacy is sufficient.
Additionally, Validator performs a variety of utility tests on the synthetic data, to understand its statistical similarities to the original data. For example, one of these tests validates the synthetic data is accurate enough to be a drop-in replacement for the original data for any machine learning modeling or data analytics use cases. Validator prints out a summary of all the tests it runs, so that users can easily view the results of their data synthesis.
The Howso Difference
In conclusion, other synthetic data tools often seem to create data out of thin air, with no insight into the assumptions and mechanisms by which the synthetic data was generated.
Howso Synthesizer is the only synthetic data solution built on top of Howso Engine, empowering you to audit, edit, and verify your synthetic data. No other synthetic data provider can say that.
Ready to see the Howso difference for yourself? Access Playground here.