Even the best designed data pipelines are not immune to bad input data. Analytics and predictive models are often at the mercy of the quality of the incoming data, with bad data causing skew or misinterpreted results. As they say, garbage in equals garbage out.
Trifacta knows this problem all too well, which is why we are introducing Data Quality Rules to help combat bad data quality.
Note: This feature is only available in Enterprise and Premium versions of Trifacta.
What is it?
Data Quality Rules allow the user to determine if current data is fit for use, and if not, what additional transformations are needed. Thanks to predictive suggestions, Data Quality Rules can assess the data set and provide a list of indicators to monitor and track the data cleanliness over time.
Data Quality Rules provide an automated way to identify data flaws and build quality indicators to monitor its remediation. The state of your data quality rules is automatically updated to reflect any changes, so it can be used to prevent any undesired transformation over time. If columns or other elements are accidentally deleted, errors will notify the users in the Transformer page.
Ultimately, rules can monitor the accuracy, completeness, consistency, validity, uniqueness of the data you leverage in your analytics initiative and ensure you have a comprehensive view of the cleanliness of the data you leverage.
How Does it work?
A new icon has been added to the Transformer Grid.
Clicking on it opens the Data quality rules panel. There are 2 ways of adding rules:
Have Trifacta suggest rules based on the data
Create a custom rule
Suggested Data Quality Rules
By clicking the View suggestions button, Trifacta can automatically suggest a series of Data Quality Rules to validate various data quality aspects. For example: is the value unique or empty, does it fit a pattern, is it in an expected range, does it correlate to another column?
From there, you can accept, remove, edit or add the Data Quality Rules that are fit for your particular use case for this data.
Add Custom Rules
You can add your own rule by leveraging the power of Trifacta Wrangler language to build any validation rule you may have in mind.
Data Quality in Job Details
When visual profiling is enabled for your job, the Rules tab in the Job Details page contains the results of the data quality rules for the job's recipes applied across the entire dataset.
When you run a job and generate results, you can review the the quality of the data of the generated output.
In parallel with executing the job, you can generate a visual profile of the generated results. This visual profile provides graphical representations of the valid and mismatched values against each column's data type, as well as indications about missing values in the output.
Visual profiles can be downloaded in PDF or JSON format for offline analysis.
After job execution, these rules are applied across the entire dataset and available when visual profiling is enabled.
See more details here
Additional data quality rules, including pattern match and in-sets, have been added in 7.8. Please see our documentation for more info.
Learn about Metrics based Data Quality Rules here
Learn more about Data Quality Rules from our product documentation.
Also check out the details on Overview of Data Quality