(View in full screen)

You can create a data quality rule using custom metrics to assess the data quality.

You can use the calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. For example, you can create a constraint that the sales quantity should be within a specific range.

To learn the basics of Data Quality Rules in Trifacta, read this article.

Metric-based rules are supported only for some metric types. For more information on the rule types that support metrics, see Data Quality Rules Reference.

Metric input types are supported for the following rules:

  • In Range

  • Greater Than

  • Less Than

  • Equals

  • Not Equals

  • In Set

  • Not In Set

Metric name

Description

Average

The average column value.

Count Distinct

The number of unique column values.

Maximum

The maximum column value.

Minimum

The minimum column value.

Sum

The sum of column values.

Standard Deviation

The sample standard deviation of column values.

Variance

The sample variance of column values.

Count

The number of rows.

Correlation

The Pearson correlation coefficient between two columns.

Z-Score

The distance from the mean, in units of standard deviations.

Steps to Create a Metric based Data Quality Rule

  1. Click on the Data Quality Rule icons on the top right corner of the Transformer.

2. Click on Add rule

3. In the list of available Data Quality Rule options, look for Column Values

4. Select an option -

For example, to build a data quality rule where the average Price has to be greater than 5, select Greater Than.

In the Rule Builder, select the Input Type as Average, the column Price and Minimum Value of 5.

5. Use Group By to group the data per certain categories.

6. Click on Add to add this as a Data Quality Rule

The metric-based data quality rule is added.

The new rule is displayed in the Data Quality Rules panel. In the data quality bar for the rule, the green bar indicates the row values that have passed the rule, and the red bar indicates the row values that failed.

  • Hover over either color to see the row counts and percentage.

  • Select either color to highlight the indicated rows in the data grid.

Additional options are available in the context menu for the rule. For more information, see Data Quality Rules Panel.

More Info

To learn more, read the following detailed documentation guides-

Did this answer your question?