What is "Optimization"?
Optimization is the action of making the best or most effective use of a situation or resource. In the context of Trifacta, optimization means speeding up execution and reducing by:
Simplifying wrangle steps
Reducing ingestion to only what is used
Reducing data flowing through the recipe
Avoiding redundant steps in a recipe
Wrangling is both an art as a science. There are many manual best practices and wrangling tips outlined in other trainings (see Sensei and Admin trainings), but this module will focus on the new Optimizer service and how it can automatically streamline data flow through the application.
How does Optimizer-Service speed things up?
These optimizations are designed to improve performance by pre-filtering in the database the volume of data by reducing the columns and rows that are queried from the datastore to the ones that are actually used your recipes.
When these filters are enabled, the number of filters successfully applied to a job execution is listed in the Optimization summary in the Job Details page. See Job Details Page.
There are 2 main features currently provided by optimizer:
Column Pruning - introduced in 7.6, reduces the number of columns
Filter Pushdown - introduced in 8.0, reduces the number of row
When enabled, job execution performed is improved by removing any unused or redundant columns from the query.
Column pruning optimizations cannot be applied to imported datasets generated with custom SQL.
When this setting is enabled, the Trifacta application optimizes job performance on this flow by pushing data filters to the relational datasource, which limits the volume of data that must be transferred from the source.
Supported relational connections:
Optimizations for SQL filtering apply to the following types of relational connections:
Azure SQL data warehouse
These connection types may or may not be available in your product edition. For more information, see Connection Types.
Filter pushdown optimizations cannot be applied to imported datasets generated with custom SQL.
Pushdown filters cannot be applied to dates in your relational sources.
How to Enable Optimization for a Flow
In the Flow Optimization Settings dialog, you can configure the following settings, which provide finer-grained control and performance tuning over your flow and its job executions.
From the Flow View More Menu(...) , select Optimization settings.
When enabled, the Trifacta application attempts to apply any of the listed optimizations that are enabled to jobs that are executed for this flow.
When disabled, none of the listed optimizations is performed.
Note- This feature must be enabled at the workspace level. When enabled, the settings in this dialog are applied to the current flow.
The performance impact of optimizer-service on jobs will be variable, depending on how much can be removed upfront. You will see the biggest performance gains during ingest and transform stages due to less data being processed.
To learn more, read this detailed documentation guide