Snowflake provides cloud-based data storage and analytics as a service. Among other infrastructures, Snowflake runs on Amazon S3. If all of your source datasets and outputs are in Snowflake locations and other conditions are met, then the entire execution of the transformations can occur in Snowflake.
Transferring the execution steps from the Trifacta node to Snowflake yields the following benefits:
A minimum of data is transferred between systems (recipe steps and associated metadata). Everything else remains in Snowflake.
Recipe steps are converted into SQL that is understandable and native to Snowflake.
Execution times are much faster due to minimal data movement and optimization.
Depending on your environment, the total cost of executing the job may be lower in Snowflake.
During pushdown execution, the recipe steps in Trifacta are converted to SQL and sequentially executed against your source data in temporary tables. The results are written directly to the target tables.
Enabling the Feature
Snowflake as a running environment requires that pushdowns be enabled for the workspace and for the specific flow for which the job is executed. If the flow and the workspace are properly configured, the job is automatically executed in Snowflake.
The following setting must be enabled in the workspace. Toggle User menu > Admin console > Workspace settings > Logical and physical optimization of jobs.
When this is enabled, the Trifacta application attempts to optimize job execution through logical optimizations of your recipe and physical optimizations of your recipe's interactions with data.
For more information, see Workspace Settings Page
By default, all newly created flows will be optimized. For older flows, you must enable the Snowflake optimizations in your flow. In Flow View, select More menu > Optimization settings
All general optimizations must be enabled for your flow, as well as the following optimizations, which are specific to Snowflake:
Column pruning from source
Snowflake > Filter Pushdown
Snowflake > Full Pushdown
For more information, see Flow Optimization Settings
Trifacta application must be integrated with Snowflake. See Snowflake Connections.
The permission to execute jobs in Snowflake must be enabled.
All sources and outputs must reside in Snowflake.
Spark + Snowflakemust be selected as the running environment. See Run Job Page.
Jobs are executed in the virtual warehouse that is specified as part of the Snowflake connection.
To execute a job in Snowflake in the Trifacta application:
Your job must meet the requirements listed above.
Your job must not include the functions, transformations, or other unsupported elements that are listed below.
You must select
Snowflake + Sparkas your running environment in the Run Job page.
To verify execution in Snowflake, in the Overview tab of the Job summary, the value for Environment under the Execution summary should read
You can also see the Environment listed as Snowflake in the Execution stages.
For more information, see Job Details Page
All data sources and all outputs specified in a job must be located within Snowflake.
All recipe steps, including all Wrangle functions in the recipe, must be translatable to SQL.
Some transformations and functions are not currently supported for execution in Snowflake. See the link in More Info.
Sampling jobs are not supported for execution in Snowflake.
If your recipe includes data quality rules, the job cannot be fully executed in Snowflake.
Visual profiling is supported with the following conditions or requirements.
Visual profiles are unloaded to a stage in an S3 bucket.
If a stage is named in the connection, it is used. This stage must point to the default S3 bucket in use.
If no stage is named, a temporary stage is be created in the
PUBLICschema. The connecting user must have write-access to
For more information, see Snowflake Running Environment