When working with periodic datasets like weekly sales data, you might have a new dataset each week in a parallel source folder or with a certain dynamic filename containing the latest week's data. To be able to pick the dataset from the dynamic folder or having a dynamic filename for the current week, you can parameterize the dataset for the date values in the path.
Similarly, you can parameterize your source dataset with variables with default values that can be modified at run time. You can also use wildcards or patterns to parameterise your dataset. Let us see how you can do that-
Steps to Parameterize your Dataset
1. Click on Add Datasets in your flow
2. Click on Import Datasets in Add Datasets to flow dialog
3. In the Import dialog - select your data source
Note- You cannot parameterize inputs from local Uploads, BigQuery or Google Sheets etc. You can parameterize datasets from GCS, S3 etc.
4. In the Import Data page, navigate your environment to locate one of the files or tables that you wish to parameterize.
Click Create Dataset with Parameters.
Alternately you can click on the Parameterize button that appears as you hover over the dataset rows.
5. In the Define Parameterized Path, select a segment of text. Then select one of the following options:
a) Add Datetime Parameter
This is useful when your filepath or name has dynamic datetime appended. You can configure the datetime parameter to pick the date in specified format dynamically like the currrent date / last week / between specified ranges etc. using teh configurable Datetime Parameter dialog that pops up.
b) Add Variable
A variable parameter is a key-value pair that can be inserted into the path.
At execution time, the default value is applied, or you can choose to override the value.
A variable can have an empty default value.
Provide a meaningful name of the variable that showcases its purpose.
c) Add Pattern Parameter
You can use wildcards and patterns to generalize certain parts of the filename or path to pick up all relevant files.
For example- replaced the 0000000 with wildcard * which matches these characters with anything in a path segment.
For more examples of patterns in dataset parameterization, read this article.
6. Click Create
and then Import & Add to Flow to add the dataset to your flow
For more details on how to Parameterize your datasets, read this detailed documentation guide.