New samples can be generated to get more accurately representative distributions of a dataset's values. You can choose from a list of various sampling methods to tailor your sample to your needs. These methods include random samples, filter based samples, stratified samples, anomaly based samples and cluster samples.
It is best practice to generate a new sample after any step that filters or restructures your dataset. These include keep and delete transforms, aggregates, joins, and unions.
There are also transforms where the output is sample dependent. If you are going to perform an
valuestocols, you'll want to first collect a stratified sample as that will include all of the distinct values in a column in your sample.