Skip to main content
waffle.svg
Domo Knowledge Base

ETL Actions: DataSets

Version 11

 

Intro

Input and output DataSets are required for all ETL transformation flows. At least one input DataSet is required, though you can include as many as necessary. One output DataSet is required at the end of a transformation flow; this is the DataSet you can then use to power Domo cards and apps.  

For information about creating an ETL DataFlow, see Creating an ETL DataFlow.

For information about the Data Center, see Data Center Layout.

Important: Input DataSets in a DataFlow cannot be restricted by PDP policies—all available rows must pass through the DataFlow. Because of this you must apply PDP policies to the output DataSets generated by a DataFlow.
 

When you build a DataFlow using an input DataSet with PDP policies in place, the DataFlow breaks unless at least one of the following criteria applies:

  • You have an "Admin" security profile or a custom role with "Manage DataFlows" enabled.

  • You are the DataSet owner.

  • You are part of the "All Rows" policy. This gives you access to all of the rows in the DataSet.

For more information about using PDP with DataFlows, see PDP and DataFusions/DataFlows

Input DataSet

You can use the Input DataSet action to add a DataSet to the transformation flow.

There must be at least one Input DataSet in a transformation flow.

Note: The maximum number of columns allowed in Magic ETL is 1500 columns. 

Configuration

To configure the Input DataSet action,

  1. Ensure that the data you want to transform already exists in Domo as a DataSet.

  2. Click the Input DataSet action, then select the DataSet you want to transform.

  3.  Select Additional Options and choose whether the DataFlow should process the entire input DataSet or only new rows appended since the last DataFlow run.

  • Choosing Entire DataSet will process all data from your input DataSet and run them through the transforms.
  • Selecting Only new rows appended since this DataFlow last run will process only new data appended to your DataSet. This is a great option if your input DataSet has many rows. as you can only process the new rows rather than the entire DataSet.
     
Important: You can only use the DataFlow append processing method if that input DataSet is configured to update with an append method. For more information on scheduling your DataSet with an append type update, see Basic Scheduling of a DataSet.

 

Tip: If your input DataSet has a large number of rows, try using the append processing method. This will allow your DataFlow to run much quicker if it's only processing the new rows.

Details

Under the Details tab of the input tile, you can view the DataSets:

  • Name
  • Owner
  • Number of rows
  • Last Updated
  • Column names and type

Data

Select the Data tab to preview a table of the input data.

Output DataSet

You can use the Output DataSet action to output the transformed data as a DataSet. You can use this new DataSet to power up cards (or other DataFlows).

There must be an Output DataSet in a transformation flow.

Note: Based on scheduled run settings, whenever the specified Input DataSet changes, the ETL Dataflow performs the transform, updating the Output DataSet. For information about scheduling an ETL DataFlow, see Scheduling an ETL DataFlow.

Configuration

To configure the Output DataSet action,

  1. Ensure that an action is connected to the Output DataSet tile and that all actions are connected and configured in the transformation flow.

  2. Click the Output DataSet tile, then specify the name of the DataSet you want to output by clicking  and entering a name and a description.

  3. Select Additional Options and choose either replace or append as your output DataSet update method. For more information on the difference between append and replace, see Update Method.

Note: Updating the output DataSet by using the append method, may create duplicate data entries. 

 

Tip: You can preview the data in the output DataSet by running a preview, clicking the Output DataSet action, then clicking the Preview tab.

Details 

If the DataFlow has not run yet, the details available are the owner, column names, and column types. Once run successfully, you are able to view all of the same details listed above for the input tile.