Skip to main content
waffle.svg
Domo Knowledge Base

Creating an ETL DataFlow

Intro

You create ETL DataFlows using an intuitive drag-and-drop interface available in the Data Center. You simply drag your DataSets onto a canvas then add your desired actions to indicate how the DataSets should be joined and transformed. A huge list of actions is available. You can combine columns, filter rows, replace text, and so on. For detailed information about actions, see the following topics:

Important: Input DataSets in a DataFlow cannot be restricted by PDP policies—all available rows must pass through the DataFlow. Because of this, you must apply PDP policies to the output DataSets generated by a DataFlow.
 

When you build a DataFlow using an input DataSet with PDP policies in place, the DataFlow breaks unless at least one of the following criteria applies:

  • You have an "Admin" security role or a custom role with "Manage DataFlows" enabled.

  • You are the DataSet owner.

  • You are part of the "All Rows" policy. This gives you access to all of the rows in the DataSet.

For more information about using PDP with DataFlows, see PDP and DataFusions/DataFlows

Training Videos - Introducing ETL DataFlows

Learn the basics of using ETL DataFlows.

Part 1 of 3

 

 Part 2 of 3

 

Part 3 of 3

 

Note: The product training videos are for Domo customers and clients only.

Creating ETL DataFlows

Use the steps in this section to help you create ETL DataFlows.

To create an ETL DataFlow,

  1. In Domo, click  > Data Center.

  2. Click ETL in the Magic Transform toolbar at the top of the window.

    Tip: You can also open the ETL editor from anywhere in Domo by selecting in the app toolbar and selecting Data > ETL.
  3. Add and configure an Input DataSet by doing the following:

    1. In the Actions panel, expand DataSets, then drag Input DataSet to the canvas.

    2. Click the Input DataSet action, then select the DataSet to transform.

  4. Add an Output DataSet by doing the following:

    1. In the Actions panel, in DataSets, drag Output DataSet to the canvas.
      You can configure the Output DataSet action after you connect an action to it.

  5. Drag other actions from the Actions panel to the canvas to transform (clean, aggregate, join, etc.) the input DataSets.
    For more information, see the following:

  6. Draw connections between the transform actions to sequence operations in the transformation flow.

  7. Configure each action, by clicking the action, then specifying the options. 

    Tips: You can get help on an action in the canvas by clicking the action, then clicking . You can also select a number of actions at once by clicking on the canvas then dragging the mouse pointer over them. Once multiple actions are selected, you can drag all of the selected actions as a group to where you want them. You can also delete the selected actions by clicking Delete in the panel on the left side of the screen.
  8. Configure the Output DataSet action by doing the following:

    1. Connect an action to the Output DataSet action.

    2. Click the Output DataSet action, then specify the name of the new DataSet to output.

  9. (Optional) Configure settings for when the transformation flow runs.
    By default, the transformation flow runs only when you run it manually. You can schedule the ETL DataFlow to run whenever the specified input DataSets change or at a set time.

  10. Specify the name and description of the ETL DataFlow.

  11. Click Save to save the ETL DataFlow, enter a version description if desired, then click Save to confirm.

When you save a DataFlow, an entry for this version is added to the Versions tab in the Details view for the DataFlow. If you entered a description when saving, that description is shown in the entry for the DataFlow. For more information about versions, see Viewing the Version History for a DataFlow.

Note: Many users ask why output DataSets for a DataFlow are not marked as "Updated" when the DataFlow runs successfully. This is usually because the data has not actually changed—no update has occurred. Therefore, the DataSets do not show as updated.

Best Practices for Creating DataFlows

Each DataFlow should...

  • include descriptive names for each step of the transformation.

  • include a description of the input DataSets being merged or manipulated and the DataSet being created, and should also indicate the owner of the data.

  • be named the same as the output DataFlow—Because the outputs of a Dataflow become their own DataSet in the Data Center, this allows for easy identification of which DataSets are produced by which DataFlows.