Skip to main content
Domo Knowledge Base

Creating an ETL DataFlow

Version 35



You create ETL DataFlows using an intuitive drag-and-drop interface available in the Data Center. You simply drag your DataSets onto a canvas then add your desired actions to indicate how the DataSets should be joined and transformed. A huge list of actions is available. You can combine columns, filter rows, replace text, and so on. For detailed information about actions, see the following topics:

Important: Input DataSets in a DataFlow cannot be restricted by PDP policies—all available rows must pass through the DataFlow. Because of this, you must apply PDP policies to the output DataSets generated by a DataFlow.

When you build a DataFlow using an input DataSet with PDP policies in place, the DataFlow breaks unless at least one of the following criteria applies:

  • You have an "Admin" security role or a custom role with "Manage DataFlows" enabled.

  • You are the DataSet owner.

  • You are part of the "All Rows" policy. This gives you access to all of the rows in the DataSet.

For more information about using PDP with DataFlows, see PDP and DataFusions/DataFlows

Video - Magic ETL Overview

Creating ETL DataFlows

Use the steps in this section to help you create ETL DataFlows.

Example Magic ETL DataFlow.png

To create an ETL DataFlow,

  1. In Domo, click Data in the toolbar at the top of the screen.

  2. Click ETL in the Magic Transform toolbar at the top of the window.

    Tip: You can also open the ETL editor from anywhere in Domo by selecting in the app toolbar and selecting Data > ETL.
  3. Add and configure an Input DataSet by doing the following:

    1. In the Actions panel, expand DataSets, then drag Input DataSet to the canvas.

    2. Click the Input DataSet action, then select the DataSet to transform.

  4. Add an Output DataSet by doing the following:

    1. In the Actions panel, in DataSets, drag Output DataSet to the canvas.
      You can configure the Output DataSet action after you connect an action to it.

  5. Drag other actions from the Actions panel to the canvas to transform (clean, aggregate, join, etc.) the input DataSets.
    For more information, see the following:

  6. Draw connections between the transform actions to sequence operations in the transformation flow.

  7. Configure each action, by clicking the action, then specifying the options. 

    Tips: You can get help on an action in the canvas by clicking the action, then clicking . You can also select a number of actions at once by clicking on the canvas then dragging the mouse pointer over them. Once multiple actions are selected, you can drag all of the selected actions as a group to where you want them. You can also delete the selected actions by clicking Delete in the panel on the left side of the screen.
  8. Configure the Output DataSet action by doing the following:

    1. Connect an action to the Output DataSet action.

    2. Click the Output DataSet action, then specify the name of the new DataSet to output.

  9. (Optional) Configure settings for when the transformation flow runs.
    By default, the transformation flow runs only when you run it manually. You can schedule the ETL DataFlow to run whenever the specified input DataSets change or at a set time.

  10. Specify the name and description of the ETL DataFlow.

  11. Click Save to save the ETL DataFlow, enter a version description if desired, then click Save to confirm.

When you save a DataFlow, an entry for this version is added to the Versions tab in the Details view for the DataFlow. If you entered a description when saving, that description is shown in the entry for the DataFlow. For more information about versions, see Viewing the Version History for a DataFlow.

Note: Many users ask why output DataSets for a DataFlow are not marked as "Updated" when the DataFlow runs successfully. This is usually because the data has not actually changed—no update has occurred. Therefore, the DataSets do not show as updated.

Best Practices for Creating DataFlows

Each DataFlow should...

  • only include DataSets that are necessary for the output DataSet.

  • filter out rows you don't need at the beginning of the DataFlow.

  • reduce the number of columns to only those you need.

  • include descriptive names for each tile in the DataFlow.

  • include a description of the DataFlow that lists:

    • the input DataSets being merged or manipulated

    • the DataSet being created

    • the owner of the DataSets

  • be named the same as the output DataSet—Because the outputs of a DataFlow become their own DataSet in the Data Center, this allows for easy identification of which DataSets are produced by which DataFlows.

  • be aware that some tiles take longer than others, including:

    • Group By

    • Join Data

    • Remove Duplicates

    • Pivot

    • Rank & Window

    • Scripting

    • Data Science