Skip to main content
waffle.svg
Domo Knowledge Base

Creating an SQL DataFlow (with Run to Here)

Version 2

 

Intro

You can create DataFlows using SQL queries. This is more difficult than using ETL but may provide a greater breadth of options. For information about creating an ETL DataFlow, see ETL DataFlows.

Important: Input DataSets in a DataFlow cannot be restricted by PDP policies—all available rows must pass through the DataFlow. Because of this, you must apply PDP policies to the output DataSets generated by a DataFlow.
 

When you build a DataFlow using an input DataSet with PDP policies in place, the DataFlow breaks unless at least one of the following criteria applies:

  • You have an "Admin" security profile or a custom role with "Manage DataFlows" enabled.

  • You are the DataSet owner.

  • You are part of the "All Rows" policy. This gives you access to all of the rows in the DataSet.

For more information about using PDP with DataFlows, see PDP and DataFusions/DataFlows

You access the interface for creating an SQL DataFlow from the Data Center. For more information about the Data Center, see Data Center Layout.

4-Part Video - MySQL DataFlow Tutorial

Part 1 of 4

 Part 2 of 4

Part 3 of 4

Part 4 of 4


Video - Combining DataSets Using DataFlows

 

 

Creating an SQL DataFlow

Note: Redshift does not support stored procedures, as documented here: http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html. The alternative is to use a MySQL DataFlow, as MySQL does support stored procedures.  

To create an SQL DataFlow,

  1. In Domo, click Data in the toolbar at the top of the screen.

  2. Click SQL in the Magic Transform toolbar at the top of the window.

    Tip: You can also open the SQL DataFlow editor from anywhere in Domo by selecting in the app toolbar and selecting Data > SQL.
  3. Select the type of DataFlow you want to create.
    This opens the Create DataFlow view.

  4. Enter a name and description for the DataFlow.

  5. Select the input DataSets you want in the DataFlow by doing the following:

    1. Click Select a DataSet.

      Note: Input DataSets in a DataFlow must already exist in Domo; you cannot upload new DataSets in the Create DataFlow view. For information about uploading new DataSets to Domo, see Connecting to Data with Connectors.
    2. Select the desired DataSet.
      Information appears for the DataSet you selected, including the number of cards powered by the DataSet, last update time, column names, etc.

      sql_dataflow_info.png

    3. Click Select DataSet.
      A tile for the selected DataSet appears on the screen. You can click the DataSet tile to view the preview window.

      1. (Optional) Under the Select Columns tab, you can remove columns you don't want to include in the DataSet by clicking the "X" to the right of the column name, or remove all columns by clicking None in the upper right corner of the pane.

        You can also add individual columns back into the DataSet by clicking the Add Column button and selecting the columns, or add all columns back into the DataSet by clicking All in the upper right corner of the pane.

      2. (Optional) Under the Configure tab, add indices if desired. Indices can help to increase the speed of operations in the table. For more information, see Easy indexing later in this article. 

      3. In the Update Method menu at the bottom left, select how the DataSet will be processed. Options include processing the entire DataSet or only new rows appended since the last DataFlow run.

        sql_dataflow_preview2_new.png

        Important: You can only use the DataFlow append processing method if that input DataSet is configured to update with an append method. For more information on scheduling your DataSet with an append type update, see Basic Scheduling of a DataSet.

        Tip: If your input DataSet has a large number of rows, try using the append processing method. This will allow your DataFlow to run much quicker if it's only processing the new rows.
    4. Select as many additional input DataSets as you want by repeating step 5.

  6. (Optional) Add transforms by doing the following:

    1. Click Add Transform.
      A Transform dialog appears with various options. For more detailed information about these options, see Understanding Transform and Output DataSet options.

    2. Enter SQL code to make the desired transformations to the input DataSet(s).

    3. Click Apply.

  7. Click Add Output DataSet.
    An Output DataSet dialog appears with various options. For more detailed information about these options, see Understanding Transform and Output DataSet options.

    1. Enter SQL code to indicate how you want the input DataSets to be combined.
      If you have transformed the input DataSets in the Transform dialog, those transformations are applied here.

    2. (Optional) Add additional output DataSets by repeating the previous two steps.

    3. Click Done.

  8. Schedule your DataFlow in the Settings pane. For more information on the different scheduling options, see Scheduling DataFlows.

    Note: If a schedule hasn't been specified, your DataFlow will default to a manual schedule.
  9. (Conditional) If you want to run the DataFlow in Strict Mode, click the Settings button at the top of the screen then toggle the Strict Mode option to on.
    For more information about this option, see Understanding Strict Mode

    sql_dataflow_strict_mode.png

  10. (Conditional) Do one of the following to save your DataFlow:

    • If you want to run the script that outputs this DataFlow to a usable DataSet in Domo, click the orange down arrow in the upper right corner of the screen, select Save and Run, enter a version description if desired, then click Save to confirm.

      This starts the process of generating DataSets from the DataFlow. This generation process may take from a minute to an hour or more, depending on the size of the input DataSets. In addition to generating DataSets, a card for the DataFlow is added to the DataFlows listing in the Data Center.

    • If you want to save this DataFlow without outputting it to DataSets at this time, click Save, enter a version description if desired, then click Save again to confirm. A card for the DataFlow is added to the DataFlows listing in the Data Center, but no DataSets are generated. You can run the DataFlow to output DataSets at any time by mousing over the card for the DataFlow in the DataFlows listing, clicking , and selecting Run. This option and the other options available in this menu are discussed later in this topic.

When you save a DataFlow, an entry for this version is added to the Versions tab in the Details view for the DataFlow. If you entered a description when saving, that description is shown in the entry for the DataFlow. For more information about versions, see Viewing the Version History for a DataFlow

Note: Many users ask why output DataSets for a DataFlow are not marked as "Updated" when the DataFlow runs successfully. This is usually because the data has not actually changed—no update has occurred. Therefore, the DataSets do not show as updated.  

Understanding Transforms and Output DataSet options

The Transform and Output dialogs have mostly similar options. The only difference between the two is that when you choose to create a table transform, an Indexing tab appears in which you can add indices. This tab does not appear for output DataSets. 

The following screenshot shows you the main components of these dialogs.
 

 

dataflow_transform_ui_new.png

You can use the following table to learn more about these components:

Component

Description

Inputs/Transforms List

Shows the names of all input DataSets you have selected for this DataFlow for your convenience in writing SQL statements. In addition, any previously created transforms appear here. These names appear in both the Transform and Output DataSet dialogs. If you transform one or more input DataSets in the Transform dialog, those transforms are applied to the input DataSets when you refer to them in the Output DataSet dialog.

Clicking on an input DataSet opens a list of all columns in the DataSet and their data types (decimal, text, date/time, etc.).

Run SQL menu

Provides access to several options:

  • Run SQL. Lets you run a test to determine whether your SQL is valid.

  • Run to here. Runs all steps in your SQL up to this point. In the case of very large DataFlows with numerous transforms or outputs, this may help you save time when testing your DataFlow's code. 

  • Explain SQL. Shows the explain plan for the SQL—an ordered list of the steps the database will make when executing the query. This can be used to help you optimize your queries. However, because these steps are highly complex and technical, this feature is recommended for advanced users only.  
     

    Note: Because MySQL determines which indexes are needed to improve run-time efficiency, not all of your index transforms may show up in the Explain SQL if MySQL determined that running the index would cause inefficiencies.

SQL field

Lets you create SQL code to transform and/or combine your input DataSets. The field contains an autocomplete feature for your convenience in writing code.

Preview button Opens the preview area at the bottom of the pane after you have chosen Run SQL or Run to here.

When creating a new transform, you can choose to create either a table- or SQL-type transform.

  • A table transform creates a new table using a SELECT statement and will always generate an output table. Due to an output table being generated, you can create easy indexes based on these tables. 

  • An SQL transform creates a table that typically doesn't include a SELECT statement such as a stored procedure. This type of transform does not generate an output table.

Both transforms and output DataSets support the "Run to here" option. With "Run to here," you can choose to run the DataFlow only up to the selected point. In the case of very large DataFlows with numerous transforms or outputs, this may help you save time when testing your DataFlow's code. To use this option, select Run to here in the wrench menu for the transform or output DataSet, or open the editor for the transform or output DataSet, click the arrow next to the Run SQL menu, and select Run to here

You can also add transforms either above or below a given transform in your DataFlow. To do this, select Add transform above or Add transform below in the wrench menu for the transform, then select whether you want to create a table- or SQL-type transform. 

Easy indexing

An index is a data structure that improves the speed of operations in a table. With Easy Indexing, you can quickly add an index on one or more columns right to your input DataSet or table transform.

To create an index in an input DataSet or transform,

  1. Open the editor for the input DataSet or transform.

  2. Select the Indexing tab.

  3. Choose your Index Type.

  4. Select which column to apply the index.

  5. Click Done


Video - Explain Plans

 

Understanding Strict Mode

Strict Mode controls how MySQL handles invalid or missing values in data-change statements and also affects the handling of division by zero, zero dates, and zeroes in dates. You can find more info at https://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sql-mode-strict.

Best Practices for Creating DataFlows

Each DataFlow should...

  • include descriptive names for each step of the transformation.

  • include a description of the input DataSets being merged or manipulated and the DataSet being created, and should also indicate the owner of the data.

  • be named the same as the output DataFlow—Because the outputs of a Dataflow become their own DataSet in the Data Center, this allows for easy identification of which DataSets are produced by which DataFlows.