ETL Actions: Combine Data
Intro
This topic discusses two methods for combing data in an ETL transformation flow: the Append Rows action, which lets you append rows from multiple DataSets into one DataSet; and the Join Data action, which lets you combine columns from two DataSets.
For information about creating an ETL DataFlow, see Creating an ETL DataFlow.
For information about the Data Center, see Data Center Layout.
Append Rows
The Append Rows action lets you append rows from multiple DataSets (with similar columns and data structure) into one DataSet, returning occurrences of all matches, including duplicate rows.
Video - Appending Rows in ETL
Example
For example, using this transform flow...
... the Append Rows action transforms this data...
and this data...
... to this...
... using this configuration:
To configure the Append Rows action,
-
Click the Append Rows action in the canvas.
-
(Optional) Rename the action by clicking
, then entering the name you want.
-
Select the option for including columns.
Option
Description
Include columns from DataSet
Includes the columns from the DataSet you select.
Include all columns
Includes unique columns from all of the input DataSets.
Only include shared columns
Includes only columns that are used in every input DataSet.
-
Review changes to be made to each DataSet.
Join Data
The Join Data action lets you combine columns from two DataSets or data "streams" into one DataSet using common values in a specific column (or set of columns). The columns must exist in both input DataSets and have the same data structure.
Video - Comparing Types of Joins in ETL
Video - Multiple Joins in ETL
Example
For example, using this transform flow...
... the Join Data action transforms this data...
and this data...
... to this...
... using this configuration:
Notes:
- In the transformed data, Domo adds "_1" to the name of specified matching columns.
- When previewing, the transformed data might not sort rows in the order you expect. You can control the row order when displaying data in table cards.
To configure the Join Data action,
-
Click the Join Data action in the canvas.
-
(Optional) Rename the action by clicking
, then entering the name you want.
-
Determine the column to use in combining data.
Notes: 1) The specific column used to combine rows must exist in both DataSets and have the same data structure. 2) Domo refers to the column in the first input DataSet as the identifying column and refers to the column in the second input DataSet as the matching column. 3) For Inner, Left Outer, and Full Outer joins, specify the primary or "lookup table" with the unique values as the first input DataSet (on the left). (The identifying column cannot have more than 10,000 duplicates of any value.) -
Select the first input DataSet (on the left), then select the identifying column you want.
The identifying column must be paired with a matching column in the second input DataSet. -
Select the second input DataSet (on the right), then select the matching column you want.
The matching column must be paired with a matching identifying column in the first input DataSet. -
Click the join type icon, then select the type of join to perform.
Join Type
Description
InnerIncludes only matching rows in both input DataSets.
Left OuterIncludes all rows from the first input DataSet (on the left) and matching rows from the second input DataSet (on the right).
Right OuterIncludes all rows from the second input DataSet (on the right) and matching rows from the first input DataSet (on the left).
Full OuterIncludes all rows from both input DataSets.
-
(Conditional) If you want to define another pair of columns, click Match Another Column, then configure the paired columns.