In addition to enabling the orchestration of the ingestion and transforms of an entire data pipeline, Logic Rivers have a few ‘tricks up their sleeve’ that make for the user-friendly development of dynamic and generic data processes.
Logic Rivers provide the ability to store data values into variables and use them throughout the workflow.
Containers can be used as a mechanism for looping or adding conditional logic to your river. Since the variable values themselves are recalculated every time the river runs, it is a truly dynamic way to develop a data pipeline.
Let’s say you want to loop through a list of values and create a corresponding database table for each value. To get a bit more specific, we’ll take some sample output from Google Adwords where you can retrieve a geographical dimension table. For simplicity, we’ll say the table only has three fields: ID, Name, and Geo_Level.
As you can see from above, the format of the data in this table may not be conducive to join on to a fact table, as its identifiers are all at different geographical levels. Let’s imagine that our ideal scenario is to have a separate dimension table for each distinct Geo_Level. How can Rivery enable us to do this in a dynamic way?
By looping through a variable list!
First, you have to create the variable list. You can add a SQL/Script logic step with a source query that returns a list of distinct geography levels, and saves these values into a variable. Note - the REPLACE() function is used to remove whitespace.
Once the variable is created, it will appear in the variables list. Make sure ‘Is Multiple Values’ is checked.
Next you create your loop. First, add a new logic step. Then wrap it in a container and set the container to ‘Loop.’
From the “for each value in _” dropdown, select the variable just created. Now, type in a different name to use as your iterator (this can be any string).
Next enter a source query that calls your iterator with curly brackets. Here we will use this twice - once in the WHERE clause and once in the table name. Thus, each iteration in the loop will create a new table and will only contain the necessary records (‘dim_city’ table will only have cities, etc.).
Now run it and watch all of your tables appear! Your results will be one table per Geo_Level, each containing corresponding data to that Geo_Level.
What’s amazing is that the loop you just created is completely dynamic - it’s based on a variable list that you created from the data itself! Thus, if tomorrow a new GeoLevel is added to the dim_geo table, it will be captured and an additional dim_* table will be created without any manual intervention.
For a more introductory post on on Logic Rivers, check this out.