Handling Hard Deletes from MySQL in Snowflake

Background: With CDC, hard deletes in your MySQL source tables will be flagged as soft deletes (__deleted = TRUE) in your target table. If you only need to know where the hard deletes records are in order to filter them out in your Snowflake queries, you can just follow the Part 1 instructions below.

If you need to make sure the records are deleted from your target as well, you can run a logic river post-CDC extraction and select out these soft deletes and republish the table without the deleted records (Part 2 below).

Prerequisite: Configure CDC on your MySQL database using these directions. Once configured, follow the steps below:

Part 1 - Source to Target river

  1. Create a new Source to Target river.
  2. Under the Source tab, select MySQL and set the Extraction Mode to CDC.
  3. Under the Target tab, select your target location and, if loading to a database, select your Default Loading Method. Upsert-Merge is the most common Loading Method used when extracting via CDC. The default will be applied to each selected table.
  4. Under the Schema tab, select the schema and tables.
  5. Within each Schema, click the ellipses on the right side of the table and select “Set initiate migration for all tables”.
  • This will ensure that the current state of the table is accurately captured.
  1. Set the appropriate schedule for your river.
  2. Click Run to save and run the river migration.

Part 2 - Logic river

  1. Create a new Logic River.
  2. On the right side of the first step, select “Add Container” → “Run Once”.
  3. Within the Container, add steps with “SQL / DB Transformations” selected. You will need one step per table where you would like to remove the deleted rows.
  • Note that you can turn the “Parallel Steps” toggle on if you would like to run the SQL transformations at the same time.
  1. Within each SQL / DB Transformation step, setup your Snowflake configuration and write a query to remove the soft deleted records from the table.

  2. Select “Table” as your target and configure where you would like the output of your query saved.

  3. Then, under the Settings tab, configure the frequency at which you would like this Logic river to run.

  4. Click Save.

If using Part 2 configuration described above, the SQL transformations will run on a schedule that is independent from your CDC load. This will cut down on river costs since you can decide to run your CDC river more frequently and then run the SQL transformations in the logic river less frequently. If you prefer to perform the SQL transformations after each CDC run, you would omit Part 1 Step 6. Then in Part 2, your first step in the Logic river would be to choose “River” and select the MySQL river you created in Part 1, then you would create a second Logic step and proceed with Part 2 Step 2 above.

1 Like