Prohibiting schema changes

Currently Rivers can change table schemas in database - like adding new fields, for example. I would very much like it would not happen and any River with a typo in a column name would simply fail (preferably with a message saying the column does not exists in the target table), rather than create a new column right away.

Is it possible to change configurations anywhere to restrict schema changes? And if not, are there any plans to add the feature?

Hi there, SchemaKeeper :slight_smile:

Which source are you referencing? It is true that schema changes (in any multi-tables rivers) are auto-detected and handled currently.

There are a couple workarounds for this as of now to accomplish what you want.

  1. Use a legacy river instead (there is a limitation here to one table per river however). This will lock in the schema and throw an error if changes detected.
  2. What is your destination warehouse? If it is Snowflake, we offer a more ‘strict’ type of upsert-merge that will fail if schema of the source doesn’t match the target. You can read more about these options here: Snowflake Upsert-Merge Loading Mode Options - Snowflake

I do agree there should be a built-in option to auto-detect or not for multi-tables rivers, and we will raise this with our Product team. Thanks for your feedback and let me know if you have any further questions!

-Taylor

Thanks for your reply, Taylor. My destination is Big Query using upsert-merge and river is source to target.
What is a legacy river? Apart from source to target, the only available options I see are a Logic River, an Action River, and Kits. Don’t seem to see a “Legacy River” anywhere for my case.

Apologies for not clarifying there, I thought you were asking about schema changes from a database source. For pulling from databases, there is the choice of river mode in the source tab:

As for loading data from a file (i.e. from email, SFTP, etc), the way we intake currently defined in the river, by name or position. If the name of a field changes, it will incur an error message as desired.
image

If this is not working as expected, please let me know!

Thanks,
Taylor

Thanks, @taylor.mcgrath

This is not working as expected. Since I was loading from a csv file from an email, I had the Name option chosen as you described.

Still, it created a new column in the Big Query’s table schema because I had a typo in one of the names in Fields Mapping in the Source tab. And let me emphasize again the severity of it: it created a new column in our database’ table Schema, not River’s Schema tab. I did not anticipate Rivery has such permissions, and I suspect most users would not want their Rivers changing anything in their table schemas.

Hi SchemaKeeper,

You are saying it created a new field in your destination DWH table but not the river schema tab itself?

Please create a support ticket by clicking ‘Help’ > ‘Open Support Ticket’ in the console and our support engineers can check out your individual case.

Apologies for the inconvenience here.

1 Like

Hello, wondering if there is any news on this? We would prefer to be notified of any source table schema changes and to have a choice in terms of actions to take. I am currently investigating ways around this, it would be great if it could be handled inside Rivery though!

1 Like

Hi @jhde thanks for your feedback! We have additional schema change handling currently planned for the second half of this year.

Which “actions” would you have as options to take when there is a source schema change?

Hi Taylor, thanks for your reply, that’s good to know. Options would include notification and choice of where (if at all) to stop execution based on the schema change. e.g.

  • Notify, stop all executions in this scheduled run (where the scheduled river is a river of sequenced rivers)
  • Notify, stop current river execution (any other sequenced rivers on this schedule not impacted)
  • Notify, skip this table and continue with the next table
  • Notify and continue (allowing Rivery to rebuild the table)