Prohibiting schema changes

SchemaKeeper · July 21, 2022, 2:38pm

Currently Rivers can change table schemas in database - like adding new fields, for example. I would very much like it would not happen and any River with a typo in a column name would simply fail (preferably with a message saying the column does not exists in the target table), rather than create a new column right away.

Is it possible to change configurations anywhere to restrict schema changes? And if not, are there any plans to add the feature?

taylor.mcgrath · July 21, 2022, 2:43pm

Hi there, SchemaKeeper

Which source are you referencing? It is true that schema changes (in any multi-tables rivers) are auto-detected and handled currently.

There are a couple workarounds for this as of now to accomplish what you want.

Use a legacy river instead (there is a limitation here to one table per river however). This will lock in the schema and throw an error if changes detected.
What is your destination warehouse? If it is Snowflake, we offer a more ‘strict’ type of upsert-merge that will fail if schema of the source doesn’t match the target. You can read more about these options here: Snowflake Upsert-Merge Loading Mode Options - Snowflake

I do agree there should be a built-in option to auto-detect or not for multi-tables rivers, and we will raise this with our Product team. Thanks for your feedback and let me know if you have any further questions!

-Taylor

SchemaKeeper · July 21, 2022, 2:56pm

Thanks for your reply, Taylor. My destination is Big Query using upsert-merge and river is source to target.
What is a legacy river? Apart from source to target, the only available options I see are a Logic River, an Action River, and Kits. Don’t seem to see a “Legacy River” anywhere for my case.

taylor.mcgrath · July 21, 2022, 4:03pm

Apologies for not clarifying there, I thought you were asking about schema changes from a database source. For pulling from databases, there is the choice of river mode in the source tab:

As for loading data from a file (i.e. from email, SFTP, etc), the way we intake currently defined in the river, by name or position. If the name of a field changes, it will incur an error message as desired.

If this is not working as expected, please let me know!

Thanks,
Taylor

SchemaKeeper · July 24, 2022, 6:12am

Thanks, @taylor.mcgrath

This is not working as expected. Since I was loading from a csv file from an email, I had the Name option chosen as you described.

Still, it created a new column in the Big Query’s table schema because I had a typo in one of the names in Fields Mapping in the Source tab. And let me emphasize again the severity of it: it created a new column in our database’ table Schema, not River’s Schema tab. I did not anticipate Rivery has such permissions, and I suspect most users would not want their Rivers changing anything in their table schemas.

taylor.mcgrath · July 24, 2022, 8:58pm

Hi SchemaKeeper,

You are saying it created a new field in your destination DWH table but not the river schema tab itself?

Please create a support ticket by clicking ‘Help’ > ‘Open Support Ticket’ in the console and our support engineers can check out your individual case.

Apologies for the inconvenience here.

jhde · June 12, 2023, 11:51am

Hello, wondering if there is any news on this? We would prefer to be notified of any source table schema changes and to have a choice in terms of actions to take. I am currently investigating ways around this, it would be great if it could be handled inside Rivery though!

taylor.mcgrath · June 12, 2023, 1:38pm

Hi @jhde thanks for your feedback! We have additional schema change handling currently planned for the second half of this year.

Which “actions” would you have as options to take when there is a source schema change?

jhde · June 12, 2023, 2:28pm

Hi Taylor, thanks for your reply, that’s good to know. Options would include notification and choice of where (if at all) to stop execution based on the schema change. e.g.

Notify, stop all executions in this scheduled run (where the scheduled river is a river of sequenced rivers)
Notify, stop current river execution (any other sequenced rivers on this schedule not impacted)
Notify, skip this table and continue with the next table
Notify and continue (allowing Rivery to rebuild the table)

Topic		Replies	Views
Upsert Merge Explained Tutorial	0	2095	July 29, 2020
New Feature - Customize Target Table Names in Multi-Table New Features	0	916	July 21, 2020
Best Practices, Considerations, and Requirements for Using Redshift and Rivery Best Practices	0	1094	February 3, 2020
Run a single table in a Multi-table SQL River Source to Target	3	766	January 11, 2022
[Post Mortem] Insufficient support on dropped columns in multi-table rivers Post Mortem	1	734	June 24, 2020

Prohibiting schema changes

Related topics