Best Practices, Considerations, and Requirements for Using Redshift and Rivery

In order to make sure the data you’re loading to your Redshift cluster works seamlessly, and your table structures or views won’t be harm during the process, please note the considerations and requirements when loading data to Redshift, using Rivery.

In this section:

  • Working with Views
  • Loading Arrays
  • Case Sensitivity in Redshift
  • Using SSH Tunneling in Order to Connect Your Redshift Cluster

Working with Views

TL;DR - Use WITH NO SCHEMA BINDING clause in the CREATE VIEW command,
when creating views, based on the target table created by Rivery.

Rivery loading method, in most cases, uses a switch table mechanism in order to make sure that the current table and all its dependencies aren’t harmed. In the process, Rivery creates a temporary clone, perform the necessary altering and data updating above it, and then switches between the target table and the temp table.

Due to old limitations in Redshift, and in purpose to save all the dependencies above the target table, Rivery recreates the entire direct tree that is based on the table. The tree is created only for views and views-of-views that exist in the same schema as the target table. Other views, that are based on these views or tables, in other schemas, may be dropped.

Therefore, Rivery requires to use WITH NO SCHEMA BINDING clause when creating a view, using the CREATE VIEW command.

For example:

CREATE OR REPLACE VIEW public.my_events_view AS
SELECT name as event_name FROM public.events
WITH NO SCHEMA BINDING;

That will make sure the views won’t be connected to the target table, that may be switched in the process. This feature will keep your views as they are, and won’t consider that the table actually exists in the database before the view is created.

More info about creating views in Redshift can be found here .

Loading Arrays

TL;DR - Loading Arrays into Redshift are represented as VARCHAR(max) in the table. Use the built-in JSON functions in Redshift in order to parse it.`

Redshift doesn’t support yet complex structure schema, using JSON type or Arrays in the table. Therefore, Rivery splits objects in the JSONs to columns, using pattern.

However, the array type isn’t so flexible in Redshift and may vary between the loads. In order to make sure the array is loaded, Rivery loading arrays as VARCHAR(max) type. If the array is longer than the max varchar length in Redshift, using the Truncate Columns option under Advanced Options in the Target tab will truncate the data to the max size allowed.

Case Sensitivity in Redshift

TL;DR - Rivery relies on the lower-case sensitivity of Redshift

Rivery mechanism is based upon the assumption of the default case sensitivity in Redshift, which is lower-case. This case sensitivity is the base of all creation, updating and managing schemas, tables, columns, views, etc. using rivery. Please make sure your metadata doesn’t contain the same name with two different cases.

Using SSH Tunneling in Order to Connect Your Redshift Cluster and Rivery

TL;DR - Connect a private Redshift cluster and Rivery using SSH Tunnel instance

In order to use the SSH Tunneling option via Rivery, please configure your SSH Tunnel server in your network. This server will have the ability to get SSH connections from Rivery IPs and will
be able to connect to your Redshift cluster via the internal IP or address. Rivery connects
to that SSH server using a private key.

After you’ve configured SSH tunneling, you can set it up in the Rivery Connection:

Under the SSH Options section, in the MySQL Connection Pop-up, you can define the following:

  • SSH Hostname
  • SSH Port (in most of the connection, that should be 22)
  • The Username Rivery should connect via the SSH to the server
  • Password (optional)
  • SSH Pem Key (Recommended)
  • And if you have also a Pem Password for that key file.
3 Likes