Introducing Rivery DAG's for apache airflow!

Apache Airflow is one of the most popular open source platforms for creating, scheduling, monitoring and archiving workflows. Python-based and fully community supported, it has become the de-facto utility in a data engineer’s toolkit for building pipelines.

Although Rivery’s DataOps platform has some considerable advantages over vanilla Apache Airflow (most prominently the no-code GUI and SaaS based architecture), there are some cases where one may want to integrate pipelines built in the rivery platform into an existing organizational data infrastructure/stack.

Rivery dags for apache airflow aims to bridge these use cases by creating a public repository to foster a community-based environment where developers can leverage the endpoints of the Rivery API or Rivery CLI to contribute dags for anyone to use and download.

Being able to integrate the Rivery platform with an open source tool like Airflow opens a new world of possibilities in the range of workflows that this could enable, some examples include:

  1. Triggering a rivery data source to target river followed by a dashboard refresh via an Airflow Bash/Python Operator.
  2. Triggering an external machine learning process to operate on a recently updated dataset.
  3. Orchestrate a data archival pipeline that makes a “copy” of the data recently updated via triggered rivery pipeline and stores it in an on-premise or cloud data server.
  4. Create dynamic notifications via webhooks that are called from an airflow DAG that will notify various channels (Slack, MS Teams, Email, Phone, Text message) upon completion of some rivery execution (logic/action/ingestion).

Two key API endpoints that are essential to enabling these and other use-cases are the run_river and check_run endpoints.

The dags that enable these have already been pre-configured by Rivery solutions engineers in this public repository to help you integrate into your workflow. All you have to do is to add your account’s api token, enter values for any default arguments/parameters, place the .py file into your Airflow server’s DAG folder, and execute the dags with a config file (the structure of which is detailed in the README).

For any parties that wish to contribute additional use-cases and DAG’s, please fork the repo and open a pull request, someone on our team will either approve the request or reach out to you for additional clarification/questions.

1 Like