Integrating Google Cloud Composer

Setup instructions for Google Cloud Composer integration with Databand.

🚧

Before you start

If you need to create or configure a Google Cloud Composer environment, we recommend reading the Cloud Composer Getting Started documentation.

Installation Guide

Databand integrates with Google Cloud Composer to provide you observability over your Composer DAGs. This guide will cover platform specific steps for tracking Composer in Databand.

Collect Your Cloud Composer Details

Before integrating Cloud Composer with Databand, we need the following 2 pieces of Google Cloud Composer metadata:

  • Cloud Composer URL
  • Cloud Storage location

Both can be located in GCloud Console.

Airflow URL: GCloud Console>Composer>{composer_env_name}>Environment Configuration>Airflow web UI.
Format - https://<guid>.appspot.com

Cloud Storage Location: GCloud Console>Composer>{composer_env_name}>>Environment Configuration>>DAGs folder.
Format - gs://<bucket_name>-bucket/dags

Establishing Communication between Airflow and Databand

Two objects must exist for Databand to establish communication to your Airflow environment:

  • Databand Airflow Syncer
  • Airflow HTTP connection object

For step-by-step instructions on how to create these objects, refer to our documentation on Tracking Apache Airflow.

πŸ“˜

Configuring Airflow 2.0+ for Tracking

For Databand tracking to work properly with Airflow 2.0+, you need to disable Lazy Load plugins. This can be done using the following configuration setting: core.lazy_load_plugins=False or you need to set the environment variable AIRFLOW__CORE__LAZY_LOAD_PLUGINS=False.

You can read more about Lazy Load plugins in the Plugins β€” Airflow Documentation.

The screenshot below provides an example of setting this property in your Composer.

Update Cloud Composer PyPI packages

🚧

Completing this step will trigger a restart of the Airflow scheduler

If restarting your scheduler is a challenge, let the Databand team know, and we can provide best practices as well as integration options for data monitoring that do not require a restart.

Update your environment's PYPI Packages with the following entries. You need to use the most recent Databand versions (for example, if you're running v.0.45.4, this is what you should use instead of ``0.4X.X):

Package Name

Extras and versions

dbnd-airflow

==0.4X.X

dbnd-airflow-auto-tracking

==0.4X.X

dbnd-airflow-monitor

[direct_db]==0.4X.X

🚧

Reminder

Please note, saving this change to your Cloud Composer environment configuration will trigger a restart of your Airflow Scheduler!

Your settings should look similar to this screenshot:

For more information on installing packages in Google Cloud Composer please see Installing Python dependencies | Cloud Composer | Google Cloud.

Installing Monitor DAG

To sync report DAG execution and dbnd metrics to Databand, you will need the databand_airflow_monitor DAG running in your Cloud Composer environment.

  • Find the DAG in the following Databand GitHub repository. Specifically in:
    /Airflow_Monitor/dags/databand_airflow_monitor.py
  • Copy databand_airflow_monitor DAG into your Cloud Storage Bucket dags/ folder (to locate it, see Gather Cloud Composer details step at top of page)
  • Enable databand_airflow_monitor DAG in the Cloud Composer Airflow UI

What’s Next
Did this page help you?