Integrating Amazon Managed Workflows Airflow

Follow the instructions on this page to configure the standard Airflow integration.

🚧

Before you start

If you need to create or configure a Amazon Managed Workflows environment, we recommend reading the Amazon Managed Workflows for Apache Airflow Getting Started documentation.

Installation Guide

Databand integrates with Amazon Managed Workflows (MWAA) to provide you observability over your MWAA Airflow DAGs. This guide will cover platform specific steps for tracking MWAA in Databand.

Collect Your MWAA Details

Before integrating MWAA with Databand, we need the following 2 pieces of MWAA metadata:

  • MWAA Airflow UI URL
  • MWAA S3 Storage location

MWAA URL

Airflow URL can be located in AWS Console:
Go to AWS MWAA>Environments>{mwaa_env_name}>Details>Airflow web UI.

Format to use: https://<guid>.<aws_region>.airflow.amazonaws.com

MWAA S3 Bucket Location

Go to AWS MWAA>Environments>{mwaa_env_name}>DAG code in Amazon S3>S3 Bucket

Establishing Communication between Airflow and Databand

Two objects must exist for Databand to establish communication to your Airflow environment:

  • Databand Airflow Syncer
  • Airflow HTTP connection object

For step-by-step instructions on how to create these objects, refer to our documentation on Tracking Apache Airflow.

πŸ“˜

Configuring Airflow 2.0+ for Tracking

For Databand tracking to work properly with Airflow 2.0+, you need to disable Lazy Load plugins. This can be done using the following configuration setting: core.lazy_load_plugins=False or you need to set the environment variable AIRFLOW__CORE__LAZY_LOAD_PLUGINS=False.

Enabling this setting will require you to restart the Airflow webserver.

You can read more about Lazy Load plugins in the Plugins β€” Airflow Documentation.

Installing DBND in MWAA

Update MWAA requirements.txt File

🚧

Completing this step will trigger a restart of the Airflow scheduler!

If restarting your scheduler is a challenge, let the Databand team know, and we can provide best practices as well as integration options for data monitoring that do not require a restart.

In the MWAA’s S3 bucket, update your requirements.txt file with the following lines:

Package Name

dbnd-airflow

dbnd-airflow-auto-tracking

dbnd-airflow-monitor[direct_db]

Update the requirements.txt version in the MWAA environment configuration.

🚧

Reminder

Please note that saving this change to your MWAA environment configuration will trigger a restart of your Airflow Scheduler!

For more information on installing 'extras' in MWAA: Installing Python dependencies - Amazon Managed Workflows for Apache Airflow.

Installing Monitor DAG

To sync report DAG execution and DBND metrics to Databand, you will need the databand_airflow_monitor DAG running in you MWAA environment.

  • Find the DAG in the following Databand GitHub repository. Specifically in:
    /Airflow_Monitor/dags/databand_airflow_monitor.py
  • Clone the repository locally.
  • Copy databand_airflow_monitor DAG to your dags/ folder on S3 (to locate it, see MWAA details step at top of page).
  • Enable databand_airflow_monitor DAG in the MWAA Airflow UI.

Did this page help you?