[deprecated] Amazon Managed Workflows Airflow

Follow the instructions on this page to configure the standard Airflow integration.

Amazon Managed Workflows is a managed Apache Airflow service that makes it easier to set up and operate end-to-end data pipelines in the AWS cloud at scale.

Installation Guide

Databand integrates with Amazon Managed Workflows (MWAA) to provide you with observability over your MWAA Airflow DAGs. This guide will cover platform-specific steps for tracking MWAA in Databand. If you need to create or configure an Amazon Managed Workflows environment, we recommend reading the Getting Started for Amazon Managed Workflows documentation.

Collect Your MWAA Details

Before integrating MWAA with Databand, we need the following 2 pieces of MWAA metadata:

  • MWAA Airflow UI URL
  • MWAA S3 Storage location

MWAA URL

Airflow URL can be located in AWS Console:
Go to AWS MWAA>Environments>{mwaa_env_name}>Details>Airflow web UI.

Format to use: https://<guid>.<aws_region>.airflow.amazonaws.com

MWAA S3 Bucket Location

Go to AWS MWAA>Environments>{mwaa_env_name}>DAG code in Amazon S3>S3 Bucket

Installing DBND in MWAA

In the MWAA’s S3 bucket, update your requirements.txt file with the following lines:

dbnd-airflow-auto-tracking==REPLACE_WITH_DATABAND_VERSION

Update the requirements.txt version in the MWAA environment configuration. Please note that saving this change to your MWAA environment configuration will trigger a restart of your Airflow Scheduler.

For more information on installing 'extras' in MWAA see Installing Python dependencies - Amazon Managed Workflows for Apache Airflow. For Databand installation details, please check Installing DBND

Installing Monitor DAG

To sync report DAG execution and DBND metrics to Databand, you will need the databand_airflow_monitor DAG running in you MWAA environment.

  1. Create databand_airflow_monitor DAG in Airflow. Please create a new file databand_airflow_monitor.py with the following dag definition and add it to your project DAGs:
from airflow_monitor.monitor_as_dag import get_monitor_dag
# This DAG is used by Databand to monitor your Airflow installation.
dag = get_monitor_dag()
  1. Deploy your new DAG and enable it in Airflow UI.

Airflow Syncer

To complete the configuration you need to define Airflow Syncer in Databand and create Airflow Connection with Databand URL and configuration params. See Apache Airflow Syncer for detailed instructions.


Did this page help you?