Databand's Syncer is in charge of bringing all the information about the current state of execution from Airlfow DB into Databand Service. It can run as a DAG that will run "sync" operations periodically.
- By clicking your user profile picture in the Databand web application (https://yourcompanyname.databand.ai/), open the 'Settings' page.
- In the 'Settings', go to the 'Airflow Syncers' page. This page will show active Airflow integrations.
- Click the 'Add' button to add an integration to a new Airflow Instance.
- Configure the Syncer with the information about your Airflow environment.
- Airflow URL - Airflow Webserver URL
- Syncer Name - user-provided label to identify Airflow instance
- External URL - An external URL of your Airflow
- DAG IDs to Sync - Specific DAG IDs to sync from your Airflow environment. Leaving this field empty will return all DAGs in Databand.
- Include source code - Specifies whether or not to send source code of the DAGs to Databand.
- Include logs collection - Specifies whether or not to send logs from DAGs to Databand.
- Bytes to collect from the head of log - The number of KB to collect from DAGs log head. This field is present only when 'Include logs collection' is enabled. The default value is 8 KB and 8096 KB is the maximum allowed.
- Bytes to collect from the head of end - The number of KB to collect from DAGs log end. This field is present only when 'Include logs collection' is enabled. The default value is 8 KB and 8096 KB is the maximum allowed.
- Environment - Internal name of your Airflow environment
- DagRun Page Size - Number of DAGs to sync every 10 seconds
- DagRun Start Time Window - Number of days of historical data to pull from your Airflow DB
At this step, you can configure automatic alerts on your pipelines. A standalone alert definition would be created for each pipeline in an Airflow instance, and a user would be able to edit it later in the process (subject to change in a feature).
Automatic alerts are created with severity HIGH.
The following automatic alerts are supported:
- State alerts - an alert is fired when a pipeline run fails.
- Run duration alerts - an alert is fired when the run duration is too long or too short. By default, this determination is made by looking at the previous 10 runs and applying an acceptable range for what is considered normal. You can edit these alerts to change both the number of previous runs to consider in the calculation and the sensitivity of what is considered an anomaly.
- Schema change alerts - an alert is fired when the schema of a dataset has changed from the previous run, either through the addition or removal of one or more columns.
You would be prompted to configure alert receivers if no alert receivers are configured yet. Alert receivers are destinations where alerts are sent. This step is optional. If no receiver is configured, an alert would only appear in the Databand UI.
To learn more about how the Slack receiver is configured, check this guide.
After clicking 'Continue', you will see a dialog window with a ready-to-use JSON file and instructions on how to use it to enable communication from Airflow to Databand. Complete the steps in Setting Up Airflow Integration, and then click "Test Connection" to confirm the connection was set up properly.
You can also use CLI for Managing Airflow Syncers.
Updated 8 months ago