dbt jobs can be tracked with Databand in two ways:
- Using Databand's SDK to track dbt jobs that are triggered by Airflow (or any other Python orchestration)
- Directly monitoring dbt Cloud using Databand's dbt monitor
Track dbt runs triggered by Airflow:
You can use Databand to track data from dbt jobs when these jobs are triggered by Airflow by using the
collect_data_from_dbt_cloud function as shown below.
dbnd SDK installed in airflow environment
Airflow is successfully integrated with databand, instructions can be found here
dbt cloud account id and dbt cloud api token
Creating Cloud API token- Please follow the instructions in dbt Cloud's API documentation to create a dbt Cloud API token. This token will be needed when creating the integration with Databand.
Obtain your dbt Cloud account ID - Sign in to your dbt cloud account via your browser. Your dbt cloud account id is the number directly following the
accountspath component of the URL.
Tracking dbt cloud runs triggered by Airflow DAGs:
A common integration of Airflow and dbt cloud is as follows:
- Airflow run DAG
- Airflow task triggers a single run of a dbt job in the cloud account
- Task that polls the cloud API for the run’s status using a run_id to determine how to proceed
In order to track the dbt job using Databand, use Databand’s
collect_data_from_dbt_cloud function once the job is complete.
from dbnd import collect_data_from_dbt_cloud dbt_cloud_run_id = 1234 account_id = 4433 dbt_cloud_api_token = "5a42af03214326778999ccfdbf000044448888bb" # code for waiting to dbt run to finish.... collect_data_from_dbt_cloud(dbt_cloud_account_id=account_id, dbt_cloud_api_token=dbt_cloud_api_token, dbt_job_run_id=dbt_cloud_run_id)
Tracking dbt Jobs in dbt Cloud
You can use Databand's dbt monitor to track dbt jobs by directly monitoring your dbt Cloud environemnt. This will allow Databand to track your dbt jobs regardless of how they are triggered (Scheduled run, Airflow trigger, manual trigger etc..)
To fully integrate Databand with your dbt Cloud environment:
- Configure a new dbt syncer at Databand's Application.
- After the dbt syncer is configured, you will be able to see your dbt job runs as pipelines in Databand.
Tracking Airflow triggered jobs
If you are tracking dbt data from Airflow, using the
collect_data_from_dbt_cloudfunction, as explained above, you should NOT configure a dbt syncer to sync dbt job runs from the same account. Tracking the same dbt job runs twice is currently not supported in Databand.
[dbt_monitor] Configuration Section Parameter Reference
[dbt_monitor]Configuration Section Parameter Reference
prometheus_port- Set which port will be used for prometheus.
interval- Set the sleep time, in seconds, between fetches, when the monitor is not busy.
number_of_iterations- Set a cap for the number of monitor iterations. This is optional.
stop_after- Set a cap for the number of seconds to run the monitor. This is optional.
Updated 5 months ago