You can set alerts on all metadata that Databand tracks - everything from run durations to data quality metrics.
A user can define alerts individually for every pipeline and dataset, Additionally, auto-alerts allow to create alerts from all pipelines originated from a specific source. To learn more, check Automatic Bulk Alerts Creation.
- In the Alerts tab in the sidebar, click the button 'Add Alert' at the top right corner of the screen.
- In the 'Add Alert Settings For Pipeline' drop-down, list select a pipeline.
- Select a metric from the pre-populated values to create an alert based on that metric. Alerts can be created at the Run level or the Task level.
Run alerts cover metadata from the pipeline/DAG execution, including overall duration and state (running/succeeded/failed).
If a run is restarted, its state and duration alert would be updated accordingly. For example, if a pipeline run triggers a Run State - Failed alert, and the run is subsequently restarted, this alert would auto-resolve, and the alert would only reappear if the restarted run fails again.
You can define an alert on any user metrics created within your pipeline tasks.
Data health alerts cover metadata coming from tasks, operators, or functions from a pipeline. Examples include data profiling metrics like standard deviation or mean from a column, missing dataset operations, or dataset schema changes.
Range alerts (“% Range” in the UI) can be used to detect when a usually stable metric reports an unexpectedly low or high value. You can set such alerts in the Alerts tab on numeric metrics. Some examples include:
- Run Duration (seconds)
- Delay Between Subsequent Runs (seconds)
- Compute Time (seconds)
- Custom user metrics that are numeric in nature
To set up a Range Alert, select a metric, and choose the "% Range" operator. You can set a target value and the range (based on a percentage of the target value) of values above and below the target value that you consider acceptable. The default range value will be the metric's value from the most recent run. Boundaries can be set between 0% and 100% in increments of 10%.
For example, you may have a pipeline that typically completes in 10 minutes or 600 seconds; however, you know that deviations of a few minutes in either direction are not uncommon. In this case, you might create an alert on the run duration with a target value of 600 and boundaries of 20%. This means that an alert would only be triggered if your pipeline completes more quickly than 8 minutes (600 seconds x 80%) or more slowly than 12 minutes (600 seconds x 120%).
The chart below displays range alert parameters and recent values of the selected metric. Values that lie outside the specified boundaries are marked in red in the chart - they will trigger alerts.
You can define SLA alert on Dataset creation. Databand will trigger an alert when a dataset isn’t updated once a day. The user can set the last hour that data can arrive in the past 24 hours or 1 hour.
The alert will be triggered a few seconds after the chosen time. For example, if the alert is set for 15:00, it will trigger at 15:01~. The dataset is being checked for the last 24 hours or 1 hour based on user selection.
You can define specialized alerts that involve multiple conditions for the alert to fire. See the Advanced Alerts page for more info.
You can define your own alerts based on acceptable ranges or values, or you can enable automated alerting using Databand's anomaly detection algorithms which will alert based on trends of metadata from previous runs.
Alert Operator List:
- Equal to
- Greater than
- Less than
- Greater than or equal to
- Less than or equal to
- Not equal to
- Value missing
- Anomaly detection - an automated trend alerting.
- Range alert (define a target value and a percentage range around that value that is considered acceptable for values of your metric)
By default, when an alert is defined or updated, it will be applied on any run that happened in the last 48 hours. Unselect the "Create alerts on historical runs for last 2 days" checkbox if you want an alert definition to apply on future runs only.
You can view all alerts from the Alerts page:
It is possible to send alerts to Slack, email, PagerDuty, Opsgenie, and custom webhooks. This functionality is controlled through alert receivers. Slack Slack and Email receiver configuration is available in the UI UI under Settings-> Receivers. If you want to configure another receiver type, contact our Cloud Support Team over Slack or email.
Multiple receivers can be configured. By default, all alerts are sent to all receivers.
Email alerting prerequisites
You must whitelist the [email protected] email address. Email alerts from Databand will be sent from this address.
By default, all alerts are sent to all receivers. We support routing alerts to specific receivers based on the alert properties. The same alert can be sent to more than one receiver.
Alert properties that can be used to route alerts are:
- Pipeline Name - For example, route all pipelines matching mysql_loader to one channel while routing all pipelines matching s3_loader to another channel. Regular expressions can be used in the pipeline name.
- Alert Severity and Alert Definition ID - For example, route High and Critical severity alerts to Slack and PagerDuty while routing Low and Medium severity alerts to Slack only.
- Source and Project Names - For example, route all alerts from your production Airflow instance to PagerDuty.
Alert routing is not exposed in the UI yet. Please contact the Databand Cloud Integration Team, and they will be happy to configure this for you.
If you defined an alert on a certain task or run state, the alert was triggered, and then you decided to restart the task or run, the previous alert will be automatically resolved, and a new one will be triggered for the task or run only if the new state matches the alert condition.
Updated 7 months ago