GuidesAPI ReferenceDiscussions

Anomaly Detection In Alerting

Anomaly detection allows simplifying alert configuration and management. Instead of setting specific conditions on metrics that may be not known in advance or change as a data change, an anomaly alert will trace the historical value of the metric and would be triggered when the metric has an abnormal value compared to the past.
A typical example is setting anomaly on a run duration, when run duration gets an extremely high value, an alert would be triggered

An anomaly detection alert can be defined on any numerical metrics including run and task durations.

How it Works

Anomaly detection is based on the moving average of historical values of the metric. Its value is controlled by two parameters that are set during alert definition

  • lookback parameter - sets a number of runs that an algorithm takes into account
  • sensitivity parameter - define the range around moving average value that would be considered safe.

An alert that is based on anomaly detection needs historical data to run. According to a lookback parameter, an alert would not be fired till the number of historical runs is greater than the lookback parameter.

When an anomaly alert is triggered its description contains a range that is considered normal when an alert is fired. This range is changing from run to run as alert definition adapts itself to a new run's data