Every organization's data is different. You are free to define custom metrics about your data sets through Databand's API. This of course can be used in addition to, or instead of automatically extracted metadata. Examples of user-generated metrics include specialized outlier checks or data completeness scores.
Databand helps users to collect metrics from datasets and pipelines. Collected metrics can be analyzed in a UI or used in alert definitions.
You can view the following metrics in the Databand UI:
- User - User-defined metrics specified in your task definition code.
- System - Metrics coming from your pipeline executions, such as duration and resource utilization, recorded automatically
- Histograms - Data profiling metrics recorded automatically using databand SDK or Deequ.
- Spark - Spark System metrics. For more information see Tracking Spark Applications
You can access metrics in the dashboard, metric page, or run details screens.
Metric Page is a place where metrics from any pipeline or dataset can be searched and viewed. For relevant metrics, an alert can be defined directly from this screen.
Metrics are grouped by pipelines and the value from the last run is shown. For numerical metrics, a trend over time would be displayed automatically.
All system metrics except Run Duration or Run State are hidden by default at Metric Page. If you want to change it click Customize Table and toggle "Show all metrics in the table.
Dashboard is used to show the bussiness level status of a system, project, or source. Some metrics i.e. total ingested volume are indicative of this status and can be added to the dashboard to be viewed alongside pipeline stats.
Anomalies in metric values can be highlighted on a dashboard. For each metric, you can also show aggregation i.e. sum, count, max or min over defined time period.
A user can select up to 9 metrics to add to the dashboard. All metric values from a time span defined by the dashboard filter would be shown.
While reviewing or debugging a run, an individual value of the metric can be observed at the Metric Tab of the Run Details Page. This tab show metrics value for a specified run.
From this view, a user can create an alert on a metric, use a trend view to review and compare metrics over time, or favorite a metric to add it to the run table.
After a metric is tracked by Databand, you can compare the metric across all run histories in a context of the relevant pipeline. Databand will show all favorited metrics in the context of the pipeline and run, making it easier for users to correlate metadata and trace the root cause of issues. For example, you can quickly identify how data volume correlates with run duration as shown in a screen below.
Updated about 1 year ago