DBND tracking allows you to track pipelines and associated data, both holistically and atomically. With seamless integrations, it is easy to track and create alerts for metadata, pipeline performance, query resource usage, and custom metrics.
After Installing DBND, enable metadata tracking by exporting the environmental variable:
Once tracking is enabled, you can use functions like
log_dataframe to track Pandas or Spark dataframes and
log_metric to track custom metrics or key performance indicators in your various tasks. With these tracking methods, histograms, statistics, previews, and more can be observed directly in your CLI or if you are using Apache Airflow, in your
Tracking plugins extend the native tracking features of orchestrators such as Airflow and Azkaban, data lake providers such as Snowflake and Redshift, and more. For a full list of plugins, visit Installing DBND Plugins.
While these logging methods provide a more atomic approach to tracking pipelines, metadata, and data, you can also integrate tracking holistically to minimize code overhead.
DBND also provides methods of tracking tasks and functions in detail using the
track_module_functions methods. By using these methods, you can track the inputs and outputs of your functions, modules, and scripts.
For example, in a pipeline with three tasks: Extract, Transform, and Load, you can use
track_functions to select any specific function(s) you wish to track.
Pipeline metadata describes a comprehensive range of metadata used for tracking and monitoring, uniquely relevant to active data processes. In other words, pipeline metadata includes any system, application, graph process, or data level information that’s broadly relevant to the normal functioning of your data pipelines. This includes:
- Job Runtime Information
- Application Logs
- Task Function Statuses
- Performance Metrics
- Data Quality Metrics
- Intermediate Results
- Data Lineage
- System Resources
DBND tracks this information and contextualizes it in your pipeline and task definitions, so you can instantly see for any given pipeline where issues are coming from, and from a system-wide perspective which pipelines are the source of problems.
Updated 27 days ago