GuidesAPI ReferenceDiscussions

Getting Started with DBND Tracking

Use DBND to track a wide range of pipeline metadata, without ever changing the way you run your pipelines.

DBND tracking allows you to track pipelines and associated data, both holistically and atomically. With seamless integrations, it is easy to track and create alerts for metadata, pipeline performance, query resource usage, and custom metrics.

This Python Quickstart will walk you through some basic Databand capabilities with Python script as an example.

Pipeline metadata describes a comprehensive range of metadata used for tracking and monitoring, uniquely relevant to active data processes. In other words, pipeline metadata includes any system, application, graph process, or data level information that’s broadly relevant to the normal functioning of your data pipelines. This includes:

  • Job Runtime Information
  • Application Logs
  • Task Function Statuses
  • Data Quality Metrics
  • Input/Output
  • Data Lineage

DBND tracks this information and contextualizes it in your pipeline and task definitions, so you can instantly see for any given pipeline where issues are coming from, and from a system-wide perspective which pipelines are the source of problems.

Methods of Use

Once tracking is enabled, you can use functions like log_dataset_op to track Pandas or Spark dataframes and log_metric to track custom metrics or key performance indicators in your various tasks. With these tracking methods, histograms, statistics, previews, and more can be observed directly in your CLI or if you are using Apache Airflow, in your airflow logs.

Tracking plugins extend the native tracking features of orchestrators such as Apache Airflow and Azkaban, data lake providers such as Snowflake and Redshift, and more. For a full list of plugins, visit Installing DBND Plugins.

While these logging methods provide a more atomic approach to tracking pipelines, metadata, and data, you can also integrate tracking holistically to minimize code overhead.

Current Integrations

The current list of integrations available for tracking with Databand includes:

Supported Languages

Tracking Databases

Tracking Other Orchestrators

Tracking Other Trackers

What’s Next