GuidesAPI ReferenceDiscussions
GuidesBlogPlatform

Tracking Python

Using DBND to track metadata from Python tasks.

If you are running python, Databand can provide visibility into your data operations, code errors, metrics, and logging information, in the context of your broader pipelines or orchestration system.

These are the available tracking options for Python:

Tracking Context

To enable tracking, call your code within the dbnd_tracking() context.

from dbnd import dbnd_tracking

if __name__ == "__main__":
    with dbnd_tracking():
        pass

Any Python code executed inside the dbnd_tracking() context will be tracked by Databand.

dbnd_tracking(name="<pipeline_name>") accepts a name parameter that will be used to identify the pipeline in the Pipelines screen of your Databand application.

If you are using Tracking Airflow DAGs you don't need to enable tracking for python code executed as part of Airflow Operator. This is done automatically.

Make sure you DBND is connected to your Databand service (see Connecting DBND to Databand (Access Tokens) )

Tracking Functions with Decorators

For a better visibility, you can also annotate your function with a decorator.
Below is an example in a Python function, though decorators for Java and Scala functions are supported as well.

from dbnd import task
import pandas as pd

# define a function with a decorator

@task
def user_function(pandas_df: pd.DataFrame, counter: int, random: int):
    return "OK"

For certain objects passed to your functions such as Pandas DataFrames and Spark DataFrames, DBND automatically collects data set previews and schema info. This makes it easier to track data lineage and report on data quality issues.

You can implicitly enable tracking, so the first @task will start tracking your script by having the environment variable DBND__TRACKING set to True. This will enable tracking with or without dbnd_tracking() context applied.

export DBND__TRACKING=True

Tracking Specific Functions without changing module code

Let us say we would like to track a function (or functions) from a module. Instead of decorating each function with @task, you can use the track_functions function.

Review the following example, where module1 contains f1 and f2 functions:

from module1 import f1,f2

from dbnd import track_functions
track_functions(f1, f2)

The track_functions function uses functions as arguments and automatically decorates them so that you can track any function without changing your existing function code or manually adding decorators.

Tracking Modules

For an easier and faster approach, you can use the track_module_functions function to track all functions inside a named module. So, module2.py from the above example would look like this:

import module1
from dbnd import track_module_functions

track_module_functions(module1)

To track all functions from multiple modules, there is also track_modules which gets modules as arguments and tracks all functions contained within those modules:

from dbnd import track_modules

import module1
import module2

track_modules(module1, module2)

Implicit Configuration of Tracking Context

You can add SDK Configuration parameters to the tracking context by adding configuration via conf parameter of dbnd_tracking function.

from dbnd import dbnd_tracking
with dbnd_tracking(conf={
                             "core": {
                                       "databand_url": "<databand_url>",
                                       "databand_access_token":"<access_token>",
                                     }
                            }
                      ):
      pass

Did this page help you?