Functions

Tracking input/output, data lineage, and schema metadata from your pipelines using DBND.

By using DBND, you can track input/output, data lineage, and schema metadata from your pipelines.

Tracking Functions with Decorators

All you need to implement tracking is to annotate your function with a decorator.
Below is an example in a Python function, though decorators for Java and Scala functions are supported as well.

# module1.py
from dbnd import task

# define a function with a decorator

@task
def user_function(pandas_df: pd.DataFrame, counter: int, random: int):
    return "OK"

For certain objects passed to your functions such as Pandas DataFrames and Spark DataFrames, DBND automatically collects data set previews and schema info. This makes it easier to track data lineage and report on data quality issues.

Tracking Named Functions

Let us say we would like to track a function (or functions) from a module. Instead of decorating each function with @task, you can use the track_functions function.

Review the following example, where module1 contains f1, f2 and f3 functions:

# module1.py

def f1():
    pass

def f2():
  pass

def f3():
  pass

In module2, we have the following functions:

# module2.py

from dbnd import track_functions
from module1 import f1, f2, f3

track_functions(f1, f2, f3)

def f4():
  f1()
  f2()
  f3()

The track_functions function uses functions as arguments and automatically decorates them so that you can track any function without changing your existing function code or manually adding decorators.

Tracking Modules

For an easier and faster approach, you can use the track_module_functions function to track all functions inside a named module. So, module2.py from the above example would look like this:

# module2.py

from dbnd import track_module_functions
from module1 import f1, f2, f3
import module1

track_module_functions(module1)

def f4():
  f1()
  f2()
  f3()

To track all functions from multiple modules, there is also track_modules which gets modules as arguments and tracks all functions contained within those modules:

# module3.py

from dbnd import track_module_functions
import module1
import module2

track_modules(module1, module2)

def f5():
  module2.f4()

In this example, f1, f2, and f3 are going to be tracked although they are not used in this module.


What’s Next
Did this page help you?