GuidesAPI ReferenceDiscussions

Python Scripts

How to track standalone data workflows (Python or Spark scripts).

In addition to tracking scheduled pipelines, you can track standalone data workflows, such as Python or Spark scripts. It is particularly useful during research or debugging when your code is not yet running on a production schedule.

To enable tracking, implement the context manager in your script(s).

if __name__ == "__main__":
   with dbnd_tracking(name="<pipeline_name>"):
      <user code here>

dbnd_tracking() instantiates the Databand context during execution. Any processes executed inside the dbnd_tracking() context will be displayed in Databand for monitoring. Call your core control flow logic within the dbnd_tracking() context.

dbnd_tracking() accepts a name parameter that will be used to identify the pipeline in the Pipelines screen of your Databand application.

Make sure you DBND is connected to your Databand service (see Connecting DBND to Databand (Access Tokens) )

The following example shows Databand decorators and logging APIs within a Python script. When this code runs, metadata and function input/output will be reported to Databand.

You can implicitly enable tracking, so the first @task will start tracking of your script by having the environment variable DBND__TRACKING set to True

export DBND__TRACKING=True
import logging
import os

from typing import Tuple

from dbnd import task, dbnd_tracking

def say_hello(text="sdfsd"):
    greeting = "Hey, {}!".format(text)
    return greeting

def join_greeting(base_greeting, extra_name):
    return "{} and {}".format(base_greeting, extra_name)

def say_hello_pipe(users_num=3):
    v = say_hello("some_user")
    for i in range(users_num):
        v = join_greeting(v, "user {}".format(i))
    return v

def say_hello_to_everybody(users_num=3) -> Tuple[str, str]:
    v = ""
    for i in range(users_num):
        v = say_hello("user {}".format(i))

    hello_pipe = say_hello_pipe()
    return v, hello_pipe

if __name__ == "__main__":
   with dbnd_tracking(name="<pipeline_name>", 
                             "core": {
                                       "databand_url": "<databand_url>",

Did this page help you?