Python Scripts

How track standalone data workflows (Python or Spark scripts).

In addition to tracking scheduled pipelines, you can track standalone data workflows, such as Python or Spark scripts. It is particularly useful during research or debugging when your code is not yet running on a production schedule.

To enable tracking, implement the context manager in your script(s).

if __name__ == "__main__":
   with dbnd_tracking():
      <user code here>

The Databand URL should be configured either by setting the DBND__CORE__DATABAND_URL environment variable or in the project.cfg configuration file.

In addition to the Databand URL, an access token must be configured to report data to a Databand cloud environment; either by setting the DBND__CORE__DATABAND_ACCESS_TOKEN environment variable or in the project.cfg configuration file.


Databand Configuration Files

For more details on configuration files, visit Setting Up Configuration with Files.

The following example shows Databand decorators and logging APIs within a Python script. When this code runs, metadata and function input/output will be reported to Databand.

import logging
import os

from typing import Tuple

from dbnd import task, dbnd_tracking

def say_hello(text="sdfsd"):
    greeting = "Hey, {}!".format(text)
    return greeting

def join_greeting(base_greeting, extra_name):
    return "{} and {}".format(base_greeting, extra_name)

def say_hello_pipe(users_num=3):
    v = say_hello("some_user")
    for i in range(users_num):
        v = join_greeting(v, "user {}".format(i))
    return v

def say_hello_to_everybody(users_num=3) -> Tuple[str, str]:
    v = ""
    for i in range(users_num):
        v = say_hello("user {}".format(i))

    hello_pipe = say_hello_pipe()
    return v, hello_pipe

if __name__ == "__main__":
   os.environ["DBND__CORE__DATABAND_URL"] = "<url>"
   os.environ["DBND__CORE__DATABAND_ACCESS_TOKEN"] = "<access_token>"
   with dbnd_tracking():

Did this page help you?