Reusing Spark Context in the Same Process

Debugging the inline Spark tasks locally and configuring Databand to run in such context.

You can debug inline Spark tasks locally by providing an existing context to it and configuring DBND to run in this context.

Enable Inplace Spark Context

Using configuration file:

enable_spark_context_inplace = True

or use context with config({SparkLocalEngineConfig.enable_spark_context_inplace: True})
Running Spark Job with Debugger

from dbnd import output, parameter
from dbnd_spark import spark_task, spark
from pyspark.sql import DataFrame

def word_count_inline(data=parameter.csv[DataFrame]):
    # spark business logic goes here
    # set a breakpoint here with a debugger of your choice

# invoke spark task  this way
if __name__ == "__main__":
    # create spark context and run spark task inside this context
    with spark.SparkSession.builder.getOrCreate() as sc:

Testing Spark Job in the same process
This technique can help you debug your script as well as improve test runtime duration

from dbnd import config
from dbnd_spark.local.local_spark_config import SparkLocalEngineConfig
from dbnd.testing.helpers_pytest import assert_run_task

def test_spark_inline_same_context(self):
    from pyspark.sql import SparkSession
    from dbnd_examples.orchestration.dbnd_spark.word_count import word_count_inline

    with SparkSession.builder.getOrCreate():
        with config({SparkLocalEngineConfig.enable_spark_context_inplace: True}):
            task_instance = word_count_inline.t(text=__file__)

You can wrap SparkSession creation using pytest.fixtures.

Did this page help you?