GuidesAPI ReferenceDiscussions

Reusing Spark Context in the Same Process

Debugging the inline Spark tasks locally and configuring Databand to run in such context.

You can debug inline Spark tasks locally by providing an existing context to it and configuring DBND to run in this context.

Enable Inplace Spark Context

Using configuration file:

enable_spark_context_inplace = True

or use context with config({SparkLocalEngineConfig.enable_spark_context_inplace: True})
Running Spark Job with Debugger

def word_count_inline(data=parameter.csv[spark.DataFrame]):
    # spark business logic goes here
    # set a breakpoint here with a debuger of your choice

# invoke spark task  this way
if __name__ == "__main__":
   # create spark context and run spark task inside this context
    with spark.SparkSession.builder.getOrCreate() as sc:

Testing Spark Job in the same process
This technique can help you debug your script as well as improve test runtime duration

    def test_spark_inline_same_context(self):
        from pyspark.sql import SparkSession
        from dbnd_examples.orchestration.dbnd_spark.word_count import word_count_inline

        with SparkSession.builder.getOrCreate() as sc:
            with config({SparkLocalEngineConfig.enable_spark_context_inplace: True}):
                task_instance = word_count_inline.t(text=__file__)

You can wrap SparkSession creation using pytest.fixtures.

Did this page help you?