Defaults for Engines and Nested Tasks

How to correctly define defaults for parameters of the task configuration in Databand.

In some cases, defining defaults for parameters in the configuration or in the constructor is not sufficient. For example:

  • Setting a different default for a config parameter (e.g. main_jar in [spark])
  • Setting a different default for all child tasks

task_config

Represents a dictionary of arbitrary configuration params that will be applied to "current" configuration during task creation and run

my_task(task_config={SparkConfig.num_of_executors:3})
$ dbnd run my_task --my_task.task_config="{'spark':{ 'num_of_executors' : 3}}"

To set a default value for a task

Use the defaults attribute as shown in the following examples:

class WordCountTask2(WordCountTask):
    defaults = {
        SparkConfig.main_jar: "jar2.jar",
        DataTask.task_env: DataTaskEnv.prod,
        WordCountTask.text: "/etc/hosts",
    }
class AllWordCounts2(AllWordCounts):
    defaults = {
        SparkConfig.main_jar: "jar2.jar",
        DataTask.task_env: DataTaskEnv.prod,
        WordCountTask.text: "/etc/hosts",
    }

In the example above, we have set the default values for WordCountTask and WordCountPySparkTask tasks.


Did this page help you?