System Configuration

Post-installation configuration steps that you need to perform before you can start using DBND.

Environments Overview

DBND provides out-of-the-box environments that you need to configure before you can start running your pipelines.
Out-of-the-box, DBND supports the following environment types:

  • Persistency - Local file system, AWS S3, Google Storage, Azure Blob Store, and HDFS
  • Spark - Local Spark, Amazon EMR, Google DataProc, Databricks, Qubole, and Livy
  • Docker Engine - Local Docker, AWS Batch, Kubernetes.
    You can also create custom environments and engines.

Main Configuration


• The environments parameter in the core section specifies a list of environments enabled and available for the project.
Possible values include local, gcp, aws, azure, and local - set by default.

[core]
environments = ['local', 'gcp']

The following table describes the environment types supported in DBND:

Environment Type

Description

Reference

Local

In the default local environment setup, the configuration works as follows:

  • A persistent metadata store for task inputs/outputs, metrics, and runtime execution information is set to the local file system under $DBND_HOME/data.
  • Python tasks run as processes on the local machine.
  • Spark tasks run locally by using the spark_submit command if you have local Spark in place.
  • Docker tasks run on your local Docker container.

Google Cloud Platform (GCP),
out-of-the-box

The Spark engine is preset for Google DataProc.
To set up a GCP environment, you will need to provide a GS bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication info.

Setting up GCP Environment

Amazon Web Services (AWS),
out-of-the-box

The Spark engine is preset for Amazon EMR.
To set an AWS environment, you need to provide an S3 bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication information.

Setting Up an AWS Environment

Microsoft Azure (Azure),
out-of-the-box

The Spark engine is preset for Databricks.
To set up an Azure environment, you need to provide an Azure Blob Store bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication information.

Setting Up an Azure Environment

Custom environment

You can define a custom environment from scratch or inherit settings from an existing environment.
You can create custom environments for managing dev/staging/production lifecycles, which normally involves switching data and execution locations.

Setting up a Custom Environment (Extending Configurations)

To use out-of-the-box environments, set up one or more of the environments described in the referenced topics.


What’s Next
Did this page help you?