Runtime Environment Configuration
Post-installation configuration steps that you need to perform before you can start using DBND.
Environments Overview
DBND provides out-of-the-box environments that you need to configure before you can start running your pipelines.
Out-of-the-box, DBND supports the following environment types:
- Persistency - Local file system, AWS S3, Google Storage, Azure Blob Store, and HDFS
- Spark - Local Spark, Amazon EMR, Google DataProc, Databricks, Qubole, and Livy
- Docker Engine - Local Docker, AWS Batch, Kubernetes.
You can also create custom environments and engines.
Main Configuration
• The environments parameter in the core
section specifies a list of environments enabled and available for the project.
Possible values include local, gcp, aws, azure
, and local
- set by default.
[core]
environments = ['local', 'gcp']
The following describe the environment types supported in DBND:
Local
In the default local
environment setup, the configuration works as follows:
A persistent metadata store for task inputs/outputs, metrics, and runtime execution information is set to the local file system under $DBND_HOME/data
.
Python tasks run as processes on the local machine.
Spark tasks run locally by using the spark_submit
command if you have local Spark in place.
Docker tasks run on your local Docker container.
Google Cloud Platform (GCP), out-of-the-box
The Spark engine is preset for Google DataProc. To set up a GCP environment, you will need to provide a GS bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication info.
See Setting up GCP Environment..
Amazon Web Service (AWS), out-of-the-box
The Spark engine is preset for Amazon EMR. To set an AWS environment, you need to provide an S3 bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication information.
See Setting Up an AWS Environment.
Microsoft Azure (Azure), out-of-the-box
The Spark engine is preset for Databricks. To set up an Azure environment, you need to provide an Azure Blob Store bucket as a root path for the metadata store and the Airflow connection ID with cloud authentication information.
See Setting Up an Azure Environment.
Custom environment
You can define a custom environment from scratch or inherit settings from an existing environment. You can create custom environments for managing dev/staging/production lifecycles, which normally involves switching data and execution locations.
See Setting up a Custom Environment (Extending Configurations).
To use out-of-the-box environments, set up one or more of the environments described in the referenced topics.
Updated 3 months ago