Kubernetes Engine Configuration

How to configure DBND to use your Kubernetes Cluster.

Configuring Kubernetes Engine Guide

To direct DBND to interact with your cluster, you need to update the databand-system.cfg configuration file with the Kubernetes Cluster connection details.

Step 1. In the environments configuration, add a new environment for the Kubernetes cluster. For example, kubernetes_cluster_env - you can use an existing one, but in this example, let's assume its name is 'kubernetes_cluster_env':

 environments = ['local', 'local_minikube', 'kubernetes_cluster_env']

Step 2. In the configuration file, add the environments' specification for your new or existing environment (kubernetes_cluster_env):

    _from = local
    remote_engine = your_kubernetes_engine
Parameter `_from` means "from where to draw previous definitions". In this example, the `[local]` section is used (see [Extending Configurations](doc:custom-environment)).
The `remote_engine` setting defines what engine is going to run your remote tasks (submitted tasks).

Step 3. In the configuration file, add the engine configuration.
The following example describes the engine configuration:

    _type = kubernetes
    container_repository = databand_examples
    container_tag =
    docker_build = True
    docker_build_push = True
    cluster_context = databand_context
    namespace = databand
    service_account_name = databand
    debug = False
    secrets = [
       { "type":"env", "target": "AIRFLOW__CORE__SQL_ALCHEMY_CONN", "secret" : "databand-secret", "key": "airflow_sql_alchemy_conn"},
       { "type":"env", "target": "AIRFLOW__CORE__FERNET_KEY", "secret" : "databand-secret", "key": "airflow_fernet_key"},
       { "type":"env", "target": "DBND__CORE__DATABAND_URL", "secret" : "databand-secret", "key": "databand_url"}
    pod_error_cfg_source_dict = {
                                "255": {"retry_count": 3, "retry_delay": "3m"},
                                "err_image_pull": {"retry_count": 0, "retry_delay": "3m"},
                                "err_pod_deleted": {"retry_count": 3, "retry_delay": "3s"},

Databand Kubernetes Config Reference

Docker Image configuration

  • container_repository - Where is the Docker image repository to pull the pod images from? If you are running user code, this is where you need to supply your repository and tag settings.
  • container_tag - If defined, Docker will not be built and the specified tag will be used.
  • image_pull_secrets - The secret with the connection information for the container_repository.
  • docker_build - Should the Kubernetes executor build the Docker image on the fly? Useful if you want a different image every time.
  • docker_build_push - Should the built Docker image be pushed to the repository? Useful for specific cases.

Cluster related variables

  • in_cluster - Defines what Kubernetes configuration is used for the kube client. Use false to enforce using local credentials, use true to enforce the in_cluster mode. Default: None (Databand will automatically decide what mode to use).
  • cluster_context - The Kubernetes context; you can check which context you are on by using kubectl config get-contexts.
  • namespace - The namespace in which Databand is installed inside the cluster (databand in this case).
  • service_account_name - You need permissions to create pods for tasks, namely - you need to have a service_account with the correct permissions.

Pod Scheduling Configuration

  • labels - Set a list of pods' labels (see Labels)
  • node_selectors and affinity - Assign nodeSelector or affinity to the pods (see Assigning Pods to Nodes)
  • annotations - Assign annotations to the pod (see Annotations)
  • tolerations - Assign tolerations to the pod (see Taints and Tolerations)
  • requests and limits - Setting the requests and limits for the pod can be achieved by setting those. You can provide a standard Kubernetes Dict, however, you can also use explicit keys like request_memory , request_cpu, limit_memory or limit_cpu
    For more information see Manage Container Resources and make sure you are aware of Quality of Service for Pods

Pod Runtime Configuration

  • secrets - Assing secrets to the pod.
  • env_vars - Assign environment variables to the pod.

Pod Error Handling

  • pod_error_cfg_source_dict (optional) - Allows flexibility of sending retry on pods that have failed with specific exit codes. You can provide "PROCESS EXIT CODE" as a key (for example, 137) or Kubernetes error string.
pod_error_cfg_source_dict = {
                                "255": {"retry_count": 3, "retry_delay": "3m"},
                                "err_image_pull": {"retry_count": 0, "retry_delay": "3m"},
                                "err_pod_deleted": {"retry_count": 3, "retry_delay": "3s"},

Supported Kubernetes errors:

  • "err_image_pull" happens on Image pull error.
  • "err_config_error" happens on Pod configuration error (you should not retry on this one).
  • "err_pod_deleted" happens on Pod deletion (very unique case of Kubernetes autoscaling).
  • "err_pod_evicted" happens on Pod relocation to a different Node.

Databand System

  • debug - When true, displays all pod requests sent to Kubernetes and more useful debugging data.
  • keep_finished_pods - do not delete finished pods (default=False)
  • keep_failed_pods - do not delete failed pods (default=False). You can use it if you need to debug the system.
  • _type - Implies that this is a Kubernetes Engine Config (see Extending Configurations). You can use it to create your own version of the Kubernetes Engine config.


Custom Configuration per Task

You can adjust configuration settings of a specific task:

@task(task_config=dict(kubernetes=dict( limits={"": 1})))
def prepare_data_gpu(data):

Using this configuration, you'll add an extra limit to the pod definition of this specific task.

You can adjust requested resources and set the limits of memory and CPU:

        "kubernetes": {"limit_memory": "128Mi",
                       "request_memory": "64Mi",
                       "limit_cpu": "500m",
                       "request_cpu": "250m"}
def prepare_data_gpu(data):

Providing Access to AWS (using environment variables)

If you want to provide access to AWS Services explicitly, you can do it by using secrets:

secrets = [   { "type":"env", "target": "AWS_ACCESS_KEY_ID", "secret" : "aws-secrets" , "key" :"aws_access_key_id"},
          { "type":"env", "target": "AWS_SECRET_ACCESS_KEY", "secret" : "aws-secrets" , "key" :"aws_secret_access_key"}]        

Providing Access to Google (using file)

If you want to provide access to GCP Services explicitly, you can do it by using secrets:

secrets = [ { "type":"volume", "target": "/var/secrets/google", "secret" : "gcp-secrets" }]]        

