Spark on Databricks
Configurations for using Databricks with Databand.
Configuration for submission (dbnd driver) machine
- Add
[databricks]
to your databand configuration - Set
cluster_id
to databricks cluster id value - Set
cloud_type = aws
orcloud_type = azure
- Configure your environment to use Databricks as spark_engine.
Example
[aws_databricks]
_type = aws
spark_engine = databricks
.... additional configurations related to your environment
- Perform the following steps:
airflow connections --delete --conn_id databricks_default
airflow connections --add \
--conn_id databricks_default \
--conn_type databricks \
--conn_host <YOUR DATABRICKS CLUSTER URI> \
--conn_extra "{\"token\": \"<YOUR ACCESS TOCKEN>\", \"host\": \" <YOUR DATABRICKS CLUSTER URI>\"}"
Getting Databricks Cluster Id
- API: https://<CLUSTER_IP>/2.0/clusters/list
- UI: Under clusters -> advanced options-> tags -> ClusterId
Configuring Databricks Cluster
You can configure DBND to spin up a new cluster for every job or use an existing cluster (default behavior). You will need to install and configure the DBND package on the cluster. See Installing DBND on Databricks Spark Cluster for more information
Updated 3 months ago
Did this page help you?