GuidesAPI ReferenceDiscussions
GuidesBlogPlatform

JVM SDK Configuration

Databand JVM SDK utilizes same properties as Python SDK. However, not of all of them are supported and ways of configuration are sligthly different.

In general, JVM SDK is configured by passing environment variables to executable. In case of Spark, variables can be set up by utlizing spark.env properties.

Following configuration properties are supported in JVM SDK:

Variable

Default Value

Description

DBND__TRACKING

False

This property is mandatory.

This property explicitly enables tracking. Possible values: True/False. When not set or set to False, tracking won't be enabled. Should be explicitly set to True.

Note: when job is running inside Airflow, you can omit this property.

DBND__CORE__DATABAND_URL

http://localhost:8080

This property is mandatory.

Tracker URL.

DBND__CORE__DATABAND_ACCESS_TOKEN

Not set

This property is mandatory.

Tracker access token.

DBND__TRACKING__VERBOSE

False

When set to True, enables verbose logging which can help with debugging agent instrumentation.

DBND__TRACKING__LOG_VALUE_PREVIEW

False

When set to True, previews for Spark datasets will be calculated. This can hit performance and should be explicitly enabled.

DBND__LOG__PREVIEW_HEAD_BYTES

32768

Size of the task log head in bytes. When log size exceeds head+tail, then middle of the log will be truncated

DBND__LOG__PREVIEW_TAIL_BYTES

32768

Size of the task log tail in bytes. When log size exceeds head+tail, then middle of the log will be truncated.

DBND__SPARK__LISTENER_INJECT_ENABLED

False

If set to True, Databand will inject Spark Listener into Spark context and will report Spark execution metrics.

DBND__SPARK__QUERY_LISTENER_INJECT_ENABLED

False

If set to True, Databand will inject Spark Query Listener into Spark context and will report Spark Queries as Dataset Operations .

DBND__SPARK__IO_TRACKING_ENABLED

False

If set to True, Databand will instrument Spark and will report advanced I/O metrics.

DBND__RUN__JOB_NAME

Spark Application name or main method name or @Task annotation value if it was set

Allows to override job name.

DBND__RUN__NAME

Randomly generated string from predefined list.

Allows to override run name.

Following configuration properties are supported as a part of Airflow integration. These properties should be set for proper connection of JVM task run and parent Airflow task which triggered execution.

Variable

Description

AIRFLOW_CTX_DAG_ID

Airflow DAG id

AIRFLOW_CTX_TASK_ID

Airflow task id

AIRFLOW_CTX_EXECUTION_DATE

Dag execution date

AIRFLOW_CTX_TRY_NUMBER

Task try number

Following configuration properties are supported as a part of Azkaban integration. These properties should be set on Azkaban instance and can not be passed as a part of Spark job.

Variable

Description

DBND__AZKABAN__SYNC_PROJECTS

List of Azkaban projects to sync. If not specified, all projects will be synced.

DBND__AZKABAN__SYNC_FLOWS

List of Azkaban flows to sync. If not specified, all flows will be synced.

The minimum required variables which should be set to enable tracking are following: DBND__CORE__DATABAND_URL, DBND__CORE__DATABAND_ACCESS_TOKEN, DBND__TRACKING

Configuring Spark Jobs

Spark jobs can be configured both by passing environment variables or setting spark.env:

spark-submit \
    --conf "spark.env.DBND__TRACKING=True" \
    --conf "spark.env.DBND__CORE__DATABAND_URL=https://tracker.databand.ai" \
    --conf "spark.env.DBND__CORE__DATABAND_ACCESS_TOKEN=375ce800-6b37-4115-ae9a-037023879fa1"

Did this page help you?