Add Python Project to spark submission

How to add Python Packages to Spark submission automatically.

Automatically add Python Packages to Spark Submission

You can upload external packages to Spark by using spark.include_user_project=True Spark Configuration
DBND supports configuring a package directory (which contains its setup.py) and an optional third-party requirements text file.

Example:

[ProjectWheelFile]
package_dir=${DBND_HOME}/dbnd-core/plugins/dbnd-test-scenarios/scenarios/dbnd-test-package
requirements_file=${DBND_HOME}/dbnd-core/plugins/dbnd-test-scenarios/scenarios/dbnd-test-package/requirements.txt

️ Important

The packages will be built and uploaded every time a Spark task needs to be rerun.

Thus, for example, if you are running a pipeline for the second time, and all of its tasks are reused, then nothing will happen. But if the signature changes in at least one of the Spark tasks, it will rerun, and the packages will be rebuilt and reuploaded.


What’s Next
Did this page help you?