This node is currently not available in KNIME v5.5 — instead we’re showing this page for KNIME v4.2. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

Create Databricks Environment

Creates a Databricks Environment connected to an existsing Databricks cluster. See AWS or Azure Databricks documentation for more information.

Note: To avoid an accidental cluster startup, this node creates a dummy DB and Spark port if loaded in executed state from a stored workflow. Reset and execute the node to start the cluster and create a Spark execution context.

Cluster access control: KNIME uploads additional libraries to the cluster. This requires manage cluster-level permissions if your cluster is secured with access control. See the Databricks documentation on how to set up the permission.

Options

General

Spark version: The Spark version used by Databricks. If this is set incorrectly, creating the Spark context will fail.
Databricks URL: Full URL of the Databricks deployment, e.g. https://<account>.cloud.databricks.com on AWS or https://<region>.azuredatabricks.net on Azure.
Cluster ID: Unique identifier of a cluster in the databricks workspace. See AWS or Azure Databricks documentation for more informations.
Workspace ID: Workspace ID for Databricks on Azure, leave blank on AWS. See Azure Databricks documentation for more informations.
Authentication: Workflow credentials, username and password or tokens can be used for authentication. Databricks strongly recommends tokens. See authentication in Databricks AWS or Azure documentation for more informations about personal access token.
To use tokens in workflow credentials, use token as username in the credentials and the token as password.

Advanced

Create Spark context: If enabled, an execution context will be started on Databricks to run KNIME Spark jobs. If disabled, the Spark context port will be disabled. This might be useful to save resources in the driver process or required if the cluster runs with Table Access Control.
Set staging area for Spark jobs: If enabled you can specify a directory in the connected Databricks file system, that will be used to transfer temporary files between KNIME and the Spark context. If no directory is set, then a default directory will be chosen in /tmp.
Terminate cluster on context destroy: If selected, the cluster will be destroyed when the node will be reseted, the Destroy Spark Context node executed on the context, the workflow or KNIME is closed. This way, resources are released, but all data cached inside the cluster are lost, unless they have been saved to persistent storage such as DBFS.
Databricks connection and receive timeout: Timeouts for the REST client in seconds.
Job status polling interval: The frequency with which KNIME polls the status of a job in seconds.

DB Port: Connection settings

Database Dialect: Choose the registered database dialect here.
Driver Name: Choose the registered database driver here. The node includes the Apache Hive driver. Proprietary drivers are also supported, but need to be downloaded and registered in the KNIME preferences under "KNIME -> Databases" with Database type Databricks.
The node uses the proprietary driver as default if registered and the Apache Hive driver otherwise.

DB Port: JDBC Parameters

This tab allows you to define JDBC driver connection parameter. The value of a parameter can be a constant, variable, credential user, credential password or KNIME URL.

DB Port: Advanced

This tab allows you to define KNIME framework properties such as connection handling, advanced SQL dialect settings or logging options. The available properties depend on the selected database type and driver.

DB Port: Input Type Mapping

This tab allows you to define rules to map from database types to KNIME types.

Mapping by Name: Columns that match the given name (or regular expression) and database type will be mapped to the specified KNIME type.
Mapping by Type: Columns that match the given database type will be mapped to the specified KNIME type.

DB Port: Output Type Mapping

This tab allows you to define rules to map from KNIME types to database types.

Mapping by Name: Columns that match the given name (or regular expression) and KNIME type will be mapped to the specified database type.
Mapping by Type: Columns that match the given KNIME type will be mapped to the specified database type.

Input Ports

This node has no input ports

Output Ports

: JDBC connection, that can be connected to the KNIME database nodes.
: DBFS connection, that can be connected to the Spark nodes to read/write files.
: Spark context, that can be connected to all Spark nodes.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

06_Connecting_to_DatabricksKNIME Hub

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Extension for Apache Spark from the below update site following our NodePit Product and Node Installation Guide:

v4.2

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 4.2.0.v202008241922

On NodePit since: 2020-07-15

Last update: 2025-07-23

KNIME versions: From v4.1 to v4.2

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!