Databricks Spark Connector

Creates a connection to an existing Databricks cluster that can be used to run Spark nodes. See AWS or Azure Databricks documentation for more information.

Note: To avoid an accidental cluster startup, this node creates a dummy Spark port if loaded in executed state from a stored workflow. Reset and execute the node to start the cluster and create a Spark execution context.

Cluster access control: KNIME uploads additional libraries to the cluster. This requires manage cluster-level permissions if your cluster is secured with access control. See the Databricks documentation on how to set up the permission.

Options

Cluster ID
Unique identifier of a cluster in the databricks workspace.
Spark Version
Version of Spark running on the cluster.
Staging area for Spark jobs
Specify a directory in the Unity File System, that will be used to transfer temporary files between KNIME and the Spark context.
Terminate cluster on context destroy
If selected, the cluster will be destroyed when the node will be reset, the Destroy Spark Context node executed on the context, the workflow or KNIME is closed. This way, resources are released, but all data cached inside the cluster are lost, unless they have been saved to persistent storage such as DBFS.
Job status polling interval (seconds)
The frequency with which KNIME polls the status of a job in seconds.

Input Ports

Icon
Databricks Workspace connection

Output Ports

Icon
Spark context, that can be connected to all Spark nodes.
Icon
Databricks Unity File System connection, that can be connected to the Spark nodes to read/write files.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.