Create Databricks Environment

Creates a Databricks Environment connected to an existsing Databricks cluster. See AWS or Azure Databricks documentation for more information.

Note: To avoid an accidental cluster startup, this node creates a dummy DB and Spark port if loaded in executed state from a stored workflow. Reset and execute the node to start the cluster and create a Spark execution context.

Cluster access control: KNIME uploads additional libraries to the cluster. This requires manage cluster-level permissions if your cluster is secured with access control. See the Databricks documentation on how to set up the permission.

Options

Settings

Spark version
The Spark version used by Databricks. If this is set incorrectly, creating the Spark context will fail.
Databricks URL
Full URL of the Databricks deployment, e.g. https://<account>.cloud.databricks.com on AWS or https://<region>.azuredatabricks.net on Azure.
Cluster ID
Unique identifier of a cluster in the databricks workspace. See AWS or Azure Databricks documentation for more informations.
Workspace ID
Workspace ID for Databricks on Azure, leave blank on AWS. See Azure Databricks documentation for more informations.
Authentication
Username and password or a personal access token can be used for authentication. Databricks strongly recommends tokens. See authentication in Databricks AWS or Azure documentation for more informations about personal access token.
  • Username & password: Authenticate with a username and password. Either enter a username and password, in which case the password will be persistently stored (in encrypted form) with the workflow. Or check Use credentials and a select a credentials flow variable to supply the username and password.
  • Token: Authenticate with the provided personal access token. If entered here, the token will be persistently stored (in encrypted form) with the workflow. Alternatively, if Use credentials is selected, the password of the selected credentials flow variable will be used as the token for authentication (username of the flow variable will be ignored).
Working directory
Specify the working directory of the resulting file system connection. The working directory must be specified as an absolute path. A working directory allows downstream nodes to access files/folders using relative paths, i.e. paths that do not have a leading slash. The default working directory is the root "/", under which all the document libraries are located.

Advanced

Create Spark context and enable Spark context port
If enabled, an execution context will be started on Databricks to run KNIME Spark jobs. If disabled, the Spark context port will be disabled. This might be useful to save resources in the driver process or required if the cluster runs with Table Access Control.
Set staging area for Spark jobs
If enabled you can specify a directory in the connected Databricks file system, that will be used to transfer temporary files between KNIME and the Spark context. If no directory is set, then a default directory will be chosen in /tmp.
Terminate cluster on context destroy
If selected, the cluster will be destroyed when the node will be reseted, the Destroy Spark Context node executed on the context, the workflow or KNIME is closed. This way, resources are released, but all data cached inside the cluster are lost, unless they have been saved to persistent storage such as DBFS.
Connection timeout
Timeout in seconds to establish a connection, or 0 for an infinite timeout.
Read timeout
Timeout in seconds to read data from an established connection, or 0 for an infinite timeout.
Job status polling interval
The frequency with which KNIME polls the status of a job in seconds.

DB Port: Connection settings

Database Dialect
Choose the registered database dialect here.
Database Driver

Choose the JDBC driver to connect to the database here. If you select "Use latest driver version available" upon execution the node will automatically use the driver with the latest (highest) driver version that is available for the current database type. This has the advantage that you do not need to touch the workflow after a driver update. However, the workflow might break in the rare case that the behavior of the driver e.g. type mapping changes with the newer version.

If this option is not enabled, you can select a specific version of the registered drivers via the drop-down list. Additional drivers downloaded from here can be registered via KNIME's preference page "KNIME -> Databases". For more details on how to register a new driver see the database documentation.

DB Port: JDBC Parameters

This tab allows you to define JDBC driver connection parameter. The value of a parameter can be a constant, variable, credential user, credential password or KNIME URL.
The UserAgentEntry parameter is added as default to all Databricks connections to track the usage of KNIME Analytics Platform as Databricks client. If you are not comfortable sharing this information with Databricks you can remove the parameter. However, if you want to promote KNIME as a client with Databricks leave the parameter as is.
For more information about the JDBC driver and the UserAgentEntry, refer to the installation and configuration guide which you can find in the docs directory of the driver package.

Parameter table with name, type and value column

DB Port: Advanced

This tab allows you to define KNIME framework properties such as connection handling, advanced SQL dialect settings or logging options. The available properties depend on the selected database type and driver.

Database type and driver specific properties

DB Port: Input Type Mapping

This tab allows you to define rules to map from database types to KNIME types.

Mapping by Name
Columns that match the given name (or regular expression) and database type will be mapped to the specified KNIME type.
Mapping by Type
Columns that match the given database type will be mapped to the specified KNIME type.

DB Port: Output Type Mapping

This tab allows you to define rules to map from KNIME types to database types.

Mapping by Name
Columns that match the given name (or regular expression) and KNIME type will be mapped to the specified database type.
Mapping by Type
Columns that match the given KNIME type will be mapped to the specified database type.

Input Ports

Icon
Databricks Workspace Connection, that can be connected to the Databricks Workspace Connector.

Output Ports

Icon
JDBC connection, that can be connected to the KNIME database nodes.
Icon
DBFS connection, that can be connected to the Spark nodes to read/write files.
Icon
Spark context, that can be connected to all Spark nodes.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.