Icon

03_​DatabricksExample

Working with Databricks

This workflow demonstrates the usage of the Create Databricks Environment node which allows you to connect to a Databricks Cluster from within KNIME Analystics Platform.

The node provides three output ports that allow you to utilize the existing DB nodes to interact wtih the Databricks DB, the file handling nodes to work with the Databricks File System, and the Spark nodes to visually assemble Spark analytics flows. All of these nodes allow you to push down the data processing into the Databricks cluster.



Point the Create DatabricksEnvironment node to yourDatabricks environment onAmazon or Azure. Use the file handling nodes to work with the Databricks File System. Use the DB nodes to work with the Databricks DB Use the Spark nodes to work with the Databricks DB. This workflow demonstrates the usage of the Create Databricks Environment node which allows you to connect to a Databricks Clusterfrom within KNIME Analystics Platform.For more information see the workflow metadata. Find it here: View -> Description Execute last to shut down the Spark context and optionally also the cluster if enabled in the Create Databricks Environment nodeParquet does not support spacesUpload data into Databricks DBSave Spark Results in DBFSRead datafrom DBFSinto SparkIf the cluster is terminatedthe execution can take someminutes until the cluster isup and runningUse this node if yourDatabricks cluster runs with table access control (table ACLs).For details see the node description. Data Generator Table to Spark Spark DecisionTree Learner Spark Predictor(Classification) Spark Scorer Destroy SparkContext Spark Column Rename Data Generator DB Table Creator DB Loader List Files/Folders DB Reader Spark to Parquet Merge Variables Parquet to Spark Create DatabricksEnvironment Create DatabricksEnvironment Point the Create DatabricksEnvironment node to yourDatabricks environment onAmazon or Azure. Use the file handling nodes to work with the Databricks File System. Use the DB nodes to work with the Databricks DB Use the Spark nodes to work with the Databricks DB. This workflow demonstrates the usage of the Create Databricks Environment node which allows you to connect to a Databricks Clusterfrom within KNIME Analystics Platform.For more information see the workflow metadata. Find it here: View -> Description Execute last to shut down the Spark context and optionally also the cluster if enabled in the Create Databricks Environment nodeParquet does not support spacesUpload data into Databricks DBSave Spark Results in DBFSRead datafrom DBFSinto SparkIf the cluster is terminatedthe execution can take someminutes until the cluster isup and runningUse this node if yourDatabricks cluster runs with table access control (table ACLs).For details see the node description. Data Generator Table to Spark Spark DecisionTree Learner Spark Predictor(Classification) Spark Scorer Destroy SparkContext Spark Column Rename Data Generator DB Table Creator DB Loader List Files/Folders DB Reader Spark to Parquet Merge Variables Parquet to Spark Create DatabricksEnvironment Create DatabricksEnvironment

Nodes

Extensions

Links