IconSpark DataFrame Java Snippet0 ×

KNIME Extension for Apache Spark core infrastructure version 2.3.0.v201807052104 by KNIME AG, Zurich, Switzerland

This node allows you to execute arbitrary java code to manipulate or create Spark DataFrames. Simply enter the java code in the text area.

Note, that this node also supports flow variables as input to your Spark job. To use a flow variable simply double click on the variable in the "Flow Variable List".

It is also possible to use external java libraries. In order to include such external jar or zip files, add their location in the "Additional Libraries" tab using the control buttons. For details see the "Additional Libraries" tab description below.
The used libraries need to be present on your cluster and added to the class path of your Spark job server. They are not automatically uploaded!

You can define reusable templates with the "Create templates..." button. Templates are stored in the users workspace by default and can be accessed via the "Templates" tab. For details see the "Templates" tab description below.

Options

Java Snippet

Flow Variable List
The list contains the flow variables that are currently available at the node input. Double clicking any of the entries will insert the respective identifier at the current cursor position (replacing the selection, if any).
Snippet text area

Enter your java code here.

The SparkSession can be accessed via the method input parameter spark. The input Dataset<Row> can be accessed via the method input parameter dataFrame1 and dataFrame2 whereas dataFrame2 is null if the input port is not connected.

Flow variables:
You can access input flow variables by defining them in the Input table. To define a flow variable simply double click on the variable in the "Flow Variable list".

You can hit ctrl+space to get an auto completion box with all available classes, methods and fields. When you select a class and hit enter a import statement will be generated if missing.

Note, that the snippet allows to define custom global variables and custom imports. To view the hidden editor parts simply click on the plus symbols in the editor.

Input
Define system input fields for the snippet text area. Every field will be populated with the data of the defined input during execution.

Additional Libraries

Add File(s)
Allows you to include local jar files.
Add KNIME URL...
Allows you to add workflow relative jar files.

Templates

Category
Groups templates into different categories.
Apply
Overwrites the current node settings with the template settings.
Java Snippet
Preview of the template code.
Additional Libraries
Preview of the additional jars.

Input Ports

First input Spark DataFrame.
Optional second input Spark DataFrame.

Output Ports

Result Spark DataFrame.

Update Site

To use this node in KNIME, install KNIME Extension for Apache Spark core infrastructure from the following update site:

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.