Performs a pivoting on the given Spark DataFrame/RDD using a selected number of columns for grouping and one column for pivoting. Each combination of values in the grouping columns will result into an output row. Each combination of pivot values and aggregations becomes a new output column.
The aggregations to perform can be specified (a) by selecting the columns directly in the "Manual Aggregation" tab, and (b) by a column name search pattern or regular expression in the "Pattern Based Aggregation" tab, and (c) by column type in the "Type Based Aggregation" tab. Each input column is only considered once, i.e. columns that are added directly on the "Manual Aggregation" tab are ignored even if their name matches a search pattern on the "Pattern Based Aggregation" tab or their type matches a type on the "Type Based Aggregation" tab. The same holds for columns that are added based on a search pattern. They are ignored even if they match a criterion that has been defined in the "Type Based Aggregation" tab.
A detailed description of the available aggregation methods can be found on the 'Description' tab in the node dialog. Further information can be also found on the Spark documentation and the Spark API documentation.
This node requires at least Apache Spark 2.0.
The "Pivot" tab allows to transpose the values of one input column into individual output columns. To pivot over multiple columns, you can use the Spark SQL with the concat() function before pivoting.
In the "Manual Aggregation" tab you can select one or more columns for aggregation.
In the "Pattern Based Aggregation" tab you can assign aggregation methods to columns based on a search pattern. The pattern can be either a string with wildcards or a regular expression. Columns where the name matches the pattern but where the data type is not compatible with the selected aggregation method are ignored. Only columns that have not been selected as group column or that have not been selected as aggregation column on the "Manual Aggregation" tab are considered.
The "Type Based Aggregation" tab allows to select an aggregation method for all columns of a certain data type e.g. to compute the mean for all numerical columns (DoubleCell). Only columns that have not been handled by the other tabs e.g. group, column based and pattern based are considered. The data type list to choose from contains basic types e.g String, Double, etc. and all data types the current input table contains.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Extension for Apache Spark (legacy) from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.