Spark Lag Column

This component copies column values from preceding rows into the current row in a Spark DataFrame/RDD. The component can be used to

- make a copy of the selected column and shift the cells I steps up (I = lag interval)
- make L copies of the selected column and shift the cells of each copy I, 2I, 3I, ... L*I steps up (L = lag)

Required extensions:
KNIME Data Generation
(https://hub.knime.com/knime/extensions/org.knime.features.datageneration/latest)
KNIME Extension for Apache Spark
(https://hub.knime.com/knime/extensions/org.knime.features.bigdata.spark/latest)
KNIME Quick Forms
(https://hub.knime.com/knime/extensions/org.knime.features.js.quickforms/latest)

Options

Skip incomplete rows
If checked, rows containing missing values will be removed.
Value Column
The column to create the lag for. This column has to be a numeric column.
Sort Column
The column to sort by. This column has to be a numeric or timestamp column.
Lag
How many lagged columns to be generated.
Lag Interval
The interval between the lagged columns.

Input Ports

Icon
Spark DataFrame containing at least the value column and a numeric or timestamp column for sorting purpose.

Output Ports

Icon
Spark DataFrame with input data and additional columns copying the values from preceding rows.

Nodes

Extensions

Links