Spark Normalizer

This node normalizes the values of all selected (numeric) columns.

Options

Min-max normalization
Linear transformation of all values such that the minimum and maximum in each column are as given.
Z-score normalization (Gaussian)
Linear transformation such that the values in each column are Gaussian-(0,1)-distributed, i.e. mean is 0.0 and standard deviation is 1.0.
Normalization by decimal scaling
The maximum value in a column (both positive and negative) is divided j-times by 10 until its absolute value is smaller or equal to 1. All values in the column are then divided by 10 to the power of j.

Input Ports

Icon
Spark DataFrame/RDD requiring normalization of some or all columns.

Output Ports

Icon
Spark DataFrame/RDD with normalized columns.
Icon
PMML document containing normalization parameters, which can be used in the "Spark Compiled Transformations Applier" node to normalize test data the same way as the training data has been normalized.

Views

This node has no views

Workflows

Further Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.