0 ×

SMOTE

KNIME Base Nodes version 4.0.1.v201908131444 by KNIME AG, Zurich, Switzerland

This node oversamples the input data (i.e. adds artificial rows) to enrich the training data. The applied technique is called SMOTE (Synthetic Minority Over-sampling Technique) by Chawla et al.

Some supervised learning algorithms (such as decision trees and neural nets) require an equal class distribution to generalize well, i.e. to get good classification performance. In case of unbalanced input data, for instance there are only few objects of the "active" but many of the "inactive" class, this node adjusts the class distribution by adding artificial rows (in the example by adding rows for the "active" class).

The algorithm works roughly as follows: It creates synthetic rows by extrapolating between a real object of a given class (in the above example "active") and one of its nearest neighbors (of the same class). It then picks a point along the line between these two objects and determines the attributes (cell values) of the new object based on this randomly chosen point.

Options

Class Column
Pick the column that contains the class information.
Nearest neighbor
An option that determines how many nearest neighbors shall be considered. The algorithm picks an object from the target class, randomly selects one of its neighbors and draws the new synthetic example along the line between the sample and the neighbor.
Oversample by
Checking this option oversamples each class equally. You need to specify how much synthetic data is introduced, e.g. a value of 2 will introduce two more portions for each class (if there are 50 rows in the input table labeled as "A"; the output will contain 150 rows belonging to "A").
Oversample minority classes
This option adds synthetic examples to all classes that are not the majority class. The output contains the same number of rows for each of the possible classes.
Enable static seed
Check this option if you want to use a seed for the random number generator. This will cause consecutive runs of the node to produce the same output data. If unchecked, each run of the node generates a new seed. Use "Draw new seed" to randomly draw a new seed.

Input Ports

Table containing labeled data for oversampling.

Output Ports

Oversampled data (input table with appended rows).

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Core from the following update site:

KNIME 4.0
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.