Synthetic Data Generator (Classification)

This component generates example data for classification tasks based on the make_classification() function in the Python scikit-learn library.

The predictor features are randomly drawn from the standard normal distribution. If desired, some of the features can be redundant or duplicated. The samples are assigned into one or more clusters within each class. Furthermore, it is possible to regulate the class separation within the feature space.

For more information see the sklearn documentation:

scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

Note: This component requires a Python environment. In this blog post we explain how to setup the KNIME Python extension:

knime.com/blog/setting-up-the-knime-python-extension-revisited-for-python-30-and-20

Options

Class Separation
The factor multiplying the hypercube size. Larger values spread out the clusters/classes and make the classification task easier.
Number of Redundant Features
The number of features, which are not related to the classes. These features are generated as random linear combinations of the informative features.
Number of Informative Features
The number of features, which are related to the classes. For each cluster, the informative features are drawn independently from N(0, 1) and then randomly linearly combined in order to add covariance within the cluster.
Number of Duplicated Features
The number of features that are represented twice in the data. These features are drawn randomly from among the informative and redundant features.
Clusters per Class
Number of clusters assigned to each class.
Number of Classes
The number of classes to generate.%%00010
Number of Samples
The number of samples to generate.%%00010
Number of Features
The total number of features to generate. These comprise the number of informative, redundant and repeated features.
Random Seed
The seed for dataset creation to make the output reproducible.

Input Ports

This node has no input ports

Output Ports

Icon
Classification Data

Nodes

Extensions

Links