Synthetic Data Generator (Classification)

This component generates example data for classification tasks based on the make_classification() function in the Python scikit-learn library.

The predictor features are randomly drawn from the standard normal distribution. If desired, some of the features can be redundant or duplicated. The samples are assigned into one or more clusters within each class. Furthermore, it is possible to regulate the class separation within the feature space.

For more information see the sklearn documentation:

scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

Note: This component requires a Python environment. In this blog post we explain how to setup the KNIME Python extension:

knime.com/blog/setting-up-the-knime-python-extension-revisited-for-python-30-and-20

Options

Class Separation: The factor multiplying the hypercube size. Larger values spread out the clusters/classes and make the classification task easier.
Number of Redundant Features: The number of features, which are not related to the classes. These features are generated as random linear combinations of the informative features.
Number of Informative Features: The number of features, which are related to the classes. For each cluster, the informative features are drawn independently from N(0, 1) and then randomly linearly combined in order to add covariance within the cluster.
Number of Duplicated Features: The number of features that are represented twice in the data. These features are drawn randomly from among the informative and redundant features.
Clusters per Class: Number of clusters assigned to each class.
Number of Classes: The number of classes to generate.%%00010
Number of Samples: The number of samples to generate.%%00010
Number of Features: The total number of features to generate. These comprise the number of informative, redundant and repeated features.
Random Seed: The seed for dataset creation to make the output reproducible.

Input Ports

This node has no input ports

Output Ports

: Classification Data

Nodes

Extensions

No modules found