Synthetic Data Generator (Multilabel Classification)

This component generates example data for a multilabel classification task based on the make_multilabel_classification() function in the Python scikit-learn library.
It generates class columns with 0s/1s that indicate the absence/presence of the respective label. The average number of labels assigned to each row can be regulated.

For more information see the sklearn documentation:

scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html

Note: This component requires a Python environment. In this blog post we explain how to setup the KNIME Python extension:

knime.com/blog/setting-up-the-knime-python-extension-revisited-for-python-30-and-20

Options

Number of Labels: The average number of labels per sample.
Number of Classes: The number of class columns to generate.
Number of Samples: The number of samples to generate.%%00010
Number of Features: The number of features to generate.
Random Seed: The seed for dataset creation to make the output reproducible.

Input Ports

This node has no input ports

Output Ports

: Multi-class Classification Data

Nodes

Integer Configuration5 ×
Column Rename (Regex)2 ×
Component Input1 ×
Component Output1 ×
Conda Environment Propagation1 ×
Show all 8 nodes

Synthetic Data Generator (Multilabel Classification)

Options

Input Ports

Output Ports

Nodes

Extensions

Links

Download