t-SNE

t-SNE is a manifold learning technique that learns low-dimensional embeddings for high-dimensional data. It is most often used for visualization purposes because it exploits the local relationships between data points and can hence capture non-linear structures in the data. Unlike other dimension reduction techniques like PCA, a learned t-SNE model can't be applied to new data. The t-SNE algorithm can be roughly summarized as two steps:

  1. Create a probability distribution capturing the relationships between points in the high-dimensional space
  2. Find a low-dimensional space that resembles the probability dimension as good as possible
As t-SNE directly utilizes the data points, it is sensitive to the scale of the input features and for best results it is recommended to normalize the features using the Normalizer node. For further details check out this great blog post or the original paper . The implementation of this node is based on the Smile - Statistical Machine Intelligence and Learning Engine .

Options

Columns
Select the columns that are included by t-SNE i.e. the original features. Note that currently only numerical columns are supported.
Dimension(s) to reduce to
The number of dimension of the target embedding (for visualization typically 2 or 3).
Iterations
The number of learning iterations to perform. Too few iterations might result in a bad embedding while too many iterations take a long time to train.
Learning rate
The learning rate to use, i.e. how much the embedding changes in one iteration. A too small learning rate means that more iterations are required to reach a good embedding while a too large learning rate can result in unstable embeddings that change strongly between iterations.
Perplexity
Informally, the perplexity is the number of neighbors for each data point. Small perplexities focus more on local structure while larger perplexities take more global relationships into account. Typical values for the perplexity lay between 5 and 50.
Remove original data columns
Check this box if you want to remove the columns used to learn the embedding.
Fail if missing values are encountered
If this box is checked, the node fails if it encounters a missing value in one of the columns used for learning. Otherwise, rows containing missing values in the learning columns will be ignored during learning and the corresponding embedding consists of missing values.
Seed
Allows specifying a static seed to allow for reproducible results. NOTE: The Smile library is reproducible if and only if the VM argument smile.threads is 1. We set this property if it is not set during start up but we don't overwrite existing VM arguments, meaning if you set the smile.threads to anything other than 1 in your knime.ini, we won't overwrite this value and results will thus not be reproducible even if a static seed is provided.

Input Ports

Icon
Input port for the data for which a low-dimensional embedding should be learned

Output Ports

Icon
The low-dimensional embedding

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.