t-SNE is a manifold learning technique, which learns low dimensional embeddings for high dimensional data. It is most often used for visualization purposes because it exploits the local relationships between datapoints and can subsequently capture nonlinear structures in the data. Unlike other dimension reduction techniques like PCA, a learned t-SNE model can't be applied to new data. The t-SNE algorithm can be roughly summarized as two steps:

- Create a probability distribution capturing the relationships between points in the high dimensional space
- Find a low dimensional space that resembles the probability dimension as well as possible

- Columns
- Select the columns that are included by t-SNE i.e. the original features. Note that currently only numerical columns are supported.
- Dimension(s) to reduce to
- The number of dimension of the target embedding (for visualization typically 2 or 3).
- Iterations
- The number of learning iterations to be performed. Too few iterations might result in a bad embedding, while too many iterations take a long time to train.
- Theta
- Controls the tradeoff between runtime and accuracy of the Barnes-Hut approximation algorithm for t-SNE. Lower values result in a more accurate approximation at the cost of higher runtimes and memory demands. A theta of zero results in the originally proposed t-SNE algorithm. However, for most datasets a theta of 0.5 does not result in a perceivable loss of quality.
- Perplexity
- Informally, the perplexity is the number of
neighbors for each datapoint.
Small perplexities focus more on local
structure while larger perplexities take more global relationships
into account.
I most cases values in range [5,50] are sufficient.

*Note:*The perplexity must be less than or equal to*(Number of rows - 1) / 3*. - Number of threads
- Number of threads used for parallel computation. The default is set to the number of cores your computer has and usually doesn't require tuning. Note that no parallelization is used if theta is zero because the exact t-SNE algorithm isn't parallelizable.
- Remove original data columns
- Check this box if you want to remove the columns used to learn the embedding.
- Fail if missing values are encountered
- If this box is checked, the node fails if it encounters a missing value in one of the columns used for learning. Otherwise, rows containing missing values in the learning columns will be ignored during learning and the corresponding embedding consists of missing values.
- Seed
- Allows you to specify a static seed to enable reproducible results.

- This node has no views

- 01_BERT_Sentiment_AnalysisKNIME Hub
- 02_PCA_t-SNEKNIME Hub
- 02_Techniques_for_Dimensionality_ReductionKNIME Hub
- Analyzing Breaking Bad subtitles with Redfield NLP nodesKNIME Hub
- Digits8x8ExampleKNIME Hub
- Show all 23 workflows

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME Statistics Nodes (Labs) from the below update site following our NodePit Product and Node Installation Guide:

v4.5

A zipped version of the software site can be downloaded here.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**