Icon

Example 1 - Word Embeddings (PubMed Docs)

Distances on Word Embeddings

Here we use word embedding instead of hot encoding, using a Word2Vec Learner node. The hidden layer size is set to 10, therefore producing an embedding with very small dimensionality.

The output of the Word2Vec Learner node is a model. The Vocabulary Extractor node extracts the words from the model vocabulary and provides their embedding in form of collections. Collection items are isolated using a Split Collection column node and the distances between word emebedding vectors are calculated.

At the end, n selected words are visualized on a scatter plot, to show proximity of same semantic words across different embedding coordinates. The String input node allows to insert one selected word and retrieve all word distances from that word. Smaller distances should correspond to closer words in context or in meaning.

In this workflow we train a Word2Vec model on 300 scientific articles from PubMed. One set of articles has been extracted using the query “mousecancer” and one set of articles using the query “human AIDS”.This workflow trains a vector representation for each term using Word2Vec, calculates the difference between the word embeddings (upper branch)and visualizes the word embedding vectors (lower branch). Scatter Plot of n selected words Extract all distances for a selectedword Read articles from PubmedPubmed_Articles.csvcleaningstemmingtag filtering10 hiddenunitsextractword embeddingfor vocabularydistance on word embeddingsdistanceon pairsn selectedwordsonly the nselected wordsby wordRelative term freq. in each docNumber of documents that contain each termVisualizeembeddingsadd DFdistance on word embeddingsdistanceon pairs Reading Data Pre-processing Word2Vec Learner VocabularyExtractor Split CollectionColumn Distance MatrixCalculate Distance MatrixPair Extractor Table Creator ReferenceRow Filter Color Manager RowID Distances for aspecific word Bag Of WordsCreator TF DF GroupBy Bubble Chart(Plotly) Joiner Term To String Distance MatrixCalculate Distance MatrixPair Extractor In this workflow we train a Word2Vec model on 300 scientific articles from PubMed. One set of articles has been extracted using the query “mousecancer” and one set of articles using the query “human AIDS”.This workflow trains a vector representation for each term using Word2Vec, calculates the difference between the word embeddings (upper branch)and visualizes the word embedding vectors (lower branch). Scatter Plot of n selected words Extract all distances for a selectedword Read articles from PubmedPubmed_Articles.csvcleaningstemmingtag filtering10 hiddenunitsextractword embeddingfor vocabularydistance on word embeddingsdistanceon pairsn selectedwordsonly the nselected wordsby wordRelative term freq. in each docNumber of documents that contain each termVisualizeembeddingsadd DFdistance on word embeddingsdistanceon pairs Reading Data Pre-processing Word2Vec Learner VocabularyExtractor Split CollectionColumn Distance MatrixCalculate Distance MatrixPair Extractor Table Creator ReferenceRow Filter Color Manager RowID Distances for aspecific word Bag Of WordsCreator TF DF GroupBy Bubble Chart(Plotly) Joiner Term To String Distance MatrixCalculate Distance MatrixPair Extractor

Nodes

Extensions

Links