Icon

01_​Topic Detection Analysis_​Training

Topic Detection Analysis Training

This workflow applies the Topic Extractor (Parallel LDA) node to detect 10 topics and describe each one of them with 5 keywords. LDA is a generative probabilistic model considered an unsupervised algorithm that finds out the top n topics, described by the most relevant m keywords. This is implemented in KNIME Analytics Platform through the Topic Extractor (Parallel LDA) node available within the Text Processing extension. LDA represents documents as random mixtures over latent topics, where each topic is characterized by a distribution over words (Blei, Ng and Jordan, 2003).

The overall workflow constitutes the training model. In addition to the Topic Extractor (Parallel LDA) node the workflow includes the following steps: importing, cleaning up, and transforming the data.

Digging Up Hillary Clinton's Past: An Interactive Tour Of Her Email DatasetThis workflow applies the Topic Extractor (Parallel LDA) node to detect 10 topics and describe each one of them with 5 keywords. LDA is a generative probabilistic modelconsidered an unsupervised algorithm that finds out the top n topics, described by the most relevant m keywords. This is implemented in KNIME Analytics Platform through theTopic Extractor (Parallel LDA) node available within the Text Processing extension. LDA represents documents as random mixtures over latent topics, where each topic ischaracterized by a distribution over words (Blei, Ng and Jordan, 2003).The overall workflow constitutes the training model. In addition to the Topic Extractor (Parallel LDA) node the workflow includes the following steps: importing, cleaning up, andtransforming the data. Further Data Processing PreparingDocuments forTopic Extractionwithout stemming Data collection and manipulation Topic Extraction Deployment 5 words for 10 topics Topic Extractor(Parallel LDA) Preprocessing Data Collection Data Manipulation Strings to Document Get only > 3 emails Saving Data forDeployment Joiner Digging Up Hillary Clinton's Past: An Interactive Tour Of Her Email DatasetThis workflow applies the Topic Extractor (Parallel LDA) node to detect 10 topics and describe each one of them with 5 keywords. LDA is a generative probabilistic modelconsidered an unsupervised algorithm that finds out the top n topics, described by the most relevant m keywords. This is implemented in KNIME Analytics Platform through theTopic Extractor (Parallel LDA) node available within the Text Processing extension. LDA represents documents as random mixtures over latent topics, where each topic ischaracterized by a distribution over words (Blei, Ng and Jordan, 2003).The overall workflow constitutes the training model. In addition to the Topic Extractor (Parallel LDA) node the workflow includes the following steps: importing, cleaning up, andtransforming the data. Further Data Processing PreparingDocuments forTopic Extractionwithout stemming Data collection and manipulation Topic Extraction Deployment 5 words for 10 topics Topic Extractor(Parallel LDA) Preprocessing Data Collection Data Manipulation Strings to Document Get only > 3 emails Saving Data forDeployment Joiner

Nodes

Extensions

Links