Text clustering of Wikipedia articles. 12 different Wikipedia articles, three each on subjects of Philosophy, Religion, Law and Quantum-Mechanics were randomly selected, manually copied from Internet, saved into respective twelve text files (*.txt) in a folder. These twelve text files were then read, text-processed and finally hierachical clustering was performed. Clustering is perfect (even though files are just 12). At the lowest level in the dendogram articles on each subject first cluster together. Any distance measure other than 'cosine', reduces accuracy drastically.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!