Icon

text clustering of Wikipeidia articles

Text clustering of Wikipeidia articles
Text cleaning Read files & convert to Documents Modeling Last amended: 1st Nov, 2019Text clustering of Wikipedia articles. 12Wikipedia articles, three each onPhilosophy, Releigon, Law and Quantum-Mechanics were manually copied fromInternet, put into respective twelve text filesin a folder. These twelve text files werethen read, text-processed and finallyhierachical clustering was performed. Clustering is perfect. At the lowest level inthe dendogram articles on each subjectfirst cluster together.Any distance measure other than'cosine', reduces accuracy drastically. Read 12 wikipediaarticles saved to.txt filesuse filenamesascategorieslowercaseNode 6Node 7rel tfNode 10calculatetf * idfNode 12vector value=tfidfNode 16Node 19Average linkageDoc category (filenames)vsclusterscosinedistanceExtract doccategory valuesas anothercolumnreplace RowIDsby another column Read multiple textfiles from a folder Strings To Document Case Converter Punctuation Erasure Number Filter TF IDF Column Expressions Bag Of WordsCreator Document Vector Stop Word Filter Document Viewer Number Filter Hierarchical Clustering(DistMatrix) HierarchicalCluster View Numeric Distances Category To Class RowID Text cleaning Read files & convert to Documents Modeling Last amended: 1st Nov, 2019Text clustering of Wikipedia articles. 12Wikipedia articles, three each onPhilosophy, Releigon, Law and Quantum-Mechanics were manually copied fromInternet, put into respective twelve text filesin a folder. These twelve text files werethen read, text-processed and finallyhierachical clustering was performed. Clustering is perfect. At the lowest level inthe dendogram articles on each subjectfirst cluster together.Any distance measure other than'cosine', reduces accuracy drastically. Read 12 wikipediaarticles saved to.txt filesuse filenamesascategorieslowercaseNode 6Node 7rel tfNode 10calculatetf * idfNode 12vector value=tfidfNode 16Node 19Average linkageDoc category (filenames)vsclusterscosinedistanceExtract doccategory valuesas anothercolumnreplace RowIDsby another column Read multiple textfiles from a folder Strings To Document Case Converter Punctuation Erasure Number Filter TF IDF Column Expressions Bag Of WordsCreator Document Vector Stop Word Filter Document Viewer Number Filter Hierarchical Clustering(DistMatrix) HierarchicalCluster View Numeric Distances Category To Class RowID

Nodes

Extensions

Links