Icon

textMining_​topicsRanks_​RF_​v3

Input:a) a table containing the BOW feature vector representationb) a table containing the detected LDA topics Process:- This workflow generates sub-datasets derived from the Bag-of-Word vector representation of the trainingdataset. Each sub-dataset corresponds to a specific topic and consists mainly of terms/words belonging tothat topic.- Afterwards, this workflow utilizes a machine learning algorithm (Random Forest) with an internal k-foldstratified cross-validation applied on each sub_dataset to assign a score for each topic- Finally, it builds a Random Forest model for the top-ranked topics in an accumulated order and reports thecumulative performance of the model.Output:Excel sheet files plotting the performance results over different constructed feature sets (i.e., the top 1 rankedtopic, top 2 ranked topics, until top 10 ranked topics), Node 365Node 366Node 371#itrNode 443Node 448Node 466Node 467Node 496Node 498Node 500 Table Reader(deprecated) Partitioning Counting Loop Start(deprecated) Integer Input Clean Files OutPutFiles ClassiciationBasedTopicsRanks Loop End (3 ports)(deprecated) Normalizer Parmaters andFucntion Perfroamnce ReadTopics Value Counter Save_results Clean Files SampleByRatio Column Rename Input:a) a table containing the BOW feature vector representationb) a table containing the detected LDA topics Process:- This workflow generates sub-datasets derived from the Bag-of-Word vector representation of the trainingdataset. Each sub-dataset corresponds to a specific topic and consists mainly of terms/words belonging tothat topic.- Afterwards, this workflow utilizes a machine learning algorithm (Random Forest) with an internal k-foldstratified cross-validation applied on each sub_dataset to assign a score for each topic- Finally, it builds a Random Forest model for the top-ranked topics in an accumulated order and reports thecumulative performance of the model.Output:Excel sheet files plotting the performance results over different constructed feature sets (i.e., the top 1 rankedtopic, top 2 ranked topics, until top 10 ranked topics), Node 365Node 366Node 371#itrNode 443Node 448Node 466Node 467Node 496Node 498Node 500 Table Reader(deprecated) Partitioning Counting Loop Start(deprecated) Integer Input Clean Files OutPutFiles ClassiciationBasedTopicsRanks Loop End (3 ports)(deprecated) Normalizer Parmaters andFucntion Perfroamnce ReadTopics Value Counter Save_results Clean Files SampleByRatio Column Rename

Nodes

Extensions

Links