Icon

Thesis 3

Text mining

Db preparation

Final Db exportation

Scopus

IEEE Explorer

Google Scholar

Google Scholar automated from citation to full data

Data Preparation for Topic Models. Preprocessing, n-grams, exclusion of reviews with a small number of terms can be adjusted as desired
Obtain topic solution. Users can test more than 1 topic solution and choose based on interpretability.
Find optimal k. Other methods can be implemented in KNIME (https://hub.knime.com/angusveitch/spaces/Public/latest/TopicKR~HRMp6v9Ip_ODMIob). Other Topic model algorithms that can be used in R or python are structural topic models (STM) and correlated topic models (CTM).

Python LDA visualization

Cloude word visualization

Data preparation

Theta preparation
Doc lenght preparation
Phi preparation
Vocabulary preparation

Visualizion

Checker box

If the python enviromental's is working use this section

Bibliography

Import the data if not excel

Replaicing missing values with 0
Missing Value
Converting string in document
1. Doc Creation
If the paper are a little amount or the premium API are used put the first row of this joiner in the second row of the concatenate (and exclude the Google Scholar box output)
Joiner
Mantaining only the articles
Row Filter
Renaiming the columns to have uniform names accross the Dbs
Column Renamer
Column Filter
Last Filteringin this case 0 paper has been removed
5. Filter Reviews with Less than 10 words
Putting the Dbs in the same table
Concatenate
Bar Chart
Eliminating the duplicates
Duplicate Row Filter
CountingPapers per topic
GroupBy
Mantaining only the usefull columns
Column Filter
Calculating the relative count (%)
Math Formula
In a range of topicsidentify elbow range in image
CHI-Square
Appending the previus info
Column Appender
Importing the excel file from Google Scholar
Excel Reader
bi-grams (tri-grams can be added as well)
4. N-grams
After using this node substitute the output of this node to the input of the python script node in the visualization box (the one that create the visualization)
Python Script
Excluding papers that are not relevant for the research
Row Filter
Importing the review framwork overiew
Excel Reader
Calculating the count for each word
GroupBy
Excluding paper before 2020
Row Filter
Removing Topic id
Column Filter
Full string with a line separation ready to be copied on word
GroupBy
Making the Source column lower case to campare after
String Manipulation
CSV Reader
CSV Reader
Removing the paperthat are not 1 or 2 Q
Row Filter
Preparing the data for thebibliography
Bibliography
Matching the source that have to be retain
Reference Row Filter
CSV Reader
Importing the excel file from IEEE Explorer
Excel Reader
Importing the excel file from scopus
Excel Reader
Mantaining only the articles
Row Filter
Mantaining only the articles
Row Filter
Mantaining only the usefull columns
Column Filter
Mantaining only the usefull columns
Column Filter
Making the Source column lower case to campare after
String Manipulation
Joiner
Sorting from A-Zterms
Sorter
Eliminating useless column
Column Filter
Making the Source column lower case to campare after
String Manipulation
Exporting theHTMLChange the file path accordingly(Usefull to publish on github and spread the graph)
Python Script (1⇒1) (deprecated)
Replaicing missing values with 0
Missing Value
Source to retain
Table Creator
Visualizing in knime
Generic JavaScript View (JavaScript) (legacy)
Transforming row in varibale
Table Row to Variable (deprecated)
Creating visualizationas HTML
Python Script (1⇒1) (deprecated)
Manteining only the n° of terms per abstract
Column Filter
Change the file path accordingly
Excel Writer
Concatenate
If this table is not empty same paper are not in the list between the soruce to mantaine and exclude
Reference Row Filter
Pivoting fo word
Pivot
Data preparation for the cloud word's
Metanode
Matching the source that have to be excluded
Reference Row Filter
Change the file path accordingly
Excel Writer
replace withtopic names
Table Creator
Replacing the topic with their name
String Replacer (Dictionary)
Source check countThe n. of rows in this node has to be equal to the sum of the rows in two nodes table creator
GroupBy
Importing Merged dataset composed by:1)phi2)theta3)doc_lenght4)vocab5)term frequency
JSON Reader
Source to exclude
Table Creator
Uniforming the name of the publication's typer
Rule Engine
Change the file path accordingly
CSV Writer
Mantaining only the articles
Row Filter
Change the file path accordingly
CSV Writer
Putting the column in the same order in all the Dbs
Column Resorter
Manteining for each topic all the weight that compose single abstract
Column Filter
Renaiming the columns to have uniform names accross the Dbs
Column Renamer
Bar Chart
Calculating the percentage weight of each words
Math Formula
Renaiming the columns to have uniform names accross the Dbs
Column Renamer
Putting the Dbs in the same table
Concatenate
In a range of topicsidentify elbow range in image
Perplexity index
Column Resorter
Putting the column in the same order in all the Dbs
Column Resorter
Step 3: Select a topic #example K=4
Topic Extractor (Parallel LDA)
Preparing the db
2. Preprocessing
Importing the excel file from Google Scholar
Excel Reader
Addingoverviewinfo
Joiner
Cloud word's
Tag Cloud
Change the file path accordingly
CSV Writer
Combining the info
Column Appender
Change the file path accordingly
Excel Writer
Change the file path accordingly
CSV Writer
Eliminating the duplicates
Duplicate Row Filter
Prapering thedb
3. Token filter
Column Renamer
The script query the Semantic scholar API the info
Python Script
Summary words per topic
GroupBy
Chainging the name
Column Renamer

Nodes

Extensions

Links