Icon

PhD_​Articles

First version shared.

First version shared.

INSTRUCTIONS1. Point the PDF parser to afolder of PDFs. If this folder isnot called "Articles", you willneed to adjust a later part ofthe workflow accordingly.2. Adjust the nodes in 'Prep &metadata' to match yourfilename conventions, orchoose to ignore filnamedata by skiping the relevantnodes.3. Run the 'Boilerplateremoval' node to removeunwanted front- and back-matter. You may wish to addyour own search terms tothese nodes.4. Review the Prepocesingnode to ensure that nothingtoo important gets filteredout.5. Derive a topic model thatsatsifactorily describes yourcollection.6. Build a network of thearticles based on the topicmodel outputs. If you don't have any PDFs prepared, youcan explore the workflow using thissample of 100 pre-parsed PDFs.To explore the topic modelling steps, youcan load the pre-processed text from myfull collection (in the Preprocessing node). HELLO! This workflow takes a folder of journal articles in PDF format and creates a similartynetwork based on the topics within the articles, as per my blog post athttp://seenanotherway.com/a-thesis-relived-using-text-analytics-to-map-a-phd-journey/.You are free to use and modify the workflow, but if you publish or the results or share aderivative workflow, please credit me (Angus Veitch) by citing the Github page: https://github.com/angusveitch/article-network/ Load articlesNode 621Node 622Node 631Node 633Node 647Node 684Sampledata PDF Parser Prep & metadata Boilerplate removal Preprocessing Text prep &conversion Topic modelling Analysis Table Reader INSTRUCTIONS1. Point the PDF parser to afolder of PDFs. If this folder isnot called "Articles", you willneed to adjust a later part ofthe workflow accordingly.2. Adjust the nodes in 'Prep &metadata' to match yourfilename conventions, orchoose to ignore filnamedata by skiping the relevantnodes.3. Run the 'Boilerplateremoval' node to removeunwanted front- and back-matter. You may wish to addyour own search terms tothese nodes.4. Review the Prepocesingnode to ensure that nothingtoo important gets filteredout.5. Derive a topic model thatsatsifactorily describes yourcollection.6. Build a network of thearticles based on the topicmodel outputs. If you don't have any PDFs prepared, youcan explore the workflow using thissample of 100 pre-parsed PDFs.To explore the topic modelling steps, youcan load the pre-processed text from myfull collection (in the Preprocessing node). HELLO! This workflow takes a folder of journal articles in PDF format and creates a similartynetwork based on the topics within the articles, as per my blog post athttp://seenanotherway.com/a-thesis-relived-using-text-analytics-to-map-a-phd-journey/.You are free to use and modify the workflow, but if you publish or the results or share aderivative workflow, please credit me (Angus Veitch) by citing the Github page: https://github.com/angusveitch/article-network/ Load articlesNode 621Node 622Node 631Node 633Node 647Node 684Sampledata PDF Parser Prep & metadata Boilerplate removal Preprocessing Text prep &conversion Topic modelling Analysis Table Reader

Nodes

Extensions

Links