Icon

Document_​Tagging_​Using_​Ontology_​Terms

Tagging Genes in Disease Related Publications

This example workflow shows how ontology terms can be used to tag biomedical literature.

In the first step, the Triple File Reader node reads an ontology in RDF format (extracted from UniProt) and allows the user to select a disease (using the Autocomplete Text Widget node).
Then, abstracts from PubMed for the specified disease are automatically extracted.

Additionally, a connection to the UniProt SPARQL Endpoint is made and a SPARQL Query executed that allows to extract preferred gene names and disease annotations of all human UniProt entries that are known to be involved in a disease. The gene names are used as the input for the Dictionary Tagger together with the extracted documents from PubMed.

In the last step a component allows to inspect the tagged data.

Note: To open the interactive view of the "View" component do a right click and select "Interactive View".

Reading ontology in RDF format containing disease names and definitions (exported from UniProt) Connecting to UniProt SPARQL Endpoint and querying for allGene Names that are associated to diseases Document Tagging Using Ontologies and Dictionaries Query for the preferred gene name and disease annotation of all human UniProt entries that are known to be involved in a diseaseUniProt SPARQLEndpointExecute and open Interactive Viewof the Component to see resultsdiseases-all.rdfAction needed! Open interactive viewand type/select a diseases nameBy default: Aplastic anemia PubMed DocumentExtractor SPARQL Query SPARQL Endpoint Punctuation Erasure Dictionary Tagger View Data Processing Joiner String Manipulation Column Rename Triple File Reader Row Filter Column Rename AutocompleteText Widget DuplicateRow Filter Reading ontology in RDF format containing disease names and definitions (exported from UniProt) Connecting to UniProt SPARQL Endpoint and querying for allGene Names that are associated to diseases Document Tagging Using Ontologies and Dictionaries Query for the preferred gene name and disease annotation of all human UniProt entries that are known to be involved in a diseaseUniProt SPARQLEndpointExecute and open Interactive Viewof the Component to see resultsdiseases-all.rdfAction needed! Open interactive viewand type/select a diseases nameBy default: Aplastic anemia PubMed DocumentExtractor SPARQL Query SPARQL Endpoint Punctuation Erasure Dictionary Tagger View Data Processing Joiner String Manipulation Column Rename Triple File Reader Row Filter Column Rename AutocompleteText Widget DuplicateRow Filter

Nodes

Extensions

Links