Icon

FAIR data with KNIME

This workflow exemplifies how KNIME can be used to make data FAIR. The use case is a low-throughput screen conducted by academic partners. The results are published in this paper: https://doi.org/10.14573/altex.1712182.

The workflow that combines data and information from 50 individual data Excel files into one large data table. The CAS numbers as chemical identifiers were extended by SMILES, InChI and InChI keys using the RDKit KNIME community nodes and REST API to enhance interoperability. The metadata was extended by controlled vocabulary using REST API and programmatical access to chEMBL and chEBI databases. To comply with FAIRness, details about the used databases and ontologies are extracted as well. To give as much provenance as possible, user-defined metadata is added using the interactive Table Editor. Depending on the repository where the (meta)data should be deposited at, the workflow can be extended by automatic upload using a PUT request.

Write the data Enrich data with additional information and controlled vocabulary Combine data from different Excel files Using the ChemicalIdentifier Resolver(https://cactus.nci.nih.gov/chemical/structure) toconvert CAS No. toSMILES ConvertingSMILES tocanonicalSMILES, InChIand InChI keyusing RDKitnodes Use the Interactive Table Editor to create column descriptions This workflow reads in data and metadata from different Excel files and compiles them into a machine-friendly table. The metadata isextended by controlled vocabulary using several GET requests. The original screen data is derived from tetsing a 75 substance library, which has been screened for their neurotoxic potential in a cell-based assay (NeuriTox assay). More information and the published results can be found at https://doi.org/10.14573/altex.1712182. The according blog post is available at https://www.knime.com/blog/fair-guiding-principles-and-how-to-fairify-your-data Reads the data from various Excel files (which all have the same layout). This file contains information about whichdata file contains results from whichsubstance This file contains aditional informationabout the substances, such as structuralformula, molecular weight, supplier, lotnumber, etc. Using https://www.ebi.ac.uk/chembl/api/data/molecule toextract molecule information,including properties,structural representationsand synonyms Get chEMBL IDsfrom InChI keysusing https://www.ebi.ac.uk/unichem/rest/inchikey/ Using https://www.ebi.ac.uk/ols/api/ontologies/chebi/terms/ to extract molecule roles Write the data file Add data point identifier add file location concatenates data from the 3 technical replicateslooping over all the filesadd biological replicate info from cmpd fileadds the column of the original cell assay plate rearranges the list of substances and their according CAS numbersadds the file name of the original Excel filesread coumpound information from Excel fileread compoundconcentrations from Excel file ConstantValue Column Concatenate Loop End Cell Replacer Column Resorter technicalreplicate 1 technicalreplicate 2 technicalreplicate 3 cell plate column Rearrangesubstance list file name Rearrange CASNo andconcentrations from input table Extract constant columnslike supplier, etc. Convert CASto SMILES Convert SMILESto InChI Column headers Add informationon cell line Rule Engine Get chEMBL IDs Get moleculeinfo from chEBML Get moleculeroles from chEBI Collectionto String Create data pointidentifier List Files/Folders Excel Reader Excel Reader Excel Reader Path to String(Variable) CSV Writer CSV Writer Table Editor Flow variables Table Row ToVariable Loop Start Joiner Joiner Joiner Joiner Column Rename Joiner Write the data Enrich data with additional information and controlled vocabulary Combine data from different Excel files Using the ChemicalIdentifier Resolver(https://cactus.nci.nih.gov/chemical/structure) toconvert CAS No. toSMILES ConvertingSMILES tocanonicalSMILES, InChIand InChI keyusing RDKitnodes Use the Interactive Table Editor to create column descriptions This workflow reads in data and metadata from different Excel files and compiles them into a machine-friendly table. The metadata isextended by controlled vocabulary using several GET requests. The original screen data is derived from tetsing a 75 substance library, which has been screened for their neurotoxic potential in a cell-based assay (NeuriTox assay). More information and the published results can be found at https://doi.org/10.14573/altex.1712182. The according blog post is available at https://www.knime.com/blog/fair-guiding-principles-and-how-to-fairify-your-data Reads the data from various Excel files (which all have the same layout). This file contains information about whichdata file contains results from whichsubstance This file contains aditional informationabout the substances, such as structuralformula, molecular weight, supplier, lotnumber, etc. Using https://www.ebi.ac.uk/chembl/api/data/molecule toextract molecule information,including properties,structural representationsand synonyms Get chEMBL IDsfrom InChI keysusing https://www.ebi.ac.uk/unichem/rest/inchikey/ Using https://www.ebi.ac.uk/ols/api/ontologies/chebi/terms/ to extract molecule roles Write the data file Add data point identifier add file location concatenates data from the 3 technical replicateslooping over all the filesadd biological replicate info from cmpd fileadds the column of the original cell assay plate rearranges the list of substances and their according CAS numbersadds the file name of the original Excel filesread coumpound information from Excel fileread compoundconcentrations from Excel file ConstantValue Column Concatenate Loop End Cell Replacer Column Resorter technicalreplicate 1 technicalreplicate 2 technicalreplicate 3 cell plate column Rearrangesubstance list file name Rearrange CASNo andconcentrations from input table Extract constant columnslike supplier, etc. Convert CASto SMILES Convert SMILESto InChI Column headers Add informationon cell line Rule Engine Get chEMBL IDs Get moleculeinfo from chEBML Get moleculeroles from chEBI Collectionto String Create data pointidentifier List Files/Folders Excel Reader Excel Reader Excel Reader Path to String(Variable) CSV Writer CSV Writer Table Editor Flow variables Table Row ToVariable Loop Start Joiner Joiner Joiner Joiner Column Rename Joiner

Nodes

Extensions

Links