Icon

01_​Chemistry_​basics

This workflow demonstrates basic cheminformatics functionality within KNIME Analytics Platform:
Reading and writing various chemistry data formats; canonalization of chemical structures; duplicate filtering; descriptor calculation; interactive filtering on multiple properties.
Data sets were collected from ChEMBLdb. Each set corresponds to a publication in which lipophilicity was determined experimentally.




Step 4. 1. Compute physchem properties using the RDKit Descriptor Calculation node. 2. Add colors to the data table based on the "assay_id" column by using the Color Manager node. 3. Configure Parallel Coordinates Plot: include lipophilicity and computed physchem properties to be displayed. 4. Adjust the tile view to display the image of the molecule and its LogP_EXP.5. Configure the GroupBy node to display the mean and standard deviation of LogP_EXP as well as the unique number ofmol_chemlids for each assay. Display the results in a table view.6. Select the nodes after the "Add molecule images" metanode. Right click on them and select "Create Component". Call it "Visualizemolecules with lipophilicity data". Ctrl/Cmd + Double click on the component to open its content. Adjust the layout by clicking on thecorresponding icon in the Toolbar.7. Execute the component and explore the interactive view. In the view select the compounds you are interested in and click apply(apply settings temporarily) and close. Hint: to enable interactivity in views following the GroupBy node you need to enable hilighting in the GroupBy node. Step 1. 1. Read data from different sourcesby draging them from the "data"folder:Lipophilicity_CHEMBL3096849.csv;Lipophilicity_CHEMBL633737.xlsx;Lipophilicity_CHEMBL636806.sdfRemember to extract all propertiesfrom SD file.2. Unite/merge them into one datastream using Concatenate node Step 2. 1. Convert column with SMILES string to SMILESdata format. Use Molecule Type Cast node.2. Generate Canonical_SMILES using RDKit CanonSMILES node. Use smiles column as input.3. Remove duplicates with the Duplicate Row Filternode Step 5.1. Filter out unwanted columns such as theMol_image and the selection information. Write the resulting table to a SDF file and toa table file.Hint: use corresponding writer nodes. Step 3. 1. Use the Column Filter nodeto remove redundant columns(keep IDs, structures andvalues)2. Use Column Rename tocustomize the column names(e.g. assay__chemblid toassay_id, value to LogP_EXPand parent_cmpd_chemblid tomol_chemblid) 01_Chemistry_Basics1. Read data from multiple files using corresponding Reader nodes. Find them in Node repository >> IO >> Read. 2. Generate canonical SMILES and remove duplicates. 3. Customize the data by adding column names and removing redundant columns.4. Compute descriptors and use Parallel Coordinates Plot to filter data interactively on multiple properties. (Make sure to keep selection from the View)5. Finally, save the data to TABLE and SDF files.Required extensions: RDKit KNIME Integration adjust nameschange typesassay_idNode 45Node 46Node 47Node 48Node 49Node 50 Molecule Type Cast RDKit Canon SMILES Column Filter Column Rename Color Manager Add molecule images CSV Reader Excel Reader SDF Reader Concatenate DuplicateRow Filter RDKit DescriptorCalculation visualisation Step 4. 1. Compute physchem properties using the RDKit Descriptor Calculation node. 2. Add colors to the data table based on the "assay_id" column by using the Color Manager node. 3. Configure Parallel Coordinates Plot: include lipophilicity and computed physchem properties to be displayed. 4. Adjust the tile view to display the image of the molecule and its LogP_EXP.5. Configure the GroupBy node to display the mean and standard deviation of LogP_EXP as well as the unique number ofmol_chemlids for each assay. Display the results in a table view.6. Select the nodes after the "Add molecule images" metanode. Right click on them and select "Create Component". Call it "Visualizemolecules with lipophilicity data". Ctrl/Cmd + Double click on the component to open its content. Adjust the layout by clicking on thecorresponding icon in the Toolbar.7. Execute the component and explore the interactive view. In the view select the compounds you are interested in and click apply(apply settings temporarily) and close. Hint: to enable interactivity in views following the GroupBy node you need to enable hilighting in the GroupBy node. Step 1. 1. Read data from different sourcesby draging them from the "data"folder:Lipophilicity_CHEMBL3096849.csv;Lipophilicity_CHEMBL633737.xlsx;Lipophilicity_CHEMBL636806.sdfRemember to extract all propertiesfrom SD file.2. Unite/merge them into one datastream using Concatenate node Step 2. 1. Convert column with SMILES string to SMILESdata format. Use Molecule Type Cast node.2. Generate Canonical_SMILES using RDKit CanonSMILES node. Use smiles column as input.3. Remove duplicates with the Duplicate Row Filternode Step 5.1. Filter out unwanted columns such as theMol_image and the selection information. Write the resulting table to a SDF file and toa table file.Hint: use corresponding writer nodes. Step 3. 1. Use the Column Filter nodeto remove redundant columns(keep IDs, structures andvalues)2. Use Column Rename tocustomize the column names(e.g. assay__chemblid toassay_id, value to LogP_EXPand parent_cmpd_chemblid tomol_chemblid) 01_Chemistry_Basics1. Read data from multiple files using corresponding Reader nodes. Find them in Node repository >> IO >> Read. 2. Generate canonical SMILES and remove duplicates. 3. Customize the data by adding column names and removing redundant columns.4. Compute descriptors and use Parallel Coordinates Plot to filter data interactively on multiple properties. (Make sure to keep selection from the View)5. Finally, save the data to TABLE and SDF files.Required extensions: RDKit KNIME Integration adjust nameschange typesassay_idNode 45Node 46Node 47Node 48Node 49Node 50 Molecule Type Cast RDKit Canon SMILES Column Filter Column Rename Color Manager Add molecule images CSV Reader Excel Reader SDF Reader Concatenate DuplicateRow Filter RDKit DescriptorCalculation visualisation

Nodes

Extensions

Links