Icon

01_​Chemistry_​basics

This workflow demonstrates basic cheminformatics functionality within KNIME Analytics Platform:
Reading and writing various chemistry data formats; canonalization of chemical structures; duplicate filtering; descriptor calculation; interactive filtering on multiple properties.
Data sets were collected from ChEMBLdb. Each set corresponds to a publication in which lipophilicity was determined experimentally.




Step 4. 1. Compute physchem properties using the RDKit Descriptor Calculation node. 2. Add colors to the data table based on the "assay_id" column by using the Color Manager node. 3. Configure Parallel Coordinates Plot: include lipophilicity and computed physchem properties to be displayed. 4. Adjust the tile view to display the image of the molecule and its LogP_EXP.5. Configure the GroupBy node to display the mean and standard deviation of LogP_EXP as well as the unique number of mol_chemlids for each assay. Display the results in atable view.6. Select the nodes after the "Add molecule images" metanode. Right click on them and select "Create Component". Call it "Visualize molecules with lipophilicity data". Ctrl/Cmd+ Double click on the component to open its content. Adjust the layout by clicking on the corresponding icon in the Toolbar.7. Execute the component and explore the interactive view. In the view select the compounds you are interested in and click apply (apply settings temporarily) and close. Hint: to enable interactivity in views following the GroupBy node you need to enable hilighting in the GroupBy node. Step 1. 1. Read data from different sources by dragingthem from the "data" folder:Lipophilicity_CHEMBL3096849.csv;Lipophilicity_CHEMBL633737.xlsx;Lipophilicity_CHEMBL636806.sdfRemember to extract all properties from SD file.2. Unite/merge them into one data stream usingConcatenate node Step 2. 1. Convert column with SMILES string to SMILES data format. UseMolecule Type Cast node.2. Generate Canonical_SMILES using RDKit Canon SMILES node.Use smiles column as input.3. Remove duplicates with the Duplicate Row Filter node Step 5.1. Filter out unwanted columns such as the Mol_imageand the selection information. Write the resulting table toa SDF file and to a table file.Hint: use corresponding writer nodes. Step 3. 1. Use the Column Filter node to removeredundant columns (keep IDs, structuresand values)2. Use Column Rename to customize thecolumn names (e.g. assay__chemblid toassay_id, value to LogP_EXP andparent_cmpd_chemblid tomol_chemblid) 01_Chemistry_Basics1. Read data from multiple files using corresponding Reader nodes. Find them in Node repository >> IO >> Read. 2. Generate canonical SMILES and remove duplicates. 3. Customize the data by adding column names and removing redundant columns.4. Compute descriptors and use Parallel Coordinates Plot to filter data interactively on multiple properties. (Make sure to keep selection from the View)5. Finally, save the data to TABLE and SDF files.Required extensions: RDKit KNIME Integration Keep Selectedadjust nameschange typesassay_id Molecule Type Cast RDKit Canon SMILES ParallelCoordinates Plot Row Filter Column Filter Column Rename Tile View Column Resorter Color Manager GroupBy Add molecule images Step 4. 1. Compute physchem properties using the RDKit Descriptor Calculation node. 2. Add colors to the data table based on the "assay_id" column by using the Color Manager node. 3. Configure Parallel Coordinates Plot: include lipophilicity and computed physchem properties to be displayed. 4. Adjust the tile view to display the image of the molecule and its LogP_EXP.5. Configure the GroupBy node to display the mean and standard deviation of LogP_EXP as well as the unique number of mol_chemlids for each assay. Display the results in atable view.6. Select the nodes after the "Add molecule images" metanode. Right click on them and select "Create Component". Call it "Visualize molecules with lipophilicity data". Ctrl/Cmd+ Double click on the component to open its content. Adjust the layout by clicking on the corresponding icon in the Toolbar.7. Execute the component and explore the interactive view. In the view select the compounds you are interested in and click apply (apply settings temporarily) and close. Hint: to enable interactivity in views following the GroupBy node you need to enable hilighting in the GroupBy node. Step 1. 1. Read data from different sources by dragingthem from the "data" folder:Lipophilicity_CHEMBL3096849.csv;Lipophilicity_CHEMBL633737.xlsx;Lipophilicity_CHEMBL636806.sdfRemember to extract all properties from SD file.2. Unite/merge them into one data stream usingConcatenate node Step 2. 1. Convert column with SMILES string to SMILES data format. UseMolecule Type Cast node.2. Generate Canonical_SMILES using RDKit Canon SMILES node.Use smiles column as input.3. Remove duplicates with the Duplicate Row Filter node Step 5.1. Filter out unwanted columns such as the Mol_imageand the selection information. Write the resulting table toa SDF file and to a table file.Hint: use corresponding writer nodes. Step 3. 1. Use the Column Filter node to removeredundant columns (keep IDs, structuresand values)2. Use Column Rename to customize thecolumn names (e.g. assay__chemblid toassay_id, value to LogP_EXP andparent_cmpd_chemblid tomol_chemblid) 01_Chemistry_Basics1. Read data from multiple files using corresponding Reader nodes. Find them in Node repository >> IO >> Read. 2. Generate canonical SMILES and remove duplicates. 3. Customize the data by adding column names and removing redundant columns.4. Compute descriptors and use Parallel Coordinates Plot to filter data interactively on multiple properties. (Make sure to keep selection from the View)5. Finally, save the data to TABLE and SDF files.Required extensions: RDKit KNIME Integration Keep Selectedadjust nameschange typesassay_idMolecule Type Cast RDKit Canon SMILES ParallelCoordinates Plot Row Filter Column Filter Column Rename Tile View Column Resorter Color Manager GroupBy Add molecule images

Nodes

Extensions

Links