Icon

Exercise Reaction Enumeration and Prioritization

Identify specific substructures in the products1. Use the Molecular Sketcher component to draw a specific scaffold. Youcan find this component in the EXAMPLES server >> 00_Components >> Life Sciences. If you don't have access to the internet this component is also available in thedata folder of this exercise. Drag and drop it into the workflow. Configure thenode to define the output formats of the molecules (e.g. SDF and SMARTS).Execute the node, open the view, draw a scaffold (e.g. pyridine or imidazole).Click OK.2. Use the RDKit Molecule Substructure Filter node to determine whether theproducts contain the draw scaffold or node. Remember to connect this nodeto the RDKit Generate Coords node and then to configure it. Use the NodeDescription to help you define the settings.3. Label the data. 3.1. Connect the first output of the RDKit Molecule Substructure Filter node tothe Constant Value Column. Execute the last node and explore the output.How did the data change?3.2. To the data from the second output port of the RDKit MoleculeSubstructure Filter node apend a new column "scaffold" and fill it with stringvalues "new".4. Merge the data (outputs from the Constant Filter nodes) with theConcatenate node. Read building blocks in1. Find the node "File Reader(Complex Format)" in the NodeRepository and drag&drop it into theworkflow editor.2. Double click on the node to openits configuration dialogue andconfigure it: use the workflow relativepath, click "Browse" and navigate tothe zinc-substances.smi.gz file in theyour-knime-workspace/2022_06_12_ICCS/data folder.3. Click OK, execute the node (e.g.press F7), and explore the output.3. Connect the node to the givenColumn Rename node. Standardize the chemical structures and filter duplicates1. Cast the column with SMILES to type "SMILES" with the MoleculeType Cast node.Tip: Use the Node Description to help you configure the node. Youcan access it by clicking on the question mark in the configurationdialog of the node or by opeining the menu View >> Description andselecting the node.2. Create a column containing RDKit molecule data type with theRDKit From Molecule node.3. Strip the salts with the RDKit Salt Stripper node.4. Remove the molecules with missing values (missing chemicalstructures) with the Row Filter node.5. Generate Canonical SMILES (RDKit Canon SMILES node) andfilter duplicates with the Duplicate Filter node.6. Connect the output of the Duplicate Filter node to the input of theRDKit Functional Group Filter node.7. Create a metanode to collapse all the standardization and filteringnodes. Select all the nodes (i.e. NOT the RDKit Functional GroupFilter) >> right click on the selection >> Create Metanode Filter building blocks for the reaction1. Explore the settings of the RDKit Functional Group Filter node that isfiltering for aromatic carboxylic acids. Execute the configuration of thenode and explore its output.Optional: Play with the settings of the node and see how it affects theoutput.2. Use another RDKit Functional Group Filter node to filter for moleculeswith one primary aliphatic amine from the catalog of building blocks (aparallel branch to the first RDKit Functional Group Filter node). Make sureto configure the filter to contain a single amine group and no acids.3. Filter the building blocks by molecular weight lower than 200 with twonodes: compute the average molecular weight with the RDKit DescriptorCalculation node and then filter with the Row Filter node.4. Connect the filtered building blocks to the RDKit Two ComponentReaction node. Carboxylic acids go to the top port, Amines to the secondport. Specify the reactionConfigure the RDKit Two ComponentReaction node to randomize thereactants, make the products uniqueand perform a matrix expansion. Alsodefine a maximum number of thereactants to be used (e.g. 200, 500 or1000). Create a component with an interactive view to explore the products1. Add colors to the table based on the information about the scaffold of the molecule,whether it's a new or an old one. To do so, use the Color Manager node.2. Generate a plot to depict all computed physchem properties at once with the ParallelCoordinates Plot node.3. Use the Tile View node to display chemical structures of the compounds. Configurethe node to display 5 molecules per row and use the RowID as the title column.4. Collect the information on how many compounds with the new and old scaffold arepresent in the data set. Do the aggregation of these values with the GroupBy nodewhere you group by the column "scaffold" and aggregate the counts of "Product Index".Visualize this table with the Table View node.5. Connect the Tile View node to the Excel Writer node.6. Create a Component with three interactive visualization elements: ParallelCoordinates Plot, Tile View, and Table View. To do so, select these three nodes, rightclick on the selection and choose "Create Component". You will get a warningmessage about the resetting of the nodes. Accept it and provide a name to thecomponent.7. Execute the component and then Right click on it to open the interactive view.Optional: Configure the layout of the component.Right click on the component >> Component >> OpenSelect the Node and Layout Editor in the toolbar (the icon to the most right)Arrange the positions of the views in the Tab "Composite view layout" Save selectedproductsUse the given ExcelWriter node to savea file with theresults. Optionallyyou can also use anSDF Writer node tosave an SD file. Exercise: Reaction Enumeration and PrioritizationFollow the instructions in the yellow annotation boxes to complete the exercise. Add and configure nodes and connect them to make a single workflow.Required extensions: Chemistry Add-Ons, RDKit KNIME integration Generate images ofthe molecules Generate images ofthe chemicalstructures of theproducts with theRDKit Molecule toSVG node carboxylic acid = 1aromatic ConstantValue Column Concatenate RDKit DescriptorCalculation Excel Writer Column Rename RDKit FunctionalGroup Filter RDKit Two ComponentReaction Clean up theproducts Identify specific substructures in the products1. Use the Molecular Sketcher component to draw a specific scaffold. Youcan find this component in the EXAMPLES server >> 00_Components >> Life Sciences. If you don't have access to the internet this component is also available in thedata folder of this exercise. Drag and drop it into the workflow. Configure thenode to define the output formats of the molecules (e.g. SDF and SMARTS).Execute the node, open the view, draw a scaffold (e.g. pyridine or imidazole).Click OK.2. Use the RDKit Molecule Substructure Filter node to determine whether theproducts contain the draw scaffold or node. Remember to connect this nodeto the RDKit Generate Coords node and then to configure it. Use the NodeDescription to help you define the settings.3. Label the data. 3.1. Connect the first output of the RDKit Molecule Substructure Filter node tothe Constant Value Column. Execute the last node and explore the output.How did the data change?3.2. To the data from the second output port of the RDKit MoleculeSubstructure Filter node apend a new column "scaffold" and fill it with stringvalues "new".4. Merge the data (outputs from the Constant Filter nodes) with theConcatenate node. Read building blocks in1. Find the node "File Reader(Complex Format)" in the NodeRepository and drag&drop it into theworkflow editor.2. Double click on the node to openits configuration dialogue andconfigure it: use the workflow relativepath, click "Browse" and navigate tothe zinc-substances.smi.gz file in theyour-knime-workspace/2022_06_12_ICCS/data folder.3. Click OK, execute the node (e.g.press F7), and explore the output.3. Connect the node to the givenColumn Rename node. Standardize the chemical structures and filter duplicates1. Cast the column with SMILES to type "SMILES" with the MoleculeType Cast node.Tip: Use the Node Description to help you configure the node. Youcan access it by clicking on the question mark in the configurationdialog of the node or by opeining the menu View >> Description andselecting the node.2. Create a column containing RDKit molecule data type with theRDKit From Molecule node.3. Strip the salts with the RDKit Salt Stripper node.4. Remove the molecules with missing values (missing chemicalstructures) with the Row Filter node.5. Generate Canonical SMILES (RDKit Canon SMILES node) andfilter duplicates with the Duplicate Filter node.6. Connect the output of the Duplicate Filter node to the input of theRDKit Functional Group Filter node.7. Create a metanode to collapse all the standardization and filteringnodes. Select all the nodes (i.e. NOT the RDKit Functional GroupFilter) >> right click on the selection >> Create Metanode Filter building blocks for the reaction1. Explore the settings of the RDKit Functional Group Filter node that isfiltering for aromatic carboxylic acids. Execute the configuration of thenode and explore its output.Optional: Play with the settings of the node and see how it affects theoutput.2. Use another RDKit Functional Group Filter node to filter for moleculeswith one primary aliphatic amine from the catalog of building blocks (aparallel branch to the first RDKit Functional Group Filter node). Make sureto configure the filter to contain a single amine group and no acids.3. Filter the building blocks by molecular weight lower than 200 with twonodes: compute the average molecular weight with the RDKit DescriptorCalculation node and then filter with the Row Filter node.4. Connect the filtered building blocks to the RDKit Two ComponentReaction node. Carboxylic acids go to the top port, Amines to the secondport. Specify the reactionConfigure the RDKit Two ComponentReaction node to randomize thereactants, make the products uniqueand perform a matrix expansion. Alsodefine a maximum number of thereactants to be used (e.g. 200, 500 or1000). Create a component with an interactive view to explore the products1. Add colors to the table based on the information about the scaffold of the molecule,whether it's a new or an old one. To do so, use the Color Manager node.2. Generate a plot to depict all computed physchem properties at once with the ParallelCoordinates Plot node.3. Use the Tile View node to display chemical structures of the compounds. Configurethe node to display 5 molecules per row and use the RowID as the title column.4. Collect the information on how many compounds with the new and old scaffold arepresent in the data set. Do the aggregation of these values with the GroupBy nodewhere you group by the column "scaffold" and aggregate the counts of "Product Index".Visualize this table with the Table View node.5. Connect the Tile View node to the Excel Writer node.6. Create a Component with three interactive visualization elements: ParallelCoordinates Plot, Tile View, and Table View. To do so, select these three nodes, rightclick on the selection and choose "Create Component". You will get a warningmessage about the resetting of the nodes. Accept it and provide a name to thecomponent.7. Execute the component and then Right click on it to open the interactive view.Optional: Configure the layout of the component.Right click on the component >> Component >> OpenSelect the Node and Layout Editor in the toolbar (the icon to the most right)Arrange the positions of the views in the Tab "Composite view layout" Save selectedproductsUse the given ExcelWriter node to savea file with theresults. Optionallyyou can also use anSDF Writer node tosave an SD file. Exercise: Reaction Enumeration and PrioritizationFollow the instructions in the yellow annotation boxes to complete the exercise. Add and configure nodes and connect them to make a single workflow.Required extensions: Chemistry Add-Ons, RDKit KNIME integration Generate images ofthe molecules Generate images ofthe chemicalstructures of theproducts with theRDKit Molecule toSVG node carboxylic acid = 1aromatic ConstantValue Column Concatenate RDKit DescriptorCalculation Excel Writer Column Rename RDKit FunctionalGroup Filter RDKit Two ComponentReaction Clean up theproducts

Nodes

Extensions

Links