Icon

01_​Exercise

Data Manipulation

Solution to "Manipulation Data" exercise for basic Life Science User Training
- Concatenate data from two different sources
- Modify String values
- Join data from multiple tables
- Remove duplicates in the data






19. Select all nodes in this box, right-click on one of them and create a component. Explore and selectsome data for export. REOS rules-- "MW" if AMW is smaller than 100 or greater than 700-- "Complexity" if NumHeavyAtoms is smaller than 5 or greater than 50 or NumRotatableBonds is greater or equal to 12-- "HBond" if NumHBD is greater than 5 or NumHBA is greater than 10-- "logP" if SlogP is smaller -5 or greater than 7.5-- "Pass" for all other cases 3. Use the SDF Reader to read in themolecules and their properties. Set knime://knime.workflow/../data/malariahts_molecules.sdf as path. Extract the Mol Block. Rememberto extract the properties in the PropertyHandling tab. 4. Use the Table Reader to import themalariahts_molecules_feature.table fromthe data folder. Use a workflow relativepath to navigate to the table. 2. Use the Excel Reader to import themalariahts_experiment_no-hits.xlsx fromthe data folder. TIP: just drag and drop thefile from the KNIME Explorer on the leftside 5. Filter out the rows with missingvalues in the Pf3D7_pEC50 columnusing the Missing Value node. Chose"Remove Row" in the Column SettingsTab. 6. Concatenate the data from theMissing Value node and the ExcelReader. Notice the data types of allcolumns. Connect the output of theConcatenate node to the ColumnExpressions node. 7. This node is pre-configured.Just execute it and inspect theoutput table. Look at the Sample column. Is there a inconsistencyyou notice? 8. Use the String Manipulationnode to upper case all sampleIDs. Try to assemble theexpression yourself by selectingthe method "upperCase" anddouble-click on it, and double-clickon the column "Sample" to insert itinto the brackets. In case thatdoesn't work, copy-paste upperCase($Sample$)into the expression field. 9. Use the Joiner node to join thetables from the SDF Reader andthe Table Reader. Add a matchingcriterion and join based on theSample column. Stay with theinner join. Exclude the samplecolumn from the bottom input tablein the Column Selection tab toavoid having it twice. 13. This Joiner node is pre-configured and joins the tablewith the assay data with thetable containing the moleculedata 10. Let's clean up the data tableusing the Table Manipulator node. - exclude the ExactMW column- rename theAverageMolecularWeight columnto AMW- change the data type ofNumRotatableBond andNumHeavyAtoms from String toInteger 11. This node is pre-configuredand will create a new column thisREOS rule tags based on the rulesstated above. 12. Exclude all the rows that areduplicate using the Duplicate RowFIlter node. Detect duplicatesbased on the columns Sampleand Mol Block. 14. Use the Row Filter or RowSplitter node to exclude rows withmissing values in the columnPf3D7_pEC50. The Row Splitterallows you to inspect the excludeddata, the Row Filter doesn't. Makesure to select "Exclude rows byattribute value" 15. Use the Column Filter node toexclude the columnNumSaturatedRings. 1. This node is pre-configured. Justexecute it and inspect the output table. EXERCISETry to follow the instructions in each box. Bold words give you a hint on which node to use. In case you get lost, have a sneak peek to the 01_MalariaHTS_DataManipulation_Visualization solution workflow. Tip: Check and familiarize yourself with the output table of each node, to see how the data changed in comparison to theprevious step and to see how it still deviates from how you want it to look like. 16. This two nodes are pre-configured, justexecute them. Check the output table of the RDKitMolecule to SVG node. The molecular structuresare now rendered to images, so we can usethem in the interactive view. 18. Visualize the data using theScatter Plot node. - select the axes/columns to plot inthe options tab (green vs red)- give a title in the General PlotOptions tab- make sure it interacts with otherview nodes in the View Controlstab under Selection 20. Define locations youwant to export the data to. 17. Assign colours to thePf3D7_ps_hit column using theColor Manager node. Connect itsinput port to the RDKit Molecule toSVG node. Keep SelectedKNIME-native formatmalariahts_experiment_hits.csvcreateREOS rulescorrect column typeexclude "selected" column Row Filter Tile View Column Resorter GroupBy Table View Table Writer CSV Reader Rule Engine Column Expressions Joiner RDKit From Molecule RDKit Moleculeto SVG ParallelCoordinates Plot Color Manager Column Rename SDF Writer Excel Writer Column Filter 19. Select all nodes in this box, right-click on one of them and create a component. Explore and selectsome data for export. REOS rules-- "MW" if AMW is smaller than 100 or greater than 700-- "Complexity" if NumHeavyAtoms is smaller than 5 or greater than 50 or NumRotatableBonds is greater or equal to 12-- "HBond" if NumHBD is greater than 5 or NumHBA is greater than 10-- "logP" if SlogP is smaller -5 or greater than 7.5-- "Pass" for all other cases 3. Use the SDF Reader to read in themolecules and their properties. Set knime://knime.workflow/../data/malariahts_molecules.sdf as path. Extract the Mol Block. Rememberto extract the properties in the PropertyHandling tab. 4. Use the Table Reader to import themalariahts_molecules_feature.table fromthe data folder. Use a workflow relativepath to navigate to the table. 2. Use the Excel Reader to import themalariahts_experiment_no-hits.xlsx fromthe data folder. TIP: just drag and drop thefile from the KNIME Explorer on the leftside 5. Filter out the rows with missingvalues in the Pf3D7_pEC50 columnusing the Missing Value node. Chose"Remove Row" in the Column SettingsTab. 6. Concatenate the data from theMissing Value node and the ExcelReader. Notice the data types of allcolumns. Connect the output of theConcatenate node to the ColumnExpressions node. 7. This node is pre-configured.Just execute it and inspect theoutput table. Look at the Sample column. Is there a inconsistencyyou notice? 8. Use the String Manipulationnode to upper case all sampleIDs. Try to assemble theexpression yourself by selectingthe method "upperCase" anddouble-click on it, and double-clickon the column "Sample" to insert itinto the brackets. In case thatdoesn't work, copy-paste upperCase($Sample$)into the expression field. 9. Use the Joiner node to join thetables from the SDF Reader andthe Table Reader. Add a matchingcriterion and join based on theSample column. Stay with theinner join. Exclude the samplecolumn from the bottom input tablein the Column Selection tab toavoid having it twice. 13. This Joiner node is pre-configured and joins the tablewith the assay data with thetable containing the moleculedata 10. Let's clean up the data tableusing the Table Manipulator node. - exclude the ExactMW column- rename theAverageMolecularWeight columnto AMW- change the data type ofNumRotatableBond andNumHeavyAtoms from String toInteger 11. This node is pre-configuredand will create a new column thisREOS rule tags based on the rulesstated above. 12. Exclude all the rows that areduplicate using the Duplicate RowFIlter node. Detect duplicatesbased on the columns Sampleand Mol Block. 14. Use the Row Filter or RowSplitter node to exclude rows withmissing values in the columnPf3D7_pEC50. The Row Splitterallows you to inspect the excludeddata, the Row Filter doesn't. Makesure to select "Exclude rows byattribute value" 15. Use the Column Filter node toexclude the columnNumSaturatedRings. 1. This node is pre-configured. Justexecute it and inspect the output table. EXERCISETry to follow the instructions in each box. Bold words give you a hint on which node to use. In case you get lost, have a sneak peek to the 01_MalariaHTS_DataManipulation_Visualization solution workflow. Tip: Check and familiarize yourself with the output table of each node, to see how the data changed in comparison to theprevious step and to see how it still deviates from how you want it to look like. 16. This two nodes are pre-configured, justexecute them. Check the output table of the RDKitMolecule to SVG node. The molecular structuresare now rendered to images, so we can usethem in the interactive view. 18. Visualize the data using theScatter Plot node. - select the axes/columns to plot inthe options tab (green vs red)- give a title in the General PlotOptions tab- make sure it interacts with otherview nodes in the View Controlstab under Selection 20. Define locations youwant to export the data to. 17. Assign colours to thePf3D7_ps_hit column using theColor Manager node. Connect itsinput port to the RDKit Molecule toSVG node. Keep SelectedKNIME-native formatmalariahts_experiment_hits.csvcreateREOS rulescorrect column typeexclude "selected" column Row Filter Tile View Column Resorter GroupBy Table View Table Writer CSV Reader Rule Engine Column Expressions Joiner RDKit From Molecule RDKit Moleculeto SVG ParallelCoordinates Plot Color Manager Column Rename SDF Writer Excel Writer Column Filter

Nodes

Extensions

Links