Icon

02. Data Manipulation

Data Manipulation

"Data Manipulation" exercise for basic Life Science User Training
- Concatenate data from two different sources
- Modify String values
- Join data from multiple tables
- Remove duplicates in the data

URL: Advaced ETL Functionalities and Machine Learning based Pre-Processing https://youtu.be/IEAsUTN8q68
URL: Joining Data Tables https://youtu.be/6BigLM6vbhs
URL: Joining Data Tables - Inner Join https://youtu.be/9uV99ByH-TA
URL: Concatenate https://youtu.be/VzH2lHbDAg0
URL: Concatenate Node https://youtu.be/ku6SyEZ1Pv8
URL: Data Manipulation: Numbers, Strings, and Rules https://youtu.be/mJrBXmLQ4ko

Activity I: Filtering- Remove rows where column Pf3D7_pEC50 contains missing values- Use Row Filter node to keep rows with values higher than 150 in column Pf3D7_ps_red - Remove column Pf3D7_ps_green from the result - Use String Manipulation to ensurethat all entries of the Samplecolumn are using UPPERCASEletters. Activity II: Data Manipulation & Aggregation - Join all data together by Sample name - Concatenatemalariahts_experiment hits andno-hits data into one single table - Join molecule properties (fromSDF Reader) with other features (from Table Reader) by Samplename - Using the Table Writer node Write the joineddata to a KNIME table file with a namedmalariahts_joined.tableHint: use a relative file path: knime://knime.workflow/../../data/malariahts_joined.table for output location Activity III: Data Manipulation (Optional) - Use the Rule Engine node to add the following tags in a new column named REOS: -- "MW" if AMW is smaller than 100 or greater than 700 -- "Complexity" if NumHeavyAtoms is smaller than 5 or greater than 50 or NumRotatableBonds is greater or equal to 12 -- "HBond" if NumHBD is greater than 5 or NumHBA is greater than 10 -- "logP" if SlogP is smaller -5 or greater than 7.5 -- "Pass" for all other cases - Keep only the following columns: Sample, Pf3D7_ps_hit, AMW, NumRotatableBonds, NumHBD, NumHBA, NumHeavyAtoms, FractionCSP3, MFP2, REOS - Write the results to a file using the Table Writer node Hint: use a relative file path: knime://knime.workflow/../../data/malariahts_joined_REOS.table for output location malariahts_molecules.sdfmalariahts_experiment_hits.csvmalariahts_experiment_hits.csvmalariahts_experiment_no-hits.xlsxmalariahts_molecules_feature.sdfmalariahts_joined.table SDF Reader File Reader File Reader Excel Reader Table Reader Table Reader Activity I: Filtering- Remove rows where column Pf3D7_pEC50 contains missing values- Use Row Filter node to keep rows with values higher than 150 in column Pf3D7_ps_red - Remove column Pf3D7_ps_green from the result - Use String Manipulation to ensurethat all entries of the Samplecolumn are using UPPERCASEletters. Activity II: Data Manipulation & Aggregation - Join all data together by Sample name - Concatenatemalariahts_experiment hits andno-hits data into one single table - Join molecule properties (fromSDF Reader) with other features (from Table Reader) by Samplename - Using the Table Writer node Write the joineddata to a KNIME table file with a namedmalariahts_joined.tableHint: use a relative file path: knime://knime.workflow/../../data/malariahts_joined.table for output location Activity III: Data Manipulation (Optional) - Use the Rule Engine node to add the following tags in a new column named REOS: -- "MW" if AMW is smaller than 100 or greater than 700 -- "Complexity" if NumHeavyAtoms is smaller than 5 or greater than 50 or NumRotatableBonds is greater or equal to 12 -- "HBond" if NumHBD is greater than 5 or NumHBA is greater than 10 -- "logP" if SlogP is smaller -5 or greater than 7.5 -- "Pass" for all other cases - Keep only the following columns: Sample, Pf3D7_ps_hit, AMW, NumRotatableBonds, NumHBD, NumHBA, NumHeavyAtoms, FractionCSP3, MFP2, REOS - Write the results to a file using the Table Writer node Hint: use a relative file path: knime://knime.workflow/../../data/malariahts_joined_REOS.table for output location malariahts_molecules.sdfmalariahts_experiment_hits.csvmalariahts_experiment_hits.csvmalariahts_experiment_no-hits.xlsxmalariahts_molecules_feature.sdfmalariahts_joined.table SDF Reader File Reader File Reader Excel Reader Table Reader Table Reader

Nodes

Extensions

Links