Icon

01_​Data Access and Transformation

01_Data Access and TransformationThis workflow fetches data from a database and does some preprocessing so that the data can be used for machine learning in the next exercise. This includes the creation of the target column("activity") containing the classification, and the generation of the 5 different fingerprints. 9. Use 3 more RDKit Fingerprint nodes to create the ECFP4, theAtomPair and the RDKit fingerprints. ECFP4: 2048 bitsAtomPair: 1024 bitsRDKit: 1024 bits 1. Use the DB Tableselector node to selectthe assay data table 2. Use the DB RowFilter node to filter forrows that have"CHEMBL214" astarget_chembl_id 3. Use the DBJoiner node toperform an innerjoin based on themolregno 4. As we don't needthe molregnocolumn anymore,use the DB ColumnFilter to exclude it 5. Use the DBReader node toeventually read inthe data table intoKNIME 6. Use the String toNumber node toconvert the"standard_value" and"pchembl_value"columns to double 7. Use the Molecule Type Cast node to convert the column "canonical_smiles" to the data type SMILES, then the RDKitFrom Molecule node to make it an RDKit molecule column andthen the Renderer to Image node to render the moelcules andcreate png images 8. Use the Rule Enginenode to create theactivity target column(set all values lowerthan 10 nM in thestandard value to'active' and all others to'inactive') 10. Use the TableWriter node to savethe preprocessedmolecules with thefingerprints to a file compound datapivot the data to have assay data in columns and compounds in rowsECFP6ECFC6Column Rename(Regex) SQLite Connector DB Table Selector DB Pivot Column Resorter RDKit Fingerprint RDKit Count-BasedFingerprint 01_Data Access and TransformationThis workflow fetches data from a database and does some preprocessing so that the data can be used for machine learning in the next exercise. This includes the creation of the target column("activity") containing the classification, and the generation of the 5 different fingerprints. 9. Use 3 more RDKit Fingerprint nodes to create the ECFP4, theAtomPair and the RDKit fingerprints. ECFP4: 2048 bitsAtomPair: 1024 bitsRDKit: 1024 bits 1. Use the DB Tableselector node to selectthe assay data table 2. Use the DB RowFilter node to filter forrows that have"CHEMBL214" astarget_chembl_id 3. Use the DBJoiner node toperform an innerjoin based on themolregno 4. As we don't needthe molregnocolumn anymore,use the DB ColumnFilter to exclude it 5. Use the DBReader node toeventually read inthe data table intoKNIME 6. Use the String toNumber node toconvert the"standard_value" and"pchembl_value"columns to double 7. Use the Molecule Type Cast node to convert the column "canonical_smiles" to the data type SMILES, then the RDKitFrom Molecule node to make it an RDKit molecule column andthen the Renderer to Image node to render the moelcules andcreate png images 8. Use the Rule Enginenode to create theactivity target column(set all values lowerthan 10 nM in thestandard value to'active' and all others to'inactive') 10. Use the TableWriter node to savethe preprocessedmolecules with thefingerprints to a file compound datapivot the data to have assay data in columns and compounds in rowsECFP6ECFC6Column Rename(Regex) SQLite Connector DB Table Selector DB Pivot Column Resorter RDKit Fingerprint RDKit Count-BasedFingerprint

Nodes

Extensions

Links