Target Tractability Information Retrieval 0 ×
This KNIME workflow facilitates comprehensive assessment of biological target tractability by integrating up-to-date data from multiple public resources including UniProt, ChEMBL, PDB, Open Targets, and Human Protein Atlas. Users can input target lists using UniProt accession codes or gene symbols, either via file upload or direct entry.
The workflow retrieves extensive functional and expression information such as protein function, subcellular localization, tissue-specific and disease-relevant expression data, and related disease annotations. This enables evaluation of target relevance in the biological context of interest. For tractability insights, it collects rich druggability data comprising associated mechanisms of action (MOA), linked approved and investigational drugs with detailed activity profiles, and chemical probes validated for target modulation. It further aggregates compound bioactivity data by assay type, supporting ligand-based drug discovery and QSAR modeling. Structure-based tractability is informed through retrieval of experimental 3D structures and ligand information from PDB. Additional small-molecule tractability scores and druggable family classifications come from Open Targets, delivering a multi-dimensional view of target feasibility. Interactive visualizations summarize these data, enabling filtering and detailed exploration of targets’ clinical development status, drug-target interactions, chemical probe availability, compound activity breadth, and structural data quality. The workflow supports export of results for downstream analysis, making it a valuable tool for prioritizing targets to guide resource allocation in drug discovery projects. URL: Assessing Target Tractability: A Hands-On KNIME Workflow Powered by UniProt, ChEMBL, PDB, and Open Targets Data https://medium.com/low-code-for-advanced-data-science/target-tractability-knime-workflow-using-uniprot-chembl-pdb-opentargets-dca70f9fb3f0
Ligand data: MOAs, indications, drug activity, bioactivities association and chemical probes through ChEMBL Retrieve the number of distinct compounds with available activity data for each target Retrieve activities for approved/investigational drugs (possible only for targets with MOA data). Targets input Retrieve MOA-indication data WebPortal page WebPortal page WebPortal page WebPortal WebPortal WebPortal Targets retrieval, revision and selection through UniProt data. WebPortal Run mode selector: real vs (pre-loaded) example EXAMPLE: pre-loaded targets input WebPortal Check max dataset size WebPortal Target structure data: association through PDB File output & export WebPortal Extract the number and name of chemical probes for targets available in ChEMBL. Retrieve basic target info from ChEMBL WebPortal WebPortal WebPortal Target tractability data: small molecule tractability from Open Targets Distinct Ensembl gene IDs can be associated with the same UniProt ID (e.g., P62269). For these cases, we retrieved target tractability separately from Open Targets and then combined the data uniquely under the same UniProt ID. UniProt data to export WebPortal Check max dataset size WebPortal If the target ID column is not correctly selected by the user or it's not selected at all, the UniProt Accession or Gene to Target Info component will fail and the uniprot_data_retrieval_problem variable will be set to 1 (otherwise set to 0). connect to chembl DB from uniprot accession retrieve PDB structures from PDB web service download output files target_input_representation target input method switch start 1. from a file 2. write or paste in a text editor target input method switch end target_ input _method input mols as file target_input_file_ text wf_run_type case switch start wf_run_type case switch end example file target_input_representation target_input_representation (index) select the target_representation col and rename it as target_input_representation substitute ";" for "" on multi-rows colummns select molecule_chembl_id, target_chembl_id, target_variant, action_type remove duplicates join on: 1. target_chembl_id 2. target_variant 3. molecule_chembl_id target_variant split top activities where homologous targets were assigned sort by: 1. target_chembl_id 2. target_variant 3. drug_name left join drugs activity group by: target_chembl_id, target_ variant and action_type aggregate: aggregated_activity rename as drug_actvities remove duplicates max 500 targets exclude homologous targets by selecting target_confidence = 9 (direct single protein target assigned) exclude homologous targets by selecting target_confidence = 7 (direct single protein complex subunits assigned) exclude homologous targets by selecting target_confidence = 5 (multiple direct protein target assigned - protein family) concatenate select only quantitative data (standard_relation = "=") remove duplicates filter relevant cols filter only the activity standard types associated with the correct activity standard units activity_type_definition left join available activity data define target_variant from mutation resort cols filter out variant_id and tid group by: target_chembl_id, and target_ variant aggregate: compounds_for_activity_type compounds_for_activity_type add " (general for target) " suffix to col MOA-indication empty table switch start filter only uniprot_accession left-join available structural data on the right (might be empty) add " (general for target) " suffix target_druggability_data target _functional_expression_data left-join ChEMBL data on the right This adjust general target col names for tagets that are not available in ChEMBL rename "uniProt accession" as "Uniprot accession" otherwise the automatic col name adjuster things it's a camelcase If geneName field is emtpy but geneSynonym is not, it moves geneSynonym content into geneName (leaving geneSynonym null ) full-outer join gene symbol and uniprot accession re-sort cols extract target_chembl_id from uniprot accession split bottom rows with missing target_chembl_id (i.e. targets not in chembl) rows with missing MoA bottom MOA-indication empty table switch end validate table and fill up with missing values filter only 2 cols target_chembl_id target_variant replace empty or missing target_variant with "WT" filter only chembl target general cols report targets present in chembl but without MOA data filter only target_chembl_id target_variant & chemical_probes cols replace empty or missing target_variant with "WT" left join full outer join merge left & right target_chemb_id (necessary after full outer join) merge left & right target_ variant (necessary after full outer join) merge left & right target_chemb_id (necessary after full outer join) merge left & right target_ variant (necessary after full outer join) no target retrieved case switch start rename molecule_chembl_id as drug_chembl_id filter out col with links (used just for visualisation in interactive view pages) add Human Protein Atlas target expression inks split bottom HPA cols (to be added only to csv) add " (general for target) " suffix select only results cols left join available target tractability data on the right (might be empty) split ensembl gene cols on new line char ";\n" filter only UniProt and Ensemble gene cols ungroup rename as Ensembl gene id split open target tractability split open target tractability description join by 1. uniprot id 2. ensemble gene id ungroup re-group by uniprot_id unique concatenate tractability info rename cols as original re-join HPA links cols to be exported on csv substitute ";" for "" on multi-rows colummns split top only relevant activity types: AC50, EC50 IC50, XC50, Ki, Kd, potency inhibition split top only relevant activity types: AC50, EC50 IC50, XC50, Ki, Kd, potency inhibition rename chembl_id as target_chembl_id target unavailable in chembl switch start target unavailable in chembl switch end (it uses only the 1st non-empty input) select only uniprot_accession col rename it validate table and insert missing for unavailable cols (it happens for targets not in chembl) max 500 targets select the target_representation col and rename it as target_input_representation (duplicate rows will be filtered out) target_representation_for_ uniprot_to_target_info_component uniprot_data_retrieval try-catch end uniprot_target_retrieval_problem switch start uniprot_data_retrieval try-catch start uniprot_data_retrieval_problem ChEMBL DB Connection PDB UniProt Accession to Structure Download Output Files Target Input Representation Selection CASE Switch Start CASE Switch End Target Input Method Selection No Target Selected Temp File Path Generator File Upload String Manipulation (Variable) Selection of Target Representation Column Target Input By Text Editor Empty Table Switch Standardize Column Names CASE Switch Start CASE Switch End WF Run Type CSV Reader Selection of Target Representation Column String Manipulation (Variable) Math Formula (Variable) adjust target representation table String Manipulation (Multi Column) Column Filter Duplicate Row Filter Joiner ChEMBL Activity From Molecule Retrieval Rule Engine Row Splitter Sorter join drug name on aggregated_activity col Joiner GroupBy Column Renamer Duplicate Row Filter Max Input Dataset Size Checker Row Filter Row Filter Row Filter Concatenate Row Filter Duplicate Row Filter Column Filter Joiner Column Renamer Joiner Rule Engine Column Resorter Column Filter GroupBy String Manipulation Column Renamer Empty Table Switch Column Filter Joiner Column Rename (Regex) CSV Writer CSV Writer Joiner handle data for targets not in ChEMBL Column Renamer adjust content of Gene and Gene synonyms cols Joiner Column Resorter Workflow intro ChEMBL UniProt Accession to Target Info Row Splitter Row Splitter CASE Switch End Table Validator Column Filter Rule Engine Column Filter report targets with no MOA Column Filter re-join chembl target cols Rule Engine Joiner Target Type Selection Joiner Column Merger Column Merger Column Merger Column Merger Empty Table Switch No Target Retrieved Workflow outro Column Renamer Column Filter add Human Protein Atlas expression links UniProt Target Search Selection Open Targets Target Tractability Retrieval Column Splitter Column Rename (Regex) Column Filter Joiner Cell Splitter Column Filter Ungroup Column Renamer Cell Splitter Cell Splitter Joiner Ungroup GroupBy Column Renamer Joiner String Manipulation (Multi Column) ChEMBL MOA-Indication From Target Retrieval Row Splitter Row Splitter ChEMBL Activity Type Retrieval (Batch) Column Renamer Target Druggability Summary Empty Table Switch CASE Switch End Column Filter Column Renamer Table Validator ChEMBL Chemical Probes from Target Retrieval Max Input Dataset Size Checker adjust target representation table UniProt Accession or Gene to Target Info Rule Engine Variable Catch Errors (Data Ports) CASE Switch Start Try (Data Ports) Rule Engine Variable Wrong Target ID column Ligand data: MOAs, indications, drug activity, bioactivities association and chemical probes through ChEMBL Retrieve the number of distinct compounds with available activity data for each target Retrieve activities for approved/investigational drugs (possible only for targets with MOA data). Targets input Retrieve MOA-indication data WebPortal page WebPortal page WebPortal page WebPortal WebPortal WebPortal Targets retrieval, revision and selection through UniProt data. WebPortal Run mode selector: real vs (pre-loaded) example EXAMPLE: pre-loaded targets input WebPortal Check max dataset size WebPortal Target structure data: association through PDB File output & export WebPortal Extract the number and name of chemical probes for targets available in ChEMBL. Retrieve basic target info from ChEMBL WebPortal WebPortal WebPortal Target tractability data: small molecule tractability from Open Targets Distinct Ensembl gene IDs can be associated with the same UniProt ID (e.g., P62269). For these cases, we retrieved target tractability separately from Open Targets and then combined the data uniquely under the same UniProt ID. UniProt data to export WebPortal Check max dataset size WebPortal If the target ID column is not correctly selected by the user or it's not selected at all, the UniProt Accession or Gene to Target Info component will fail and the uniprot_data_retrieval_problem variable will be set to 1 (otherwise set to 0). connect to chembl DB from uniprot accession retrieve PDB structures from PDB web service download output files target_input_representation target input method switch start 1. from a file 2. write or paste in a text editor target input method switch end target_ input _method input mols as file target_input_file_ text wf_run_type case switch start wf_run_type case switch end example file target_input_representation target_input_representation (index) select the target_representation col and rename it as target_input_representation substitute ";" for "" on multi-rows colummns select molecule_chembl_id, target_chembl_id, target_variant, action_type remove duplicates join on: 1. target_chembl_id 2. target_variant 3. molecule_chembl_id target_variant split top activities where homologous targets were assigned sort by: 1. target_chembl_id 2. target_variant 3. drug_name left join drugs activity group by: target_chembl_id, target_ variant and action_type aggregate: aggregated_activity rename as drug_actvities remove duplicates max 500 targets exclude homologous targets by selecting target_confidence = 9 (direct single protein target assigned) exclude homologous targets by selecting target_confidence = 7 (direct single protein complex subunits assigned) exclude homologous targets by selecting target_confidence = 5 (multiple direct protein target assigned - protein family) concatenate select only quantitative data (standard_relation = "=") remove duplicates filter relevant cols filter only the activity standard types associated with the correct activity standard units activity_type_definition left join available activity data define target_variant from mutation resort cols filter out variant_id and tid group by: target_chembl_id, and target_ variant aggregate: compounds_for_activity_type compounds_for_activity_type add " (general for target) " suffix to col MOA-indication empty table switch start filter only uniprot_accession left-join available structural data on the right (might be empty) add " (general for target) " suffix target_druggability_data target _functional_expression_data left-join ChEMBL data on the right This adjust general target col names for tagets that are not available in ChEMBL rename "uniProt accession" as "Uniprot accession" otherwise the automatic col name adjuster things it's a camelcase If geneName field is emtpy but geneSynonym is not, it moves geneSynonym content into geneName (leaving geneSynonym null ) full-outer join gene symbol and uniprot accession re-sort cols extract target_chembl_id from uniprot accession split bottom rows with missing target_chembl_id (i.e. targets not in chembl) rows with missing MoA bottom MOA-indication empty table switch end validate table and fill up with missing values filter only 2 cols target_chembl_id target_variant replace empty or missing target_variant with "WT" filter only chembl target general cols report targets present in chembl but without MOA data filter only target_chembl_id target_variant & chemical_probes cols replace empty or missing target_variant with "WT" left join full outer join merge left & right target_chemb_id (necessary after full outer join) merge left & right target_ variant (necessary after full outer join) merge left & right target_chemb_id (necessary after full outer join) merge left & right target_ variant (necessary after full outer join) no target retrieved case switch start rename molecule_chembl_id as drug_chembl_id filter out col with links (used just for visualisation in interactive view pages) add Human Protein Atlas target expression inks split bottom HPA cols (to be added only to csv) add " (general for target) " suffix select only results cols left join available target tractability data on the right (might be empty) split ensembl gene cols on new line char ";\n" filter only UniProt and Ensemble gene cols ungroup rename as Ensembl gene id split open target tractability split open target tractability description join by 1. uniprot id 2. ensemble gene id ungroup re-group by uniprot_id unique concatenate tractability info rename cols as original re-join HPA links cols to be exported on csv substitute ";" for "" on multi-rows colummns split top only relevant activity types: AC50, EC50 IC50, XC50, Ki, Kd, potency inhibition split top only relevant activity types: AC50, EC50 IC50, XC50, Ki, Kd, potency inhibition rename chembl_id as target_chembl_id target unavailable in chembl switch start target unavailable in chembl switch end (it uses only the 1st non-empty input) select only uniprot_accession col rename it validate table and insert missing for unavailable cols (it happens for targets not in chembl) max 500 targets select the target_representation col and rename it as target_input_representation (duplicate rows will be filtered out) target_representation_for_ uniprot_to_target_info_component uniprot_data_retrieval try-catch end uniprot_target_retrieval_problem switch start uniprot_data_retrieval try-catch start uniprot_data_retrieval_problem ChEMBL DB Connection PDB UniProt Accession to Structure Download Output Files Target Input Representation Selection CASE Switch Start CASE Switch End Target Input Method Selection No Target Selected Temp File Path Generator File Upload String Manipulation (Variable) Selection of Target Representation Column Target Input By Text Editor Empty Table Switch Standardize Column Names CASE Switch Start CASE Switch End WF Run Type CSV Reader Selection of Target Representation Column String Manipulation (Variable) Math Formula (Variable) adjust target representation table String Manipulation (Multi Column) Column Filter Duplicate Row Filter Joiner ChEMBL Activity From Molecule Retrieval Rule Engine Row Splitter Sorter join drug name on aggregated_activity col Joiner GroupBy Column Renamer Duplicate Row Filter Max Input Dataset Size Checker Row Filter Row Filter Row Filter Concatenate Row Filter Duplicate Row Filter Column Filter Joiner Column Renamer Joiner Rule Engine Column Resorter Column Filter GroupBy String Manipulation Column Renamer Empty Table Switch Column Filter Joiner Column Rename (Regex) CSV Writer CSV Writer Joiner handle data for targets not in ChEMBL Column Renamer adjust content of Gene and Gene synonyms cols Joiner Column Resorter Workflow intro ChEMBL UniProt Accession to Target Info Row Splitter Row Splitter CASE Switch End Table Validator Column Filter Rule Engine Column Filter report targets with no MOA Column Filter re-join chembl target cols Rule Engine Joiner Target Type Selection Joiner Column Merger Column Merger Column Merger Column Merger Empty Table Switch No Target Retrieved Workflow outro Column Renamer Column Filter add Human Protein Atlas expression links UniProt Target Search Selection Open Targets Target Tractability Retrieval Column Splitter Column Rename (Regex) Column Filter Joiner Cell Splitter Column Filter Ungroup Column Renamer Cell Splitter Cell Splitter Joiner Ungroup GroupBy Column Renamer Joiner String Manipulation (Multi Column) ChEMBL MOA-Indication From Target Retrieval Row Splitter Row Splitter ChEMBL Activity Type Retrieval (Batch) Column Renamer Target Druggability Summary Empty Table Switch CASE Switch End Column Filter Column Renamer Table Validator ChEMBL Chemical Probes from Target Retrieval Max Input Dataset Size Checker adjust target representation table UniProt Accession or Gene to Target Info Rule Engine Variable Catch Errors (Data Ports) CASE Switch Start Try (Data Ports) Rule Engine Variable Wrong Target ID column
Nodes
Extensions
Links