Target Tractability Information Retrieval 0 ×
This KNIME workflow facilitates comprehensive assessment of biological target tractability by integrating up-to-date data from multiple public resources including UniProt, ChEMBL, PDB, Open Targets, and Human Protein Atlas. Users can input target lists using UniProt accession codes or gene symbols, either via file upload or direct entry.
The workflow retrieves extensive functional and expression information such as protein function, subcellular localization, tissue-specific and disease-relevant expression data, and related disease annotations. This enables evaluation of target relevance in the biological context of interest. For tractability insights, it collects rich druggability data comprising associated mechanisms of action (MOA), linked approved and investigational drugs with detailed activity profiles, and chemical probes validated for target modulation. It further aggregates compound bioactivity data by assay type, supporting ligand-based drug discovery and QSAR modeling. Structure-based tractability is informed through retrieval of experimental 3D structures and ligand information from PDB. Additional small-molecule tractability scores and druggable family classifications come from Open Targets, delivering a multi-dimensional view of target feasibility. Interactive visualizations summarize these data, enabling filtering and detailed exploration of targets’ clinical development status, drug-target interactions, chemical probe availability, compound activity breadth, and structural data quality. The workflow supports export of results for downstream analysis, making it a valuable tool for prioritizing targets to guide resource allocation in drug discovery projects. URL: Assessing Target Tractability: A Hands-On KNIME Workflow Powered by UniProt, ChEMBL, PDB, and Open Targets Data https://medium.com/low-code-for-advanced-data-science/target-tractability-knime-workflow-using-uniprot-chembl-pdb-opentargets-dca70f9fb3f0
Targets retrieval, revision and selection through UniProt data.
If the target ID column is not correctly selected by the user or it's not selected at all, the UniProt Accession or Gene to Target Info component will fail and the uniprot_data_retrieval_problem variable will be set to 1 (otherwise set to 0).
Ligand data: MOAs, indications, drug activity, bioactivities association and chemical probes through ChEMBL
Retrieve the number of distinct compounds with available activity data for each target
Retrieve activities for approved/investigational drugs (possible only for targets with MOA data).
Targets input
Retrieve MOA-indication data
WebPortal page
WebPortal page
WebPortal page
WebPortal
WebPortal
WebPortal
WebPortal
Run mode selector: real vs (pre-loaded) example
EXAMPLE: pre-loaded targets input
WebPortal
Check max dataset size
WebPortal
Target structure data: association through PDB
File output & export
WebPortal
Extract the number and name of chemical probes for targets available in ChEMBL.
Retrieve basic target info from ChEMBL
WebPortal
WebPortal
WebPortal
Target tractability data: small molecule tractability from Open Targets
Distinct Ensembl gene IDs can be associated with the same UniProt ID (e.g., P62269). For these cases, we retrieved target tractability separately from Open Targets and then combined the data uniquely under the same UniProt ID.
UniProt data to export
WebPortal
Check max dataset size
WebPortal
Target Input By Text Editor
Open Targets Target Tractability Retrieval
rename "uniProt accession"as "Uniprot accession"otherwise the automatic col nameadjuster things it's a camelcase
UniProt Target Search Selection
add "(general for target)"suffix
split bottomHPA cols (to be addedonly to csv)
Column Splitter (deprecated)
full outer join
input mols as file
target_input_file_text
String Manipulation (Variable)
This adjust general target col namesfor tagets that are not available in ChEMBL
handle data for targets not in ChEMBL
Selection of Target Representation Column
merge left & righttarget_chemb_id(necessary afterfull outer join)
merge left & righttarget_chemb_id(necessary afterfull outer join)
merge left & righttarget_variant(necessary afterfull outer join)
merge left & righttarget_variant(necessary afterfull outer join)
no target retrievedcase switch start
connect to chembl DB
wf_run_typecase switch end
wf_run_typecase switch start
example file
Selection of Target Representation Column
MOA-indicationempty table switch end
target_input_representation
String Manipulation (Variable)
max 500 targets
Max Input Dataset Size Checker
substitute ";" for ""on multi-rows colummns
String Manipulation (Multi Column)
rows with missing MoAbottom
Row Splitter (deprecated)
select the target_representationcol and rename it astarget_input_representation
adjust target representation table
exclude homologous targets byselecting target_confidence = 9(direct single protein target assigned)
ChEMBL MOA-Indication From Target Retrieval
target_input_representation (index)
join by1. uniprot id2. ensemble gene id
removeduplicates
rename colsas original
re-joinHPA links cols to beexported on csv
ungroup
split bottom rows withmissing target_chembl_id(i.e. targets not in chembl)
Row Splitter (deprecated)
re-group by uniprot_idunique concatenatetractability info
If geneName field is emtpy but geneSynonym is not,it moves geneSynonym content into geneName(leaving geneSynonym null)
adjust content of Gene and Gene synonyms cols
split top activities wherehomologous targets were assigned
Row Splitter (deprecated)
rename asEnsembl gene id
sort by:1. target_chembl_id2. target_variant3. drug_name
ChEMBL Activity From Molecule Retrieval
filter only UniProtand Ensemble gene cols
target_variant
ungroup
group by: target_chembl_id,target_variant and action_typeaggregate: aggregated_activity
extract target_chembl_idfrom uniprot accession
ChEMBL UniProt Accession to Target Info
rename asdrug_actvities
join drug name on aggregated_activity col
split open target tractability
left join drugs activity
split open target tractabilitydescription
full-outer join gene symboland uniprot accession
join on:1. target_chembl_id2. target_variant3. molecule_chembl_id
re-sort cols
split ensembl gene colson new line char ";\n"
select onlyresults cols
left joinavailable target tractability dataon the right (might be empty)
add "(general for target)"suffix to col
uniprot_data_retrievaltry-catch end
Catch Errors (Data Ports)
target input methodswitch start1. from a file2. write or paste in a text editor
target input methodswitch end
target_input_representation
Target Input Representation Selection
resort cols
ChEMBL Chemical Probes from Target Retrieval
validate table and insertmissing for unavailable cols(it happens for targets not in chembl)
compounds_for_activity_type
UniProt Accession or Gene to Target Info
target_representation_for_uniprot_to_target_info_component
filter out variant_idand tid
report targets present inchembl but without MOA data
report targets with no MOA
max 500 targets
Max Input Dataset Size Checker
group by: target_chembl_id,and target_variantaggregate: compounds_for_activity_type
filter only target_chembl_idtarget_variant & chemical_probes cols
select the target_representationcol and rename it astarget_input_representation(duplicate rows will be filtered out)
adjust target representation table
substitute ";" for ""on multi-rows colummns
String Manipulation (Multi Column)
filter only the activity standard typesassociated with the correctactivity standard units
filter only chembltarget general cols
select onlyuniprot_accession col
activity_type_definition
rename it
remove duplicates
replace empty or missingtarget_variant with "WT"
target unavailable in chemblswitch start
from uniprot accessionretrieve PDB structuresfrom PDB web service
PDB UniProt Accession to Structure
filter relevant cols
target unavailable in chemblswitch end(it uses only the 1st non-empty input)
left join availableactivity data
define target_variantfrom mutation
split top only relevantactivity types: AC50, EC50IC50, XC50, Ki, Kd, potencyinhibition
Row Splitter (deprecated)
split top only relevantactivity types: AC50, EC50IC50, XC50, Ki, Kd, potencyinhibition
Row Splitter (deprecated)
concatenate
filter only 2 colstarget_chembl_idtarget_variant
rename chembl_idas target_chembl_id
select only quantitativedata (standard_relation = "=")
Target Druggability Summary
exclude homologous targets byselecting target_confidence = 7(direct single protein complex subunits assigned)
exclude homologous targets byselecting target_confidence = 5(multiple direct protein target assigned - protein family)
validate table and fill upwith missing values
ChEMBL Activity Type Retrieval (Batch)
add Human ProteinAtlas target expression inks
add Human Protein Atlas expression links
left-joinChEMBL data on the right
target_druggability_data
target_functional_expression_data
target_input_method
Target Input Method Selection
selectmolecule_chembl_id,target_chembl_id,target_variant, action_type
removeduplicates
add "(general for target)"suffix
left-joinavailable structural data on the right(might be empty)
download outputfiles
rename molecule_chembl_idas drug_chembl_id
filter out colwith links(used just for visualisationin interactive view pages)
uniprot_data_retrievaltry-catch start
MOA-indicationempty table switch start
re-join chembl target cols
uniprot_target_retrieval_problemswitch start
filter onlyuniprot_accession
replace empty or missingtarget_variant with "WT"
uniprot_data_retrieval_problem
left join
Nodes
Extensions
Links