Icon

Target Tractability Information Retrieval

This KNIME workflow facilitates comprehensive assessment of biological target tractability by integrating up-to-date data from multiple public resources including UniProt, ChEMBL, PDB, Open Targets, and Human Protein Atlas. Users can input target lists using UniProt accession codes or gene symbols, either via file upload or direct entry.

The workflow retrieves extensive functional and expression information such as protein function, subcellular localization, tissue-specific and disease-relevant expression data, and related disease annotations. This enables evaluation of target relevance in the biological context of interest.

For tractability insights, it collects rich druggability data comprising associated mechanisms of action (MOA), linked approved and investigational drugs with detailed activity profiles, and chemical probes validated for target modulation. It further aggregates compound bioactivity data by assay type, supporting ligand-based drug discovery and QSAR modeling.

Structure-based tractability is informed through retrieval of experimental 3D structures and ligand information from PDB. Additional small-molecule tractability scores and druggable family classifications come from Open Targets, delivering a multi-dimensional view of target feasibility.

Interactive visualizations summarize these data, enabling filtering and detailed exploration of targets’ clinical development status, drug-target interactions, chemical probe availability, compound activity breadth, and structural data quality. The workflow supports export of results for downstream analysis, making it a valuable tool for prioritizing targets to guide resource allocation in drug discovery projects.

URL: Assessing Target Tractability: A Hands-On KNIME Workflow Powered by UniProt, ChEMBL, PDB, and Open Targets Data https://medium.com/low-code-for-advanced-data-science/target-tractability-knime-workflow-using-uniprot-chembl-pdb-opentargets-dca70f9fb3f0

Targets retrieval, revision and selection through UniProt data.
If the target ID column is not correctly selected by the user or it's not selected at all, the UniProt Accession or Gene to Target Info component will fail and the uniprot_data_retrieval_problem variable will be set to 1 (otherwise set to 0).
Ligand data: MOAs, indications, drug activity, bioactivities association and chemical probes through ChEMBL
Retrieve the number of distinct compounds with available activity data for each target
Retrieve activities for approved/investigational drugs (possible only for targets with MOA data).
Targets input
Retrieve MOA-indication data
WebPortal page
WebPortal page
WebPortal page
WebPortal
WebPortal
WebPortal
WebPortal
Run mode selector: real vs (pre-loaded) example
EXAMPLE: pre-loaded targets input
WebPortal
Check max dataset size
WebPortal
Target structure data: association through PDB
File output & export
WebPortal
Extract the number and name of chemical probes for targets available in ChEMBL.
Retrieve basic target info from ChEMBL
WebPortal
WebPortal
WebPortal
Target tractability data: small molecule tractability from Open Targets
Distinct Ensembl gene IDs can be associated with the same UniProt ID (e.g., P62269). For these cases, we retrieved target tractability separately from Open Targets and then combined the data uniquely under the same UniProt ID.
UniProt data to export
WebPortal
Check max dataset size
WebPortal
Target Input By Text Editor
Open Targets Target Tractability Retrieval
rename "uniProt accession"as "Uniprot accession"otherwise the automatic col nameadjuster things it's a camelcase
Column Renamer
UniProt Target Search Selection
add "(general for target)"suffix
Column Name Replacer
split bottomHPA cols (to be addedonly to csv)
Column Splitter (deprecated)
full outer join
Joiner
input mols as file
File Upload
target_input_file_text
String Manipulation (Variable)
This adjust general target col namesfor tagets that are not available in ChEMBL
handle data for targets not in ChEMBL
Selection of Target Representation Column
merge left & righttarget_chemb_id(necessary afterfull outer join)
Column Merger
merge left & righttarget_chemb_id(necessary afterfull outer join)
Column Merger
merge left & righttarget_variant(necessary afterfull outer join)
Column Merger
No Target Retrieved
Workflow outro
merge left & righttarget_variant(necessary afterfull outer join)
Column Merger
no target retrievedcase switch start
Empty Table Switch
connect to chembl DB
ChEMBL DB Connection
wf_run_typecase switch end
CASE Switch End
WF Run Type
wf_run_typecase switch start
CASE Switch Start
example file
CSV Reader
Selection of Target Representation Column
MOA-indicationempty table switch end
CASE Switch End
target_input_representation
String Manipulation (Variable)
max 500 targets
Max Input Dataset Size Checker
substitute ";" for ""on multi-rows colummns
String Manipulation (Multi Column)
rows with missing MoAbottom
Row Splitter (deprecated)
Temp File Path Generator
select the target_representationcol and rename it astarget_input_representation
adjust target representation table
exclude homologous targets byselecting target_confidence = 9(direct single protein target assigned)
Row Filter (deprecated)
ChEMBL MOA-Indication From Target Retrieval
target_input_representation (index)
Math Formula (Variable)
join by1. uniprot id2. ensemble gene id
Joiner
removeduplicates
Duplicate Row Filter
rename colsas original
Column Renamer
re-joinHPA links cols to beexported on csv
Joiner
ungroup
Ungroup
split bottom rows withmissing target_chembl_id(i.e. targets not in chembl)
Row Splitter (deprecated)
re-group by uniprot_idunique concatenatetractability info
GroupBy
If geneName field is emtpy but geneSynonym is not,it moves geneSynonym content into geneName(leaving geneSynonym null)
adjust content of Gene and Gene synonyms cols
Empty Table Switch
split top activities wherehomologous targets were assigned
Row Splitter (deprecated)
rename asEnsembl gene id
Column Renamer
sort by:1. target_chembl_id2. target_variant3. drug_name
Sorter
ChEMBL Activity From Molecule Retrieval
filter only UniProtand Ensemble gene cols
Column Filter
target_variant
Rule Engine
Workflow intro
ungroup
Ungroup
group by: target_chembl_id,target_variant and action_typeaggregate: aggregated_activity
GroupBy
extract target_chembl_idfrom uniprot accession
ChEMBL UniProt Accession to Target Info
No Target Selected
rename asdrug_actvities
Column Renamer
join drug name on aggregated_activity col
split open target tractability
Cell Splitter
Standardize Column Names
left join drugs activity
Joiner
split open target tractabilitydescription
Cell Splitter
full-outer join gene symboland uniprot accession
Joiner
join on:1. target_chembl_id2. target_variant3. molecule_chembl_id
Joiner
re-sort cols
Column Resorter
split ensembl gene colson new line char ";\n"
Cell Splitter
select onlyresults cols
Column Filter
left joinavailable target tractability dataon the right (might be empty)
Joiner
add "(general for target)"suffix to col
Column Renamer
uniprot_data_retrievaltry-catch end
Catch Errors (Data Ports)
target input methodswitch start1. from a file2. write or paste in a text editor
CASE Switch Start
target input methodswitch end
CASE Switch End
target_input_representation
Target Input Representation Selection
resort cols
Column Resorter
ChEMBL Chemical Probes from Target Retrieval
validate table and insertmissing for unavailable cols(it happens for targets not in chembl)
Table Validator
compounds_for_activity_type
String Manipulation
UniProt Accession or Gene to Target Info
target_representation_for_uniprot_to_target_info_component
Rule Engine Variable
filter out variant_idand tid
Column Filter
report targets present inchembl but without MOA data
report targets with no MOA
max 500 targets
Max Input Dataset Size Checker
group by: target_chembl_id,and target_variantaggregate: compounds_for_activity_type
GroupBy
filter only target_chembl_idtarget_variant & chemical_probes cols
Column Filter
select the target_representationcol and rename it astarget_input_representation(duplicate rows will be filtered out)
adjust target representation table
substitute ";" for ""on multi-rows colummns
String Manipulation (Multi Column)
filter only the activity standard typesassociated with the correctactivity standard units
Joiner
filter only chembltarget general cols
Column Filter
select onlyuniprot_accession col
Column Filter
activity_type_definition
Column Renamer
rename it
Column Renamer
remove duplicates
Duplicate Row Filter
replace empty or missingtarget_variant with "WT"
Rule Engine
target unavailable in chemblswitch start
Empty Table Switch
from uniprot accessionretrieve PDB structuresfrom PDB web service
PDB UniProt Accession to Structure
filter relevant cols
Column Filter
target unavailable in chemblswitch end(it uses only the 1st non-empty input)
CASE Switch End
left join availableactivity data
Joiner
define target_variantfrom mutation
Rule Engine
split top only relevantactivity types: AC50, EC50IC50, XC50, Ki, Kd, potencyinhibition
Row Splitter (deprecated)
split top only relevantactivity types: AC50, EC50IC50, XC50, Ki, Kd, potencyinhibition
Row Splitter (deprecated)
concatenate
Concatenate
filter only 2 colstarget_chembl_idtarget_variant
Column Filter
rename chembl_idas target_chembl_id
Column Renamer
select only quantitativedata (standard_relation = "=")
Row Filter (deprecated)
Target Druggability Summary
exclude homologous targets byselecting target_confidence = 7(direct single protein complex subunits assigned)
Row Filter (deprecated)
exclude homologous targets byselecting target_confidence = 5(multiple direct protein target assigned - protein family)
Row Filter (deprecated)
validate table and fill upwith missing values
Table Validator
ChEMBL Activity Type Retrieval (Batch)
add Human ProteinAtlas target expression inks
add Human Protein Atlas expression links
left-joinChEMBL data on the right
Joiner
target_druggability_data
CSV Writer
target_functional_expression_data
CSV Writer
Target Type Selection
target_input_method
Target Input Method Selection
selectmolecule_chembl_id,target_chembl_id,target_variant, action_type
Column Filter
removeduplicates
Duplicate Row Filter
add "(general for target)"suffix
Column Name Replacer
left-joinavailable structural data on the right(might be empty)
Joiner
download outputfiles
Download Output Files
rename molecule_chembl_idas drug_chembl_id
Column Renamer
filter out colwith links(used just for visualisationin interactive view pages)
Column Filter
uniprot_data_retrievaltry-catch start
Try (Data Ports)
MOA-indicationempty table switch start
Empty Table Switch
re-join chembl target cols
uniprot_target_retrieval_problemswitch start
CASE Switch Start
filter onlyuniprot_accession
Column Filter
replace empty or missingtarget_variant with "WT"
Rule Engine
uniprot_data_retrieval_problem
Rule Engine Variable
left join
Joiner
Wrong Target ID column

Nodes

Extensions

Links