In particular, 3 different SMARTS operations are shown:
1. View filtered molecules from single SMARTS query (top). This process takes a user-provided SMARTS query and filters the provided SMILES file based on the query. Substructures which match the query are highlighted.
2. View molecules filtered by SMARTS file (middle). This process performs filtering based on the given SMILES file and SMARTS file. The RDKit Molecule Substructure Filter node can be configured to specify how the filtering should be done.
3. Count # of matches from SMARTS file (bottom). This process counts the number of substructure matches found in the given SMILES file, using the SMARTS file as input queries. Each query will be a column in the output table, with column/row entries indicating the number of matches between a given query/molecule (SMILES).
Load Data:
Example data is provided, although the user may change the path to the SMILES or SMARTS file (in the "Load data" section) to specify the file they'd like to use. The only requirements are that the SMILES table contains a column titled "SMILES" with SMILES entries and column "Name" giving the molecule name or id. Similarly, the SMARTS table should have a column titled "SMARTS" with SMARTS queries and a column "Name" giving the name or id of the query. Note that you may need to adjust the Column Renamer and CSV Reader nodes to account for differences in formatting.
Additional Resources:
If you'd like to use a command-line interface with similar (and more robust/faster) functionality, please visit RDKit Tools repository linked in external resources.
Acknowledgement:
Both the SMILES and SMARTS data used in this example workflow are taken from public sources. The SMILES file is based on the Tox21 project, and is taken from the GitHub repository constructed by Guillaume Lambard (see external resources). The SMARTS file is based on PAINS filters (Baell and Holloway) and it taken from the link provided in external resources.
URL: RDKit Tools GitHub https://github.com/jeremyjyang/rdkit-tools?tab=readme-ov-file#smarts
URL: Tox21 (About) https://tox21.gov/tox21-library/
URL: Tox21 (File Used) https://github.com/GLambard/Molecules_Dataset_Collection/blob/master/latest/tox21.csv
URL: PAINS Paper (Baell and Holloway) https://pubs.acs.org/doi/10.1021/jm901137j
URL: PAINS (File Used) https://optibrium.com/downloads/PAINS_S8.txt
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.