Icon

SMARTS Matching

Overview:

In particular, 3 different SMARTS operations are shown:

1. View filtered molecules from single SMARTS query (top). This process takes a user-provided SMARTS query and filters the provided SMILES file based on the query. Substructures which match the query are highlighted.

2. View molecules filtered by SMARTS file (middle). This process performs filtering based on the given SMILES file and SMARTS file. The RDKit Molecule Substructure Filter node can be configured to specify how the filtering should be done.

3. Count # of matches from SMARTS file (bottom). This process counts the number of substructure matches found in the given SMILES file, using the SMARTS file as input queries. Each query will be a column in the output table, with column/row entries indicating the number of matches between a given query/molecule (SMILES).


Load Data:
Example data is provided, although the user may change the path to the SMILES or SMARTS file (in the "Load data" section) to specify the file they'd like to use. The only requirements are that the SMILES table contains a column titled "SMILES" with SMILES entries and column "Name" giving the molecule name or id. Similarly, the SMARTS table should have a column titled "SMARTS" with SMARTS queries and a column "Name" giving the name or id of the query. Note that you may need to adjust the Column Renamer and CSV Reader nodes to account for differences in formatting.

Additional Resources:
If you'd like to use a command-line interface with similar (and more robust/faster) functionality, please visit RDKit Tools repository linked in external resources.

Acknowledgement:
Both the SMILES and SMARTS data used in this example workflow are taken from public sources. The SMILES file is based on the Tox21 project, and is taken from the GitHub repository constructed by Guillaume Lambard (see external resources). The SMARTS file is based on PAINS filters (Baell and Holloway) and it taken from the link provided in external resources.

URL: RDKit Tools GitHub https://github.com/jeremyjyang/rdkit-tools?tab=readme-ov-file#smarts
URL: Tox21 (About) https://tox21.gov/tox21-library/
URL: Tox21 (File Used) https://github.com/GLambard/Molecules_Dataset_Collection/blob/master/latest/tox21.csv
URL: PAINS Paper (Baell and Holloway) https://pubs.acs.org/doi/10.1021/jm901137j
URL: PAINS (File Used) https://optibrium.com/downloads/PAINS_S8.txt

Nodes

Extensions

Links