Extracts RDKit molecules out of png images embedded in MS Word or MS PowerPoint files.
The images must have been generated with the RDKit version (>=2020_09_1). Since that version, RDKit adds the molecule information as metadata to the image and hence the molecule can be extracted again.
The output contains a RDKit Molecule column and the index of the image it was extracted from. (No testing was done how Office products generate the images index, I would assume order of insertion and not order of pages/slides)
If the image was generated with
https://github.com/kienerj/molecule-slide-generator
then in addition to the molecules, all it's properties are extracted as well into accordingly named table columns.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!