Icon

Extract RDKit Molecules From Office

Extracts RDKit molecules out of png images embedded in MS Word or MS PowerPoint files.

The images must have been generated with the RDKit version (>=2020_09_1). Since that version, RDKit adds the molecule information as metadata to the image and hence the molecule can be extracted again.

The output contains a RDKit Molecule column and the index of the image it was extracted from. (No testing was done how Office products generate the images index, I would assume order of insertion and not order of pages/slides)

If the image was generated with

https://github.com/kienerj/molecule-slide-generator

then in addition to the molecules, all it's properties are extracted as well into accordingly named table columns.

Options

Select Presentation:
Select powerpoint file to extract from

Input Ports

This node has no input ports

Output Ports

Icon
RDKit Molecules

Nodes

Extensions

Links