This is the forth workflow in the PubChem Big Data story.
We prepare three datasets for the machine learning experiments. Set 1: Compounds, their chemical structures and their bioactivity values as reported in PubChem. Set 2: Compounds, their chemical structures and their bioactivity values where missing values were replaced with 0 (i.e. compunds were assumed to have shown no activity). Set 3: Unique compounds (duplicates removed)
Additionally, in step 4, we collect the counts of active and inactive compounds per AID.
AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDo you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.