04_Generate_Features

This is the forth workflow in the PubChem Big Data story.

We prepare three datasets for the machine learning experiments. Set 1: Compounds, their chemical structures and their bioactivity values as reported in PubChem. Set 2: Compounds, their chemical structures and their bioactivity values where missing values were replaced with 0 (i.e. compunds were assumed to have shown no activity). Set 3: Unique compounds (duplicates removed)

Additionally, in step 4, we collect the counts of active and inactive compounds per AID.

AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.

Nodes

Extensions

Download

To use this workflow in KNIME, download it from the below URL and open it in KNIME:

Download Workflow

Created by: daria_knime

Created at: 2018-12-03

On NodePit since: 2022-09-19

Last update: 2024-07-24

Created with KNIME version: v4.6.1

Tags: Apache SparkKNIME Workflow Executor for Apache SparkKWEfASchemistryRDKitbiologybioactivity

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

04_​Generate_​Features

Nodes

Extensions

Links

Download

04_Generate_Features