01_Fetch_BioAssays

This is the first workflow in the PubChem Big Data story.

In the top part of the workflow we download the assay data from the PubChem database using its API and upload it to a specified S3 bucket on AWS. One file per assay/experiment (AID).

In the bottom part we clean up the assay data using KNIME Extension for Apache Spark and store cleaned up files on AWS.

AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.

Nodes

String Widget4 ×
Component Input3 ×
Component Output3 ×
Column Filter2 ×
GET Request2 ×
Show all 38 nodes

Extensions

Download

To use this workflow in KNIME, download it from the below URL and open it in KNIME:

Download Workflow

Created by: daria_knime

Created at: 2018-06-20

On NodePit since: 2022-09-19

Last update: 2024-04-25

Created with KNIME version: v4.6.0

Tags: PubChemChemistryAWSApache Sparklife sciencesdata collectionbioactivity dataAmazon S3Livyloop

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

01_​Fetch_​BioAssays

Nodes

Extensions

Links

Download

01_Fetch_BioAssays