Icon

03_​Fetch_​SMILES

This is the third workflow in the PubChem Big Data story.

First, we obtain the SMILES of the necessary CIDs using PubChem REST services. Then, we use KNIME Extension for Apache Spark to add the strings with SMILES to the matrix.

AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.

Fetch SMILES for CIDs using PubChem REST services 03_Fetch_SMILES This workflow demonstrates how to fetch SMILES using PubChem REST services and append them to the pivoted PubChem data using Apache Spark.For more information see the workflow metadata. Find it here: View -> Description”Required extensions: KNIME Extension for Apache Spark, and KNIME Workflow Executor for Apache Spark (Preview), KNIME Extension for Big Data File Formats Add SMILES to the CID_AID matrix Connect to AWS and Create Big Data Environment Collect statistics 5000 at a timejoin on CIDto 3Collect resultsCIDsca 1.5 milNode 516CID vs AIDnulls filteredCIDs vs AIDs nulls filteredand SMILES Timer Info Chunk Loop Start Destroy SparkContext CSV Writer Spark Joiner Spark Repartition Fetch SMILES viaPOST request Loop End ORC Reader ORC Writer Table to Spark ORC to Spark Spark to ORC Generate Path Paths toLivy and S3 Amazon S3 Connector Create SparkContext (Livy) AWS Authentication Fetch SMILES for CIDs using PubChem REST services 03_Fetch_SMILES This workflow demonstrates how to fetch SMILES using PubChem REST services and append them to the pivoted PubChem data using Apache Spark.For more information see the workflow metadata. Find it here: View -> Description”Required extensions: KNIME Extension for Apache Spark, and KNIME Workflow Executor for Apache Spark (Preview), KNIME Extension for Big Data File Formats Add SMILES to the CID_AID matrix Connect to AWS and Create Big Data Environment Collect statistics 5000 at a timejoin on CIDto 3Collect resultsCIDsca 1.5 milNode 516CID vs AIDnulls filteredCIDs vs AIDs nulls filteredand SMILES Timer Info Chunk Loop Start Destroy SparkContext CSV Writer Spark Joiner Spark Repartition Fetch SMILES viaPOST request Loop End ORC Reader ORC Writer Table to Spark ORC to Spark Spark to ORC Generate Path Paths toLivy and S3 Amazon S3 Connector Create SparkContext (Livy) AWS Authentication

Nodes

Extensions

Links