Icon

02_​Fetch_​And_​Transform_​PubChem_​Data

Fetch and Transform PubChem Data
GET IDs of the relevantexperimentsHere we collectexperiments with type"Screening" Extract count of tested molecules for eachexperimentHere we request counts for the first 150experiments OPTIONAL: backup data Download results of the experimentsfrom PubChem.Here we collect experiments in which200k - 350k compounds were tested Process Data in Local Big Data EnvironmentStep I:Remove missing values and pivot. Step II: Clean up pivoting results Save data to the Local Big Data Environment Fetch and Transform PubChem DataThis workflow prepares a data set using Local Big Data Environment for Data Chefs Battle: Chemistry vs Biology In the top part the results of biological experiments are collected from PubChem database using its API. In the middle part the preprocessing of the data is performed using Local BigData Environment. In the bottom part the data are backed up Experiments with counts200k - 350k resultsPivoted data Fetch ExperimentDetails from PubChem Table Writer Row Filter Create Local BigData Environment ORC Writer Get IDs ofExperiments Extract Countsfor Experiments ORC Writer Column Filter Column Rename Spark to ORC ORC to Spark Spark to Table Preprocess inSpark Step I Preprocess inSpark Step II Spark ColumnRename (Regex) Spark Column Filter GET IDs of the relevantexperimentsHere we collectexperiments with type"Screening" Extract count of tested molecules for eachexperimentHere we request counts for the first 150experiments OPTIONAL: backup data Download results of the experimentsfrom PubChem.Here we collect experiments in which200k - 350k compounds were tested Process Data in Local Big Data EnvironmentStep I:Remove missing values and pivot. Step II: Clean up pivoting results Save data to the Local Big Data Environment Fetch and Transform PubChem DataThis workflow prepares a data set using Local Big Data Environment for Data Chefs Battle: Chemistry vs Biology In the top part the results of biological experiments are collected from PubChem database using its API. In the middle part the preprocessing of the data is performed using Local BigData Environment. In the bottom part the data are backed up Experiments with counts200k - 350k resultsPivoted dataFetch ExperimentDetails from PubChem Table Writer Row Filter Create Local BigData Environment ORC Writer Get IDs ofExperiments Extract Countsfor Experiments ORC Writer Column Filter Column Rename Spark to ORC ORC to Spark Spark to Table Preprocess inSpark Step I Preprocess inSpark Step II Spark ColumnRename (Regex) Spark Column Filter

Nodes

Extensions

Links