Icon

07_​Will_​They_​Blend_​BigQuery_​Databricks

Google BigQuery meets Databricks

This workflow connects to the Austin Bikeshare dataset, hosted among the Google BigQuery public datasets and a Databricks instance hosting the Austin Weather dataset. It first performs custom queries on the two platforms, then data are imported into the workflow and merged. Blended data are then visualized and uploaded on the personal Google BigQuery space and on the Databricks cluster.

You need a P12 authentication key provided by the Google Cloud Platform in order to execute the upper part of the workflow and a Databricks cluster in order to run the bottom part.





Google Authentication Load blended data into Google BigQuery Austin Bikeshare custom query This workflow blends the Austin Bike Share Trips dataset publicly available on Google BigQuery with the Austin Weather datasethosted on Databricks. After manipulation and visualiation, blended data are uploaded both on BigQuery personal schema and on the Databricks cluster.. In Databricks Community edition (free) the clusteris terminated after 120 min of inactivity Access Austin weather dataset via Spark Load blended data into Databricks Select tableand fieldsGroup by dayQuery the DBFilter outsome rowsImport datainto KNIMEManipulate datawith Spak toextract date fieldAccess datavia SparkUpload dataUpload dataProvide Cluster ID,workspace and credentialsJoin dataCreate table on BigQueryCreate table on DatabricksProvide your serviceaccount ID and the P12 keyConnect to BigQuery schema DB Table Selector DB GroupBy DB Reader DB Row Filter Data manipulation Spark to Table PySpark Script(1 to 1) Visualization Data manipulation CSV to Spark DB Loader DB Loader Create DatabricksEnvironment Joiner DB Table Creator DB Table Creator Google Authentication(API Key) Google BigQueryConnector Google Authentication Load blended data into Google BigQuery Austin Bikeshare custom query This workflow blends the Austin Bike Share Trips dataset publicly available on Google BigQuery with the Austin Weather datasethosted on Databricks. After manipulation and visualiation, blended data are uploaded both on BigQuery personal schema and on the Databricks cluster.. In Databricks Community edition (free) the clusteris terminated after 120 min of inactivity Access Austin weather dataset via Spark Load blended data into Databricks Select tableand fieldsGroup by dayQuery the DBFilter outsome rowsImport datainto KNIMEManipulate datawith Spak toextract date fieldAccess datavia SparkUpload dataUpload dataProvide Cluster ID,workspace and credentialsJoin dataCreate table on BigQueryCreate table on DatabricksProvide your serviceaccount ID and the P12 keyConnect to BigQuery schemaDB Table Selector DB GroupBy DB Reader DB Row Filter Data manipulation Spark to Table PySpark Script(1 to 1) Visualization Data manipulation CSV to Spark DB Loader DB Loader Create DatabricksEnvironment Joiner DB Table Creator DB Table Creator Google Authentication(API Key) Google BigQueryConnector

Nodes

Extensions

Links