Icon

_​db_​sql_​bigdata_​hive_​spark_​meta_​collection

A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)

A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)

KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database related topics and books and online ressources but ...

What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark (including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happen not to have such a cluster at home.

A good starting point in general is the official "KNIME Database Extension Guide"
https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71

=> you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept.

-------------------------------
A - About SQL in general - a few links to refersh and learn

Structured Query Language Basics
https://www.nuwavesolutions.com/structured-query-language-basics/

SQL CASE WHEN - and other code in the sqltutorial
https://www.sqltutorial.org/sql-case/

KNIME courses Lesson 4. Bringing Things Together - Join and Concatenation
https://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71

KNIME Nodeguide for Joining and Concatenating
https://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71

-------------------------------
B - KNIME and SQL databases - going from basic to advanced

Database - Simple IO (using standalone SQL-databse SQLite)
https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71

KNIME and SQLite - simple example (with the New DB nodes)
https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71

H2 - use a SQL standalone Database from Scratch or from upload
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71

H2 - Example of H2 database - handling of database structure and use of Primary Keys - insert only new lines by ID
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_primary_key?u=mlauber71

Databases - Advanced Usage (using SQLite)
https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71

use H2 to produce a Position / Rank number within a group variable (window functions with new H2 JDBC drivers)
https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71
https://hub.knime.com/mlauber71/spaces/Public/latest/forum/kn_forum_31608_h2_group_rank_window_function?u=mlauber71

-------------------------------
C - KNIME and Big Data

KNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera.

KNIME Big Data Extensions User Guide
https://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71

School of Hive – everything you need to know to work with Hive tables on a Big Data system
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71

=> if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratch

An overview of KNIME based functions to access big data systems (with KNIME's local big data environment)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71

=> see for yourself how the big data nodes work on your local computer

KNIME and Hive - load multiple CSV files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71


-- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code)

work with Hive and external tables in CSV and Parquet

KNIME and Hive - load multiple CSV files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71

KNIME and Hive - load multiple Parquet files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71

Some more functions with Hive like adding fields to Hive table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71

If you experience slow connection when working with complex (Impala) queries you could check the "Retrieve in configure" settings
https://forum.knime.com/t/db-nodes-in-a-more-complex-flow-executing-very-slow/23649/2?u=mlauber71
https://docs.knime.com/latest/db_extension_guide/index.html#advanced_tab

Hive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71

D -------------------------------
KNIME and Spark

Being Lazy is Useful — Lazy Evaluation in Spark
https://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648

=> you can do without and jump right into it, but it makes sense to understand this basic concept of Spark

Comparison of Hive and Spark SQL - a gentle introductory example
https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71


An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71

=> use the DB, Spark and PySpark nodes on your big data system

-- going further with Spark

Overview of Examples using Spark (and ML) with KNIME
https://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71

Local Big Data Irish Meter
https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71

Cleaning the NYC taxi dataset on Spark
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71

E -------------------------------
More database and SQL related stuff

Tobias Kötter's KNIME hub space with a lot of DB related workflows
https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71

If you want to transfer data between data bases without down- and uploading it all
https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71

Microsoft Access is still a thing - and you can access it with KNIME
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71

The full SQL can be found in this free eBook (not all SQL commands might work on all databases)
https://goalkicker.com/SQLBook/

A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database relatedtopics and books and online ressources but ...What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark(including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happennot to have such a cluster at home.A good starting point in general is the official "KNIME Database Extension Guide"https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71=> you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept.-------------------------------A - About SQL in general - a few links to refersh and learnStructured Query Language Basicshttps://www.nuwavesolutions.com/structured-query-language-basics/SQL CASE WHEN - and other code in the sqltutorialhttps://www.sqltutorial.org/sql-case/KNIME courses Lesson 4. Bringing Things Together - Join and Concatenationhttps://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71KNIME Nodeguide for Joining and Concatenatinghttps://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71-------------------------------B - KNIME and SQL databases - going from basic to advancedDatabase - Simple IO (using standalone SQL-databse SQLite)https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71KNIME and SQLite - simple example (with the New DB nodes)https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71H2 - use a SQL standalone Database from Scratch or from uploadhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71H2 - Example of H2 database - handling of database structure and use of Primary Keys - insert only new lines by IDhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_primary_key?u=mlauber71Databases - Advanced Usage (using SQLite)https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71use H2 to produce a Position / Rank number within a group variable (window functions with new H2 JDBC drivers)https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71https://hub.knime.com/mlauber71/spaces/Public/latest/forum/kn_forum_31608_h2_group_rank_window_function?u=mlauber71-------------------------------C - KNIME and Big DataKNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera.KNIME Big Data Extensions User Guidehttps://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71School of Hive – everything you need to know to work with Hive tables on a Big Data systemhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71=> if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratchAn overview of KNIME based functions to access big data systems (with KNIME's local big data environment)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71=> see for yourself how the big data nodes work on your local computerKNIME and Hive - load multiple CSV files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71-- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code)work with Hive and external tables in CSV and ParquetKNIME and Hive - load multiple CSV files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71KNIME and Hive - load multiple Parquet files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71Some more functions with Hive like adding fields to Hive tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71 If you experience slow connection when working with complex (Impala) queries you could check the "Retrieve in configure" settingshttps://forum.knime.com/t/db-nodes-in-a-more-complex-flow-executing-very-slow/23649/2?u=mlauber71https://docs.knime.com/latest/db_extension_guide/index.html#advanced_tabHive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71D -------------------------------KNIME and Spark Being Lazy is Useful — Lazy Evaluation in Sparkhttps://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648 => you can do without and jump right into it, but it makes sense to understand this basic concept of SparkComparison of Hive and Spark SQL - a gentle introductory examplehttps://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71=> use the DB, Spark and PySpark nodes on your big data system-- going further with SparkOverview of Examples using Spark (and ML) with KNIMEhttps://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71 Local Big Data Irish Meterhttps://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71Cleaning the NYC taxi dataset on Sparkhttps://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71E -------------------------------More database and SQL related stuffTobias Kötter's KNIME hub space with a lot of DB related workflowshttps://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71If you want to transfer data between data bases without down- and uploading it allhttps://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71Microsoft Access is still a thing - and you can access it with KNIMEhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71The full SQL can be found in this free eBook (not all SQL commands might work on all databases)https://goalkicker.com/SQLBook/ A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database relatedtopics and books and online ressources but ...What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark(including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happennot to have such a cluster at home.A good starting point in general is the official "KNIME Database Extension Guide"https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71=> you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept.-------------------------------A - About SQL in general - a few links to refersh and learnStructured Query Language Basicshttps://www.nuwavesolutions.com/structured-query-language-basics/SQL CASE WHEN - and other code in the sqltutorialhttps://www.sqltutorial.org/sql-case/KNIME courses Lesson 4. Bringing Things Together - Join and Concatenationhttps://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71KNIME Nodeguide for Joining and Concatenatinghttps://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71-------------------------------B - KNIME and SQL databases - going from basic to advancedDatabase - Simple IO (using standalone SQL-databse SQLite)https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71KNIME and SQLite - simple example (with the New DB nodes)https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71H2 - use a SQL standalone Database from Scratch or from uploadhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71H2 - Example of H2 database - handling of database structure and use of Primary Keys - insert only new lines by IDhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_primary_key?u=mlauber71Databases - Advanced Usage (using SQLite)https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71use H2 to produce a Position / Rank number within a group variable (window functions with new H2 JDBC drivers)https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71https://hub.knime.com/mlauber71/spaces/Public/latest/forum/kn_forum_31608_h2_group_rank_window_function?u=mlauber71-------------------------------C - KNIME and Big DataKNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera.KNIME Big Data Extensions User Guidehttps://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71School of Hive – everything you need to know to work with Hive tables on a Big Data systemhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71=> if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratchAn overview of KNIME based functions to access big data systems (with KNIME's local big data environment)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71=> see for yourself how the big data nodes work on your local computerKNIME and Hive - load multiple CSV files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71-- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code)work with Hive and external tables in CSV and ParquetKNIME and Hive - load multiple CSV files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71KNIME and Hive - load multiple Parquet files at once via external tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71Some more functions with Hive like adding fields to Hive tablehttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71 If you experience slow connection when working with complex (Impala) queries you could check the "Retrieve in configure" settingshttps://forum.knime.com/t/db-nodes-in-a-more-complex-flow-executing-very-slow/23649/2?u=mlauber71https://docs.knime.com/latest/db_extension_guide/index.html#advanced_tabHive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71D -------------------------------KNIME and Spark Being Lazy is Useful — Lazy Evaluation in Sparkhttps://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648 => you can do without and jump right into it, but it makes sense to understand this basic concept of SparkComparison of Hive and Spark SQL - a gentle introductory examplehttps://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark)https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71=> use the DB, Spark and PySpark nodes on your big data system-- going further with SparkOverview of Examples using Spark (and ML) with KNIMEhttps://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71 Local Big Data Irish Meterhttps://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71Cleaning the NYC taxi dataset on Sparkhttps://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71E -------------------------------More database and SQL related stuffTobias Kötter's KNIME hub space with a lot of DB related workflowshttps://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71If you want to transfer data between data bases without down- and uploading it allhttps://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71Microsoft Access is still a thing - and you can access it with KNIMEhttps://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71The full SQL can be found in this free eBook (not all SQL commands might work on all databases)https://goalkicker.com/SQLBook/

Nodes

  • No nodes found

Extensions

  • No modules found

Links