A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark)
KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database related topics and books and online ressources but ...
What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark (including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happen not to have such a cluster at home.
A good starting point in general is the official "KNIME Database Extension Guide"
https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71
=> you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept.
-------------------------------
A - About SQL in general - a few links to refersh and learn
Structured Query Language Basics
https://www.nuwavesolutions.com/structured-query-language-basics/
SQL CASE WHEN - and other code in the sqltutorial
https://www.sqltutorial.org/sql-case/
KNIME courses Lesson 4. Bringing Things Together - Join and Concatenation
https://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71
KNIME Nodeguide for Joining and Concatenating
https://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71
-------------------------------
B - KNIME and SQL databases - going from basic to advanced
Database - Simple IO (using standalone SQL-databse SQLite)
https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71
KNIME and SQLite - simple example (with the New DB nodes)
https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71
H2 - use a SQL standalone Database from Scratch or from upload
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71
H2 - Example of H2 database - handling of database structure and use of Primary Keys - insert only new lines by ID
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_primary_key?u=mlauber71
Databases - Advanced Usage (using SQLite)
https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71
use H2 to produce a Position / Rank number within a group variable (window functions with new H2 JDBC drivers)
https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71
https://hub.knime.com/mlauber71/spaces/Public/latest/forum/kn_forum_31608_h2_group_rank_window_function?u=mlauber71
-------------------------------
C - KNIME and Big Data
KNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera.
KNIME Big Data Extensions User Guide
https://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71
School of Hive – everything you need to know to work with Hive tables on a Big Data system
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71
=> if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratch
An overview of KNIME based functions to access big data systems (with KNIME's local big data environment)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71
=> see for yourself how the big data nodes work on your local computer
KNIME and Hive - load multiple CSV files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71
-- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code)
work with Hive and external tables in CSV and Parquet
KNIME and Hive - load multiple CSV files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71
KNIME and Hive - load multiple Parquet files at once via external table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71
Some more functions with Hive like adding fields to Hive table
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71
If you experience slow connection when working with complex (Impala) queries you could check the "Retrieve in configure" settings
https://forum.knime.com/t/db-nodes-in-a-more-complex-flow-executing-very-slow/23649/2?u=mlauber71
https://docs.knime.com/latest/db_extension_guide/index.html#advanced_tab
Hive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71
D -------------------------------
KNIME and Spark
Being Lazy is Useful — Lazy Evaluation in Spark
https://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648
=> you can do without and jump right into it, but it makes sense to understand this basic concept of Spark
Comparison of Hive and Spark SQL - a gentle introductory example
https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71
An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71
=> use the DB, Spark and PySpark nodes on your big data system
-- going further with Spark
Overview of Examples using Spark (and ML) with KNIME
https://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71
Local Big Data Irish Meter
https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71
Cleaning the NYC taxi dataset on Spark
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71
E -------------------------------
More database and SQL related stuff
Tobias Kötter's KNIME hub space with a lot of DB related workflows
https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71
If you want to transfer data between data bases without down- and uploading it all
https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71
Microsoft Access is still a thing - and you can access it with KNIME
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71
The full SQL can be found in this free eBook (not all SQL commands might work on all databases)
https://goalkicker.com/SQLBook/
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.