Icon

DB SQL - H2 School of duplicates - and how to deal with them

<p>School of duplicates - and how to deal with them - H2 version</p><p></p><p>Dealing with duplicates is a constant theme with data scientist. And a lot of things can go wrong. The easienst ways to deal with them is GROUP BY or DISTINCT. Just get rid of them and be done. But as this examples might demonstrate this might not always be the best option. Even if your data provider swears your combined IDs are unique especially in Big Data scenarios there might still be lurking some muddy duplicates and you shoudl still be able to deal with them. And you should be able to bring a messy dataset into a meaningful table with a nice unique ID without loosing too much information. And this workflow would like to encourage you to think about what to do with your duplicates and not to get caught off guard but to take control :-)</p>

URL: long forum debate about duplicates https://forum.knime.com/t/remove-rows-with-duplicate-values/11105/15?u=mlauber71
URL: School of duplicates - and how to deal with them (corresponding article) https://forum.knime.com/t/school-of-duplicates-and-how-to-deal-with-them/24164?u=mlauber71
URL: Window functions with new DB drivers https://forum.knime.com/t/sqlite-and-window-functions/31608/4?u=mlauber71
URL: A meta collection of KNIME and databases (SQL, Big Data/Hive/Impala and Spark/PySpark) https://hub.knime.com/mlauber71/spaces/Public/latest/_db_sql_bigdata_hive_spark_meta_collection?u=mlauber71
URL: Example how to use H2 database to create table with upload and from scratch https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71
URL: Medium: KNIME, Databases and SQL https://medium.com/low-code-for-advanced-data-science/knime-databases-and-sql-273e27c9702a

Nodes

Extensions

Links