Icon

m_​001_​hive_​partitions

Create a Big Data Hive/Parquet table with a partition based on an existing KNIME table and add more partitions later

Create a Big Data Hive/Parquet table with a partition based on an existing KNIME table and add more partitions later

You can create a Hive table with Parquet format with the DB Table Creator node with additional options specifying the PARQUET format and a PARTITION. You will leave the partition column out when creating the table from the example and later use the column as partition when you insert the KNIME table into your newly created Hiv table. You can later add more partitions just by uploading it thru "DB Loader" - the partitions will automatically be created or an existing partition will be appended.

=> please download the whole workflow group "kn_example_bigdata_hive_partitions"



STORED AS PARQUETPARTITIONED BY (education STRING)TBLPROPERTIES ( 'parquet.compression'='snappy' , 'external.table.purge'='true' , 'transactional'='false' , 'discover.partitions' = 'false' ) Create a Big Data Hive/Parquet table with a partition based on an existing KNIME table and add more partitions laterYou can create a Hive table with Parquet format with the DB Table Creator node with additional options specifying the PARQUET format and a PARTITION. You will leave the partition column out when creating the table from the example and later use the column as partition when you insert the KNIME table into your newly created Hiv table. You can later add more partitions just by uploading it thru "DB Loader" - the partitions will automatically be created or an existing partition will be appended.=> please download the whole workflow group "kn_example_bigdata_hive_partitions"https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_partitions~kotdKu7AOmLrlEbG/ create default.data_education_masters to have an "existing" big data table the same as using the DB Connection Table Writer but just with SQL code - to _SHOW PARTITIONS `default`.`data_train`see wich partitions are theredrop education for partition^(?!education$).*alternative: SQL Executor withDROP TABLE IF EXISTS default.data_all;customer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data lines=> create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start over../data/census_income_train.parquetthe KNME datadata_trainset a partitioning command on the tab"Additional Options"PARTITIONED BY (education STRING)data_traindata_trainexclude education"Bachelors"to be added lateras a new partitiongroup by educationcheck resultsGROUPed byeducationdata_trainadd Hive Partitioneducation="Bachelors"check resultsGROUPed byeducationgroup by educationdata_trainALTER TABLE `default`.`data_train`DROP IF EXISTS partition (education = 'Bachelors');=> make sure the partition does not eixts(if you want to add to an existing partition do not use this part)SHOW PARTITIONS `default`.`sample_02`see wich partitions are theremove education lastalternative: SQL Executor withDROP TABLE IF EXISTS default.data_education_masters;data_education_mastersdata_education_mastersexclude education" Masters" to be added lateras a new partitionfrom an existing big data filedata_trainadd Hive Partitioneducation="Masters" from existing big data table"data_education_masters"=> partition is the last columnit will automatically be usedas partition(if you have more than one use them in the right order)data_traingroup by educationcheck resultsGROUPed byeducationSHOW PARTITIONS `default`.`data_train`see wich partitions are thereINSERT OVERWRITE TABLE `default`.`data_train`PARTITION (`education`)SELECT *FROM `default`.`data_education_masters`group by educationcheck resultsGROUPed byeducationdata_trainColumn Rename DB Query Reader Column Filter DB Table Remover Java Snippet(simple) local big datacontext create Parquet Reader DB Table Creator DB Loader DB Table Selector Row Splitter DB GroupBy DB Sorter DB Reader DB Loader DB Reader DB Sorter DB GroupBy DB Table Selector DB SQL Executor DB Query Reader Column Resorter DB Table Remover DB Loader DB Table Creator Row Splitter DB ConnectionTable Writer DB Table Selector DB GroupBy DB Sorter DB Reader DB Query Reader DB ConnectionExtractor DB SQL Executor DB Sorter DB GroupBy DB Reader DB Table Selector STORED AS PARQUETPARTITIONED BY (education STRING)TBLPROPERTIES ( 'parquet.compression'='snappy' , 'external.table.purge'='true' , 'transactional'='false' , 'discover.partitions' = 'false' ) Create a Big Data Hive/Parquet table with a partition based on an existing KNIME table and add more partitions laterYou can create a Hive table with Parquet format with the DB Table Creator node with additional options specifying the PARQUET format and a PARTITION. You will leave the partition column out when creating the table from the example and later use the column as partition when you insert the KNIME table into your newly created Hiv table. You can later add more partitions just by uploading it thru "DB Loader" - the partitions will automatically be created or an existing partition will be appended.=> please download the whole workflow group "kn_example_bigdata_hive_partitions"https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_partitions~kotdKu7AOmLrlEbG/ create default.data_education_masters to have an "existing" big data table the same as using the DB Connection Table Writer but just with SQL code - to _SHOW PARTITIONS `default`.`data_train`see wich partitions are theredrop education for partition^(?!education$).*alternative: SQL Executor withDROP TABLE IF EXISTS default.data_all;customer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data lines=> create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start over../data/census_income_train.parquetthe KNME datadata_trainset a partitioning command on the tab"Additional Options"PARTITIONED BY (education STRING)data_traindata_trainexclude education"Bachelors"to be added lateras a new partitiongroup by educationcheck resultsGROUPed byeducationdata_trainadd Hive Partitioneducation="Bachelors"check resultsGROUPed byeducationgroup by educationdata_trainALTER TABLE `default`.`data_train`DROP IF EXISTS partition (education = 'Bachelors');=> make sure the partition does not eixts(if you want to add to an existing partition do not use this part)SHOW PARTITIONS `default`.`sample_02`see wich partitions are theremove education lastalternative: SQL Executor withDROP TABLE IF EXISTS default.data_education_masters;data_education_mastersdata_education_mastersexclude education" Masters" to be added lateras a new partitionfrom an existing big data filedata_trainadd Hive Partitioneducation="Masters" from existing big data table"data_education_masters"=> partition is the last columnit will automatically be usedas partition(if you have more than one use them in the right order)data_traingroup by educationcheck resultsGROUPed byeducationSHOW PARTITIONS `default`.`data_train`see wich partitions are thereINSERT OVERWRITE TABLE `default`.`data_train`PARTITION (`education`)SELECT *FROM `default`.`data_education_masters`group by educationcheck resultsGROUPed byeducationdata_trainColumn Rename DB Query Reader Column Filter DB Table Remover Java Snippet(simple) local big datacontext create Parquet Reader DB Table Creator DB Loader DB Table Selector Row Splitter DB GroupBy DB Sorter DB Reader DB Loader DB Reader DB Sorter DB GroupBy DB Table Selector DB SQL Executor DB Query Reader Column Resorter DB Table Remover DB Loader DB Table Creator Row Splitter DB ConnectionTable Writer DB Table Selector DB GroupBy DB Sorter DB Reader DB Query Reader DB ConnectionExtractor DB SQL Executor DB Sorter DB GroupBy DB Reader DB Table Selector

Nodes

Extensions

Links