Icon

kn_​example_​hive_​orc_​external_​table

Hive - upload data in several ORC files to HDFS and bring them together as an EXTERNAL table

Hive - upload data in several ORC files to HDFS and bring them together as an EXTERNAL table
You have several ORC files with the same structure that you would upload to a HDFS folder. That folder would then be 'adressed' as an external table by your Hive system. The data will then be inserted into a 'real' big data table.
The benefit: you can upload your (ORC) compressed files in chunks and then just use them as a single data table in your big data system

Hive - upload data in several ORC files to HDFS and bring them together as an EXTERNAL tableYou have several ORC files with the same structure that you would upload to a HDFS folder. That folder would then be 'adressed' as an external table by your Hive system. The data will then be inserted into a 'real' big data table.The benefit: you can upload your (ORC) compressed files in chunks and then just use them as a single data table in your big data system SET hive.vectorized.execution.enabled=false;SET skip.header.line.count = 1;CREATE EXTERNAL TABLE IF NOT EXISTS $${Sv_big_data_schema}$$.$${Sv_external_table_name}$$ ($${Sv_sql_string2}$$) COMMENT 'import data from ORC' STORED AS ORC /* LOCATION "/Users/m_lauber/Dropbox/knime-workspace/hub/kn_example_hive_orc_external_table/big_data/upload/my_external_table/"*/ LOCATION "$${Sv_path_external_table}$$"; PARTITIONED BY (d_date STRING)STORED AS PARQUETCOMMENT 'this is my final table' TBLPROPERTIES ( 'parquet.compression'='snappy' , 'external.table.purge'='true' , 'transactional'='false' , 'discover.partitions' = 'false' , 'table.owner' = 'John Doe' , 'table.owner_team' = 'Data Analytics' , 'table.housekeeping'='{"info": "housekeeping by daily partition" , "attribute": "d_date" , "attribute_string_format": "yyyy-MM-dd" }' ) This just simulates your ORC files on your HDFS folder (you can also upload them there) $${Sv_big_data_schema}$$.$${Sv_external_table_name}$$=> convert ORC files into external tablereadall data fromexteraml table=> just to checkin a real world exampleyou would maybe just selectfrom extermal tablemade of ORCDESCRIBE EXTENDEDexternal table/big_data//data/=> will clear big_datafolderyou can replace thiswith your big dataconnection start_date=> Date formatstart_time$Column Type$ LIKE "String*" =>"STRING"$Column Type$ LIKE "*long*" =>"BIGINT"$Column Type$ LIKE "*double*" =>"DOUBLE"$Column Type$ LIKE "*integer*" =>"INTEGER"$Column Type$ LIKE "*Time*" =>"TIMESTAMP"$Column Type$ LIKE "*Date*" =>"DATE"=> you might have to extend that if you havetimestamps or BIGINT columns or unknown typesa_external_table_folder!!! create path for connection(CONNECTED, local, ../big_data/upload)v_big_data_schema=> the name of yout big data schemasearch for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?search for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?createdummydataCluster_Membershiptimestamp_real_txtCREATEmain databaseSTART9 days back fromtoday40% randomlyd_dated_dated_dateminus_daysd_date^(?!minus_days).*$CREATEmain databaseSTOPv_external_table_name=> the name of your external tablewrite ORC filea_external_table_orc!!! create name of ORC file(CONNECTED, local, ../big_data/upload)first linecurrentd_datecurrentd_datePath_ORC_filesclear ORC filesone entryread oneORC fileto get the structure$${Sv_big_data_schema}$$.$${Sv_external_table_name}$$v_path_external_tableget just the structureWHERE 1=2d_dateas partitioncolumn excludedreaddb_020_load_from_single_csv_filev_name_final_tablecreate asempty tableDROP TABLE IF EXISTS $${Sv_big_data_schema}$$.$${Sv_name_final_table}$$;v_name_final_tablev_name_final_tablecreate folderfor externaltable ifnot existsLIMIT` 1000DESCRIBE EXTENDEDfinal tableSHOW PARTITIONSfinal tableDB SQL Executor DB Reader DB Table Selector DB Query Reader Metadata forBig Data Create Local BigData Environment Java EditVariable (simple) Java EditVariable (simple) Extract Table Spec Rule Engine Metanode SQLstrings Create File/FolderVariables String Input List Files/Folders determineupload path Data Generator Column Rename create timestamp Counting Loop Start Partitioning ConstantValue Column String to Date&Time Date&Time Shift Math Formula Date&Time to String Column Filter Variable Loop End Merge Variables String Input ORC Writer Create File/FolderVariables Row Filter Table Rowto Variable Merge Variables Merge Variables Column Rename Delete Files/Folders(Table) Row Filter Table Rowto Variable ORC Reader DB SQL Executor Java EditVariable (simple) DB ConnectionExtractor DB Query DB Column Filter DB Reader DB Table Creator DB SQL Executor Java EditVariable (simple) DB ConnectionTable Writer Create Folder DB Query DB ConnectionExtractor DB Query Reader DB Query Reader Hive - upload data in several ORC files to HDFS and bring them together as an EXTERNAL tableYou have several ORC files with the same structure that you would upload to a HDFS folder. That folder would then be 'adressed' as an external table by your Hive system. The data will then be inserted into a 'real' big data table.The benefit: you can upload your (ORC) compressed files in chunks and then just use them as a single data table in your big data system SET hive.vectorized.execution.enabled=false;SET skip.header.line.count = 1;CREATE EXTERNAL TABLE IF NOT EXISTS $${Sv_big_data_schema}$$.$${Sv_external_table_name}$$ ($${Sv_sql_string2}$$) COMMENT 'import data from ORC' STORED AS ORC /* LOCATION "/Users/m_lauber/Dropbox/knime-workspace/hub/kn_example_hive_orc_external_table/big_data/upload/my_external_table/"*/ LOCATION "$${Sv_path_external_table}$$"; PARTITIONED BY (d_date STRING)STORED AS PARQUETCOMMENT 'this is my final table' TBLPROPERTIES ( 'parquet.compression'='snappy' , 'external.table.purge'='true' , 'transactional'='false' , 'discover.partitions' = 'false' , 'table.owner' = 'John Doe' , 'table.owner_team' = 'Data Analytics' , 'table.housekeeping'='{"info": "housekeeping by daily partition" , "attribute": "d_date" , "attribute_string_format": "yyyy-MM-dd" }' ) This just simulates your ORC files on your HDFS folder (you can also upload them there) $${Sv_big_data_schema}$$.$${Sv_external_table_name}$$=> convert ORC files into external tablereadall data fromexteraml table=> just to checkin a real world exampleyou would maybe just selectfrom extermal tablemade of ORCDESCRIBE EXTENDEDexternal table/big_data//data/=> will clear big_datafolderyou can replace thiswith your big dataconnection start_date=> Date formatstart_time$Column Type$ LIKE "String*" =>"STRING"$Column Type$ LIKE "*long*" =>"BIGINT"$Column Type$ LIKE "*double*" =>"DOUBLE"$Column Type$ LIKE "*integer*" =>"INTEGER"$Column Type$ LIKE "*Time*" =>"TIMESTAMP"$Column Type$ LIKE "*Date*" =>"DATE"=> you might have to extend that if you havetimestamps or BIGINT columns or unknown typesa_external_table_folder!!! create path for connection(CONNECTED, local, ../big_data/upload)v_big_data_schema=> the name of yout big data schemasearch for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?search for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?createdummydataCluster_Membershiptimestamp_real_txtCREATEmain databaseSTART9 days back fromtoday40% randomlyd_dated_dated_dateminus_daysd_date^(?!minus_days).*$CREATEmain databaseSTOPv_external_table_name=> the name of your external tablewrite ORC filea_external_table_orc!!! create name of ORC file(CONNECTED, local, ../big_data/upload)first linecurrentd_datecurrentd_datePath_ORC_filesclear ORC filesone entryread oneORC fileto get the structure$${Sv_big_data_schema}$$.$${Sv_external_table_name}$$v_path_external_tableget just the structureWHERE 1=2d_dateas partitioncolumn excludedreaddb_020_load_from_single_csv_filev_name_final_tablecreate asempty tableDROP TABLE IF EXISTS $${Sv_big_data_schema}$$.$${Sv_name_final_table}$$;v_name_final_tablev_name_final_tablecreate folderfor externaltable ifnot existsLIMIT` 1000DESCRIBE EXTENDEDfinal tableSHOW PARTITIONSfinal tableDB SQL Executor DB Reader DB Table Selector DB Query Reader Metadata forBig Data Create Local BigData Environment Java EditVariable (simple) Java EditVariable (simple) Extract Table Spec Rule Engine Metanode SQLstrings Create File/FolderVariables String Input List Files/Folders determineupload path Data Generator Column Rename create timestamp Counting Loop Start Partitioning ConstantValue Column String to Date&Time Date&Time Shift Math Formula Date&Time to String Column Filter Variable Loop End Merge Variables String Input ORC Writer Create File/FolderVariables Row Filter Table Rowto Variable Merge Variables Merge Variables Column Rename Delete Files/Folders(Table) Row Filter Table Rowto Variable ORC Reader DB SQL Executor Java EditVariable (simple) DB ConnectionExtractor DB Query DB Column Filter DB Reader DB Table Creator DB SQL Executor Java EditVariable (simple) DB ConnectionTable Writer Create Folder DB Query DB ConnectionExtractor DB Query Reader DB Query Reader

Nodes

Extensions

Links