Icon

kn_​example_​python_​excel_​table_​object_​read

KNIME and Python's openpyxl to extract named tables (ranges) from Excel file (using the bundled KNIME Python version)

KNIME and Python's openpyxl to extract named tables (ranges) from Excel file (using the bundled KNIME Python version)
https://forum.knime.com/t/read-from-excel-table/38190/8?u=mlauber71

The Excel File "Tables.xlsx" contains 4 named Table ranges which will be imported into KNIME. First in one step. The second Python node would give back the cell ranges and use that information to extract the tables

KNIME and Python's openpyxl to extract named tables (ranges) from Excel file (using the bundled KNIME Python version)https://forum.knime.com/t/read-from-excel-table/38190/8?u=mlauber71The Excel File "Tables.xlsx" contains 4 named Table ranges which will be imported into KNIME. First in one step. The second Python node would give back the cell ranges and use that information to extract the tables EXTRACT the cell ranges of all named tables and import them into KNIME and then export them back into single Excel files################################################################################ https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html# https://stackoverflow.com/questions/69255564/how-to-extract-different-tables-in-excel-sheet-using-python# import the KNIME moduleimport knime.scripting.io as knioimport numpy as npimport pandas as pdfrom openpyxl import Workbookfrom openpyxl import load_workbookfrom openpyxl.worksheet.table import Table, TableStyleInfo# recreate the classic input_table from the new Python import since KNIME 4.5i# nput_table = knio.input_tables[0].to_pandas()var_workflow_path = knio.flow_variables['context.workflow.absolute-path']print("var_workflow_path : ", var_workflow_path )var_path_excel_file = knio.flow_variables['context.workflow.data-path'] + "Tables.xlsx"print("var_path_excel_file : ", var_path_excel_file )# read filewb = load_workbook(var_path_excel_file) # Set the filepath + filename# select the sheet where tables are locatedws = wb["Tables"]v_list = ws.tables.items()df = pd.DataFrame(v_list, columns=['table_name', 'table_range'])knio.output_tables[0] = knio.Table.from_pandas(df) ################################################################################ https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html# https://stackoverflow.com/questions/69255564/how-to-extract-different-tables-in-excel-sheet-using-python# import the KNIME moduleimport knime.scripting.io as knioimport numpy as npimport pandas as pdfrom openpyxl import Workbookfrom openpyxl import load_workbookfrom openpyxl.worksheet.table import Table, TableStyleInfo# recreate the classic input_table from the new Python import since KNIME 4.5# input_table = knio.input_tables[0].to_pandas()var_path_separator = knio.flow_variables['path.separator.system']var_workflow_path = knio.flow_variables['context.workflow.absolute-path']print("var_workflow_path : ", var_workflow_path )var_path_excel_file = knio.flow_variables['context.workflow.data-path'] + "Tables.xlsx"print("var_path_excel_file : ", var_path_excel_file )# read filewb = load_workbook(var_path_excel_file) # Set the filepath + filename# select the sheet where tables are locatedws = wb["Tables"]# check what tables that exist in the worksheetprint({key : value for key, value in ws.tables.items()})mapping = {}# loop through all the tables and add to a dictionaryfor entry, data_boundary in ws.tables.items(): # parse the data within the ref boundary data = ws[data_boundary] ### extract the data ### # the inner list comprehension gets the values for each cell in the table content = [[cell.value for cell in ent] for ent in data] header = content[0] #the contents ... excluding the header rest = content[1:] #create dataframe with the column names #and pair table name with dataframe df = pd.DataFrame(rest, columns = header) mapping[entry] = df# print(mapping)# Extract all the tables to individually dataframes from the dictionaryTable2, Table3, Table4, Table5 = mapping.values()knio.output_tables[0] = knio.Table.from_pandas(Table2)knio.output_tables[1] = knio.Table.from_pandas(Table3)knio.output_tables[2] = knio.Table.from_pandas(Table4)knio.output_tables[3] = knio.Table.from_pandas(Table5) locate and create/data/ folderwith absolute pathsTables.xlsxif you know how many namedtables there are you canimport them allTables.xlsxTables.xlsxsheet: ""splittable_rangetable_range_start(\D+)(\d+)table_range_starttable_range_endstart_characterstart_numbertable_range_ end(\D+)(\d+)start_numberend_numberSTARTsave the extractednamed tables as new excel tablev_name_extracted_tablev_name*ENDCollect LocalMetadata Python Script Python Script Excel Reader Cell Splitter Regex Split Column Rename Column Rename Regex Split Column Rename Table Row ToVariable Loop Start Excel Writer Java EditVariable (simple) String to Path(Variable) Variable Loop End KNIME and Python's openpyxl to extract named tables (ranges) from Excel file (using the bundled KNIME Python version)https://forum.knime.com/t/read-from-excel-table/38190/8?u=mlauber71The Excel File "Tables.xlsx" contains 4 named Table ranges which will be imported into KNIME. First in one step. The second Python node would give back the cell ranges and use that information to extract the tables EXTRACT the cell ranges of all named tables and import them into KNIME and then export them back into single Excel files################################################################################ https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html# https://stackoverflow.com/questions/69255564/how-to-extract-different-tables-in-excel-sheet-using-python# import the KNIME moduleimport knime.scripting.io as knioimport numpy as npimport pandas as pdfrom openpyxl import Workbookfrom openpyxl import load_workbookfrom openpyxl.worksheet.table import Table, TableStyleInfo# recreate the classic input_table from the new Python import since KNIME 4.5i# nput_table = knio.input_tables[0].to_pandas()var_workflow_path = knio.flow_variables['context.workflow.absolute-path']print("var_workflow_path : ", var_workflow_path )var_path_excel_file = knio.flow_variables['context.workflow.data-path'] + "Tables.xlsx"print("var_path_excel_file : ", var_path_excel_file )# read filewb = load_workbook(var_path_excel_file) # Set the filepath + filename# select the sheet where tables are locatedws = wb["Tables"]v_list = ws.tables.items()df = pd.DataFrame(v_list, columns=['table_name', 'table_range'])knio.output_tables[0] = knio.Table.from_pandas(df) ################################################################################ https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html# https://stackoverflow.com/questions/69255564/how-to-extract-different-tables-in-excel-sheet-using-python# import the KNIME moduleimport knime.scripting.io as knioimport numpy as npimport pandas as pdfrom openpyxl import Workbookfrom openpyxl import load_workbookfrom openpyxl.worksheet.table import Table, TableStyleInfo# recreate the classic input_table from the new Python import since KNIME 4.5# input_table = knio.input_tables[0].to_pandas()var_path_separator = knio.flow_variables['path.separator.system']var_workflow_path = knio.flow_variables['context.workflow.absolute-path']print("var_workflow_path : ", var_workflow_path )var_path_excel_file = knio.flow_variables['context.workflow.data-path'] + "Tables.xlsx"print("var_path_excel_file : ", var_path_excel_file )# read filewb = load_workbook(var_path_excel_file) # Set the filepath + filename# select the sheet where tables are locatedws = wb["Tables"]# check what tables that exist in the worksheetprint({key : value for key, value in ws.tables.items()})mapping = {}# loop through all the tables and add to a dictionaryfor entry, data_boundary in ws.tables.items(): # parse the data within the ref boundary data = ws[data_boundary] ### extract the data ### # the inner list comprehension gets the values for each cell in the table content = [[cell.value for cell in ent] for ent in data] header = content[0] #the contents ... excluding the header rest = content[1:] #create dataframe with the column names #and pair table name with dataframe df = pd.DataFrame(rest, columns = header) mapping[entry] = df# print(mapping)# Extract all the tables to individually dataframes from the dictionaryTable2, Table3, Table4, Table5 = mapping.values()knio.output_tables[0] = knio.Table.from_pandas(Table2)knio.output_tables[1] = knio.Table.from_pandas(Table3)knio.output_tables[2] = knio.Table.from_pandas(Table4)knio.output_tables[3] = knio.Table.from_pandas(Table5) locate and create/data/ folderwith absolute pathsTables.xlsxif you know how many namedtables there are you canimport them allTables.xlsxTables.xlsxsheet: ""splittable_rangetable_range_start(\D+)(\d+)table_range_starttable_range_endstart_characterstart_numbertable_range_ end(\D+)(\d+)start_numberend_numberSTARTsave the extractednamed tables as new excel tablev_name_extracted_tablev_name*ENDCollect LocalMetadata Python Script Python Script Excel Reader Cell Splitter Regex Split Column Rename Column Rename Regex Split Column Rename Table Row ToVariable Loop Start Excel Writer Java EditVariable (simple) String to Path(Variable) Variable Loop End

Nodes

Extensions

Links