Icon

kn_​example_​python_​excel_​image_​extract

Extract images from and Excel file with the help of Python / OpenPyxl

Extract images from and Excel file with the help of Python / OpenPyxl

there is a Jupyter notebook to experiment with under /data/kn_example_python_excel_image_extract.ipynb

Extract images from and Excel file with the help of Python / OpenPyxlthere is a Jupyter notebook to experiment with under /data/kn_example_python_excel_image_extract.ipynb import knime_io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdfrom openpyxl import load_workbookfrom pathlib import Path# pip install openpyxl-image-loaderfrom openpyxl_image_loader import SheetImageLoaderfrom openpyxl.utils import get_column_letterv_excel_name_only = knio.flow_variables['File name']v_excel_file = knio.flow_variables['Parent folder'] + "/" + v_excel_name_only + ".xlsx"v_image_path = knio.flow_variables['Parent folder'] + "/" + v_excel_name_only + "/"# https://stackoverflow.com/questions/31420817/convert-excel-row-column-indices-to-alphanumeric-cell-reference-in-python-openpy# create path for the image filesPath(v_image_path).mkdir(parents=True, exist_ok=True)wb = load_workbook(v_excel_file)ws_list = wb.sheetnamesdef xlref(row, column, zero_indexed=True): if zero_indexed: row += 1 column += 1 return get_column_letter(column) + str(row)df = pd.DataFrame({'Excel_File': pd.Series(dtype='str'),'Sheet_Name': pd.Series(dtype='str'),'Image_Cell': pd.Series(dtype='str'),'Image_Stored_Path': pd.Series(dtype='str') } )for s in ws_list: ws = wb[s] wb_images = ws._images for i in range(0,len(wb_images)): # yes this is sort of stupid but there is a bug out there # https://stackoverflow.com/questions/54847209/openpyxl-valueerror-i-o-operation-on-closed-file wb = load_workbook(v_excel_file) ws = wb[s] wb_images = ws._images wb_images_one = wb_images[i] v_cell_coordinates = xlref(row=wb_images_one.anchor._from.row, column=wb_images_one.anchor._from.col) # print(v_cell_coordinates) image_loader = SheetImageLoader(ws) image = image_loader.get(v_cell_coordinates) v_png_export_name = v_image_path + v_excel_name_only + "_" + ws.title + "_" + v_cell_coordinates + ".png" # print(v_png_export_name) image.save(v_png_export_name) df2 = pd.DataFrame( { "Excel_File" : [v_excel_file],"Sheet_Name" : [s], "Image_Cell" : [v_cell_coordinates],"Image_Stored_Path" : [v_png_export_name] } ) df = pd.concat([df, df2], ignore_index=True, axis=0)v_excel_export = v_image_path + "List_Images.xlsx"df.to_excel(v_excel_export, sheet_name='image_list')wb.closeknio.output_tables[0] = knio.write_table(df) Propagate Python environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the nameslist *.xlsx files from directory/dataPropagate Python environmentfor KNIME on MacOSX withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesSTART LoopEND Loop conda_environment_kaggle_windows URL to File Path List Files/Folders Path to String Python Script(Labs) conda_environment_kaggle_macosx Table Row ToVariable Loop Start Loop End Extract images from and Excel file with the help of Python / OpenPyxlthere is a Jupyter notebook to experiment with under /data/kn_example_python_excel_image_extract.ipynb import knime_io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdfrom openpyxl import load_workbookfrom pathlib import Path# pip install openpyxl-image-loaderfrom openpyxl_image_loader import SheetImageLoaderfrom openpyxl.utils import get_column_letterv_excel_name_only = knio.flow_variables['File name']v_excel_file = knio.flow_variables['Parent folder'] + "/" + v_excel_name_only + ".xlsx"v_image_path = knio.flow_variables['Parent folder'] + "/" + v_excel_name_only + "/"# https://stackoverflow.com/questions/31420817/convert-excel-row-column-indices-to-alphanumeric-cell-reference-in-python-openpy# create path for the image filesPath(v_image_path).mkdir(parents=True, exist_ok=True)wb = load_workbook(v_excel_file)ws_list = wb.sheetnamesdef xlref(row, column, zero_indexed=True): if zero_indexed: row += 1 column += 1 return get_column_letter(column) + str(row)df = pd.DataFrame({'Excel_File': pd.Series(dtype='str'),'Sheet_Name': pd.Series(dtype='str'),'Image_Cell': pd.Series(dtype='str'),'Image_Stored_Path': pd.Series(dtype='str') } )for s in ws_list: ws = wb[s] wb_images = ws._images for i in range(0,len(wb_images)): # yes this is sort of stupid but there is a bug out there # https://stackoverflow.com/questions/54847209/openpyxl-valueerror-i-o-operation-on-closed-file wb = load_workbook(v_excel_file) ws = wb[s] wb_images = ws._images wb_images_one = wb_images[i] v_cell_coordinates = xlref(row=wb_images_one.anchor._from.row, column=wb_images_one.anchor._from.col) # print(v_cell_coordinates) image_loader = SheetImageLoader(ws) image = image_loader.get(v_cell_coordinates) v_png_export_name = v_image_path + v_excel_name_only + "_" + ws.title + "_" + v_cell_coordinates + ".png" # print(v_png_export_name) image.save(v_png_export_name) df2 = pd.DataFrame( { "Excel_File" : [v_excel_file],"Sheet_Name" : [s], "Image_Cell" : [v_cell_coordinates],"Image_Stored_Path" : [v_png_export_name] } ) df = pd.concat([df, df2], ignore_index=True, axis=0)v_excel_export = v_image_path + "List_Images.xlsx"df.to_excel(v_excel_export, sheet_name='image_list')wb.closeknio.output_tables[0] = knio.write_table(df) Propagate Python environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the nameslist *.xlsx files from directory/dataPropagate Python environmentfor KNIME on MacOSX withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesSTART LoopEND Loopconda_environment_kaggle_windows URL to File Path List Files/Folders Path to String Python Script(Labs) conda_environment_kaggle_macosx Table Row ToVariable Loop Start Loop End

Nodes

Extensions

Links