Icon

kn_​forum_​python_​openpyxl_​colour

Filter an Excel file using style elements like color of the cell extracted with the help of Python openpyxl

Filter an Excel file using style elements like color of the cell extracted with the help of Python openpyxl

It is possible to do that with the help of Python's Openpyxl (like with so many Excel manipulation tasks in and around KNIME).

Basically what happens is this:

* Import the Excel file into python
* activate the "Sheet1"
* find the edges, in this case, the no of rows
* iterate thru all of the cells with data in column A
* extract the style information (that was some piece of work)
* Store them in the corresponding column C
* check if the style information contains the specific colour we want to see (you could drop that once you are comfortable with the solution)
* if yes write TRUE if not FASLE into column B
* save the Excel file under a new name (so you might keep your original file)
* check the results

Please also check out the Jupyter notebook in the subdirectory of this workflow
/script/kn_forum_python_openpyxl_colour.ipynb

Filter an Excel file using style elements like color of the cell extracted with the help of Python openpyxlhttps://forum.knime.com/t/filter-excel-table-by-colour/20633/3?u=mlauber71 Please also check out the Jupyter notebook in the subdirectory of this workflow/script/kn_forum_python_openpyxl_colour.ipynbPython Installation (the short story)https://forum.knime.com/t/problem-with-setting-a-python-deep-learning-environment/19477/2?u=mlauber71 This is the information openpyxl extracts from an Excel cell, one can later interpret the color information<openpyxl.styles.fills.PatternFill object>Parameters:patternType='solid', fgColor=<openpyxl.styles.colors.Color object>Parameters:rgb='FF00B0F0', indexed=None, auto=None, theme=None, tint=0.0, type='rgb',bgColor=<openpyxl.styles.colors.Color object>Parameters:rgb=None, indexed=64, auto=None, theme=None, tint=0.0, type='indexed' # conda install -c anaconda openpyxlfrom openpyxl import load_workbookimport openpyxlimport osvar_data_path = flow_variables['v_path_data_folder']# var_data_path = "../data/"print("var_data_path : ", var_data_path )var_path_excel_file = var_data_path + "Example-2.xlsx"print("var_path_excel_file : ", var_path_excel_file )var_path_excel_file_export = var_data_path + "processed_file.xlsx"print("var_path_excel_file_export : ", var_path_excel_file_export )wb = load_workbook(var_path_excel_file)ws = wb["Sheet1"]# activate the ws 'data'for s in range(len(wb.sheetnames)): if wb.sheetnames[s] == ws: breakwb.active = sprint(s)def find_edges(sheet): row = sheet.max_row while row > 0: cells = sheet[row] if all([cell.value is None for cell in cells]): row -= 1 else: break if row == 0: return 0, 0 column = sheet.max_column while column > 0: cells = next(sheet.iter_cols(min_col=column, max_col=column, max_row=row)) if all([cell.value is None for cell in cells]): column -= 1 else: break return row, columnv_edges = find_edges(wb.worksheets[0])v_edge_row = v_edges[0]v_edge_column = v_edges[0]# create a lambda function to test for the presence of the specific RGB colorv_color_to_test = "FF00B0F0"# https://thispointer.com/python-how-to-use-if-else-elif-in-lambda-functions/# if the substring is found within the string it is true# in this case the string is so characteristic a substring is sufficient# if you want to extract more complicated patterns you would have to employ RegEx or smth.fct_test_substring = lambda x : True if (x.find(v_color_to_test) != -1) else False# https://stackoverflow.com/questions/29792134/how-we-can-use-iter-rows-in-python-openpyxl-package# iterates thru the A column (column=1) until the edge is reachedfor row in ws.iter_rows(min_row=1, max_col=1, max_row=v_edge_row): for cell in row: print(cell.row) # create a variable that contains a cell object in the C (3) column C_cell= ws.cell(row=cell.row, column=3) print(C_cell) # print("style_id: ", cell.style_id) print("fill: ", str(cell.fill)) # extract Style into a variables v_fill_style = str(cell.fill) # assign the value as string to the C column C_cell.value = v_fill_style # create aother variable that contains a cell object in the B (2) column B_cell= ws.cell(row=cell.row, column=2) # test if the colour is there and assign the value to the cell B_cell.value = fct_test_substring(v_fill_style)# save the file under a new namewb.save(var_path_excel_file_export)output_table = input_table.copy() processed_file.xlsxv_path_data_foldermanipulate excel filein Pythontransfer the variablesno real function there butPython script wants an inputlist fileprocessed_file.xlsxjust to makesure we have aclean start Excel Reader (XLS) Extract ContextProperties Java Edit Variable Python Script (1⇒1) Variable toTable Row List Files Delete Files String to URI Filter an Excel file using style elements like color of the cell extracted with the help of Python openpyxlhttps://forum.knime.com/t/filter-excel-table-by-colour/20633/3?u=mlauber71 Please also check out the Jupyter notebook in the subdirectory of this workflow/script/kn_forum_python_openpyxl_colour.ipynbPython Installation (the short story)https://forum.knime.com/t/problem-with-setting-a-python-deep-learning-environment/19477/2?u=mlauber71 This is the information openpyxl extracts from an Excel cell, one can later interpret the color information<openpyxl.styles.fills.PatternFill object>Parameters:patternType='solid', fgColor=<openpyxl.styles.colors.Color object>Parameters:rgb='FF00B0F0', indexed=None, auto=None, theme=None, tint=0.0, type='rgb',bgColor=<openpyxl.styles.colors.Color object>Parameters:rgb=None, indexed=64, auto=None, theme=None, tint=0.0, type='indexed' # conda install -c anaconda openpyxlfrom openpyxl import load_workbookimport openpyxlimport osvar_data_path = flow_variables['v_path_data_folder']# var_data_path = "../data/"print("var_data_path : ", var_data_path )var_path_excel_file = var_data_path + "Example-2.xlsx"print("var_path_excel_file : ", var_path_excel_file )var_path_excel_file_export = var_data_path + "processed_file.xlsx"print("var_path_excel_file_export : ", var_path_excel_file_export )wb = load_workbook(var_path_excel_file)ws = wb["Sheet1"]# activate the ws 'data'for s in range(len(wb.sheetnames)): if wb.sheetnames[s] == ws: breakwb.active = sprint(s)def find_edges(sheet): row = sheet.max_row while row > 0: cells = sheet[row] if all([cell.value is None for cell in cells]): row -= 1 else: break if row == 0: return 0, 0 column = sheet.max_column while column > 0: cells = next(sheet.iter_cols(min_col=column, max_col=column, max_row=row)) if all([cell.value is None for cell in cells]): column -= 1 else: break return row, columnv_edges = find_edges(wb.worksheets[0])v_edge_row = v_edges[0]v_edge_column = v_edges[0]# create a lambda function to test for the presence of the specific RGB colorv_color_to_test = "FF00B0F0"# https://thispointer.com/python-how-to-use-if-else-elif-in-lambda-functions/# if the substring is found within the string it is true# in this case the string is so characteristic a substring is sufficient# if you want to extract more complicated patterns you would have to employ RegEx or smth.fct_test_substring = lambda x : True if (x.find(v_color_to_test) != -1) else False# https://stackoverflow.com/questions/29792134/how-we-can-use-iter-rows-in-python-openpyxl-package# iterates thru the A column (column=1) until the edge is reachedfor row in ws.iter_rows(min_row=1, max_col=1, max_row=v_edge_row): for cell in row: print(cell.row) # create a variable that contains a cell object in the C (3) column C_cell= ws.cell(row=cell.row, column=3) print(C_cell) # print("style_id: ", cell.style_id) print("fill: ", str(cell.fill)) # extract Style into a variables v_fill_style = str(cell.fill) # assign the value as string to the C column C_cell.value = v_fill_style # create aother variable that contains a cell object in the B (2) column B_cell= ws.cell(row=cell.row, column=2) # test if the colour is there and assign the value to the cell B_cell.value = fct_test_substring(v_fill_style)# save the file under a new namewb.save(var_path_excel_file_export)output_table = input_table.copy() processed_file.xlsxv_path_data_foldermanipulate excel filein Pythontransfer the variablesno real function there butPython script wants an inputlist fileprocessed_file.xlsxjust to makesure we have aclean startExcel Reader (XLS) Extract ContextProperties Java Edit Variable Python Script (1⇒1) Variable toTable Row List Files Delete Files String to URI

Nodes

Extensions

Links