Icon

kn_​forum_​72565_​python_​missing_​date_​pairs

Fill gaps in pairs of dates within a year

Fill gaps in pairs of dates within a year

Fill gaps in pairs of dates within a year import knime.scripting.io as knioimport pandas as pd# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pandas()# Convert strings to datetimedf['Start date'] = pd.to_datetime(df['Start date'])df['End date'] = pd.to_datetime(df['End date'])filled_rows = []for ID, group in df.groupby('ID'): group = group.sort_values(by='Start date') for i in range(len(group) - 1): end_date_1 = group.iloc[i]['End date'] start_date_2 = group.iloc[i+1]['Start date'] # If there's a gap between two dates if end_date_1 + pd.Timedelta(days=1) != start_date_2: filled_rows.append({'ID': ID, 'Start date': end_date_1 + pd.Timedelta(days=1),'End date': start_date_2 - pd.Timedelta(days=1)})filled_df = pd.DataFrame(filled_rows)# Combine the original dataframe with the filled rows and sortresult_df = pd.concat([df, filled_df], ignore_index=True).sort_values(by=['ID', 'Startdate'])knio.output_tables[0] = knio.Table.from_pandas(filled_df)knio.output_tables[1] = knio.Table.from_pandas(result_df) The general approach I'm going to take is:For each ID, sort the date pairs.Check if there are gaps between the end date of one pair and the start date of the next pair.If there's a gap, insert a new row with the end date of the previous pair plus one day as the start date and the start date ofthe next pair minus one day as the end date. How to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extensionKNIME, ChatGPT and Python (and some waffles)https://medium.com/p/c05709dd3bf5 Test file.xlsxresultsNode 6 Excel Reader Python Script Sorter Sorter Fill gaps in pairs of dates within a year import knime.scripting.io as knioimport pandas as pd# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pandas()# Convert strings to datetimedf['Start date'] = pd.to_datetime(df['Start date'])df['End date'] = pd.to_datetime(df['End date'])filled_rows = []for ID, group in df.groupby('ID'): group = group.sort_values(by='Start date') for i in range(len(group) - 1): end_date_1 = group.iloc[i]['End date'] start_date_2 = group.iloc[i+1]['Start date'] # If there's a gap between two dates if end_date_1 + pd.Timedelta(days=1) != start_date_2: filled_rows.append({'ID': ID, 'Start date': end_date_1 + pd.Timedelta(days=1),'End date': start_date_2 - pd.Timedelta(days=1)})filled_df = pd.DataFrame(filled_rows)# Combine the original dataframe with the filled rows and sortresult_df = pd.concat([df, filled_df], ignore_index=True).sort_values(by=['ID', 'Startdate'])knio.output_tables[0] = knio.Table.from_pandas(filled_df)knio.output_tables[1] = knio.Table.from_pandas(result_df) The general approach I'm going to take is:For each ID, sort the date pairs.Check if there are gaps between the end date of one pair and the start date of the next pair.If there's a gap, insert a new row with the end date of the previous pair plus one day as the start date and the start date ofthe next pair minus one day as the end date. How to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extensionKNIME, ChatGPT and Python (and some waffles)https://medium.com/p/c05709dd3bf5 Test file.xlsxresultsNode 6 Excel Reader Python Script Sorter Sorter

Nodes

Extensions

Links