Icon

kn_​forum_​72565_​python_​missing_​date_​pairs

Fill gaps in pairs of dates within a year

Fill gaps in pairs of dates within a year
https://forum.knime.com/t/insert-missing-dates-range/72565/9?u=mlauber71

Fill gaps in pairs of dates within a yearhttps://forum.knime.com/t/insert-missing-dates-range/72565/9?u=mlauber71 # Import the necessary modulesimport knime.scripting.io as knioimport pandas as pd# Fetch the input table from the KNIME node and convert it into a pandas DataFramedf = knio.input_tables[0].to_pandas()# Convert the 'Start date' and 'End date' columns from strings to datetime formatdf['Start date'] = pd.to_datetime(df['Start date'])df['End date'] = pd.to_datetime(df['End date'])# Create an empty list to store rows with filled date gapsfilled_rows = []# Group the DataFrame by the 'ID' columnfor ID, group in df.groupby('ID'): # Sort each group by 'Start date' group = group.sort_values(by='Start date') # Iterate through each row of the group to identify gaps for i in range(len(group) - 1): # Get the 'End date' of the current row end_date_1 = group.iloc[i]['End date'] # Get the 'Start date' of the next row start_date_2 = group.iloc[i+1]['Start date'] # Check if there's a gap between the end date of the current row and the start date of the next row if end_date_1 + pd.Timedelta(days=1) != start_date_2: # If there's a gap, create a new row that fills the gap and add it to the filled_rows list filled_rows.append({'ID': ID, 'Start date': end_date_1 + pd.Timedelta(days=1), 'End date': start_date_2 -pd.Timedelta(days=1)})# Convert the filled_rows list into a pandas DataFramefilled_df = pd.DataFrame(filled_rows)# Combine the original DataFrame with the filled rows DataFrame, reset the index, and sort by 'ID' and 'Start date'result_df = pd.concat([df, filled_df], ignore_index=True).sort_values(by=['ID', 'Start date'])# Convert the filled_df DataFrame into a KNIME table and set it as the first output of the nodeknio.output_tables[0] = knio.Table.from_pandas(filled_df)# Convert the result_df DataFrame into a KNIME table and set it as the second output of the nodeknio.output_tables[1] = knio.Table.from_pandas(result_df) The general approach I'm going to take is:For each ID, sort the date pairs.Check if there are gaps between the end date of one pair and the start date of the next pair.If there's a gap, insert a new row with the end date of the previous pair plus one day as the start date and the start date ofthe next pair minus one day as the end date. How to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extensionKNIME, ChatGPT and Python (and some waffles)https://medium.com/p/c05709dd3bf5 Test file.xlsxresultsNode 6 Excel Reader Python Script Sorter Sorter Fill gaps in pairs of dates within a yearhttps://forum.knime.com/t/insert-missing-dates-range/72565/9?u=mlauber71 # Import the necessary modulesimport knime.scripting.io as knioimport pandas as pd# Fetch the input table from the KNIME node and convert it into a pandas DataFramedf = knio.input_tables[0].to_pandas()# Convert the 'Start date' and 'End date' columns from strings to datetime formatdf['Start date'] = pd.to_datetime(df['Start date'])df['End date'] = pd.to_datetime(df['End date'])# Create an empty list to store rows with filled date gapsfilled_rows = []# Group the DataFrame by the 'ID' columnfor ID, group in df.groupby('ID'): # Sort each group by 'Start date' group = group.sort_values(by='Start date') # Iterate through each row of the group to identify gaps for i in range(len(group) - 1): # Get the 'End date' of the current row end_date_1 = group.iloc[i]['End date'] # Get the 'Start date' of the next row start_date_2 = group.iloc[i+1]['Start date'] # Check if there's a gap between the end date of the current row and the start date of the next row if end_date_1 + pd.Timedelta(days=1) != start_date_2: # If there's a gap, create a new row that fills the gap and add it to the filled_rows list filled_rows.append({'ID': ID, 'Start date': end_date_1 + pd.Timedelta(days=1), 'End date': start_date_2 -pd.Timedelta(days=1)})# Convert the filled_rows list into a pandas DataFramefilled_df = pd.DataFrame(filled_rows)# Combine the original DataFrame with the filled rows DataFrame, reset the index, and sort by 'ID' and 'Start date'result_df = pd.concat([df, filled_df], ignore_index=True).sort_values(by=['ID', 'Start date'])# Convert the filled_df DataFrame into a KNIME table and set it as the first output of the nodeknio.output_tables[0] = knio.Table.from_pandas(filled_df)# Convert the result_df DataFrame into a KNIME table and set it as the second output of the nodeknio.output_tables[1] = knio.Table.from_pandas(result_df) The general approach I'm going to take is:For each ID, sort the date pairs.Check if there are gaps between the end date of one pair and the start date of the next pair.If there's a gap, insert a new row with the end date of the previous pair plus one day as the start date and the start date ofthe next pair minus one day as the end date. How to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extensionKNIME, ChatGPT and Python (and some waffles)https://medium.com/p/c05709dd3bf5 Test file.xlsxresultsNode 6 Excel Reader Python Script Sorter Sorter

Nodes

Extensions

Links