Icon

KNIME_​date_​grouping_​across_​rows

It appeared that the KEY was a grouping so that A-G were in one KEY group and H-M were in another.

Therefore my understanding is that the date matching was to only occur within those groups.

The process in my head was more straightforward than my resultant workflow:
1. Group each of the rows by KEY (e.g. XXX_YYY_ZZZ and DDD_XXX_RRR) and for each of those find the current minimum date.
2. Mark the row that was the minimum date in each case as “START” to show it was the START of a 65 day period
3. Within each group, then join all the other rows to this START row and compare dates. Any rows that were within 65 days of the START should be marked as “DESELECT”.
4. Collect all rows that are either START or DESELECT and remove them from the input data stream.
5. The new input stream for the next iteration consists of all remaining rows.
6. Return to step 1 and repeat process until there are no rows remaining that are not either START or DESELECT
7. On completion of the loop, all START rows are marked as”1” and all DESELECT rows are marked as “0”

Dry running that process:
Iterations
1 START: A, H DESELECT: B, I
2 START: C, J DESELECT: D, K
3 START: E, L DESELECT: F,G
4 START: M DESELECT: N

That felt relatively straightforward. Then I attempted a workflow….

This workflow demonstrates that "justbecause you can... it doesn't mean youshould" ;-) It appeared that the KEY was a grouping so that A-G were in one KEY group and H-M were in another. Therefore my understanding is that the date matching was to only occur within those groups. The process in my head was more straightforward than my resultant workflow:1. Group each of the rows by KEY (e.g. XXX_YYY_ZZZ and DDD_XXX_RRR) and for each of those find the current minimum date. 2. Mark the row that was the minimum date in each case as “START” to show it was the START of a 65 day period3. Within each group, then join all the other rows to this START row and compare dates. Any rows that were within 65 days of the STARTshould be marked as “DESELECT”.4. Collect all rows that are either START or DESELECT and remove them from the input data stream.5. The new input stream for the next iteration consists of all remaining rows. 6. Return to step 1 and repeat process until there are no rows remaining that are not either START or DESELECT7. On completion of the loop, all START rows are marked as”1” and all DESELECT rows are marked as “0”Dry running that process:Iterations1 START: A, H DESELECT: B, I2 START: C, J DESELECT: D, K3 START: E, L DESELECT: F,G4 START: M DESELECT: NThat felt relatively straightforward. Then I attempted a workflow…. 19 April 2021 @takbb Brian Bates Sample DataFetch real dates for string CREATEDATESemi Cross Join the current START for each key group with all other rows forsame keyMark "DESELECT"on all rows wheredays difference fromcurrent START isless than 65 daysCalc Days DifferenceGet EarliestCREATEDATE foreach current group of KEYLoop until LoopEndVariable="true"(i.e. keep looping while we have rowsto process)The lower input flow is those rows that are neither START nor DESELECTED and these will go back round the loop for the next attemptJoin row with earliest create date for each keySend the remaining "X" rows round againMark all DE_SELECTas "X"Mark "START" againstearliest as this is a Start of a 65 day blockTidy superflous columnsSet LoopEndVariable to falseIteratePick up the current START rowsand current DESELECTED rows and send them out of the loopto be collected on completionDetermine if we haverun out of rowsEnd for Empty Table SwitchTidy superflous columnsSort back into original row orderGenerate a numeric RowNumber to allowus to sort back to original order laterSet LoopEndVariableto "false" Set LoopEndVariable to "true"Create "count" columnwhere 1=Start of 65 days0=Within 65 days of startTidy Superflous columnsDummy rowEmpty table so graba dummy rowto satisfy Knime! Table Creator String to Date&Time Joiner Rule Engine Date&TimeDifference GroupBy Recursive Loop End Joiner (Labs) Row Splitter ConstantValue Column String Manipulation Column Filter Java Edit Variable RecursiveLoop Start Concatenate Empty Table Switch End IF Column Filter Sorter Create NumericRowNumber Java Snippet Java Snippet Rule Engine Column Filter Table Creator Concatenate This workflow demonstrates that "justbecause you can... it doesn't mean youshould" ;-) It appeared that the KEY was a grouping so that A-G were in one KEY group and H-M were in another. Therefore my understanding is that the date matching was to only occur within those groups. The process in my head was more straightforward than my resultant workflow:1. Group each of the rows by KEY (e.g. XXX_YYY_ZZZ and DDD_XXX_RRR) and for each of those find the current minimum date. 2. Mark the row that was the minimum date in each case as “START” to show it was the START of a 65 day period3. Within each group, then join all the other rows to this START row and compare dates. Any rows that were within 65 days of the STARTshould be marked as “DESELECT”.4. Collect all rows that are either START or DESELECT and remove them from the input data stream.5. The new input stream for the next iteration consists of all remaining rows. 6. Return to step 1 and repeat process until there are no rows remaining that are not either START or DESELECT7. On completion of the loop, all START rows are marked as”1” and all DESELECT rows are marked as “0”Dry running that process:Iterations1 START: A, H DESELECT: B, I2 START: C, J DESELECT: D, K3 START: E, L DESELECT: F,G4 START: M DESELECT: NThat felt relatively straightforward. Then I attempted a workflow…. 19 April 2021 @takbb Brian Bates Sample DataFetch real dates for string CREATEDATESemi Cross Join the current START for each key group with all other rows forsame keyMark "DESELECT"on all rows wheredays difference fromcurrent START isless than 65 daysCalc Days DifferenceGet EarliestCREATEDATE foreach current group of KEYLoop until LoopEndVariable="true"(i.e. keep looping while we have rowsto process)The lower input flow is those rows that are neither START nor DESELECTED and these will go back round the loop for the next attemptJoin row with earliest create date for each keySend the remaining "X" rows round againMark all DE_SELECTas "X"Mark "START" againstearliest as this is a Start of a 65 day blockTidy superflous columnsSet LoopEndVariable to falseIteratePick up the current START rowsand current DESELECTED rows and send them out of the loopto be collected on completionDetermine if we haverun out of rowsEnd for Empty Table SwitchTidy superflous columnsSort back into original row orderGenerate a numeric RowNumber to allowus to sort back to original order laterSet LoopEndVariableto "false" Set LoopEndVariable to "true"Create "count" columnwhere 1=Start of 65 days0=Within 65 days of startTidy Superflous columnsDummy rowEmpty table so graba dummy rowto satisfy Knime!Table Creator String to Date&Time Joiner Rule Engine Date&TimeDifference GroupBy Recursive Loop End Joiner (Labs) Row Splitter ConstantValue Column String Manipulation Column Filter Java Edit Variable RecursiveLoop Start Concatenate Empty Table Switch End IF Column Filter Sorter Create NumericRowNumber Java Snippet Java Snippet Rule Engine Column Filter Table Creator Concatenate

Nodes

Extensions

Links