Icon

JustKNIMEit_​Challenge9_​Anonymization_​JeromeTreboux

JustKNIMEit_Challenge9_Anonymization_JeromeTreboux
Just KNIME it! Challenge 9 - Anonymization - Jérôme TrebouxI first remove column0 as it is a counter used as an ID.!! I reset the column BodyType for some row as it may contain the name of the player !!I then anonymize the names and photo using the node anonymization.I shuffle the column and reset the row ID.Finally I rename the columnsDescription of the Challenge:You would like to post a question on the KNIME forum, but you have confidential data that you cannot share. In this challenge you will create a workflow which removes (or transforms) any columns that reveal anything confidential in your data (suchas location, name, gender, etc.). After that, you should shuffle the remaining columns' rows such that each numeric column maintains its original statistical distribution but does not have a relationship with any other column. Rename thesecolumns as well, such that in the end of your workflow they do not have any specific meaning. Let's see an example:Before anonymizationRow Name Fav_Num Muscle_Mass0 Victor 7 101 Aline 3 202 Scott 42 30After anonymizationRow column column (#1)0 3 301 42 102 7 20 Node 1Remove column 0Remove Player name in column BodyTypeName and photoanonymizationCheck the anonymityFor each columnsShuffleAppend allcolumnsReset RowIDAnonymization of the columnswith duplicates using (#1, ...) CSV Reader Column Filter Python Script Anonymization AnonymityAssessment Column ListLoop Start Shuffle Loop End (ColumnAppend) RowID Column Rename(Regex) Just KNIME it! Challenge 9 - Anonymization - Jérôme TrebouxI first remove column0 as it is a counter used as an ID.!! I reset the column BodyType for some row as it may contain the name of the player !!I then anonymize the names and photo using the node anonymization.I shuffle the column and reset the row ID.Finally I rename the columnsDescription of the Challenge:You would like to post a question on the KNIME forum, but you have confidential data that you cannot share. In this challenge you will create a workflow which removes (or transforms) any columns that reveal anything confidential in your data (suchas location, name, gender, etc.). After that, you should shuffle the remaining columns' rows such that each numeric column maintains its original statistical distribution but does not have a relationship with any other column. Rename thesecolumns as well, such that in the end of your workflow they do not have any specific meaning. Let's see an example:Before anonymizationRow Name Fav_Num Muscle_Mass0 Victor 7 101 Aline 3 202 Scott 42 30After anonymizationRow column column (#1)0 3 301 42 102 7 20 Node 1Remove column 0Remove Player name in column BodyTypeName and photoanonymizationCheck the anonymityFor each columnsShuffleAppend allcolumnsReset RowIDAnonymization of the columnswith duplicates using (#1, ...) CSV Reader Column Filter Python Script Anonymization AnonymityAssessment Column ListLoop Start Shuffle Loop End (ColumnAppend) RowID Column Rename(Regex)

Nodes

Extensions

Links