Icon

JKISeason2-24_​tark

JKISeason2-24

Challenge 24: Fraudulent Email Address Detection
Level: Medium

Description: In this challenge you will take the role of cybersecurity analyst, and see if you can identify emails that are trying to pass as legitimate when they are in fact malicious. You notice that bad-actor emails try to trick the receiver by mimicking major email domains. For instance, you notice that @gnail, @gmial, etc. are trying to pass as @gmail. You then decide to get a count of all the domains: those that have the lowest count have a higher probability of being fraudulent. You must also check whether those low-count email domains are trying to pose as the major emails domains or not. Your answer should not mark @unique.com as fraudulent. Note: Try not to hard-code any variables in your workflow, but instead use mean or median for instance. Hint: Checking for string similarity might help.

Author: Victor Palacios

Dataset: Domains Data in the KNIME Hub

Calculate distances to the reference domains Labelling and calculate distances to the non-fraudulent reference domains. Node 1Node 2Calculate distancesto the referenceNode 11Countthe number oforiginsNode 14Unique domains(reference)Sort acsendingby distanceNode 20Calculatethe maximumnumber of the originwithin the groupLabellingSet the followingfilter thresholdNode 30Node 31the first rowNode 33origin > domain(to use it recursively)Node 35Recalculate distancesto the non-fraudulentreferenceNode 41Node 42CSV Reader Cell Splitter String Matcher Column Rename GroupBy Numeric RowSplitter GroupBy Color Manager Table View Sorter Table Manipulator Math Formula Rule Engine Metanode Joiner RecursiveLoop Start Row Filter Recursive Loop End Column Rename ReferenceRow Filter Metanode Group Loop Start Loop End Calculate distances to the reference domains Labelling and calculate distances to the non-fraudulent reference domains. Node 1Node 2Calculate distancesto the referenceNode 11Countthe number oforiginsNode 14Unique domains(reference)Sort acsendingby distanceNode 20Calculatethe maximumnumber of the originwithin the groupLabellingSet the followingfilter thresholdNode 30Node 31the first rowNode 33origin > domain(to use it recursively)Node 35Recalculate distancesto the non-fraudulentreferenceNode 41Node 42CSV Reader Cell Splitter String Matcher Column Rename GroupBy Numeric RowSplitter GroupBy Color Manager Table View Sorter Table Manipulator Math Formula Rule Engine Metanode Joiner RecursiveLoop Start Row Filter Recursive Loop End Column Rename ReferenceRow Filter Metanode Group Loop Start Loop End

Nodes

Extensions

Links