Icon

C24_​Fraudulent Email Address Detection

In this challenge you will take the role of cybersecurity analyst, and see if you can identify emails that are trying to pass as legitimate when they are in fact malicious. You notice that bad-actor emails try to trick the receiver by mimicking major email domains. For instance, you notice that @gnail, @gmial, etc. are trying to pass as @gmail. You then decide to get a count of all the domains: those that have the lowest count have a higher probability of being fraudulent. You must also check whether those low-count email domains are trying to pose as the major emails domains or not. Your answer should not mark @unique.com as fraudulent. Note: Try not to hard-code any variables in your workflow, but instead use mean or median for instance. Hint: Checking for string similarity might help.

Read & Tokenise data Group same domains &calculate frequency ofappearance Get AVG Frequency of all rows Split between most common and infrequent domainsCalculate distance between infrequent domains (least frequent) &reference domains (most frequent) Using Median of similarity distance as a gauge. IF a domain is below the gauge, then it's Fake. IF it is above the gauge, then it's true. (because its an infrequent but possiblytrue domain)IF It is 0 then, it's the reference domain --> hence true. Join email accounts + predictions Read DataTokenise on @Rename col to Front & DomainGet FrequencyNode 7Node 8Node 9Node 11Node 12Node 13Node 14Node 15Node 16 CSV Reader Cell Splitter Column Rename GroupBy Similarity Search Numeric RowSplitter Math Formula Table Rowto Variable Joiner Column Rename Joiner Rule Engine Math Formula Read & Tokenise data Group same domains &calculate frequency ofappearance Get AVG Frequency of all rows Split between most common and infrequent domainsCalculate distance between infrequent domains (least frequent) &reference domains (most frequent) Using Median of similarity distance as a gauge. IF a domain is below the gauge, then it's Fake. IF it is above the gauge, then it's true. (because its an infrequent but possiblytrue domain)IF It is 0 then, it's the reference domain --> hence true. Join email accounts + predictions Read DataTokenise on @Rename col to Front & DomainGet FrequencyNode 7Node 8Node 9Node 11Node 12Node 13Node 14Node 15Node 16 CSV Reader Cell Splitter Column Rename GroupBy Similarity Search Numeric RowSplitter Math Formula Table Rowto Variable Joiner Column Rename Joiner Rule Engine Math Formula

Nodes

Extensions

Links