Icon

Journal Entry Anomaly Detection

Two ML Methods For Outlier Detection

Suggested method to use: Isolation Forest

It can be useful to test and evaluate more than one!

Isolation Forest To Find Outliers

This is preferably done with a 'reference' set (as the training set), and another set for the test set.

On the output, it's the classifications/branches with the shortest path that are outliers.
This is since there couldn't be drawn a good correlation between the input features and it's classification (otherwise more features would be correlated well).

This also works with categorical data as well as numeric (unlike above examples).

Data Cleaning and Preparation

Check if each JE is balanced

Find Outliers in DBSCAN Clustering

'Outliers' will be classified as 'noise' while others assigned to a cluster

Check Even Amounts

Rule Based Checks

Journal EntryData
CSV Reader
Debit and creditsas zero if null
Missing Value
Apply IsolationForest to get Mean Length
H2O Isolation Forest Predictor
Get some featuresper JE_ID(journal entry id)
GroupBy
Mark outliersLIKE noise = fraud
Expression
Combine entry date andtime strings
String Manipulation
Remove entry dateand time to avoidconfusion
Column Filter
Of the 'ok' data, wewant some percent tojoin with the 'outlier' datato run as a test.
Table Partitioner
In this dataset, we can seeif it is an anomaly, so splitit. Top: OkBottom: Anomaly
Row Splitter
Create H2O Frame
Table to H2O
30% of 'ok' dataAll of 'anomaly' data
Concatenate
Scorer
Classified as 'outlier'
Row Filter
Create H2O Frame
Table to H2O
Create KNIMETable
H2O to Table
Expression
Z-scorenormalisation
Normalizer
Convert Post date
String to Date&Time
H2O Isolation Forest Learner
Convert Entry date/time
String to Date&Time
H2O Local Context
Open viewto see score
Scorer
Check thatcredit and debitare balanced
Math Formula
Can look at the imbalancedJE in the top port.Ok entries on the bottom.
Row Splitter
Euclidean distance
Numeric Distances
Check if the credit or debitcolumns have precision of0.01
Expression
Train DBSCAN usingEuclidean Distance
DBSCAN

Nodes

Extensions

Links