Icon

15. Fraud_​Detection-Embeddings

Select emailsRight click ->Interactive View, Select some rowsthen click 'Apply' Outlier Detection: Embedding BasedOutlier detection as a proxy for potential fraud detection. We try to detect numerical outliers in the high-dimensional embedding space.In the top right of this section we rank the emails numerically (in the multidimensional embedding space), based on how much they diverge from the rest of the emails; this can allow us to notice potentially suspicious activity. The lower two blocks allow us to visualise this information, in either two or three dimensions. Note that we apply PCA dimension reduction, which basically means we squeeze down the information, from 1500 dimensions, to 2/3 dimensions. Not all of the information is retained in the final plot, but it still allows us to detect outliers to a reasonable degree. Discover numerical outliers in larger semantic spaceFinal table sorted from furthest outlier to least PCA Dimension Reduction and PlottingProject all embeddings on to common space so as to remove influence of components which we are not interested in. This method is efficient computationally. 2D(Right click on 'Scatter Plot'->'Interactive View...') 3D(Right click on '3D Scatter'->'Interactive View...') This is already partially executed as the API service maynot be available. You can try to run to the final output if you like. If there is memory problems running the PCA nodes, thisis usually due to running out of ram memory, so you cando one of the following two things:- You can google 'how to increase Knime memory/ram'.- Easier to do: Place a'Row Sampling' node after thisnode, and choose, for example 100 rows (or howevermany you want to experiment with). Fetch embeddings for all emailsusing a custom model Fetch embeddings for all emailsusing (optionally) OpenAI's API Node 764Node 777Node 789Node 818Reduce to 3-dimensionsFitting/ 'Preprocess'for PCAassign colorsto classesNode 896Node 897Node 914Reduce to 2-dimensionsassign colorsto classesNode 919Reduce to 100-dim firstNot much information lossNode 921Node 922User canselect whichemails toanalyseTake onlyuser-selectedrowsNode 990 ConstantValue Column Column Filter String Manipulation Missing Value PCA Apply PCA Compute Color Manager 3D ScatterPlot (Plotly) MahalanobisDistance OpenAI Embeddings Scatter Plot(JavaScript) PCA Apply Color Manager Distance MatrixCalculate PCA Apply Sorter Column Filter Table View(JavaScript) Rule-basedRow Filter Private Model CSV Reader Select emailsRight click ->Interactive View, Select some rowsthen click 'Apply' Outlier Detection: Embedding BasedOutlier detection as a proxy for potential fraud detection. We try to detect numerical outliers in the high-dimensional embedding space.In the top right of this section we rank the emails numerically (in the multidimensional embedding space), based on how much they diverge from the rest of the emails; this can allow us to notice potentially suspicious activity. The lower two blocks allow us to visualise this information, in either two or three dimensions. Note that we apply PCA dimension reduction, which basically means we squeeze down the information, from 1500 dimensions, to 2/3 dimensions. Not all of the information is retained in the final plot, but it still allows us to detect outliers to a reasonable degree. Discover numerical outliers in larger semantic spaceFinal table sorted from furthest outlier to least PCA Dimension Reduction and PlottingProject all embeddings on to common space so as to remove influence of components which we are not interested in. This method is efficient computationally. 2D(Right click on 'Scatter Plot'->'Interactive View...') 3D(Right click on '3D Scatter'->'Interactive View...') This is already partially executed as the API service maynot be available. You can try to run to the final output if you like. If there is memory problems running the PCA nodes, thisis usually due to running out of ram memory, so you cando one of the following two things:- You can google 'how to increase Knime memory/ram'.- Easier to do: Place a'Row Sampling' node after thisnode, and choose, for example 100 rows (or howevermany you want to experiment with). Fetch embeddings for all emailsusing a custom model Fetch embeddings for all emailsusing (optionally) OpenAI's API Node 764Node 777Node 789Node 818Reduce to 3-dimensionsFitting/ 'Preprocess'for PCAassign colorsto classesNode 896Node 897Node 914Reduce to 2-dimensionsassign colorsto classesNode 919Reduce to 100-dim firstNot much information lossNode 921Node 922User canselect whichemails toanalyseTake onlyuser-selectedrowsNode 990 ConstantValue Column Column Filter String Manipulation Missing Value PCA Apply PCA Compute Color Manager 3D ScatterPlot (Plotly) MahalanobisDistance OpenAI Embeddings Scatter Plot(JavaScript) PCA Apply Color Manager Distance MatrixCalculate PCA Apply Sorter Column Filter Table View(JavaScript) Rule-basedRow Filter Private Model CSV Reader

Nodes

Extensions

Links