Icon

JKISeason3-13_​tomljh

Detecting Fraudulent Contracts

Level: Easy to Medium

Description: You work in the contracts department of a software company and are asked to detect fraudulent (or wrong) contracts based on their contract value. Given the PDF versions of the contracts, you need to extract their contract value (and, optionally, any other fields you find useful) and detect outliers among them. You can either use simpler outlier detection techniques, such as those based on statistics or visualization, or more advanced ones based on machine learning.

Author: Lada Rudnitckaia

Dataset: Contract data in the KNIME Community Hub

PS: Due to the particularity of the data, it is possible to simply retrievethe payment amount here. Method 1: Based on statistics or visualization Method 2: Unsupervised learningPS: 1.Unsupervised learning is performed on the original distribution space of the data, making itconvenient to intuitively set the "Epsilon" parameter.2.Due to the characteristics of the data itself, setting it to 100 or 1000 is acceptable here. Read DataContracts/*.pdfGet the main information fieldsDefault ParametersDelete duplicate blank charactersOrganize column names and data typesVisualization of abnormal dataAbnormal dataGet file nameOnly get payment amountEpsilon = 1000Cluster = "Noise" Tika Parser String Splitter(Regex) String Cleaner Table Manipulator Box Plot Numeric Outliers String Splitter(Regex) String Splitter(Regex) DBSCAN Numeric Distances Row Filter PS: Due to the particularity of the data, it is possible to simply retrievethe payment amount here. Method 1: Based on statistics or visualization Method 2: Unsupervised learningPS: 1.Unsupervised learning is performed on the original distribution space of the data, making itconvenient to intuitively set the "Epsilon" parameter.2.Due to the characteristics of the data itself, setting it to 100 or 1000 is acceptable here. Read DataContracts/*.pdfGet the main information fieldsDefault ParametersDelete duplicate blank charactersOrganize column names and data typesVisualization of abnormal dataAbnormal dataGet file nameOnly get payment amountEpsilon = 1000Cluster = "Noise"Tika Parser String Splitter(Regex) String Cleaner Table Manipulator Box Plot Numeric Outliers String Splitter(Regex) String Splitter(Regex) DBSCAN Numeric Distances Row Filter

Nodes

Extensions

Links