Icon

Just KNIME It S03 _​ CH13 _​ Detecting Fraudulent Contracts

You work in the contracts department of a software company and are asked to detect fraudulent (or wrong) contracts based on their contract value. Given the PDF versions of the contracts, you need to extract their contract value (and, optionally, any other fields you find useful) and detect outliers among them. You can either use simpler outlier detection techniques, such as those based on statistics or visualization, or more advanced ones based on machine learning.

Author: Lada Rudnitckaia

Just KNIME It - Season3 - Challenge 13: Detecting Fraudulent Contractshttps://hub.knime.com/-/spaces/-/~7Gg-3sOHMyeHGru9/current-state/ (1) Tukey's range test JKI S03 CH 13 Detecting Outliers'Tukey's range test' RESULTS (2) DBSCAN Detecting Outliers'DBSCAN' RESULTS (3) Numeric Ouliers Detecting Outliers'DBSCAN' RESULTS get PDFs Content (Tika Parser)from current workflow data area [CWDA]../data/PDF_files/contractsdata set:regex extract contract details exclude $Content$ column outlier detection 'Tukey's range test' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot)distance definition euclideandensity-based clustering start groupping by 'Product'collect resultsoutlier detection ' DBSCAN' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot)start groupping by 'Product'collect resultsNode 1374Node 1377Clustercolumnoutlier detection ' Numeric Outlyers' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot) Read PDFs from CWDA Column Expressions Column Filter Python Script Numeric Distances DBSCAN Group Loop Start Loop End Python Script Group Loop Start Loop End Numeric Outliers Joiner Rule Engine Python Script Just KNIME It - Season3 - Challenge 13: Detecting Fraudulent Contractshttps://hub.knime.com/-/spaces/-/~7Gg-3sOHMyeHGru9/current-state/ (1) Tukey's range test JKI S03 CH 13 Detecting Outliers'Tukey's range test' RESULTS (2) DBSCAN Detecting Outliers'DBSCAN' RESULTS (3) Numeric Ouliers Detecting Outliers'DBSCAN' RESULTS get PDFs Content (Tika Parser)from current workflow data area [CWDA]../data/PDF_files/contractsdata set:regex extract contract details exclude $Content$ column outlier detection 'Tukey's range test' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot)distance definition euclideandensity-based clustering start groupping by 'Product'collect resultsoutlier detection ' DBSCAN' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot)start groupping by 'Product'collect resultsNode 1374Node 1377Clustercolumnoutlier detection ' Numeric Outlyers' (grouped by 'Product')upper == tukey's flagged [data frame]middle == outliers [data frame]lower == chart mosaic [image] (counts bar chart, box-plot) Read PDFs from CWDA Column Expressions Column Filter Python Script Numeric Distances DBSCAN Group Loop Start Loop End Python Script Group Loop Start Loop End Numeric Outliers Joiner Rule Engine Python Script

Nodes

Extensions

Links