Icon

DataCleaning_​WebPortal

Data Cleaning WebPortal 2.0 This workflow implements a first attempt to guided analytics. Data (in folder data) - CRM type-of data, including demopgraphics, product history (insurances), web history. - malaria data setMarketing Goal - Up-selling yet one more insurance: a lawyer insurance. - predict malaria yes/noPre-processing. - assess data quality - interactively remove data columns which are empty, have low variance, almost zero skewness, and/or are highly correlated to other data columns - interactively remove outliersAudit - Build report of action sequences and decision values 1. Upload & Read File Keeps looping till an existing file with extension .csv or .table is selected. 3. Dimensionality Reduction Data set quality &Dimensionality Reduction Collecting Data for Auditing Report original quality final quality 2. Select TargetColumn 4. Final Data Set Quality Data set quality is based oncross-validation error ratio Node 21dummyNode 480Node 482Node 486username, workflow workspace, timestampfile pathof uploaded fileNode 520Node 535Node 543read fileNode 546auditReport.xlsselect target columnGeneric Loop Start Table Creator File Upload Variable ConditionLoop End File Correct? User Infos add filepath Data set quality DimensionalityReduction Summary Read File Data set quality XLS Writer Select Target Data Cleaning WebPortal 2.0 This workflow implements a first attempt to guided analytics. Data (in folder data) - CRM type-of data, including demopgraphics, product history (insurances), web history. - malaria data setMarketing Goal - Up-selling yet one more insurance: a lawyer insurance. - predict malaria yes/noPre-processing. - assess data quality - interactively remove data columns which are empty, have low variance, almost zero skewness, and/or are highly correlated to other data columns - interactively remove outliersAudit - Build report of action sequences and decision values 1. Upload & Read File Keeps looping till an existing file with extension .csv or .table is selected. 3. Dimensionality Reduction Data set quality &Dimensionality Reduction Collecting Data for Auditing Report original quality final quality 2. Select TargetColumn 4. Final Data Set Quality Data set quality is based oncross-validation error ratio Node 21dummyNode 480Node 482Node 486username, workflow workspace, timestampfile pathof uploaded fileNode 520Node 535Node 543read fileNode 546auditReport.xlsselect target columnGeneric Loop Start Table Creator File Upload Variable ConditionLoop End File Correct? User Infos add filepath Data set quality DimensionalityReduction Summary Read File Data set quality XLS Writer Select Target

Nodes

Extensions

Links