Icon

01_​Guided_​Labeling_​for_​Document_​Classification

Guided Labeling for Document Classification
Guided Labeling for Document ClassificationThis workflow defines a fully automated web based application that will label your data using active learning. The workflow was designed for business analysts to easily go through documents to be labeled in any number of classes. In each iterationthe user labels more documents and the model is trained using the already labeled instances. With every new iteration, the model proposes the most uncertain documents using the entropy scorer node. Once the user is happy with the performanceachieved with the available labels, they can exit the loop and export the model to label the remaining instances. Data Source and LicenseIMDB Review Dataset : kaggle.com/utathya/imdb-review-datasetData License : Database Contents License (DbCL) v1.0 The Process Step by Step1. Upload your documents and enter / upload the labels you want to use2. Start labeling your data3. Monitor model accuracy as you provide more labels4. When the accuracy reaches a desired amount, exit the loop5. Download the model and the labels, and visualize the results Show user currentpredictions and ask formore labels. Allow user to downloadthe model trained on allthe labeld data. top output : data to train modelbottom output:past iterations statsCreate empty table to track accuracies.top: new labelsbottom : new statsRemove current iter. statsRemove current iter. probsport 0: already labeledport 1: still to be labeledport 2: iterations statsRecursive LoopStart (2 ports) Table Creator Recursive LoopEnd (2 ports) Label Concatenate Text Preprocessing Upload Deploy Pre-process forVisualization Joiner Column Filter Column Splitter Initialize / Train Classifierwith Available Labels Guided Labeling for Document ClassificationThis workflow defines a fully automated web based application that will label your data using active learning. The workflow was designed for business analysts to easily go through documents to be labeled in any number of classes. In each iterationthe user labels more documents and the model is trained using the already labeled instances. With every new iteration, the model proposes the most uncertain documents using the entropy scorer node. Once the user is happy with the performanceachieved with the available labels, they can exit the loop and export the model to label the remaining instances. Data Source and LicenseIMDB Review Dataset : kaggle.com/utathya/imdb-review-datasetData License : Database Contents License (DbCL) v1.0 The Process Step by Step1. Upload your documents and enter / upload the labels you want to use2. Start labeling your data3. Monitor model accuracy as you provide more labels4. When the accuracy reaches a desired amount, exit the loop5. Download the model and the labels, and visualize the results Show user currentpredictions and ask formore labels. Allow user to downloadthe model trained on allthe labeld data. top output : data to train modelbottom output:past iterations statsCreate empty table to track accuracies.top: new labelsbottom : new statsRemove current iter. statsRemove current iter. probsport 0: already labeledport 1: still to be labeledport 2: iterations statsRecursive LoopStart (2 ports) Table Creator Recursive LoopEnd (2 ports) Label Concatenate Text Preprocessing Upload Deploy Pre-process forVisualization Joiner Column Filter Column Splitter Initialize / Train Classifierwith Available Labels

Nodes

Extensions

Links