Icon

Deposition Parser

Here we unpack the JSON structure the Microsoft API sends back Note:I saved the output of the form recognizer API ina .table file so that we can load it withoutinvoking the API, as I had to remove mycredentials from the Azure Form Recognizernode above, which resets all following nodes.This is only a workaround so the workflow canbe executed without an actual form recognizersubscription. Simple density-based clustering to identify all words on a line Send to MSPDFGet pagesEach page one rowOnly responsebodywordsand page numberEach word one rowwordsand page numberCalculate thebounding boxfor each wordCluster words basedon their y-coordinate.Words close togetherin dimension y areon one line.Define a distanceonly based on they-coordinateof each wordCluster_0 -> 0Cluster_1 -> 1etcConcat linesLine numbersSort by lineand x coordas text goes topto bottom, leftto rightlineNumberOnly valid line numberslineNumberto integerOnly rowswith line numbersBetter column namesPersonID for conversationsnippetsKeep onlycounter wherewe have a nameFill followinglines with last seenconversation snippetIDGroup byconvo snippet IDCalculate text,min line, max line,min page, max pageper convoRemovestartSplit line numberfrom textRemove oldtext columnsplit_1 = textCleanupExample outputby Azure Form Recognizer Azure FormRecognizer List Files/Folders Files toBinary Objects Path to URI JSON Path Ungroup Column Filter JSON Path Ungroup JSON Path Java Snippet DBSCAN Numeric Distances String Manipulation GroupBy String To Number Sorter String Manipulation Rule Engine String To Number Row Splitter Table Manipulator Regex Split Counter Generation Rule Engine Missing Value GroupBy Row Filter Regex Split Column Filter Column Rename Table Manipulator Table Reader Here we unpack the JSON structure the Microsoft API sends back Note:I saved the output of the form recognizer API ina .table file so that we can load it withoutinvoking the API, as I had to remove mycredentials from the Azure Form Recognizernode above, which resets all following nodes.This is only a workaround so the workflow canbe executed without an actual form recognizersubscription. Simple density-based clustering to identify all words on a line Send to MSPDFGet pagesEach page one rowOnly responsebodywordsand page numberEach word one rowwordsand page numberCalculate thebounding boxfor each wordCluster words basedon their y-coordinate.Words close togetherin dimension y areon one line.Define a distanceonly based on they-coordinateof each wordCluster_0 -> 0Cluster_1 -> 1etcConcat linesLine numbersSort by lineand x coordas text goes topto bottom, leftto rightlineNumberOnly valid line numberslineNumberto integerOnly rowswith line numbersBetter column namesPersonID for conversationsnippetsKeep onlycounter wherewe have a nameFill followinglines with last seenconversation snippetIDGroup byconvo snippet IDCalculate text,min line, max line,min page, max pageper convoRemovestartSplit line numberfrom textRemove oldtext columnsplit_1 = textCleanupExample outputby Azure Form RecognizerAzure FormRecognizer List Files/Folders Files toBinary Objects Path to URI JSON Path Ungroup Column Filter JSON Path Ungroup JSON Path Java Snippet DBSCAN Numeric Distances String Manipulation GroupBy String To Number Sorter String Manipulation Rule Engine String To Number Row Splitter Table Manipulator Regex Split Counter Generation Rule Engine Missing Value GroupBy Row Filter Regex Split Column Filter Column Rename Table Manipulator Table Reader

Nodes

Extensions

Links