Icon

Solution

PART 1: Using the Works data table, find a configuration node that creates a drop-downlist of values from the GenreType column. Connect this configuration node to a RowFilter and filter the data set according to the user's selection.-- Select "Comedy" to more easily follow the steps of this problem. PART 2: Using the Paragraphs data table, we need to split the PlainText column so we can count how many wordsare in each record. You will need to split the field twice, first by spaces and then by a newline character (see the hintbelow if you do not know how to designate a newline character). Once the PlainText field is separated into a one-word-per-row structure, summarize the data, grouping by chapter_idand character_id and counting the number of words. After splitting and before summarizing, you should have 894,201records. After summarizing, you should have 4,876 records. PART 3: Join in the Chapters data table to thesummarized word count from the Part 2. Removeduplicate chapter_id column. You should have 4,876rows and 7 columns. PART 4: Join the results from Part 3 with the results from Part 1 using work_id. Summarize theresults to obtain the total word count for each work. Create a variable for the work_id with thelowest total word count.For the Comedy genre, which work is converted into a variable? PART 5: Using the Characters table, perform a series of joins to determine the work and work_idvalues for each character. The resulting table will need to be aggregated to remove duplicatesand sum each character's word count. It should include all columns from the Characters tableplus work_id and the word count. You should have 1,331 rows and 6 columns. PART 6: Filter the results from Part 5 using the variable created in Part 4. This will create a table of word counts bycharacter for the dynamically chosen work. Using a variable expression, create a string that uses the followingstructure (copy/paste):join("../",variable("Title"), "_CharacterWordCount.table")Next, convert this string into a path variable. You MUST select "Relative to" and "Current workflow" in the nodeconfiguration menu. Finally, use the Table Writer to generate an output table in the location designated by the stringabove. Hint: The newline character is denoted by \n. In the Cell Splitter node, make sure you check the box that reads "Use \ asescape character." Select valuesfrom GenreTypeFilter rowsbased on selectionSplit on spacesinto a listUngroup the listinto rowsCount wordsby chapter andcharacterGroup by title andwork_id. Sum word countSplit againon newline characterUngroup the listinto rowsSort wordcount ascendingTurn work_idinto a variableFilter Part 5by the work_idvariable from Part 4Use the string aboveto create a stringfile path variableInner join datastreams on chapter_idInner join streamson work_idJoin withPart 2 results oncharacter_idJoin with Part 3 results onchapter_idSum word countby Character tablefields and work_idWrite output based onpath variableConvert thestring from the previous step intoa path variableworksparagraphschapterscharacters Value SelectionConfiguration Row Filter Cell Splitter Ungroup GroupBy GroupBy Cell Splitter Ungroup Sorter Table Rowto Variable Row Filter VariableExpressions Joiner Joiner Joiner Joiner GroupBy Table Writer String to Path(Variable) Table Reader Table Reader Table Reader Table Reader PART 1: Using the Works data table, find a configuration node that creates a drop-downlist of values from the GenreType column. Connect this configuration node to a RowFilter and filter the data set according to the user's selection.-- Select "Comedy" to more easily follow the steps of this problem. PART 2: Using the Paragraphs data table, we need to split the PlainText column so we can count how many wordsare in each record. You will need to split the field twice, first by spaces and then by a newline character (see the hintbelow if you do not know how to designate a newline character). Once the PlainText field is separated into a one-word-per-row structure, summarize the data, grouping by chapter_idand character_id and counting the number of words. After splitting and before summarizing, you should have 894,201records. After summarizing, you should have 4,876 records. PART 3: Join in the Chapters data table to thesummarized word count from the Part 2. Removeduplicate chapter_id column. You should have 4,876rows and 7 columns. PART 4: Join the results from Part 3 with the results from Part 1 using work_id. Summarize theresults to obtain the total word count for each work. Create a variable for the work_id with thelowest total word count.For the Comedy genre, which work is converted into a variable? PART 5: Using the Characters table, perform a series of joins to determine the work and work_idvalues for each character. The resulting table will need to be aggregated to remove duplicatesand sum each character's word count. It should include all columns from the Characters tableplus work_id and the word count. You should have 1,331 rows and 6 columns. PART 6: Filter the results from Part 5 using the variable created in Part 4. This will create a table of word counts bycharacter for the dynamically chosen work. Using a variable expression, create a string that uses the followingstructure (copy/paste):join("../",variable("Title"), "_CharacterWordCount.table")Next, convert this string into a path variable. You MUST select "Relative to" and "Current workflow" in the nodeconfiguration menu. Finally, use the Table Writer to generate an output table in the location designated by the stringabove. Hint: The newline character is denoted by \n. In the Cell Splitter node, make sure you check the box that reads "Use \ asescape character." Select valuesfrom GenreTypeFilter rowsbased on selectionSplit on spacesinto a listUngroup the listinto rowsCount wordsby chapter andcharacterGroup by title andwork_id. Sum word countSplit againon newline characterUngroup the listinto rowsSort wordcount ascendingTurn work_idinto a variableFilter Part 5by the work_idvariable from Part 4Use the string aboveto create a stringfile path variableInner join datastreams on chapter_idInner join streamson work_idJoin withPart 2 results oncharacter_idJoin with Part 3 results onchapter_idSum word countby Character tablefields and work_idWrite output based onpath variableConvert thestring from the previous step intoa path variableworksparagraphschapterscharacters Value SelectionConfiguration Row Filter Cell Splitter Ungroup GroupBy GroupBy Cell Splitter Ungroup Sorter Table Rowto Variable Row Filter VariableExpressions Joiner Joiner Joiner Joiner GroupBy Table Writer String to Path(Variable) Table Reader Table Reader Table Reader Table Reader

Nodes

Extensions

Links