Icon

04_​Parsing_​the_​KNIME_​Forum

Parsing the KNIME Forum

This workflow demonstrates how to parse the KNIME Forum. We work in different stages. First we read the list of topics from the forum page. Then we handle different forum categories separately. In each category, we search for all topics that are newer than 9 days. (This limitation is done mainly to speed up the workflow.) If there is a next page available, we also take that into consideration. After parsing the different thread pages for the forum categories, we read all information for the individual topics.

Parsing the KNIME ForumThis workflow extracts topics and contents of the KNIME forum posts Download HTML pages from the web Parse generated XML documents to extract topic and content collect all forumsand all threadsrm missingsget all postsParse first postand thread titlefilter threadswithout commentsget all postsonly commentsenter url from KNIME forum page collect data Column Filter XPath Row Filter MISSING HtmlParser XPath Concatenate Row Filter MISSINGHttpRetriever XPath Table Creator Parsing the KNIME ForumThis workflow extracts topics and contents of the KNIME forum posts Download HTML pages from the web Parse generated XML documents to extract topic and content collect all forumsand all threadsrm missingsget all postsParse first postand thread titlefilter threadswithout commentsget all postsonly commentsenter url from KNIME forum page collect data Column Filter XPath Row Filter MISSING HtmlParser XPath Concatenate Row Filter MISSINGHttpRetriever XPath Table Creator

Nodes

Extensions

Links