Icon

Challenge3

Here is my answer for Just KNIME it Challenge 2.

あなたは、CDCから2017年のがんデータを検査用に受け取り、以下の質問に答えることを目標としています。
(1) 女性に最も頻繁に発生する癌の種類の上位 5 つは何ですか?
(2) 男性に最も頻繁に発生する上位5つのがん種は何ですか?
(3)がん罹患率(がん患者数を人口

Level: EasyDescription: You received the 2017 cancer data from the CDC for inspection, and your goalis to answer the following questions: (1) What are the top-5 most frequent cancer types occurring in females? (2) What are the top-5 most frequent cancer types occurring in males? (3) Which US state has the highest cancer incidence rate (that is, the highest number ofcancer cases normalized by the size of its population)?Author: Janina MothesDataset: Cancer and Population Data in the KNIME HubOur solution will appear here next Tuesday. In the meantime, feel free to discuss your workon the KNIME forum or on social media using the hashtag #justknimeit.Remember to upload your solution with tag justknimeit-3 to your public space on the KNIMEHub. To increase the visibility of your solution, also post it to its challenge thread in theKNIME forum. レベル: 簡単Description: あなたは、CDCから2017年のがんデータを検査用に受け取り、以下の質問に答えることを目標としています。(1) 女性に最も頻繁に発生する癌の種類の上位 5 つは何ですか?(2) 男性に最も頻繁に発生する上位5つのがん種は何ですか?(3)がん罹患率(がん患者数を人口比で正規化したもの)が最も高いのはアメリカのどの州かデータソース:https://hub.knime.com/alinebessa/spaces/Just%20KNIME%20It!%20Datasets/latest/Challenge%203%20-%20Datasets~pZAJOPBXtHiXnRhq/ Data notes: 1. Count with missing cancer code - 1319 rows2. poplulation master has 1 exceeded state name which is "Puerto Rico"3. 126277 rows are missing count#. It seems cancer site code and state code were cross joined.4. Cancer code sometimes has "-" / or letter. It seems cancer site can have multiple points. Thiscovers by cancer code. Not sure what's the difference between "Male and Female Breast" AND" Male and Female Breast, In Situ"... Judging from name, We'll asuume that- the letter should be converted to number based code. - Code needs to be splited by "-", then aggregate by code- site name of site code 0 shows "All". As code naming convension seems 5 digit number,assuming this should be excluded to avoide double count.- "Breast-InSitu-Female" and "Breast-InSitu" were exactly same records. So should be discardone of them. Breast would be replaced by "26000"- "26000-Female" and "26000" with femal gender seems duplicated. Discard "26000-Female" - (31010, 33011, 32010, 25010, 35011, 21071, 21052, 33042, 21041) had duplicated rows My answer(1) What are the top-5 most frequent cancer types occurring in females?1. Male and Female Breast 260002. Esophagus 210103. Other Digestive Organs 211304. Cecum 210415. Cervix Uteri 27010 (2) What are the top-5 most frequent cancer types occurring in males? Male1. Esophagus 210102. Other Digestive Organs 211303. Urinary Bladder, invasive and in situ 290104. Cecum 210415. Nose, Nasal Cavity and Middle Ear 22010(3) Which US state has the highest cancer incidence rate (that is, the highest number ofcancer cases normalized by the size of its population)?Pennsylvania population2017.xlsxCDC_cancer_2017.csvState is missingReplace . to blank52 statesPuerto Rico is missing in cancer data+ PopulationCount is missingcancer code 0by genderrank by genderTop5(1) female(2) maleby statecancer incidence rate rank by state(3) state Excel Reader CSV Reader Row Splitter String Replacer Check Statediscrepancy Joiner Row Splitter Row Splitter Breast relatedpre processing Cancer site code"-" preprocessing Check data bytranslation GroupBy Rank Row Filter Row Splitter GroupBy Math Formula Rank Row Filter Level: EasyDescription: You received the 2017 cancer data from the CDC for inspection, and your goalis to answer the following questions: (1) What are the top-5 most frequent cancer types occurring in females? (2) What are the top-5 most frequent cancer types occurring in males? (3) Which US state has the highest cancer incidence rate (that is, the highest number ofcancer cases normalized by the size of its population)?Author: Janina MothesDataset: Cancer and Population Data in the KNIME HubOur solution will appear here next Tuesday. In the meantime, feel free to discuss your workon the KNIME forum or on social media using the hashtag #justknimeit.Remember to upload your solution with tag justknimeit-3 to your public space on the KNIMEHub. To increase the visibility of your solution, also post it to its challenge thread in theKNIME forum. レベル: 簡単Description: あなたは、CDCから2017年のがんデータを検査用に受け取り、以下の質問に答えることを目標としています。(1) 女性に最も頻繁に発生する癌の種類の上位 5 つは何ですか?(2) 男性に最も頻繁に発生する上位5つのがん種は何ですか?(3)がん罹患率(がん患者数を人口比で正規化したもの)が最も高いのはアメリカのどの州かデータソース:https://hub.knime.com/alinebessa/spaces/Just%20KNIME%20It!%20Datasets/latest/Challenge%203%20-%20Datasets~pZAJOPBXtHiXnRhq/ Data notes: 1. Count with missing cancer code - 1319 rows2. poplulation master has 1 exceeded state name which is "Puerto Rico"3. 126277 rows are missing count#. It seems cancer site code and state code were cross joined.4. Cancer code sometimes has "-" / or letter. It seems cancer site can have multiple points. Thiscovers by cancer code. Not sure what's the difference between "Male and Female Breast" AND" Male and Female Breast, In Situ"... Judging from name, We'll asuume that- the letter should be converted to number based code. - Code needs to be splited by "-", then aggregate by code- site name of site code 0 shows "All". As code naming convension seems 5 digit number,assuming this should be excluded to avoide double count.- "Breast-InSitu-Female" and "Breast-InSitu" were exactly same records. So should be discardone of them. Breast would be replaced by "26000"- "26000-Female" and "26000" with femal gender seems duplicated. Discard "26000-Female" - (31010, 33011, 32010, 25010, 35011, 21071, 21052, 33042, 21041) had duplicated rows My answer(1) What are the top-5 most frequent cancer types occurring in females?1. Male and Female Breast 260002. Esophagus 210103. Other Digestive Organs 211304. Cecum 210415. Cervix Uteri 27010 (2) What are the top-5 most frequent cancer types occurring in males? Male1. Esophagus 210102. Other Digestive Organs 211303. Urinary Bladder, invasive and in situ 290104. Cecum 210415. Nose, Nasal Cavity and Middle Ear 22010(3) Which US state has the highest cancer incidence rate (that is, the highest number ofcancer cases normalized by the size of its population)?Pennsylvania population2017.xlsxCDC_cancer_2017.csvState is missingReplace . to blank52 statesPuerto Rico is missing in cancer data+ PopulationCount is missingcancer code 0by genderrank by genderTop5(1) female(2) maleby statecancer incidence rate rank by state(3) state Excel Reader CSV Reader Row Splitter String Replacer Check Statediscrepancy Joiner Row Splitter Row Splitter Breast relatedpre processing Cancer site code"-" preprocessing Check data bytranslation GroupBy Rank Row Filter Row Splitter GroupBy Math Formula Rank Row Filter

Nodes

Extensions

Links