Icon

05. Time Series - solution

Time Series Solution

Solution for "Time Series" exercise for advanced Life Science User Training - Extract granularities from a timestamp - Aggregate by time granularities - Calculate moving average - Calculate moving aggregation

Literature Search on PubmedThe goal of this exercise is to get all the publications associated to smallpox from the Pubmed Database and analyze how the number of publications have changed over time. Step 1The Component PubMed Document Extractor loads allpublications associated to smallpox.The Document Data Extractor node extracts title, abstract,author and publication date for each publicaiton.The Column Filter node remove document and query from thedata table. Step 2Use ExtractDate&Time Fields toextract year to aseparate column. Step 3Remove missing integer valuesusing the Missing Value node.Use GroupBy node to group allpublication per year and count thenumber of titles. Step 4Use the Moving Averagenode to calculate the averageof the titles using the CenterGaussian with a Window of 9. Step 5Use the Moving Aggregationnode to calculate themaximum of the titles usingthe central window of size 9. Activity I: Conversion and FilteringConvert publication dates to Date&Time format and filter for publication during 1970-2000 Step 1Read the publications fromsmallpox.csv using the FileReader node Step 2Convert the publication datefrom string format toDate&Time format using theString to Date&Time node Step 3Filter for all publications from1970-2000 using theDate&Time-based Row Filter Activity II: Analyse Time SeriesAnalyze the amount of publications related to smallpox over the last years smallpox.csv Column Filter Missing Value Moving Average Extract Date&TimeFields GroupBy Document DataExtractor Moving Aggregation PubMed DocumentExtractor String to Date&Time Date&Time-basedRow Filter Line Plot File Reader Literature Search on PubmedThe goal of this exercise is to get all the publications associated to smallpox from the Pubmed Database and analyze how the number of publications have changed over time. Step 1The Component PubMed Document Extractor loads allpublications associated to smallpox.The Document Data Extractor node extracts title, abstract,author and publication date for each publicaiton.The Column Filter node remove document and query from thedata table. Step 2Use ExtractDate&Time Fields toextract year to aseparate column. Step 3Remove missing integer valuesusing the Missing Value node.Use GroupBy node to group allpublication per year and count thenumber of titles. Step 4Use the Moving Averagenode to calculate the averageof the titles using the CenterGaussian with a Window of 9. Step 5Use the Moving Aggregationnode to calculate themaximum of the titles usingthe central window of size 9. Activity I: Conversion and FilteringConvert publication dates to Date&Time format and filter for publication during 1970-2000 Step 1Read the publications fromsmallpox.csv using the FileReader node Step 2Convert the publication datefrom string format toDate&Time format using theString to Date&Time node Step 3Filter for all publications from1970-2000 using theDate&Time-based Row Filter Activity II: Analyse Time SeriesAnalyze the amount of publications related to smallpox over the last years smallpox.csv Column Filter Missing Value Moving Average Extract Date&TimeFields GroupBy Document DataExtractor Moving Aggregation PubMed DocumentExtractor String to Date&Time Date&Time-basedRow Filter Line Plot File Reader

Nodes

Extensions

Links