Icon

Bonus. Time Series

Time Series

"Time Series" exercise for advanced Life Science User Training
- Extract granularities from a timestamp
- Aggregate by time granularities
- Calculate moving average
- Calculate moving aggregation

Literature Search on PubmedThe goal of this exercise is to get all the publications associated to smallpox from the Pubmed Database and analyze how the number of publications have changed over time. Step 1The Component PubMed Document Extractor loads allpublications associated to smallpox.The Document Data Extractor node extracts title, abstract,author and publication date for each publicaiton.The Column Filter node remove document and query from thedata table. Step 2Use ExtractDate&Time Fields toextract year to aseparate column. Step 3Remove missing integer valuesusing the Missing Value node.Use GroupBy node to group allpublication per year and count thenumber of titles. Step 4Use the Moving Averagenode to calculate the averageof the titles using the CenterGaussian with a Window of 9. Step 5Use the Moving Aggregationnode to calculate themaximum of the titles usingthe central window of size 9. Activity I: Conversion and FilteringConvert publication dates to Date&Time format and filter for publication during 1970-2000 Step 1Read the publications fromsmallpox.csv using the FileReader node Step 2Convert the publication datefrom string format toDate&Time format using theString to Date&Time node Step 3Filter for all publications from1970-2000 using theDate&Time-based Row Filter Activity II: Analyse Time SeriesAnalyze the amount of publications related to smallpox over the last years Column Filter Missing Value Document DataExtractor PubMed DocumentExtractor Line Plot Literature Search on PubmedThe goal of this exercise is to get all the publications associated to smallpox from the Pubmed Database and analyze how the number of publications have changed over time. Step 1The Component PubMed Document Extractor loads allpublications associated to smallpox.The Document Data Extractor node extracts title, abstract,author and publication date for each publicaiton.The Column Filter node remove document and query from thedata table. Step 2Use ExtractDate&Time Fields toextract year to aseparate column. Step 3Remove missing integer valuesusing the Missing Value node.Use GroupBy node to group allpublication per year and count thenumber of titles. Step 4Use the Moving Averagenode to calculate the averageof the titles using the CenterGaussian with a Window of 9. Step 5Use the Moving Aggregationnode to calculate themaximum of the titles usingthe central window of size 9. Activity I: Conversion and FilteringConvert publication dates to Date&Time format and filter for publication during 1970-2000 Step 1Read the publications fromsmallpox.csv using the FileReader node Step 2Convert the publication datefrom string format toDate&Time format using theString to Date&Time node Step 3Filter for all publications from1970-2000 using theDate&Time-based Row Filter Activity II: Analyse Time SeriesAnalyze the amount of publications related to smallpox over the last years Column Filter Missing Value Document DataExtractor PubMed DocumentExtractor Line Plot

Nodes

Extensions

Links