Icon

JKISeason3-14_​tomljh_​v2

Forecasting Avocado Prices


Level: Medium

Description: You are a data scientist asked to analyze an avocado dataset by your team. The task at hand is to pick a specific avocado type in the whole of the US and forecast its daily average prices. To do that, you should train, apply, and score an ARIMA model. Do you see any seasonality in the line plot or autocorrelation plots? Do you think a seasonal ARIMA (SARIMA) would perform better? For your model, visualize forecasts and compute scoring metrics.

Authors: Roberto Cadili, Swetha Kannan, and Corey Weisinger

Dataset: Avocado Price Data in the KNIME Community Hub

Based on the analysis of the ACF and PACF graphs, manually set the hyperparameters of the SARIMAmodel: (p, d, q)(P, D, Q)m Step 1: Step 2: Manual adjustment Step 3: Check: The fitting ability of the model for all data and whether it has the long-period prediction ability Read Dataavocado.csvtype = "conventional"region = "TotalUS"PS: The requirements of the problem(p, d, q)(P, D, Q)m(1, 0, 0)(0,1,0)52Training set: 117 samples (2015 - 2017), Test set: 52 samples (1 year).52rmse : 0.171mape : 0.1lag = 52 D = 0, P = 1 , Q = 0 ~ 3Delete rows with null valuesOverall situationsort : Date(p, d, q)(P, D, Q)m(1, 0, 0)(0,1,0)52Row Number > 52rmse : 0.067mape : 0.048104 = 2 * 52Keep only two columns:DateAveragePriceAnalysis of the original time seriesConclusion: d = 0, p = 1, q = 0 ~ 4, m=52Residual check: QualifiedRow Number > 52Generate timestamplag = 1D = 1, P = 0 , Q = 0Delete rows with null valuesCSV Reader String to Date&Time Row Filter SARIMA Learner(Labs) Partitioning SARIMA Predictor(Labs) Joiner Numeric Scorer Differencer (Labs) AutocorrelationPlot (Labs) Row Filter Line Plot Line Plot Sorter RowID SARIMA Learner(Labs) Line Plot Joiner Row Filter Numeric Scorer SARIMA Predictor(Labs) Concatenate Line Plot Column Filter AutocorrelationPlot (Labs) ResidualAnalyzer (Labs) Row Filter Column Expressions Differencer (Labs) AutocorrelationPlot (Labs) Row Filter Based on the analysis of the ACF and PACF graphs, manually set the hyperparameters of the SARIMAmodel: (p, d, q)(P, D, Q)m Step 1: Step 2: Manual adjustment Step 3: Check: The fitting ability of the model for all data and whether it has the long-period prediction ability Read Dataavocado.csvtype = "conventional"region = "TotalUS"PS: The requirements of the problem(p, d, q)(P, D, Q)m(1, 0, 0)(0,1,0)52Training set: 117 samples (2015 - 2017), Test set: 52 samples (1 year).52rmse : 0.171mape : 0.1lag = 52 D = 0, P = 1 , Q = 0 ~ 3Delete rows with null valuesOverall situationsort : Date(p, d, q)(P, D, Q)m(1, 0, 0)(0,1,0)52Row Number > 52rmse : 0.067mape : 0.048104 = 2 * 52Keep only two columns:DateAveragePriceAnalysis of the original time seriesConclusion: d = 0, p = 1, q = 0 ~ 4, m=52Residual check: QualifiedRow Number > 52Generate timestamplag = 1D = 1, P = 0 , Q = 0Delete rows with null valuesCSV Reader String to Date&Time Row Filter SARIMA Learner(Labs) Partitioning SARIMA Predictor(Labs) Joiner Numeric Scorer Differencer (Labs) AutocorrelationPlot (Labs) Row Filter Line Plot Line Plot Sorter RowID SARIMA Learner(Labs) Line Plot Joiner Row Filter Numeric Scorer SARIMA Predictor(Labs) Concatenate Line Plot Column Filter AutocorrelationPlot (Labs) ResidualAnalyzer (Labs) Row Filter Column Expressions Differencer (Labs) AutocorrelationPlot (Labs) Row Filter

Nodes

Extensions

Links