BEST MODEL SELECTION: AUTO ARIMA
Auto ARIMA was selected as the final forecasting model based on consistent evidence across two independent validation approaches. On the held-out test set (January 2015 onward), Auto ARIMA produced the best AICc (-320.774) and was the only model whose residuals passed the Ljung-Box white noise test (p = 0.21), indicating no meaningful autocorrelation structure was left unexplained. Across time series cross-validation (expanding window, 36-month initial, 12-month step, 24-month horizon), Auto ARIMA won the most individual folds (27 of 50), confirming that its test-set performance reflects genuine generalization rather than a favorable split.
Auto ETS achieved the lowest test-set RMSE (0.258, MAPE 2.29%) but ranked last in TSCV mean RMSE (0.325) and showed the highest fold-to-fold variance, suggesting overfitting to the post-2015 test window. SARIMA(1,1,1)(1,1,1)[12] produced a competitive AICc (-316.853) but did not match Auto ARIMA on residual diagnostics or TSCV consistency.
Tradeoff: Seasonal Naïve ranked first in TSCV mean RMSE (0.317) and was the most stable model (SD_RMSE = 0.100). For a lower-stakes use case — internal reporting, rough planning estimates, or contexts where non-technical staff must maintain the model — Seasonal Naïve is a fully defensible alternative. For a capital allocation decision where a directional error carries material financial consequence, Auto ARIMA's superior fit, clean residuals, and cross-validation consistency make it the stronger choice.