Icon

Vidar Vanilla system

Detached houses

Loading from CSV files

Merge and column addition

Plot_YearQuarter: 2006Q1=2006,0; 2006Q2=2006,25...

Splitting data in historical and out-of sample data

Procedure

  • Update the exogenous and endogenous data

  • Specify the sample and forecast datasets (introduce variable)

  • The forecast period must be specified in the SARIMAX node (introduce variable)

  • Run the models and export the results to Excel

  • Remember that the entire time series is model output and must be overwritten with the available historical data

Summary of Current Model Dynamics

  • LRA (Long-Run Association):

    • Mechanism: Captures the Fundamental Equilibrium.

    • Role: Defines the structural relationship and long-term anchor. It identifies the "steady state" of housing starts based on macroeconomic fundamentals, ensuring the forecast doesn't drift realistically far from economic reality over time.

  • PRA (Polynomial Regression Analysis):

    • Mechanism: Captures Level-Dependent Sensitivity & Bounds.

    • Role: By using 2nd powers ($x^2$), it models how the impact of a variable (e.g., interest rates) changes depending on its current level. It inherently captures tipping points and non-linear elasticity, allowing for "lower" or "higher" bounds where the market’s reaction to further changes either accelerates or plateaus.

    • Practial implementation: In KNIME ou cannot remove polynomial orders of selected variables only. This you need to define the signicant orders of the variabels and include them in a LRA node.

  • ARIMAX:

    • Mechanism: Captures Autocorrelation & Stochastic Shocks.

    • Role: Focuses on the "memory" of the time series. It picks up short-term momentum (clusters of starts) and models how random shocks or past forecasting errors dissipate over time, cleaning up the residual noise left by the structural models.

    • Practical implementation: In KNIME you cannot include more than one exogeneous variable in the SARIMAX node. Also, the SARIMAX node will not return p-values for the included variables. Thus, you have to include Python-scripts aswell. However, if the dynamics can be specified in the SARIMAX node, try and use it. It will make maintenance and sharing easier.


Potential Future Improvements

To move beyond the current setup and capture even deeper market dynamics, you could consider:

  1. Introduce variables, to define sample easy and increase consistency

  2. Regime-Switching (Markov Switching):

    • Why: To account for different "states of the world" (e.g., a "Credit Crunch" regime vs. a "Normal" regime). The relationship between your variables and housing starts might change entirely during a financial crisis, which a single polynomial equation cannot fully capture.

Assumptions and test


To ensure your coefficients are reliable (consistent) and that your standard errors allow for valid inference (t-tests, p-values), the "only" truly indispensable requirement across all these models is that your residuals (errors) must not be correlated with your predictors or with themselves.

A. Linear Regression

  • Best for: Modeling direct, additive relationships.

  • Key Assumptions:

    1. Linearity: Relationship between X and Y is a straight line.

    2. Homoscedasticity: Error variance is constant across all X. (residual plot --> snow)

    3. Independence: Observations are not related (no serial correlation).

  • Critical Tests:

    • Breusch-Pagan / White Test: For Homoscedasticity.

    • Durbin-Watson: For Autocorrelation (Independence).

    • VIF (Variance Inflation Factor): For Multicollinearity.

B. Polynomial Regression

  • Best for: Capturing curves ($Y = \beta_0 + \beta_1 X + \beta_2 X^2 + ...$).

  • Key Assumptions:

    1. Correct Specification: You must choose the right "degree" (e.g., quadratic vs. cubic).

    2. Hierarchy Principle: If you include $X^2$, you must also include $X$.

    3. No Ill-Conditioning: High-degree polynomials create massive multicollinearity.

  • Critical Tests:

    • ANOVA (F-test): To compare if a higher-degree model (e.g., degree 3) is significantly better than a lower one (degree 2).

    • T-test on Highest Order Term: To see if the "curve" is statistically significant.

    • Residual vs. Fitted Plot: To see if a "bow" shape remains (indicating you need a higher degree).

C. ARIMAX (ARIMA + Exogenous Variables)

  • Best for: Time-series data influenced by external factors.

  • Key Assumptions:

    1. Stationarity: The mean and variance must not change over time.

    2. White Noise Residuals: The model should leave no "patterns" behind in the errors.

    3. No Spurious Correlation: Two trended variables might look related but aren't (Unit Roots).

  • Critical Tests:

    • ADF (Augmented Dickey-Fuller): To check for Stationarity (Unit Roots).

    • Ljung-Box Test: To ensure residuals are "White Noise" (no remaining autocorrelation).

    • ACF/PACF Plots: To identify the correct $p$ and $q$ lags.

Extract

Transform

In- and out-of-sample forecast

Flats

Row-houses

Holiday houses

Garage

Residential Buildings

Merge Predictions

Industrial buildings

Storage buildings

Office buildings

Transport and communication

Commercial buildings

Hotel and restaurants

Agriculture buildings

Education buildings

Health buildings

Other buildings

NLP layer 1: research expectations for market drivers

Export to Excel

Detached houses + row chained

Training
Column Filter
LRA
Functional form test, correlations, etc.
Extra info
SARIMAX
Joiner
Mass join
Line Plot
PRA
Add dummy
Expression
Add dummy
Expression
LRA
SARIMAX
Training
Column Filter
Line Plot
Functional form test, correlations, etc.
Extra info
Joiner
Mass join
Line Plot
PRA
Anthropic Authenticator
Choose model
Anthropic LLM Selector
LLM Prompter
Select variables of interest
Table Creator
Joiner
Mass join
Complete
Column Filter
Add dummy
Expression
Add dummy
Expression
Functional form test, correlations, etc.
Extra info
Training
Column Filter
Complete
Column Filter
Training
Column Filter
LRA
Joiner
Mass join
Line Plot
LRA
SARIMAX
PRA
PRA
Row Filter
Line Plot
Functional form test, correlations, etc.
Extra info
SARIMAX
Joiner
Mass join
SARIMAX
Add dummy
Expression
Test to see the contribution of ARIMAX
Python ARIMAX test
Complete
Column Filter
Training
Column Filter
LRA
Add dummy
Expression
LRA
SARIMAX
Training
Column Filter
Line Plot
Functional form test, correlations, etc.
Extra info
Bar Chart
Functional form test, correlations, etc.
Extra info
Functional form test, correlations, etc.
Extra info
Line Plot
SARIMAX
Driver effect
Joiner
Mass join
PRA
Bar Chart
Driver effect
Driver effect
Bar Chart
Bar Chart
PRA
Complete
Column Filter
Eksogene data
Excel Reader
Joiner
Mass join
Training
Column Filter
Excel Writer
Complete
Column Filter
LRA
Complete
Column Filter
Functional form test, correlations, etc.
Extra info
Mass join
Line Plot
Line Plot
Driver effect
Joiner
Mass join
Detrend stock variables
Python Script
redefining sample without missing variables
Row Filter
Bar Chart
SARIMAX
Line Plot
SA
Python Script
Driver effect
Endogene data
Excel Reader
LRA
Bar Chart
Joiner
PRA
Complete
Column Filter
Driver effect
SARIMAX
Bar Chart
Functional form test, correlations, etc.
Extra info
Driver effect
Line Plot
Bar Chart
Joiner
Mass join
Driver effect
PRA
Driver effect
redefining sample without missing variables
Row Filter
Functional form test, correlations, etc.
Extra info
Complete
Column Filter
Complete
Column Filter
Joiner
Mass join
Training
Column Filter
Training
Column Filter
SARIMAX
First difference
Python Script
Full sample
Row Filter
LRA
Test to see if modelsshare bias
Residual plot
PRA
Training
Column Filter
Bar Chart
PlotDate creation for visualsAnd Index column for order
Expression
Line Plot
Bar Chart
Complete
Column Filter
Complete
Column Filter
Training
Column Filter
Complete
Column Filter
Driver effect
Historical sample
Row Filter
Functional form test, correlations, etc.
Extra info
LRA
Training
Column Filter
Functional form test, correlations, etc.
Extra info
Joiner
Mass join
Line Plot
LRA
Joiner
Mass join
PRA
Driver effect
SARIMAX
SARIMAX
Line Plot
Insert prediction of houses
Joiner
Overwrite NaN predictions
Expression
Functional form test, correlations, etc.
Extra info
Bar Chart
Joiner
Mass join
Complete
Column Filter
PRA
Training
Column Filter
Training
Column Filter
Complete
Column Filter
Functional form test, correlations, etc.
Extra info
Expression
LRA
LRA
Expression
Joiner
Mass join
SARIMAX
PRA
Line Plot
LRA
Training
Column Filter
LRA
Line Plot
SARIMAX
Complete
Column Filter
PRA
PRA
Functional form test, correlations, etc.
Extra info
Training
Column Filter
Joiner
Mass join
Complete
Column Filter
LRA
PRA
Functional form test, correlations, etc.
Extra info
SARIMAX
SARIMAX
Joiner
Mass join
Line Plot
PRA

Nodes

Extensions

Links