Icon

Find optimal model with cross validation

<p>This workflow is designed to illustrate how to select an optimal value for a model parameter using cross validation.</p><p>The workflow reads in the dataset and tries to find the optimal <strong>polynomial regression model</strong> (i.e. the polynomial that will have the minimal test error). We check various degrees (from 1 to 10) using the "Parameter Optimization Loop". For each degree, we estimate the test error using cross-validation.<br><br>We plot the estimation of the test error vs the degree. We also identify the best degree (i.e., the one that minimizes the estimation of the test error) and we use all the data to train a polynomial of that degree.</p>

This workflow is designed to illustrate how to select an optimal value for a model parameter using cross validation.

The workflow reads in the dataset and tries to find the optimal polynomial regression model (i.e. the polynomial that will have the minimal test error). We check various degrees (from 1 to 10) using the "Parameter Optimization Loop". For each degree, we estimate the test error using cross-validation.

We plot the estimation of the test error vs the degree. We also identify the best degree (i.e., the one that minimizes the estimation of the test error) and we use all the data to train a polynomial of that degree.

Cross-Validation for Model Selection

This sequence reads in the dataset, splits it into training and test sets multiple times (cross-validation), trains a polynomial regression model on each training set, predicts outcomes on the corresponding test set, and then evaluates prediction accuracy. By repeating this process for different model complexities, it helps identify which model best balances fit and generalization, reducing the risk of overfitting.

Find the Best Model Settings with Cross-Validation

This sequence automatically tests different model parameters (degree) to find the best setup for predicting your target. For each parameter setting, the data is split into training and test sets multiple times (cross-validation). A polynomial regression model is trained on each training set, then used to predict the test set. The prediction accuracy is measured and averaged. After trying all parameter options, the workflow identifies which settings give the most accurate and reliable predictions, helping you avoid overfitting and choose the optimal model.

Cross-Validation Scoring Sequence

The data is repeatedly split into training and test sets. For each split, a polynomial regression model is trained on the training data, then used to predict values for the test data. The predicted results are compared to the actual values to measure prediction accuracy. This process is repeated across all splits, and the results are combined to give an overall estimate of how well the model performs on unseen data.

Train and Visualize Final Model

After finding the best model settings, the workflow uses these settings to train a polynomial regression model on the entire dataset. The resulting model is then visualized by plotting the regression line and displaying the model's coefficients and statistics, helping you understand how the model fits your data and which features are most important.

Regression Line Plotter
Parameter Optimization Loop End
Polynomial Regression Learner
Numeric Scorer
Table View
Scatter Plot
Table Row to Variable
Polynomial Regression Learner
Read data. Target generated withg[x_] := 200 + 1000 x - 80 x^2 + x^3;plus noise
CSV Reader
X-Partitioner
X-Aggregator
Regression Predictor
Column Renamer
Parameter Optimization Loop Start

Nodes

Extensions

Links