Prediction with Spreadsheets
This workflow demonstrates how to train and evaluate a predictive model using spreadsheet data.
In this example, we access three datasets:
The bio information of the athletes who participated in any of the Summer or Winter Olympics.
The athlete event results of the 1896-2020 Summer Olympic Games.
Additional information about the Summer Olympic Games.
The goal is to train a Gradient Boosted Trees to predict whether an athlete will win a medal based on features such as sport, home country of the athlete, gender, age of the athlete, and whether the game takes place in the athlete's home country. We start by preparing the data for model training, including merging the data, removing missing values, creating the target variable ("Medal - Target Variable") along with two additional features, and filtering the data to include only specific columns relevant for model training. We split the dataset into a training and a test set and use it to train a Gradient Boosted Trees model. Lastly, we evaluate the model's performance using a confusion matrix and plotting a ROC curve.
For a detailed overview of each node in this workflow, refer to the workflow description in the Info panel.
💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.