Icon

Decision_​Tree

Decision Tree - exercise
Use Case DescriptionThe dataset we use in this exercise describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010.One of the columns is the overall condition ranking, with values between 1 and 10. The goal of this exercise is to train a binary classification model, which can predict whether the overall condition is high or low.To do so, the workflow below reads the data set and creates the class column based on overall condition ranking, which iscalled rank and has the values low if the overall condition is smaller or equal to 5, otherwise high. It is now on you continue this workflow! Exercise: Decision Tree1) Use the Partitioning node to split data into training (70%) and test set (30%)- Use stratified sampling based on the column rank, to retain the distribution of the class values in both output tables2) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node)- Select the "rank" column as the class column3) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)4) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node)- Select "rank" as the actual column and "Prediction (rank)" as the predicted column- What is the accuracy of the model?5) Visualize the ROC curve (ROC Curve node)- Make sure that the checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated- Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column6) Optional: Try different setting options for the decision tree algorithm. Can you improve the model performance? Classification: Decision Tree ReadAmesHousing.csv Extract ClassInformation CSV Reader Use Case DescriptionThe dataset we use in this exercise describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010.One of the columns is the overall condition ranking, with values between 1 and 10. The goal of this exercise is to train a binary classification model, which can predict whether the overall condition is high or low.To do so, the workflow below reads the data set and creates the class column based on overall condition ranking, which iscalled rank and has the values low if the overall condition is smaller or equal to 5, otherwise high. It is now on you continue this workflow! Exercise: Decision Tree1) Use the Partitioning node to split data into training (70%) and test set (30%)- Use stratified sampling based on the column rank, to retain the distribution of the class values in both output tables2) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node)- Select the "rank" column as the class column3) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)4) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node)- Select "rank" as the actual column and "Prediction (rank)" as the predicted column- What is the accuracy of the model?5) Visualize the ROC curve (ROC Curve node)- Make sure that the checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated- Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column6) Optional: Try different setting options for the decision tree algorithm. Can you improve the model performance? Classification: Decision Tree ReadAmesHousing.csv Extract ClassInformation CSV Reader

Nodes

Extensions

Links