Icon

Decision_​Tree_​Exercise

Decision Tree

Decision Tree: binary classification of house ranking (high/low rank).

- Create target column
- Filter unnecessary columns
- Split the dataset into train and test set
- Train and apply the model
- Evaluation

Classification: decision tree Exercise: Decision TreeIn this exercise we train a binary classification model that predicts the overall condition of a house (either high or low).1) Read the AmesHousing.csv file. It describes the sale of individual residential properties in Ames, Iowa (USA). - One of the columns is the overall condition ranking, with values between 1 and 10.2) Create a new column "rank" using the Rule Engine node. - An house will have rank "Low" if the value of the attribute "Overall Cond" is <= 5. Otherwise it will be ranked as "High".3) Remove the following column • PID • MS SubClass • Overall Cond4) Use a Partitioning node to split data into training (70%) e test set (30%) - Use stratified sampling based on the column rank, to retain the distribution of the class values in both output tabes.5) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node) - Select the "rank" column as the class column6) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)7) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node) - What is the accuracy of the model?8) Visualize the ROC curve (ROC Curve node) - Make sure that checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated - Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column9)Try different setting options for the decision tree algorithm. Can you improve the model performance? Read AmesHousing.csv CSV Reader Classification: decision tree Exercise: Decision TreeIn this exercise we train a binary classification model that predicts the overall condition of a house (either high or low).1) Read the AmesHousing.csv file. It describes the sale of individual residential properties in Ames, Iowa (USA). - One of the columns is the overall condition ranking, with values between 1 and 10.2) Create a new column "rank" using the Rule Engine node. - An house will have rank "Low" if the value of the attribute "Overall Cond" is <= 5. Otherwise it will be ranked as "High".3) Remove the following column • PID • MS SubClass • Overall Cond4) Use a Partitioning node to split data into training (70%) e test set (30%) - Use stratified sampling based on the column rank, to retain the distribution of the class values in both output tabes.5) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node) - Select the "rank" column as the class column6) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)7) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node) - What is the accuracy of the model?8) Visualize the ROC curve (ROC Curve node) - Make sure that checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated - Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column9)Try different setting options for the decision tree algorithm. Can you improve the model performance? Read AmesHousing.csv CSV Reader

Nodes

Extensions

Links