Icon

01_​Training_​ChurnPredictor

Exercise to train and optimize classifiers (Decision Tree and Random Forest) for churn prediction.

Activity I: Decision Tree1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Decision Tree using the Learner-Predictor nodes7. Score model performance computing a confusion matrix, class and overall statics, and the ROC curve Activity II: Random Forest1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Random Forest using the Learner-Predictor nodes7. Score model performance and compare it with that of the Decision Tree using the Binary Classification Inspector node8. Save the best model Activity III: Hyperparameter Optimization1. Repeat the steps of Activity I and II to read, join, convert, partition and over-sample the dataset2. Optimize the Random Forest using the Parameter Optimization Loop nodes to maximise accuracy a. Optimize the number of models (= trees) b. Optimize tree depth3. Train and apply the model with the optimized parameters4. Save the optimized model Activity I: Decision Tree1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Decision Tree using the Learner-Predictor nodes7. Score model performance computing a confusion matrix, class and overall statics, and the ROC curve Activity II: Random Forest1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Random Forest using the Learner-Predictor nodes7. Score model performance and compare it with that of the Decision Tree using the Binary Classification Inspector node8. Save the best model Activity III: Hyperparameter Optimization1. Repeat the steps of Activity I and II to read, join, convert, partition and over-sample the dataset2. Optimize the Random Forest using the Parameter Optimization Loop nodes to maximise accuracy a. Optimize the number of models (= trees) b. Optimize tree depth3. Train and apply the model with the optimized parameters4. Save the optimized model

Nodes

  • No nodes found

Extensions

  • No modules found

Links