Icon

01_​Training_​ChurnPredictor

Solution to train and optimize classifiers (Decision Tree and Random Forest) for churn prediction.

Activity I: Decision Tree1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Decision Tree using the Learner-Predictor nodes7. Score model performance computing a confusion matrix, class and overall statics, and the ROC curve Activity II: Random Forest1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Random Forest using the Learner-Predictor nodes7. Score model performance and compare it with that of the Decision Tree using the Binary Classification Inspector node8. Save the best model Activity III: Hyperparameter Optimization1. Repeat the steps of Activity I and II to read, join, convert, partition and over-sample the dataset2. Optimize the Random Forest using the Parameter Optimization Loop nodes to maximise accuracy a. Optimize the number of models (= trees) b. Optimize tree depth3. Train and apply the model with the optimized parameters4. Save the optimized model Contract dataChurn = 1No Churn = 0Calls datacolorby churnconvert Churncolumn to StringInspect variables."Churn" column isunbalancedInner Join2 tables basedon customer PhoneTrain 70%Test 30%Accuracy88-89%colorby churnTrain 70%Test 30%convert Churncolumn to StringInner Join2 tables basedon customer PhoneContract dataChurn = 1No Churn = 0Calls dataAccuracyRF : 94%DT: 88%colorby churnconvert Churncolumn to StringTrain 70%Test 30%Contract dataChurn = 1No Churn = 0Inner Join2 tables basedon customer PhoneCalls dataSave best model (RF)Accuracy 95%Num of trees: 215Tree depth: 16Save optimized model CSV Reader Excel Reader Color Manager Number To String Data Explorer Joiner Partitioning DecisionTree Learner Decision TreePredictor Scorer (JavaScript) ROC Curve Color Manager Partitioning Number To String Joiner CSV Reader Excel Reader Random ForestPredictor Random ForestLearner SMOTE SMOTE Binary ClassificationInspector DecisionTree Learner Decision TreePredictor Joiner Random ForestPredictor SMOTE Random ForestLearner Color Manager Number To String Partitioning CSV Reader Joiner Excel Reader Model Writer Parameter OptimizationLoop Start ParameterOptimization Loop End Scorer Table Rowto Variable Random ForestLearner Random ForestPredictor Model Writer Activity I: Decision Tree1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Decision Tree using the Learner-Predictor nodes7. Score model performance computing a confusion matrix, class and overall statics, and the ROC curve Activity II: Random Forest1. Read the CallsData.xls and the ContractData.csv2. Join (inner join) the calls and contract files by Area Code and Phone3. Convert Churn from number to string and assign colors to rows4. Partition the dataset in training and test sets with stratified sampling5. Handle class imbalance using down-sampling or over-sampling6. Train and apply a Random Forest using the Learner-Predictor nodes7. Score model performance and compare it with that of the Decision Tree using the Binary Classification Inspector node8. Save the best model Activity III: Hyperparameter Optimization1. Repeat the steps of Activity I and II to read, join, convert, partition and over-sample the dataset2. Optimize the Random Forest using the Parameter Optimization Loop nodes to maximise accuracy a. Optimize the number of models (= trees) b. Optimize tree depth3. Train and apply the model with the optimized parameters4. Save the optimized model Contract dataChurn = 1No Churn = 0Calls datacolorby churnconvert Churncolumn to StringInspect variables."Churn" column isunbalancedInner Join2 tables basedon customer PhoneTrain 70%Test 30%Accuracy88-89%colorby churnTrain 70%Test 30%convert Churncolumn to StringInner Join2 tables basedon customer PhoneContract dataChurn = 1No Churn = 0Calls dataAccuracyRF : 94%DT: 88%colorby churnconvert Churncolumn to StringTrain 70%Test 30%Contract dataChurn = 1No Churn = 0Inner Join2 tables basedon customer PhoneCalls dataSave best model (RF)Accuracy 95%Num of trees: 215Tree depth: 16Save optimized model CSV Reader Excel Reader Color Manager Number To String Data Explorer Joiner Partitioning DecisionTree Learner Decision TreePredictor Scorer (JavaScript) ROC Curve Color Manager Partitioning Number To String Joiner CSV Reader Excel Reader Random ForestPredictor Random ForestLearner SMOTE SMOTE Binary ClassificationInspector DecisionTree Learner Decision TreePredictor Joiner Random ForestPredictor SMOTE Random ForestLearner Color Manager Number To String Partitioning CSV Reader Joiner Excel Reader Model Writer Parameter OptimizationLoop Start ParameterOptimization Loop End Scorer Table Rowto Variable Random ForestLearner Random ForestPredictor Model Writer

Nodes

Extensions

Links