Icon

13_​Training_​a_​Churn_​Predictor_​Solution

Training a Churn Predictor - Solution

Solution to exercise 13 for the KNIME Analytics Platform for Data Wranglers course
- Join data from different sources
- Apply color formatting using the Color Manager node
- Create a train and test set partitioning the data
- Train a decision tree model and evaluate the performance
- Calculate feature using the Math Formula node
- Group data into bins
- Pivot and visualize data


Data ReadingRead the following data tables from the datafolder and check the output tables to get familiarwith the data CallsData.xlsContractData.csvHint: Activate the checkbox "Table containscolumn headers ..." for the CallsData.xls table Pre-Processing1. Join the two data tables based on the columns "Area Code" and"Phone" using a Joiner node2. Change the data type of the columns Churn and Area Code frominteger to string using the Number To String node3. Add a color to each row based on the column churn with the ColorManager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict the which customers are likely tochurn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of the Partitioning node) usingthe decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract? Frequent phone usersor infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formula node and append a newcolumn with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and select Midpoints for the BinNaming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area Chart nodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options) based ofArea Code and Phonearea codeand churn ->StringColor by Churn80%vs. 20%class = Churn Calculate totalnumber minutesCreate 10 binsNumber of churning andnon churning customersfor each binVisualize resultsContractDataCalls DataEvaluate the modelJoiner Number To String Color Manager Partitioning DecisionTree Learner Decision TreePredictor Math Formula Auto-Binner Pivoting Stacked Area Chart CSV Reader Excel Reader Scorer Data ReadingRead the following data tables from the datafolder and check the output tables to get familiarwith the data CallsData.xlsContractData.csvHint: Activate the checkbox "Table containscolumn headers ..." for the CallsData.xls table Pre-Processing1. Join the two data tables based on the columns "Area Code" and"Phone" using a Joiner node2. Change the data type of the columns Churn and Area Code frominteger to string using the Number To String node3. Add a color to each row based on the column churn with the ColorManager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict the which customers are likely tochurn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of the Partitioning node) usingthe decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract? Frequent phone usersor infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formula node and append a newcolumn with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and select Midpoints for the BinNaming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area Chart nodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options) based ofArea Code and Phonearea codeand churn ->StringColor by Churn80%vs. 20%class = Churn Calculate totalnumber minutesCreate 10 binsNumber of churning andnon churning customersfor each binVisualize resultsContractDataCalls DataEvaluate the modelJoiner Number To String Color Manager Partitioning DecisionTree Learner Decision TreePredictor Math Formula Auto-Binner Pivoting Stacked Area Chart CSV Reader Excel Reader Scorer

Nodes

Extensions

Links