Icon

13_​Training_​a_​Churn_​Predictor_​Solution

Training a Churn Predictor - Solution

Solution to exercise 13 for the KNIME Analytics Platform for Data Wranglers course
- Join data from different sources
- Apply color formatting using the Color Manager node
- Create a train and test set partitioning the data
- Train a decision tree model and evaluate the performance
- Calculate feature using the Math Formula node
- Group data into bins
- Pivot and visualize data


Data ReadingRead the following data tables fromthe data folder and check the outputtables to get familiar with the data CallsData.xlsContractData.csvHint: Activate the checkbox "Tablecontains column headers ..." for theCallsData.xls table Pre-Processing1. Join the two data tables based on the columns"Area Code" and "Phone" using a Joiner node2. Change the data type of the columns Churn andArea Code from integer to string using the NumberTo String node3. Add a color to each row based on the columnchurn with the Color Manager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict thewhich customers are likely to churn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of thePartitioning node) using the decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract?Frequent phone users or infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formulanode and append a new column with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and selectMidpoints for the Bin Naming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area ChartnodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options) area codeand churn ->StringColor by Churn80%vs. 20%class = ChurnCalculate totalnumber minutesCreate 10 binsNumber of churning andnon churning customersfor each binVisualize resultsContractDataCalls DataEvaluate the modelbased onArea Code andPhone Number To String Color Manager Partitioning DecisionTree Learner Decision TreePredictor Math Formula Auto-Binner Pivoting Stacked Area Chart CSV Reader Excel Reader Scorer Joiner Data ReadingRead the following data tables fromthe data folder and check the outputtables to get familiar with the data CallsData.xlsContractData.csvHint: Activate the checkbox "Tablecontains column headers ..." for theCallsData.xls table Pre-Processing1. Join the two data tables based on the columns"Area Code" and "Phone" using a Joiner node2. Change the data type of the columns Churn andArea Code from integer to string using the NumberTo String node3. Add a color to each row based on the columnchurn with the Color Manager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict thewhich customers are likely to churn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of thePartitioning node) using the decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract?Frequent phone users or infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formulanode and append a new column with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and selectMidpoints for the Bin Naming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area ChartnodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options) area codeand churn ->StringColor by Churn80%vs. 20%class = ChurnCalculate totalnumber minutesCreate 10 binsNumber of churning andnon churning customersfor each binVisualize resultsContractDataCalls DataEvaluate the modelbased onArea Code andPhone Number To String Color Manager Partitioning DecisionTree Learner Decision TreePredictor Math Formula Auto-Binner Pivoting Stacked Area Chart CSV Reader Excel Reader Scorer Joiner

Nodes

Extensions

Links