Icon

13_​Training_​a_​Churn_​Prediction_​Model_​Exercise

Training a Churn Predictor - Exercise

Exercise 13 for the KNIME Analytics Platform for Data Wranglers course
- Join data from different sources
- Apply color formatting using the Color Manager node
- Create a training and test set
- Train a decision tree model and evaluate performance
- Calculate feature using the Math Formula node
- Group data into bins
- Pivot and visualize data




Data ReadingRead the following data tables fromthe data folder and check the outputtables to get familiar with the data CallsData.xlsContractData.csvTip 1: Drag&drop the files from theKNIME Explorer to the canvas andcheck the preview in theconfiguration windowTip 2: Activate the checkbox "Tablecontains column headers ..." for theCallsData.xls table Pre-Processing1. Join the two data tables based on the columns"Area Code" and "Phone" using a Joiner node2. Change the data type of the columns Churn andArea Code from integer to string using the NumberTo String node3. Add a color to each row based on the columnchurn with the Color Manager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict thewhich customers are likely to churn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of thePartitioning node) using the decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract?Frequent phone users or infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formulanode and append a new column with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and selectMidpoints for the Bin Naming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area ChartnodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options) Data ReadingRead the following data tables fromthe data folder and check the outputtables to get familiar with the data CallsData.xlsContractData.csvTip 1: Drag&drop the files from theKNIME Explorer to the canvas andcheck the preview in theconfiguration windowTip 2: Activate the checkbox "Tablecontains column headers ..." for theCallsData.xls table Pre-Processing1. Join the two data tables based on the columns"Area Code" and "Phone" using a Joiner node2. Change the data type of the columns Churn andArea Code from integer to string using the NumberTo String node3. Add a color to each row based on the columnchurn with the Color Manager node Model Training and Evaluation1. Create a training and a test set with Partitioning nodeRecommended settings:- Select "Relative" and set it to 80 %- Select "Stratified sampling" and use the column "Churn"2. Train a decision tree with the Decision Tree Learner node to predict thewhich customers are likely to churn by using the training set.Recommended settings:- Connect the upper output port of the Partitioning node with the Learner node- Select the column "Churn" as Class Column3. Apply the trained decision tree model to the test set (lower output port of thePartitioning node) using the decisoin tree predictor node.4. Evalutate the performance of the trained model using the Scorer node. Optional: Which customers are happier with their contract?Frequent phone users or infrequent users?1. Use the Math Formula node to calculate the total number of minutesRecommended settings: use the following expression in the Math Formulanode and append a new column with the column name Mins Total$Day Mins$+$Eve Mins$+$Night Mins$+$Intl Mins$2. Use the Auto-Binner node to create 10 Bins Recommended settings:Include only the column “Min Total”, set the Number of bins to 10, and selectMidpoints for the Bin Naming option3. Use the Pivoting node to find out how many customers in a bin churned.Recommended settings:- Groups: Include Min Total [Binned]- Pivots: Include Churn- Manuel Aggregation: Any Column with aggregation method count4. Plot your results in a percentage area chart using the Stacked Area ChartnodeRecommended settings:- Column for x-axis: Mins Total [Binned]- Include all columns- Select Percentage-Area-Chart in the second tab (General Plot Options)

Nodes

  • No nodes found

Extensions

  • No modules found

Links