Icon

Predict behavior to retain customers

Model Training for Predict behavior to retain customers About DatasetContentEach row represents a customer, each column contains customer’s attributes described on the column Metadatathe data set includes information about:Customers who left within the last month – the column is called ChurnServices that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support,streamingTV and moviesCustomer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total chargesDemographic info about customers – gender, age range, and if they have partners and dependentsThe raw data contains 7043 rows (customers) and 21 columns (features).Task Predict The “Churn” column ,it is our target.Find more information about Dataset in Kaggle at https://www.kaggle.com/datasets/blastchar/telco-customer-churn?select=WA_Fn-UseC_-Telco-Customer-Churn.csv (Data, Discussions, Metadata) Data ReadingEach row represents acustomer, each columncontains customer’sattributes described on thecolumn Metadata Graphical PropertiesAssign colors by Churn Data PartitioningCreate two separate partitionsfrom original data set: training set(80%) and test set (20%). Train a ModelThis node builds a decision tree. OtherLearner nodes train other models. MostLearner nodes output a PMML model(blue square output port). Apply the ModelPredictor nodes apply aspecific model to a dataset and append the modelpredictions. Score the ModelCompute a confusionmatrix between real andpredicted class values andcalculate the relatedaccuracy measures. training set test set Pre-processingManually --we deleted missing values from column Totalcharges , replaced No phoneService with No and replaced No InternetService with No 1-Replace Yes to 1 and No to 0 in columns(Partner,Dependents,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,PaperlessBillingand Churn) 2- Replace Male to 0 and Female to 1 in column Gender 3- Catogrize columns (InternetService,Contract and PaymentMethod) Model Training for Predict behavior to retain customers About Algorithm - Decision Trees (DTs): they are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.- We build a tree recursivly that contains 2 types of nodes, decision nodes, result/leaf nodes- All the data start at the root node, then we choose a split point & create 2 new nodes- How do we select the split point? - Attribute selection measures: * Entropy * Information Gain * Gini Index * Chi-SquareRandom Forest algorithm: - Steps to build a random forest Data Bootstraping Features Selection Building independent DTs- Then we use majority voting for predicting new data. Visualize Create interactive visualize score plots. Interactive DashboardDisplay dashboard of the entiredata Red for Churn =YesBlue for Churn = NoRandom drawing 80% upper port20% lower portcompute accuracyon test setReading Train to predictclass "churn" Apply random forestto test setReplace male to 0and female to 1Rplace String columns to integer Rplace remaining String columns to integer Color Manager Partitioning Scorer CSV Reader Random ForestLearner Random ForestPredictor Rule Engine String To Number Replace Yes to1 and No to 0 Category To Number visualizescore model InteractivaDashboard Model Training for Predict behavior to retain customers About DatasetContentEach row represents a customer, each column contains customer’s attributes described on the column Metadatathe data set includes information about:Customers who left within the last month – the column is called ChurnServices that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support,streamingTV and moviesCustomer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total chargesDemographic info about customers – gender, age range, and if they have partners and dependentsThe raw data contains 7043 rows (customers) and 21 columns (features).Task Predict The “Churn” column ,it is our target.Find more information about Dataset in Kaggle at https://www.kaggle.com/datasets/blastchar/telco-customer-churn?select=WA_Fn-UseC_-Telco-Customer-Churn.csv (Data, Discussions, Metadata) Data ReadingEach row represents acustomer, each columncontains customer’sattributes described on thecolumn Metadata Graphical PropertiesAssign colors by Churn Data PartitioningCreate two separate partitionsfrom original data set: training set(80%) and test set (20%). Train a ModelThis node builds a decision tree. OtherLearner nodes train other models. MostLearner nodes output a PMML model(blue square output port). Apply the ModelPredictor nodes apply aspecific model to a dataset and append the modelpredictions. Score the ModelCompute a confusionmatrix between real andpredicted class values andcalculate the relatedaccuracy measures. training set test set Pre-processingManually --we deleted missing values from column Totalcharges , replaced No phoneService with No and replaced No InternetService with No 1-Replace Yes to 1 and No to 0 in columns(Partner,Dependents,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,PaperlessBillingand Churn) 2- Replace Male to 0 and Female to 1 in column Gender 3- Catogrize columns (InternetService,Contract and PaymentMethod) Model Training for Predict behavior to retain customers About Algorithm - Decision Trees (DTs): they are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.- We build a tree recursivly that contains 2 types of nodes, decision nodes, result/leaf nodes- All the data start at the root node, then we choose a split point & create 2 new nodes- How do we select the split point? - Attribute selection measures: * Entropy * Information Gain * Gini Index * Chi-SquareRandom Forest algorithm: - Steps to build a random forest Data Bootstraping Features Selection Building independent DTs- Then we use majority voting for predicting new data. Visualize Create interactive visualize score plots. Interactive DashboardDisplay dashboard of the entiredata Red for Churn =YesBlue for Churn = NoRandom drawing 80% upper port20% lower portcompute accuracyon test setReading Train to predictclass "churn" Apply random forestto test setReplace male to 0and female to 1Rplace String columns to integer Rplace remaining String columns to integer Color Manager Partitioning Scorer CSV Reader Random ForestLearner Random ForestPredictor Rule Engine String To Number Replace Yes to1 and No to 0 Category To Number visualizescore model InteractivaDashboard

Nodes

Extensions

Links