Icon

KNIME 1st assignment v5

Data ingestion RFM preparation RFM calculation Output Segmentation (*) Remark about Nueric outliersnode: We make the assumption thatoutliers values are capped by thehighest value of the quartile forthose bigger that Q3 + 1.5*IQR; or bythe smallest value of the quartile forthose smaller that Q1 - 1.5*IQR Conversion for PowerBi - Invoice Time Data collection and filtering (from mainworkflow, so already preprocessed) Clients filtering and most valuable clients meancalculation Calculations and manipulations to obtain the final increase of revenues Extra on Knime:We make an assumption that 15% of mediumvaluable clients and 2% of less valuable clientsbecome most valuable clients in the aftermath ofan efficient marketing campaign.We suppose that 2% of less valuable clients are Transactionsfile readerGroups the ordersby invoice numberProduct*Priceto obtain SumValueChanges the names of the columnsRemove the orders that are not in time range(includes September)Converts the invoice_date into a Data&Type Calculates the recency of the invoice in days starting from 30/09/2019Creates bins R,F and M for each customerCounts the number of orders per a customerRenames the columns created by GroupbyGathers all the relevant columns in the same tableCleaning data(removing missing values)Divides Sumvalue by frequency to get monetary value (average)Switches the values of the recency column bins so that the most recent purchases have the highest bin numberConverts bin columns from string to integerCalculation of RFM score by sumSorting the customers indescending orderExtracts the tablein an Excel fileDivision of the customers in 3 segments based on their RFM scoreRemoves rows with negative valuesRemove customersthat appear multipletimesCreates a BoxPlot to determinateoutliersReplace outliersvalues by the closestpermitted value (*)Converts the invoice_date into a Data&Type Writes Transaction file preprocessedTransactionsfile readerCleaning data(removing missing values)Product*Priceto obtain SumValueCustomer filereaderCleaning data(removing "unspecified" values)Writes customer file preprocessedProduct file readerSorts category column to spot insoherencesRemoves product description to lighten the final fileWrites product file preprocessedRemoves rows with negative valuesExcel file from the main workflowFilters the columns not usedCalculates the average of total spending for the period for each segment. Most valuable clients total spending means=5874,945Removes the most valuable clientsSeparates the medium valuable clients and the less valuable clientsSelects 15% of the medium valuable clients randomlySelects 2% of the less valuable clientsrandomlyConcatenates the samplingsWe compare the value the client now brings to the company being in the segment of the most valuable clients. However, we need to remove the money he doesn't spend anymore since he switched segmentsSums the increase of revenue (delta) for each each segment of customers Only retains the numerical valuesTurns lines into columnsSum to get the final result:2 021 066.98Renames the columns to link the income increase with the segments Excel Reader GroupBy Math Formula Column Renamer Date&Time-basedRow Filter String to Date&Time Date&TimeDifference Auto-Binner GroupBy Column Renamer Joiner Row Filter Math Formula Rule Engine String Manipulation(Multi Column) Math Formula Sorter Excel Writer Rule Engine Rule-basedRow Filter DuplicateRow Filter Box Plot Numeric Outliers String to Date&Time Excel Writer Excel Reader Row Filter Math Formula CSV Reader Row Filter Excel Writer CSV Reader Column Resorter Column Splitter Excel Writer Rule-basedRow Filter Excel Reader Column Filter GroupBy Rule-basedRow Filter Rule-basedRow Splitter Partitioning Partitioning Concatenate Math Formula Pivot Column Filter Table Transposer Math Formula Column Renamer Data ingestion RFM preparation RFM calculation Output Segmentation (*) Remark about Nueric outliersnode: We make the assumption thatoutliers values are capped by thehighest value of the quartile forthose bigger that Q3 + 1.5*IQR; or bythe smallest value of the quartile forthose smaller that Q1 - 1.5*IQR Conversion for PowerBi - Invoice Time Data collection and filtering (from mainworkflow, so already preprocessed) Clients filtering and most valuable clients meancalculation Calculations and manipulations to obtain the final increase of revenues Extra on Knime:We make an assumption that 15% of mediumvaluable clients and 2% of less valuable clientsbecome most valuable clients in the aftermath ofan efficient marketing campaign.We suppose that 2% of less valuable clients are Transactionsfile readerGroups the ordersby invoice numberProduct*Priceto obtain SumValueChanges the names of the columnsRemove the orders that are not in time range(includes September)Converts the invoice_date into a Data&Type Calculates the recency of the invoice in days starting from 30/09/2019Creates bins R,F and M for each customerCounts the number of orders per a customerRenames the columns created by GroupbyGathers all the relevant columns in the same tableCleaning data(removing missing values)Divides Sumvalue by frequency to get monetary value (average)Switches the values of the recency column bins so that the most recent purchases have the highest bin numberConverts bin columns from string to integerCalculation of RFM score by sumSorting the customers indescending orderExtracts the tablein an Excel fileDivision of the customers in 3 segments based on their RFM scoreRemoves rows with negative valuesRemove customersthat appear multipletimesCreates a BoxPlot to determinateoutliersReplace outliersvalues by the closestpermitted value (*)Converts the invoice_date into a Data&Type Writes Transaction file preprocessedTransactionsfile readerCleaning data(removing missing values)Product*Priceto obtain SumValueCustomer filereaderCleaning data(removing "unspecified" values)Writes customer file preprocessedProduct file readerSorts category column to spot insoherencesRemoves product description to lighten the final fileWrites product file preprocessedRemoves rows with negative valuesExcel file from the main workflowFilters the columns not usedCalculates the average of total spending for the period for each segment. Most valuable clients total spending means=5874,945Removes the most valuable clientsSeparates the medium valuable clients and the less valuable clientsSelects 15% of the medium valuable clients randomlySelects 2% of the less valuable clientsrandomlyConcatenates the samplingsWe compare the value the client now brings to the company being in the segment of the most valuable clients. However, we need to remove the money he doesn't spend anymore since he switched segmentsSums the increase of revenue (delta) for each each segment of customers Only retains the numerical valuesTurns lines into columnsSum to get the final result:2 021 066.98Renames the columns to link the income increase with the segments Excel Reader GroupBy Math Formula Column Renamer Date&Time-basedRow Filter String to Date&Time Date&TimeDifference Auto-Binner GroupBy Column Renamer Joiner Row Filter Math Formula Rule Engine String Manipulation(Multi Column) Math Formula Sorter Excel Writer Rule Engine Rule-basedRow Filter DuplicateRow Filter Box Plot Numeric Outliers String to Date&Time Excel Writer Excel Reader Row Filter Math Formula CSV Reader Row Filter Excel Writer CSV Reader Column Resorter Column Splitter Excel Writer Rule-basedRow Filter Excel Reader Column Filter GroupBy Rule-basedRow Filter Rule-basedRow Splitter Partitioning Partitioning Concatenate Math Formula Pivot Column Filter Table Transposer Math Formula Column Renamer

Nodes

Extensions

Links