Icon

C02

https://havef.fun/s2c02

# Process
Firstly, I utilized the Data Explorer node to thoroughly comprehend the data and identified two columns with missing data.
I employed two simple techniques, Row Filter and Column Merger, to fill in the missing data.
To ensure consistency within the data, I then normalized it.
Lastly, I directed the data to the k-Means node.
In order to simulate the arrival of new data, I selected several nodes such as Row Sampling, Normalizer (Apply), and Cluster Assigner.

# Cluster Analysis Part
This section discusses the denormalization of the cluster center using the denormalizer node and color-coded identification. Additionally, the categorization of data features are based on size ranges and are illustrated in the two Box Plot.

For instance, cluster_0 indicated low purchase and installments, but high cash advance and transactions. In contrast, cluster_1 showcased higher purchase, one-off purchase, balance, credit limit, and payments. Cluster_2, on the other hand, is more average, and requires more in-depth analysis.

The analysis suggests that cluster_0 users primarily use credit cards for cash advances and less frequently for purchases. For credit company, the next step should include methods to encourage credit card use for purchases. On the other hand, cluster_1 features high-value users that require additional features or personalized interests to retain them.

More work needs to be done to refine the analysis, such as exploring the balance-to-credit-limit ratio, purchase-to-balance ratio, and employing more advanced clustering algorithms. However, for this analysis, this is sufficient.

EDA, find some missing data1. missing 313 row MINIMUM_PAYMENTS2. missing 1 row CREDIT_LIMIT Cluster Count(CUST_ID)cluster_0 4723cluster_1 1443cluster_2 2783 Used to mimic the arrival of new data. RowID Mean Silhouette Coefficientcluster_0 0.4622952795214287cluster_1 0.17864330328916997cluster_2 0.3204000954424621Overall 0.37243003212005127 CC GENERAL.csvNode 3dropmissing 1 row CREDIT_LIMITmissing 313 row MINIMUM_PAYMENTSsimply use PAYMENTS to replaceNode 6Node 7check Cluster distributionCluster -- Count(CUST_ID)mock new customersNode 16Node 17Node 24 CSV Reader Data Explorer Row Filter Column Merger Normalizer k-Means GroupBy Row Sampling Normalizer (Apply) Cluster Assigner cluster_analysis SilhouetteCoefficient EDA, find some missing data1. missing 313 row MINIMUM_PAYMENTS2. missing 1 row CREDIT_LIMIT Cluster Count(CUST_ID)cluster_0 4723cluster_1 1443cluster_2 2783 Used to mimic the arrival of new data. RowID Mean Silhouette Coefficientcluster_0 0.4622952795214287cluster_1 0.17864330328916997cluster_2 0.3204000954424621Overall 0.37243003212005127 CC GENERAL.csvNode 3dropmissing 1 row CREDIT_LIMITmissing 313 row MINIMUM_PAYMENTSsimply use PAYMENTS to replaceNode 6Node 7check Cluster distributionCluster -- Count(CUST_ID)mock new customersNode 16Node 17Node 24CSV Reader Data Explorer Row Filter Column Merger Normalizer k-Means GroupBy Row Sampling Normalizer (Apply) Cluster Assigner cluster_analysis SilhouetteCoefficient

Nodes

Extensions

Links