Icon

Cluster Analysis of Binary Data

<p><strong>Cluster Analysis of Binary Data</strong></p><p>The University of Saskatchewan</p><p>Ph.D. in Interdisciplinary Studies</p><p>Created by: Carlos Enrique Diaz, MBM, B.Eng.</p><p>Email: carlos.diaz@usask.ca</p><p>Supervisor: Lori Bradford, Ph.D.</p><p>Email: lori.bradford@usask.ca</p><p></p><p>This workflow begins by transforming categorical binary data into a numerical format to enable the k-Medoids clustering algorithm, which can operate with Manhattan distance. Since averaging binary values is not meaningful, k-Means is unsuitable for this type of data. The Silhouette Coefficient method is employed for visual evaluation to determine the optimal number of clusters (k). Additionally, a novel approach is introduced to estimate the maximum value of k by identifying the point at which a preset value of k = n results in the k-Medoids algorithm producing fewer clusters than specified. The analysis also includes a co-occurrence examination of the categorical values.</p>

Nodes

Extensions

Links