TY - JOUR
T1 - The k-means algorithm
T2 - A comprehensive survey and performance evaluation
AU - Ahmed, Mohiuddin
AU - Seraj, Raihan
AU - Islam, Syed Mohammed Shamsul
PY - 2020/8
Y1 - 2020/8
N2 - The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.
AB - The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.
KW - Categorical attributes
KW - Clustering
KW - Cyber security
KW - Healthcare
KW - Initialization
KW - K-means
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85090372567&partnerID=8YFLogxK
U2 - 10.3390/electronics9081295
DO - 10.3390/electronics9081295
M3 - Review article
AN - SCOPUS:85090372567
VL - 9
SP - 1
EP - 12
JO - Electronics
JF - Electronics
SN - 2079-9292
IS - 8
M1 - 1295
ER -