Staleness-Reduction Mini-Batch K -Means

Xueying Zhu, Jie Sun, Zhenhao He, Jiantong Jiang, Zeke Wang

Research output: Contribution to journalArticlepeer-review

4 Citations (Web of Science)

Abstract

K-means (km) is a clustering algorithm that has been widely adopted due to its simple implementation and high clustering quality. However, the standard km suffers from high computational complexity and is therefore time-consuming. Accordingly, the mini-batch (mbatch) km is proposed to significantly reduce computational costs in a manner that updates centroids after performing distance computations on just a mbatch, rather than a full batch, of samples. Even though the mbatch km converges faster, it leads to a decrease in convergence quality because it introduces staleness during iterations. To this end, in this article, we propose the staleness-reduction mbatch (srmbatch) km, which achieves the best of two worlds: low computational costs like the mbatch km and high clustering quality like the standard km. Moreover, srmbatch still exposes massive parallelism to be efficiently implemented on multicore CPUs and many-core GPUs. The experimental results show that srmbatch can converge up to 40 X - 130 X faster than mbatch when reaching the same target loss, and srmbatch is able to reach 0.2%-1.7% lower final loss than that of mbatch.

Original languageEnglish
Pages (from-to)14424-14436
Number of pages13
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number10
Early online date16 Jun 2023
DOIs
Publication statusPublished - Oct 2024

Fingerprint

Dive into the research topics of 'Staleness-Reduction Mini-Batch K -Means'. Together they form a unique fingerprint.

Cite this