SUBSCALE: Fast and scalable subspace clustering for high dimensional data

Amardeep Kaur, Amitava Datta

    Research output: Chapter in Book/Conference paperConference paperpeer-review

    5 Citations (Scopus)

    Abstract

    © 2014 IEEE. The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.
    Original languageEnglish
    Title of host publicationProceedings 2014 IEEE International Conference on Data Mining Workshop (ICDMW)
    Place of PublicationNew Jersey, USA
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Pages621-628
    VolumeN/A
    ISBN (Electronic)9781479942756
    ISBN (Print)9781479942749
    DOIs
    Publication statusPublished - 2015
    Event2014 IEEE International Conference on Data Mining Workshop - Shenzhen, China
    Duration: 14 Dec 201414 Dec 2014

    Workshop

    Workshop2014 IEEE International Conference on Data Mining Workshop
    Abbreviated titleICDMW
    Country/TerritoryChina
    CityShenzhen
    Period14/12/1414/12/14

    Fingerprint

    Dive into the research topics of 'SUBSCALE: Fast and scalable subspace clustering for high dimensional data'. Together they form a unique fingerprint.

    Cite this