Fast and scalable subspace clustering of high dimensional data

Amardeep Kaur

    Research output: ThesisDoctoral Thesis

    338 Downloads (Pure)

    Abstract

    This thesis is focussed on finding clusters and outliers in the subspaces of high-dimensional data. A subspace is a subset of the data dimensions. The number of subspaces increases exponentially with the increase in the data dimensionality, which poses challenges. We propose SUBSCALE, a scalable and efficient subspace clustering algorithm, to find non-redundant clusters without using expensive indexing structures or performing multiple data scans. Using parallel and distributed implementations, we bring further improvements in the performance of the SUBSCALE algorithm. Finally, we extend the SUBSCALE algorithm to find subspace outliers and rank them by strength of their outlying behaviour.
    Original languageEnglish
    QualificationDoctor of Philosophy
    Awarding Institution
    • The University of Western Australia
    Award date5 Oct 2016
    Publication statusUnpublished - 2016

    Fingerprint

    Clustering algorithms

    Cite this

    @phdthesis{ce046aee3549426cba35bb94bbea445d,
    title = "Fast and scalable subspace clustering of high dimensional data",
    abstract = "This thesis is focussed on finding clusters and outliers in the subspaces of high-dimensional data. A subspace is a subset of the data dimensions. The number of subspaces increases exponentially with the increase in the data dimensionality, which poses challenges. We propose SUBSCALE, a scalable and efficient subspace clustering algorithm, to find non-redundant clusters without using expensive indexing structures or performing multiple data scans. Using parallel and distributed implementations, we bring further improvements in the performance of the SUBSCALE algorithm. Finally, we extend the SUBSCALE algorithm to find subspace outliers and rank them by strength of their outlying behaviour.",
    keywords = "High dimension of data, Subspace clustering, Outlier detection, Outlier ranking, Data clustering, Data cleaning, Anomoly detection",
    author = "Amardeep Kaur",
    year = "2016",
    language = "English",
    school = "The University of Western Australia",

    }

    Kaur, A 2016, 'Fast and scalable subspace clustering of high dimensional data', Doctor of Philosophy, The University of Western Australia.

    Fast and scalable subspace clustering of high dimensional data. / Kaur, Amardeep.

    2016.

    Research output: ThesisDoctoral Thesis

    TY - THES

    T1 - Fast and scalable subspace clustering of high dimensional data

    AU - Kaur, Amardeep

    PY - 2016

    Y1 - 2016

    N2 - This thesis is focussed on finding clusters and outliers in the subspaces of high-dimensional data. A subspace is a subset of the data dimensions. The number of subspaces increases exponentially with the increase in the data dimensionality, which poses challenges. We propose SUBSCALE, a scalable and efficient subspace clustering algorithm, to find non-redundant clusters without using expensive indexing structures or performing multiple data scans. Using parallel and distributed implementations, we bring further improvements in the performance of the SUBSCALE algorithm. Finally, we extend the SUBSCALE algorithm to find subspace outliers and rank them by strength of their outlying behaviour.

    AB - This thesis is focussed on finding clusters and outliers in the subspaces of high-dimensional data. A subspace is a subset of the data dimensions. The number of subspaces increases exponentially with the increase in the data dimensionality, which poses challenges. We propose SUBSCALE, a scalable and efficient subspace clustering algorithm, to find non-redundant clusters without using expensive indexing structures or performing multiple data scans. Using parallel and distributed implementations, we bring further improvements in the performance of the SUBSCALE algorithm. Finally, we extend the SUBSCALE algorithm to find subspace outliers and rank them by strength of their outlying behaviour.

    KW - High dimension of data

    KW - Subspace clustering

    KW - Outlier detection

    KW - Outlier ranking

    KW - Data clustering

    KW - Data cleaning

    KW - Anomoly detection

    M3 - Doctoral Thesis

    ER -