This thesis is focussed on finding clusters and outliers in the subspaces of high-dimensional data. A subspace is a subset of the data dimensions. The number of subspaces increases exponentially with the increase in the data dimensionality, which poses challenges. We propose SUBSCALE, a scalable and efficient subspace clustering algorithm, to find non-redundant clusters without using expensive indexing structures or performing multiple data scans. Using parallel and distributed implementations, we bring further improvements in the performance of the SUBSCALE algorithm. Finally, we extend the SUBSCALE algorithm to find subspace outliers and rank them by strength of their outlying behaviour.
|Qualification||Doctor of Philosophy|
|Award date||5 Oct 2016|
|Publication status||Unpublished - 2016|