Big Data Architecture in Radio Astronomy: The effectiveness of the Hadoop/Hive/Spark ecosystem in data analysis of large astronomical data collections

Geoffrey Jack Duniam

    Research output: ThesisMaster's Thesis

    819 Downloads (Pure)

    Abstract

    In this study, alternatives to the classical High Performance Computing environment (MPl/OpenMP) are investigated for large scale astronomy data analysis, specifically the Spark/Hive/Hadoop ecosystem. Combined with astronomy specific Python based machine learning libraries in the context of the analysis of very large collections of data, this framework was then tested with a range of benchmarking exercises. This framework has been found to be very effective, and although it may not outperform MPI/OpenMP, it offers reliability, elasticity, scalability and ease of use.
    Original languageEnglish
    QualificationMasters
    Awarding Institution
    • The University of Western Australia
    Supervisors/Advisors
    • Kitaeff, Slava, Supervisor
    • Datta, Amitava, Supervisor
    Award date20 Jul 2017
    DOIs
    Publication statusUnpublished - 2017

    Fingerprint

    Dive into the research topics of 'Big Data Architecture in Radio Astronomy: The effectiveness of the Hadoop/Hive/Spark ecosystem in data analysis of large astronomical data collections'. Together they form a unique fingerprint.

    Cite this