In this study, alternatives to the classical High Performance Computing environment (MPl/OpenMP) are investigated for large scale astronomy data analysis, specifically the Spark/Hive/Hadoop ecosystem. Combined with astronomy specific Python based machine learning libraries in the context of the analysis of very large collections of data, this framework was then tested with a range of benchmarking exercises. This framework has been found to be very effective, and although it may not outperform MPI/OpenMP, it offers reliability, elasticity, scalability and ease of use.
|20 Jul 2017
|Unpublished - 2017