DALiuGE: A graph execution framework for harnessing the astronomical data deluge

C. Wu, R. Tobar, K. Vinsen, A. Wicenec, D. Pallot, B. Lao, R. Wang, T. T. Fricke, M. Boulton, I. Cooper, R. Dodson, M. Dolensky, Yaxing Mei, F. Wang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The Data Activated Liu 1 Graph Engine – DALiuGE2– is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both datasets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry datasets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalAstronomy and Computing
Volume20
DOIs
Publication statusPublished - 1 Jul 2017

Fingerprint

Pipelines
data processing equipment
supercomputers
Very Large Array (VLA)
Supercomputers
data reduction
Processing
Interferometry
Telescopes
engines
Scalability
Data reduction
resources
interferometry
actuators
prototypes
telescopes
Engines

Cite this

Wu, C. ; Tobar, R. ; Vinsen, K. ; Wicenec, A. ; Pallot, D. ; Lao, B. ; Wang, R. ; Fricke, T. T. ; Boulton, M. ; Cooper, I. ; Dodson, R. ; Dolensky, M. ; Mei, Yaxing ; Wang, F. / DALiuGE : A graph execution framework for harnessing the astronomical data deluge. In: Astronomy and Computing. 2017 ; Vol. 20. pp. 1-15.
@article{0919bc97816e46c4aa60c6a79a71278a,
title = "DALiuGE: A graph execution framework for harnessing the astronomical data deluge",
abstract = "The Data Activated Liu 1 Graph Engine – DALiuGE2– is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both datasets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry datasets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.",
keywords = "Data driven, Dataflow, Graph execution engine, Many-task computing, Square kilometre array",
author = "C. Wu and R. Tobar and K. Vinsen and A. Wicenec and D. Pallot and B. Lao and R. Wang and Fricke, {T. T.} and M. Boulton and I. Cooper and R. Dodson and M. Dolensky and Yaxing Mei and F. Wang",
year = "2017",
month = "7",
day = "1",
doi = "10.1016/j.ascom.2017.03.007",
language = "English",
volume = "20",
pages = "1--15",
journal = "Astronomy and Computing",
issn = "2213-1337",
publisher = "Elsevier",

}

Wu, C, Tobar, R, Vinsen, K, Wicenec, A, Pallot, D, Lao, B, Wang, R, Fricke, TT, Boulton, M, Cooper, I, Dodson, R, Dolensky, M, Mei, Y & Wang, F 2017, 'DALiuGE: A graph execution framework for harnessing the astronomical data deluge' Astronomy and Computing, vol. 20, pp. 1-15. https://doi.org/10.1016/j.ascom.2017.03.007

DALiuGE : A graph execution framework for harnessing the astronomical data deluge. / Wu, C.; Tobar, R.; Vinsen, K.; Wicenec, A.; Pallot, D.; Lao, B.; Wang, R.; Fricke, T. T.; Boulton, M.; Cooper, I.; Dodson, R.; Dolensky, M.; Mei, Yaxing; Wang, F.

In: Astronomy and Computing, Vol. 20, 01.07.2017, p. 1-15.

Research output: Contribution to journalArticle

TY - JOUR

T1 - DALiuGE

T2 - A graph execution framework for harnessing the astronomical data deluge

AU - Wu, C.

AU - Tobar, R.

AU - Vinsen, K.

AU - Wicenec, A.

AU - Pallot, D.

AU - Lao, B.

AU - Wang, R.

AU - Fricke, T. T.

AU - Boulton, M.

AU - Cooper, I.

AU - Dodson, R.

AU - Dolensky, M.

AU - Mei, Yaxing

AU - Wang, F.

PY - 2017/7/1

Y1 - 2017/7/1

N2 - The Data Activated Liu 1 Graph Engine – DALiuGE2– is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both datasets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry datasets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.

AB - The Data Activated Liu 1 Graph Engine – DALiuGE2– is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both datasets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry datasets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.

KW - Data driven

KW - Dataflow

KW - Graph execution engine

KW - Many-task computing

KW - Square kilometre array

UR - http://www.scopus.com/inward/record.url?scp=85019655766&partnerID=8YFLogxK

U2 - 10.1016/j.ascom.2017.03.007

DO - 10.1016/j.ascom.2017.03.007

M3 - Article

VL - 20

SP - 1

EP - 15

JO - Astronomy and Computing

JF - Astronomy and Computing

SN - 2213-1337

ER -