TY - JOUR
T1 - Machine learning the gap between real and simulated nebulae
T2 - A domain-adaptation approach to classify ionised nebulae in nearby galaxies
AU - Belfiore, Francesco
AU - Ginolfi, Michele
AU - Blanc, Guillermo
AU - Boquien, Mederic
AU - Chevance, Melanie
AU - Congiu, Enrico
AU - Glover, Simon C.O.
AU - Groves, Brent
AU - Klessen, Ralf S.
AU - Eduardo Méndez-Delgado, J.
AU - Williams, Thomas G.
N1 - Publisher Copyright:
© The Authors 2025.
PY - 2025/2/14
Y1 - 2025/2/14
N2 - Classifying ionised nebulae in nearby galaxies is crucial to studying stellar feedback mechanisms and understanding the physical conditions of the interstellar medium. This classification task is generally performed by comparing observed line ratios with photoionisation simulations of different types of nebulae (HII regions, planetary nebulae, and supernova remnants). However, due to simplifying assumptions, such simulations are generally unable to fully reproduce the line ratios in observed nebulae. This discrepancy limits the performance of the classical machine-learning approach, where a model is trained on the simulated data and then used to classify real nebulae. For this study, we used a domain-adversarial neural network (DANN) to bridge the gap between photoionisation models (source domain) and observed ionised nebulae from the PHANGS-MUSE survey (target domain). The DANN is an example of a domain-adaptation algorithm, whose goal is to maximise the performance of a model trained on labelled data in the source domain on an unlabelled target domain by extracting domain-invariant features. Our results indicate a significant improvement in classification performance in the target domain when employing the DANN framework compared to a classical neural network (NN) classifier. Additionally, we investigated the impact of adding noise to the source dataset, finding that noise injection acts as a form of regularisation, further enhancing the performances of both the NN and DANN models on the observational data. The combined use of domain adaptation and noise injection improved the classification accuracy in the target domain by 23%. This study highlights the potential of domain adaptation methods in tackling the domain-shift challenge when using theoretical models to train machine-learning pipelines in astronomy.
AB - Classifying ionised nebulae in nearby galaxies is crucial to studying stellar feedback mechanisms and understanding the physical conditions of the interstellar medium. This classification task is generally performed by comparing observed line ratios with photoionisation simulations of different types of nebulae (HII regions, planetary nebulae, and supernova remnants). However, due to simplifying assumptions, such simulations are generally unable to fully reproduce the line ratios in observed nebulae. This discrepancy limits the performance of the classical machine-learning approach, where a model is trained on the simulated data and then used to classify real nebulae. For this study, we used a domain-adversarial neural network (DANN) to bridge the gap between photoionisation models (source domain) and observed ionised nebulae from the PHANGS-MUSE survey (target domain). The DANN is an example of a domain-adaptation algorithm, whose goal is to maximise the performance of a model trained on labelled data in the source domain on an unlabelled target domain by extracting domain-invariant features. Our results indicate a significant improvement in classification performance in the target domain when employing the DANN framework compared to a classical neural network (NN) classifier. Additionally, we investigated the impact of adding noise to the source dataset, finding that noise injection acts as a form of regularisation, further enhancing the performances of both the NN and DANN models on the observational data. The combined use of domain adaptation and noise injection improved the classification accuracy in the target domain by 23%. This study highlights the potential of domain adaptation methods in tackling the domain-shift challenge when using theoretical models to train machine-learning pipelines in astronomy.
KW - Galaxies: ISM
KW - HII regions
KW - Methods: data analysis
KW - Methods: statistical
UR - http://www.scopus.com/inward/record.url?scp=85217544345&partnerID=8YFLogxK
U2 - 10.1051/0004-6361/202451934
DO - 10.1051/0004-6361/202451934
M3 - Article
AN - SCOPUS:85217544345
SN - 0004-6361
VL - 694
SP - 1
EP - 11
JO - Astronomy and Astrophysics
JF - Astronomy and Astrophysics
M1 - A212
ER -