Dataset for training and evaluating RFI detection schemes representing MeerKat instrumentation and predominantly satellite-based contamination. These datasets are produced using Tabascal and output in hdf5 format. The choice of format is to allow for easy use with machine-learning workflows, not other astronomy pipelines (for example, measurement sets). These datasets are prepared for immediate loading with Tensorflow. The attached config.json files describe the parameters used to generate these datasets.
Dataset parameters Name Num Satellite Sources Num Ground RFI Sources obs_100AST_0SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09 0 0 obs_100AST_1SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09 1 0 obs_100AST_1SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09 1 3 obs_100AST_2SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09 2 0 obs_100AST_2SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09 2 3
Using simulated data allows for access to ground truth for noise contamination. As such, these datasets contain the observation visibility amplitudes (without noise), noise visibilities and boolean pixel-wise masks at several thresholds on the noise visibilities. We outline the dimensions of all datasets below:
Dataset Dimensions Field vis masks_orig masks_0 masks_1 masks_2 masks_4 masks_8 masks_16 Datatype float32 float32 bool bool bool bool bool bool Of course, one can produce masks at arbitrary thresholds, but for convenience, we include several pre-computed options.
All datasets and all fields have the dimensions 512, 512, 512, 1 (baseline, time, frequency, amplitude/mask)
Date made available | 31 Oct 2023 |
---|
Publisher | Zenodo |
---|