Skip to main navigation Skip to search Skip to main content

Data from: Integrating environmental DNA metabarcoding and remote sensing reveals known and novel fish diversity hotspots in a World Heritage Area

  • Manuela R. Bizzozzero (Creator)
  • Svenja M. Marfurt (Creator)
  • Florian Altermatt (Creator)
  • Erik P. Willems (Creator)
  • Alexander Damm-Reiser (Creator)
  • Simon Allen (Creator)
  • Jean Claude Walser (Creator)
  • Michael Krutzen (Creator)
  • Swiss National Science Foundation (Sponsor)

Dataset

Description

Aim Shark Bay, a UNESCO World Heritage site in Western Australia, is
highly vulnerable to climate change, yet its fish biodiversity remains
poorly understood at fine spatial scales. We integrated environmental DNA
(eDNA) metabarcoding with high-resolution remote sensing to assess and
extrapolate fish diversity patterns, providing a scalable framework for
biodiversity monitoring in dynamic coastal ecosystems. Location Shark Bay,
Western Australia. Methods We analysed 270 water samples across 560 km²
using fish-specific 16S and 12S rRNA metabarcoding, linking biodiversity
patterns to key environmental variables—including depth, salinity, sea
surface temperature, and habitat characteristics—derived from
high-resolution satellite imagery. To predict fish biodiversity across
unsampled areas, we employed machine-learning models, enabling spatial
extrapolation of eDNA data across the seascape. Results eDNA metabarcoding
identified 107 fish species across 132 genera and 71 families, with
substantial overlap with conventional monitoring but broader coverage at
higher taxonomic levels. Fish richness increased with decreasing salinity,
high channel habitat coverage, and moderate depths with high seagrass
coverage. We delineated five distinct fish communities (A–E): Two shallow
seagrass communities — one in sparse seagrass (A) and another dense
seagrass (B), one in channel habitats (C) with the greatest fish
diversity; one in deep sandy waters (D) and one in medium-depth,
seagrass-free areas (E). Additionally, we detected several tropical
species, suggesting poleward shifts due to rising water temperatures. Main
conclusions This study highlights the utility of combining marine eDNA
metabarcoding with remote sensing to detect fine-scale biodiversity. The
integration of machine learning enables spatial upscaling and timely
responses to habitat changes, enhancing marine conservation and
management. By identifying key environmental drivers of fish diversity,
this approach supports proactive conservation strategies, providing a
scalable model for biodiversity monitoring under climate change. 

1 Environmental DNA
1.1 Sampling Design Our
sampling areas (combined ca. 557 km2) comprised two
long-term dolphin research sites within the eastern (ca. 230
km2) and western (ca. 327
km2) gulfs of Shark Bay, Western Australia. To
support future research on the feeding ecology of Shark Bay’s iconic
bottlenose dolphins (Connor and Krützen, 2015), we focused our
biodiversity assessment on fish taxa. As dolphin behavioural data is
typically collected during austral winter, we aimed to capture a
representative snapshot of fish biodiversity during this season.
To maximise the biological signal while minimising sampling
effort, we employed a stratified random sampling design, thus enhancing
sample representativeness and efficient capture of underlying biological
patterns (Altermatt et al., 2023; Carvalho et al., 2016). Sampling units
were derived from the 2016 “Shark Bay Marine Habitat Classification” a
byproduct of the 2016 seagrass extent from Strydom et al., (2020)
published as map in Sutton and Shaw, (2020). To account
for the diffuse nature of eDNA samples, we divided both gulf study sites
into 500 x 500 m grid cells (hereafter sampling grid), ensuring a minimum
sampling distance of 500 m. We considered this distance adequate as other
eDNA studies in nearshore marine environments report effective sampling
ranges from less than 100 m (O’Donnell et al., 2017; Port et al., 2016) to
800 m (Yamamoto et al., 2017).  1.2 Sampling
and extraction All eDNA samples were
collected between August 30 and September 27, 2021, by filtering seawater
through 0.45 µm CN (Cellulose-Nitrate) filters using a peristaltic pump
(GeoPumpTM, Geotech Environmental Equipment, Inc.,
Denver, Colorado) on site. In total, we sampled 45 locations and collected
274 samples, including four field negative controls. Samples were
collected at the geographical centre of the selected grid cells at
mid-water depth. At each location, we collected six
samples of 3 L each, filtering a total of 18 L of sea water per location.
We immediately stored the filter papers in Longmire’s solution (Longmire
et al., 1997) at room temperature until eDNA extraction, following the
procedure described by Bizzozzero et al. (2024).
1.3 PCR, library preparation, and
sequencing To cover a broad range of fish
species, we amplified the samples targeting two fish-specific metabarcodes
in different genomic regions as recommended by Kumar et al., (2022): a 16S
rRNA gene fragment, hereafter Fish16S, and a 12S rRNA gene fragment,
hereafter MiFish12S. For each metabarcode, we generated and sequenced
separate libraries following a published protocol (Bizzozzero et al.,
2024). We checked for possible contaminants, including several negative
controls and two positive controls: a mock community (MC); and a positive
index control (PCIndex).
1.4 Data processing and taxonomic
assignments To facilitate data processing,
the UNOISE3 workflow, as part of the USERACH framework
(v11.0.667_i86linux64), was applied (Edgar, 2016). After removing
PhiX-related reads and those with low complexity, the paired-end reads
were merged. To improve the merging process, low-quality read ends were
trimmed. The primer sites were then removed, and the amplicon reads were
filtered based on standard quality criteria (e.g., minimum mean quality,
length range, and GC-content range). The cleaned amplicon reads were
processed into operational taxonomic units (OTUs) using the zero-radius
clustering approach (ZOTUs). Finally, the cleaned amplicon reads were
mapped to the ZOTUs to generate count tables (detailed workflow and
thresholds provided in the summary script of this repository).
Taxonomic classification was performed using SINTAX, a
k-mer-based method (Edgar, 2016). ZOTUs from the MiFish12S dataset were
annotated with the MIDORI2 srRNA database (GB248), whereas the Fish16S
dataset was enriched through annotations from multiple sources, including
MIDORI2 (GB259), MitoFish (v397), and NCBI RefSeq
(Fish-16S-v240202). We processed and analysed our data
in Rstudio V2022.07.2 (RStudio Team, 2022), using R 4.3.0 (R Core Team,
2023). To improve data quality, we used read counts from positive and
negative controls to remove non-target taxa and external contaminants.
Negative controls helped identify and exclude contamination. ZOTUs were
filtered using a false assignment threshold (MiFish12S: 0.155%, Fish16S:
0.048%) based on PCIndex reads~~ to correct for
sequencing errors (Galan et al., 2018). Finally, samples with
dysfunctional PCRs (MiFish12S and Fish16S: M2046, M2117; MiFish12S only:
M1027) were visually identified and removed following Taberlet et al.
(2018). We evaluated correctness of taxonomic
assignments by checking whether the identified taxa were documented in the
tropical Indo-West Pacific marine bioregion (Briggs and Bowen, 2012). This
was based on data from the Global Biodiversity Information Facility (GBIF,
2001; accessed: 24.04.2024), the Australian Faunal Directory (ABRS, 2020;
accessed: 24.04.2024) and FishBase (FishBase, 2021; accessed 24.04.2024).
If a taxon was not recorded in the region, we reassigned it to a lower
taxonomic level that more plausibly occurs in Shark Bay.
2 Environmental data acquisition and
processing We extracted marine habitat
types, i.e., channel, sand, sand/silt, seagrass, and turf algae, from the
2016 “Shark Bay Marine Habitat Classification” a byproduct of the 2016
seagrass extent from Strydom et al., (2020) published as map in Sutton and
Shaw, (2020), which also informed our sampling design. Given the potential
variability in the extent of seagrass in Shark Bay across different years
(Strydom et al., 2020), we adjusted the seagrass extent in the habitat map
using 2021 Sentinel-2 (level 2A) satellite imagery (Copernicus Marine
Service Information, 2023a) applying a random forest algorithm in the
Google Earth Engine (Gorelick et al., 2017) with a custom JavaScript. We
acquired the highest available resolution (10–1000 m) of satellite-derived
data describing SST, Chlo-a, and total suspended matter (TSM) of our
region of interest. Furthermore, we constructed bathymetry derived values
(depth, slope, complexity) based on a publicly available bathymetries
(Beaman, 2023; Lebrec et al., 2021). Environmental variables were
aggregated to the 500 × 500 m eDNA sampling grid. Continuous variables
with higher resolution, such as bathymetry, Chlo-a, and TSM, were
processed using ‘bilinear’ reprojection (‘raster’ package; Hijmans, 2023)
for bathymetry, while median values were calculated for Chlo-a and TSM.
Percentage coverage of categorical habitat types (10 × 10 m grid) was
calculated within the sampling grid. For further details on data
acquisition and processing refer to the Supplementary Information of the
associated Manuscript DDI-2025-0112 (in
Diversity and Distributions).

## Integrating Environmental DNA Metabarcoding and Remote Sensing Reveals
Known and Novel Fish Diversity Hotspots in a World Heritage Area The dryad
repository contains all the eDNA raw data, filtering steps, and meta data
used in the framework of this in this study associated with the manuscript
*Integrating Environmental DNA Metabarcoding and Remote Sensing Reveals
Known and Novel Fish Diversity Hotspots in a World Heritage
Area*(DDI-2025-0112). The data consists of 273 eDNA samples and 17
negative controls. All samples have been sequences in two sequencing runs
p751_run_250522 for the MiFish12S metabarcode and p751_run_220617 for the
Fish16S metabarcode ## Datasets: All datasets are contained in
the eDNA_unveils_fine-scale_fish_biodiversity_datasets.zip file *
**Sample_Overview.csv**: an overview of all samples and their metadata
used in the study: * Sample.NR: unique Sample ID * Location: ID of
location, samples taken from the same location have the same ID *
Extraction Date: Date of DNA extraction * Sampling Date: Date of eDNA
sample collection * Study Site: Which gulf (western or eastern) of Shark
Bay the sample was collected in. WG = Western Guld, EG = Eastern Gulf *
longitude/latitude: GPS location of sample taken * Sample.Depth: Depth at
which the sample was taken [m] * type: S for Sample, FNC for Field
negative Control, ENC for Extraction negative Control * Bathymetry [m],
Channel_Perc. [% within 500 m cell], Complexity [Bathymetry SD within 500
m cell], Distance_to_Shore [m], Sand_Perc.[% within 500 m cell],
Sand_Silt_Perc.[% within 500 m cell], Seagrass_Perc.[% within 500 m cell],
Slope [Degree], Salinity [psu], Sea surface temperature daily difference
(SST_Daily_Diff)[°C], Sea surface temperature (SST) [°C],
Turf_Algae_Perc.[% within 500 m cell]: Environmental variables extracted
from remote sensing data at the sampling location * in_rich: if yes the
sample was included for the analysis on richness in our study * in_comp:
if yes the sample was included for the composition analysis in our study
The rest of the data is organised by metabarcode sequencing runs. There is
one folder for each run: * **Fish16S** * **MiFish12S**  * In each folder
there are: * Mapfile: containing the index, primer and run information for
each sample * xx__RawData.zip: contains an 'a_data' folder with
all the raw reads for all samples, R1 denotes forward reads, R2 reverse
reads; and 'y_help' folder containing md5sums for each sample. *
WorkflowSummaryLog: Log file of the data filtering, ZOTU clustering and
taxonomic assignment steps * xx_ZOTU_tax: Taxonomic assignments for each
ZOTU with assignment confidence in brackets * xx_ZOTU_Count_TH90: ZOTU
count table and taxonomic assignment as used in the study
Date made available12 Nov 2025
PublisherDRYAD

Cite this