Description
The OurDNA dataset is composed of harmonised, aggregated genome and exome sequences from the OurDNA program and provides the foundational reference set used by the OurDNA browser. The OurDNA program is a flagship initiative of the Centre for Population Genomics to increase the genomic representation of Australian multicultural communities. The OurDNA program aims to aggregate and share genetic variation data from over 20,000 Australians, including 8,000 new high-quality whole genome sequences from participants from genomically underrepresented groups recruited following participatory community engagement.
The OurDNA Browser is a resource intended for clinicians and researchers with formal training in genetics and genomics who understand the limitations of population genetic data. Use of the dataset is subject to conditions of use as outlined in OurDNA browser policies.
The OurDNA dataset v1 (GRCh38) includes 12,882 individuals:
10,671 exomes
2,211 genomes
Short variants
Total SNVs: 57,322,471
Total INDELs: 4,567,608
Variant type counts
Synonymous: 719,413
Missense: 1,321,931
Nonsense: 35,735
Frameshift: 36,991
Canonical splice site: 33,237
Versioned, aggregate data are available for download. Download instructions are provided on the OurDNA browser.
Methods
The OurDNA dataset contains individuals sequenced using a mix of exome and genome capture methods and sequencing chemistries, so coverage varies between individuals and across sites. This variation in coverage is incorporated into the variant frequency calculations for each variant. Data were QCed and analyzed using the Hail open-source framework for scalable genetic analysis.
All of the raw data from contributing projects and the OurDNA project have been (re)processed through equivalent pipelines to increase consistency across projects. Short-read whole genome sequencing data was processed according to the DRAGEN-GATK Best Practices guidelines. This includes alignment to GRCh38 using the open-source DRAGEN mapper (DRAGMAP, v1.3.0), and variant calling with GATK v4.2.6.1 HaplotypeCaller to discover single-nucleotide variants (SNVs) and insertion-deletions (indels). All samples were aggregated using the hail gVCF Combiner, and then sample and variant quality control was performed on the joint call set in line with gnomAD best practices.
Funding
Garvan Institute of Medical Research (https://ror.org/01b3dvp57) and Murdoch Children’s Research Institute (https://ror.org/048fyec77) contribute to the development of this resource via their significant funding support for the Centre for Population Genomics, enabled through the generosity of donors.
Funding for this research has also been provided by the Australian Government’s Medical Research Future Fund (MRFF) grant 2015969 (CIA Daniel MacArthur; 2022-2027) from the Genomics Health Futures Mission and by the National Health and Medical Research Council (NHMRC, https://ror.org/011kf5r70) investigator grant 2009982 (CIA Daniel MacArthur; 2022-2026).
The contents of this published material are solely the responsibility of the authors and do not reflect the views of the Commonwealth of Australia or the NHMRC.
Christopher Richards, Katrina de Lange and Jennifer Piscionere are joint first authors.
The OurDNA Browser is a resource intended for clinicians and researchers with formal training in genetics and genomics who understand the limitations of population genetic data. Use of the dataset is subject to conditions of use as outlined in OurDNA browser policies.
The OurDNA dataset v1 (GRCh38) includes 12,882 individuals:
10,671 exomes
2,211 genomes
Short variants
Total SNVs: 57,322,471
Total INDELs: 4,567,608
Variant type counts
Synonymous: 719,413
Missense: 1,321,931
Nonsense: 35,735
Frameshift: 36,991
Canonical splice site: 33,237
Versioned, aggregate data are available for download. Download instructions are provided on the OurDNA browser.
Methods
The OurDNA dataset contains individuals sequenced using a mix of exome and genome capture methods and sequencing chemistries, so coverage varies between individuals and across sites. This variation in coverage is incorporated into the variant frequency calculations for each variant. Data were QCed and analyzed using the Hail open-source framework for scalable genetic analysis.
All of the raw data from contributing projects and the OurDNA project have been (re)processed through equivalent pipelines to increase consistency across projects. Short-read whole genome sequencing data was processed according to the DRAGEN-GATK Best Practices guidelines. This includes alignment to GRCh38 using the open-source DRAGEN mapper (DRAGMAP, v1.3.0), and variant calling with GATK v4.2.6.1 HaplotypeCaller to discover single-nucleotide variants (SNVs) and insertion-deletions (indels). All samples were aggregated using the hail gVCF Combiner, and then sample and variant quality control was performed on the joint call set in line with gnomAD best practices.
Funding
Garvan Institute of Medical Research (https://ror.org/01b3dvp57) and Murdoch Children’s Research Institute (https://ror.org/048fyec77) contribute to the development of this resource via their significant funding support for the Centre for Population Genomics, enabled through the generosity of donors.
Funding for this research has also been provided by the Australian Government’s Medical Research Future Fund (MRFF) grant 2015969 (CIA Daniel MacArthur; 2022-2027) from the Genomics Health Futures Mission and by the National Health and Medical Research Council (NHMRC, https://ror.org/011kf5r70) investigator grant 2009982 (CIA Daniel MacArthur; 2022-2026).
The contents of this published material are solely the responsibility of the authors and do not reflect the views of the Commonwealth of Australia or the NHMRC.
Christopher Richards, Katrina de Lange and Jennifer Piscionere are joint first authors.
| Date made available | 14 May 2025 |
|---|---|
| Publisher | Zenodo |
Cite this
- DataSetCite