TY - JOUR
T1 - Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C
AU - Chen, Stephanie H.
AU - Rossetto, Maurizio
AU - van der Merwe, Marlien
AU - Lu-Irving, Patricia
AU - Yap, Jia Yee S.
AU - Sauquet, Hervé
AU - Bourke, Greg
AU - Amos, Timothy G.
AU - Bragg, Jason G.
AU - Edwards, Richard J.
N1 - Funding Information:
We would like to acknowledge the contribution of the Genomics for Australian Plants Framework Initiative consortium (https://www.genomicsforaustralianplants.com/consortium/) in the generation of data used in this publication. The Initiative is supported by funding from Bioplatforms Australia (enabled by NCRIS), the Ian Potter Foundation, Royal Botanic Gardens Foundation (Victoria), Royal Botanic Gardens Victoria, the Royal Botanic Gardens and Domain Trust, the Council of Heads of Australasian Herbaria, CSIRO, Centre for Australian National Biodiversity Research and the Department of Biodiversity, Conservation and Attractions, Western Australia. SHC was supported through an Australian Government Research Training Program Scholarship. RJE was funded by the Australian Research Council (LP160100610 and LP18010072) We thank Stuart Allan for providing access to the sequenced plant and assistance with sample collection at Blue Mountains Botanic Garden and Carolyn Connelly for facilitating access to laboratory materials at the Royal Botanic Garden Sydney. We acknowledge Chris Jackson for advice on repeat annotation. We thank the members of UNSW Research Technology Services, particularly Duncan Smith, for help with software installation on the high-performance computing cluster Katana. We acknowledge Mabel Lum for assistance with the Bioplatforms Australia data portal. ONT and 10x sequencing were conducted at the Australian Genome Research Facility (AGRF). Hi-C library prep and sequencing was conducted at the Ramaciotti Centre for Genomics at the University of New South Wales.
Funding Information:
We would like to acknowledge the contribution of the Genomics for Australian Plants Framework Initiative consortium ( https://www.genomicsforaustralianplants.com/consortium/ ) in the generation of data used in this publication. The Initiative is supported by funding from Bioplatforms Australia (enabled by NCRIS), the Ian Potter Foundation, Royal Botanic Gardens Foundation (Victoria), Royal Botanic Gardens Victoria, the Royal Botanic Gardens and Domain Trust, the Council of Heads of Australasian Herbaria, CSIRO, Centre for Australian National Biodiversity Research and the Department of Biodiversity, Conservation and Attractions, Western Australia. SHC was supported through an Australian Government Research Training Program Scholarship. RJE was funded by the Australian Research Council (LP160100610 and LP18010072)
Publisher Copyright:
© 2022 John Wiley & Sons Ltd.
PY - 2022/7
Y1 - 2022/7
N2 - Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8% of Embryophyta BUSCOs “Complete”. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single-copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1% of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 “Duplicated” BUSCO genes using a new tool, DepthKopy (https://github.com/slimsuite/depthkopy), suggests almost all are real duplications, increasing confidence in the annotation and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.
AB - Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8% of Embryophyta BUSCOs “Complete”. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single-copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1% of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 “Duplicated” BUSCO genes using a new tool, DepthKopy (https://github.com/slimsuite/depthkopy), suggests almost all are real duplications, increasing confidence in the annotation and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.
KW - genome assembly
KW - Hi-C
KW - long-read sequencing
KW - reference genome
KW - Telopea
KW - waratah
UR - http://www.scopus.com/inward/record.url?scp=85122680523&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13574
DO - 10.1111/1755-0998.13574
M3 - Article
C2 - 35016262
AN - SCOPUS:85122680523
SN - 1755-098X
VL - 22
SP - 1836
EP - 1854
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 5
ER -