Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C

Stephanie H. Chen, Maurizio Rossetto, Marlien van der Merwe, Patricia Lu-Irving, Jia Yee S. Yap, Hervé Sauquet, Greg Bourke, Timothy G. Amos, Jason G. Bragg, Richard J. Edwards

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8% of Embryophyta BUSCOs “Complete”. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single-copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1% of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 “Duplicated” BUSCO genes using a new tool, DepthKopy (https://github.com/slimsuite/depthkopy), suggests almost all are real duplications, increasing confidence in the annotation and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.
Original languageEnglish
Pages (from-to)1836-1854
Number of pages19
JournalMolecular Ecology Resources
Volume22
Issue number5
DOIs
Publication statusPublished - Jul 2022
Externally publishedYes

Fingerprint

Dive into the research topics of 'Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C'. Together they form a unique fingerprint.

Cite this