TY - JOUR
T1 - A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference
AU - Chan, Chon Kit Kenneth
AU - Rosic, Nedeljka
AU - Lorenc, Michał T.
AU - Visendi, Paul
AU - Lin, Meng
AU - Kaniewska, Paulina
AU - Ferguson, Brett J.
AU - Gresshoff, Peter M.
AU - Batley, Jacqueline
AU - Edwards, David
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.
AB - Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.
KW - Coral
KW - Host-microbe symbiosis
KW - K-mer analysis
KW - Meta-transcriptome
KW - RNA-Seq
KW - Soybean
UR - http://www.scopus.com/inward/record.url?scp=85057227053&partnerID=8YFLogxK
U2 - 10.1007/s10142-018-0647-3
DO - 10.1007/s10142-018-0647-3
M3 - Article
C2 - 30483906
AN - SCOPUS:85057227053
VL - 19
SP - 363
EP - 371
JO - Functional & Integrative Genomics
JF - Functional & Integrative Genomics
SN - 1438-793X
IS - 2
ER -