A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference

Chon Kit Kenneth Chan, Nedeljka Rosic, Michał T. Lorenc, Paul Visendi, Meng Lin, Paulina Kaniewska, Brett J. Ferguson, Peter M. Gresshoff, Jacqueline Batley, David Edwards

Research output: Contribution to journalArticle

Abstract

Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.

Original languageEnglish
Pages (from-to)363-371
Number of pages9
JournalFunctional and Integrative Genomics
Volume19
Issue number2
DOIs
Publication statusPublished - 1 Mar 2019

Fingerprint

Transcriptome
RNA
Genome
Gene Expression
Anthozoa
DNA Sequence Analysis
Soybeans
Datasets
Technology
Costs and Cost Analysis
Polymerase Chain Reaction
Population
Genes

Cite this

Chan, Chon Kit Kenneth ; Rosic, Nedeljka ; Lorenc, Michał T. ; Visendi, Paul ; Lin, Meng ; Kaniewska, Paulina ; Ferguson, Brett J. ; Gresshoff, Peter M. ; Batley, Jacqueline ; Edwards, David. / A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference. In: Functional and Integrative Genomics. 2019 ; Vol. 19, No. 2. pp. 363-371.
@article{fb70ea43a1e748de982d16b4c28f65ef,
title = "A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference",
abstract = "Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.",
keywords = "Coral, Host-microbe symbiosis, K-mer analysis, Meta-transcriptome, RNA-Seq, Soybean",
author = "Chan, {Chon Kit Kenneth} and Nedeljka Rosic and Lorenc, {Michał T.} and Paul Visendi and Meng Lin and Paulina Kaniewska and Ferguson, {Brett J.} and Gresshoff, {Peter M.} and Jacqueline Batley and David Edwards",
year = "2019",
month = "3",
day = "1",
doi = "10.1007/s10142-018-0647-3",
language = "English",
volume = "19",
pages = "363--371",
journal = "Functional & Integrative Genomics",
issn = "1438-793X",
publisher = "Springer-Verlag London Ltd.",
number = "2",

}

A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference. / Chan, Chon Kit Kenneth; Rosic, Nedeljka; Lorenc, Michał T.; Visendi, Paul; Lin, Meng; Kaniewska, Paulina; Ferguson, Brett J.; Gresshoff, Peter M.; Batley, Jacqueline; Edwards, David.

In: Functional and Integrative Genomics, Vol. 19, No. 2, 01.03.2019, p. 363-371.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference

AU - Chan, Chon Kit Kenneth

AU - Rosic, Nedeljka

AU - Lorenc, Michał T.

AU - Visendi, Paul

AU - Lin, Meng

AU - Kaniewska, Paulina

AU - Ferguson, Brett J.

AU - Gresshoff, Peter M.

AU - Batley, Jacqueline

AU - Edwards, David

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.

AB - Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.

KW - Coral

KW - Host-microbe symbiosis

KW - K-mer analysis

KW - Meta-transcriptome

KW - RNA-Seq

KW - Soybean

UR - http://www.scopus.com/inward/record.url?scp=85057227053&partnerID=8YFLogxK

U2 - 10.1007/s10142-018-0647-3

DO - 10.1007/s10142-018-0647-3

M3 - Article

VL - 19

SP - 363

EP - 371

JO - Functional & Integrative Genomics

JF - Functional & Integrative Genomics

SN - 1438-793X

IS - 2

ER -