TY - JOUR
T1 - The GENCODE v7 catalog of human long noncoding RNAs
T2 - Analysis of their gene structure, evolution, and expression
AU - Derrien, Thomas
AU - Johnson, Rory
AU - Bussotti, Giovanni
AU - Tanzer, Andrea
AU - Djebali, Sarah
AU - Tilgner, Hagen
AU - Guernec, Gregory
AU - Martin, David
AU - Merkel, Angelika
AU - Knowles, David G.
AU - Lagarde, Julien
AU - Veeravalli, Lavanya
AU - Ruan, Xiaoan
AU - Ruan, Yijun
AU - Lassmann, Timo
AU - Carninci, Piero
AU - Brown, James B.
AU - Lipovich, Leonard
AU - Gonzalez, Jose M.
AU - Thomas, Mark
AU - Davis, Carrie A.
AU - Shiekhattar, Ramin
AU - Gingeras, Thomas R.
AU - Hubbard, Tim J.
AU - Notredame, Cedric
AU - Harrow, Jennifer
AU - Guigó, Roderic
PY - 2012/9
Y1 - 2012/9
N2 - The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences - particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
AB - The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences - particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
UR - http://www.scopus.com/inward/record.url?scp=84865727393&partnerID=8YFLogxK
U2 - 10.1101/gr.132159.111
DO - 10.1101/gr.132159.111
M3 - Article
C2 - 22955988
AN - SCOPUS:84865727393
SN - 1088-9051
VL - 22
SP - 1775
EP - 1789
JO - Genome Research
JF - Genome Research
IS - 9
ER -