RECLU: A pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

H. Ohmiya, M. Vitezic, M.C. C. Frith, M. Itoh, P. Carninci, Alistair R.R. Forrest, Y. Hayashizaki, T. Lassmann, I. Alam, D. Albanese, G. Altschuler, R. Andersson, T. Arakawa, J. Archer, E. Arner, P. Arner, M. Babina, K. Baillie, V. Bajic, S. BakerA. Balic, P. Balwierz, A. Beckhouse, N. Bertin, J.A. A. Blake, A. Blumenthal, B. Bodega, A. Bonetti, J. Briggs, F. Brombacher, M. Burroughs, A. Califano, C. Cannistraci, D. Carbajo, Y. Chen, M. Chierici, Y. Ciani, H. Clevers, E. Dalla, C. Daub, C. Davis, M. de Hoon, D. de Lima Morais, M. Detmar, A. Diehl, E. Dimont, T. Dohi, F. Drabros, A. Edge, M. Edinger, K. Ekwall, M. Endoh, H. Enomoto, M. Fagiolini, L. Fairbairn, H. Fang, M.C. C. Farach-Carson, G. Faulkner, A. Favorov, M. Fisher, M. Francescatto, T. Freeman, R. Fujita, S. Fukuda, C. Furlanello, M. Furuno, J-I. I. Furusawa, T.B.H. B.H. Geijtenbeek, A. Gibson, T. Gingeras, D. Goldowitz, J. Gough, S. Guhl, R. Guler, S. Gustincich, T. Ha, V. Haberle, M. Hamaguchi, M. Hara, M. Harbers, J. Harshbarger, A. Hasegawa, Y. Hasegawa, T. Hashimoto, M. Herlyn, P. Heutink, W. Hide, K. Hitchens, S. Ho Sui, O. Hofmann, I. Hoof, F. Hori, D. Hume, L. Huminiecki, K. Ilda, T. Ikawa, Y. Ishizu, B. Jankovic, H. Jia, M. Jorgensen, A. Joshi, G. Jurman, B. Kaczkowski, C. Kai, K. Kaida, A. Kaiho, K. Kajiyama, M. Kanamori-Katayama, A. Kasianov, T. Kasukawa, S. Katayama, S. Kato, S. Kawaguchi, J. Kawai, H. Kawaji, H. Kawamoto, Y. Kawamura, T. Kawashima, J. Kempfle, T. Kenna, J. Kere, L. Khachigian, T. Kitamura, Peter P. Klinken, A. Knox, M. Kojima, S. Kojima, N. Kondo, H. Koseki, S. Koyasu, S. Krampitz, A. Kubosaki, I. Kulakovskiy, A.T.J. T.J. Kwon, J. Laros, T. Lenhard, A. Lennartsson, K. Li, B. Lilje, L. Lipovich, M. Lizio, A. Mackay-Sim, V. Makeev, R. Manabe, J. Mar, B. Marchand, A. Mathelier, Y. Medvedeva, T.F. F. Meehan, A. Mejhert, A. Meynert, Y. Mizuno, H. Morikawa, M. Morimoto, K. Moro, E. Motakis, H. Motohashi, C. Mummery, C.J. J. Mungall, M. Murata, S. Nagao, Y. Nakachi, F. Nakahara, T. Nakamura, Y. Nakamura, K. Nakazato, N. Ninomiya, H. Nishiyori, S. Noma, T. Nozaki, S. Ogishima, N. Ohkura, H. Ohno, M. Ohshima, M. Okada-Hatakeyama, Y. Okzaki, V. Orlando, D. Ovchinnikov, A. Pain, R. Passier, H. Persson, S. Piazza, C. Plessy, S. Pradhan-Bhatt, J. Prendergast, O. Rackham, J. Ramilowski, M. Rashid, T. Ravasi, M. Rehli, P. Rizzu, M. Roncador, S. Roy, M. Rye, E. Saijyo, A. Sajantila, A. Saka, S. Sakaguchi, M. Sakai, A. Sandelin, H. Sato, H. Satoh, S. Savvi, A. Saxena, U. Schaefer, S. Schmeier, C. Schmidl, C. Schneider, E.A. A. Schultes, G. Schulze-Tanzil, A. Schwegmann, C. Semple, T. Sengstag, J. Severin, G. Sheng, H. Shimoji, Y. Shimoni, J. Shin, C. Simon, D. Sugiyama, T. Sugiyama, K. Summers, H. Suzuki, M. Suzuki, N. Suzuki, R. Swoboda, P. T Hoen, M. Tagami, N. Takahashi, J. Takai, H. Tanaka, H. Tatsukawa, Z. Tatum, M. Taylor, M. Thompson, H. Toyoda, T. Toyoda, E. Valen, M. van De Wetering, L. van Den Berg, E. van Nimwegen, R. Verardo, D. Vijayan, I. Vorontzov, W. Wasserman, S. Watanabe, C. Wells, Louise Winteringham, E. Wolvetang, E.J. J. Wood, Y. Yamaguchi, M. Yamamoto, M. Yoneda, Y. Yonekura, S. Yoshida, R. Young, S.E. E. Zabierowski, P. Zhang, X. Zhao, S. Zucchelli

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Background
Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution.

Results
We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not.

Conclusions
By examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches.
Original languageEnglish
Article number269
Number of pages15
JournalBMC Genomics
Volume15
DOIs
Publication statusPublished - 2014

    Fingerprint

Cite this