-
Masayoshi Itoh,
Miki Kojima,
Sayaka Nagao-Sato,
Eri Saijo,
Timo Lassmann,
Mutsumi Kanamori-Katayama,
Ai Kaiho,
Marina Lizio,
Hideya Kawaji,
Piero Carninci, Alistair R. R. Forrest,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Background
Cap analysis of gene expression (CAGE) is a 5′ sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol.
Methodology
In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 ‘HeliScope ready’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility.
Conclusions
We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and full-length cDNA generation.
PLoS ONE 01/2012; 7(1):e30809. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The pluripotency of mouse embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) can be maintained by feeder cells, which secrete leukemia inhibitory factor (LIF). We found that feeder cells provide a relatively low concentration (25 unit/ml) of LIF, which is insufficient to maintain the ESCs/iPSCs pluripotency in feeder free conditions. To identify additional factors involved in the maintenance of pluripotency, we carried out a global transcript expression profiling of mouse iPSCs cultured on feeder cells and in feeder-free (LIF-treated) conditions. This identified 17 significantly differentially expressed genes (adjusted p value <0.05) including seven chemokines overexpressed in iPSCs grown on feeder cells. Ectopic expression of these chemokines in iPSCs revealed that CC chemokine ligand 2 (Ccl2) induced the key transcription factor genes for pluripotency, Klf4, Nanog, Sox2, and Tbx3. Furthermore, addition of recombinant Ccl2 protein drastically increased the number of Nanog-green fluorescent protein-positive iPSCs grown in low-LIF feeder free conditions. We further revealed that pluripotency promotion by Ccl2 is mediated by activating the Stat3-pathway followed by Klf4 upregulation. We demonstrated that Ccl2-mediated increased pluripotency is independent of phosphoinositide 3-kinase and mitogen-activated protein kinase pathways and that Tbx3 may be upregulated by Klf4. Overall, Ccl2 cooperatively activates the Stat3-pathway with LIF in feeder-free conditions to maintain pluripotency for ESCs/iPSCs.
Stem Cells 06/2011; 29(8):1196-205. · 7.78 Impact Factor
-
Mutsumi Kanamori-Katayama,
Masayoshi Itoh,
Hideya Kawaji,
Timo Lassmann,
Shintaro Katayama,
Miki Kojima,
Nicolas Bertin,
Ai Kaiho,
Noriko Ninomiya,
Carsten O Daub,
Piero Carninci, Alistair R R Forrest,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 μg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-μg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-μg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5' associated, we also observe a low level of signal on exons that is useful for defining gene structures.
Genome Research 05/2011; 21(7):1150-9. · 13.61 Impact Factor
-
Mutsumi Kanamori-Katayama,
Ai Kaiho,
Yuri Ishizu,
Yuko Okamura-Oho,
Okio Hino,
Masaaki Abe,
Takumi Kishimoto,
Hisahiko Sekihara,
Yukio Nakamura,
Harukazu Suzuki, Alistair R R Forrest,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Mesothelioma is a highly malignant tumor that is primarily caused by occupational or environmental exposure to asbestos fibers. Despite worldwide restrictions on asbestos usage, further cases are expected as diagnosis is typically 20-40 years after exposure. Once diagnosed there is a very poor prognosis with a median survival rate of 9 months. Considering this the development of early pre clinical diagnostic markers may help improve clinical outcomes.
Microarray expression arrays on mesothelium and other tissues dissected from mice were used to identify candidate mesothelial lineage markers. Candidates were further tested by qRTPCR and in-situ hybridization across a mouse tissue panel. Two candidate biomarkers with the potential for secretion, uroplakin 3B (UPK3B), and leucine rich repeat neuronal 4 (LRRN4) and one commercialized mesothelioma marker, mesothelin (MSLN) were then chosen for validation across a panel of normal human primary cells, 16 established mesothelioma cell lines, 10 lung cancer lines, and a further set of 8 unrelated cancer cell lines.
Within the primary cell panel, LRRN4 was only detected in primary mesothelial cells, but MSLN and UPK3B were also detected in other cell types. MSLN was detected in bronchial epithelial cells and alveolar epithelial cells and UPK3B was detected in retinal pigment epithelial cells and urothelial cells. Testing the cell line panel, MSLN was detected in 15 of the 16 mesothelioma cells lines, whereas LRRN4 was only detected in 8 and UPK3B in 6. Interestingly MSLN levels appear to be upregulated in the mesothelioma lines compared to the primary mesothelial cells, while LRRN4 and UPK3B, are either lost or down-regulated. Despite the higher fraction of mesothelioma lines positive for MSLN, it was also detected at high levels in 2 lung cancer lines and 3 other unrelated cancer lines derived from papillotubular adenocarcinoma, signet ring carcinoma and transitional cell carcinoma.
PLoS ONE 01/2011; 6(10):e25391. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Perturbation and time-course data sets, in combination with computational approaches, can be used to infer transcriptional regulatory networks which ultimately govern the developmental pathways and responses of cells. Here, we individually knocked down the four transcription factors PU.1, IRF8, MYB and SP1 in the human monocyte leukemia THP-1 cell line and profiled the genome-wide transcriptional response of individual transcription starting sites using deep sequencing based Cap Analysis of Gene Expression. From the proximal promoter regions of the responding transcription starting sites, we derived de novo binding-site motifs, characterized their biological function and constructed a network. We found a previously described composite motif for PU.1 and IRF8 that explains the overlapping set of transcriptional responses upon knockdown of either factor.
Nucleic Acids Research 12/2010; 38(22):8141-8. · 8.03 Impact Factor
-
Hideya Kawaji,
Jessica Severin,
Marina Lizio, Alistair R R Forrest,
Erik van Nimwegen,
Michael Rehli,
Kate Schroder,
Katharine Irvine,
Harukazu Suzuki,
Piero Carninci,
Yoshihide Hayashizaki,
Carsten O Daub
[show abstract]
[hide abstract]
ABSTRACT: The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5'-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP-chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
Nucleic Acids Research 11/2010; 39(Database issue):D856-60. · 8.03 Impact Factor
-
Atsutaka Kubosaki,
Gabriella Lindgren,
Michihira Tagami,
Christophe Simon,
Yasuhiro Tomaru,
Hisashi Miura,
Takahiro Suzuki,
Erik Arner, Alistair R R Forrest,
Katharine M Irvine,
Kate Schroder,
Yuki Hasegawa,
Mutsumi Kanamori-Katayama,
Michael Rehli,
David A Hume,
Jun Kawai,
Masanori Suzuki,
Harukazu Suzuki,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Gene regulatory networks in living cells are controlled by the interaction of multiple cell type-specific transcription regulators with DNA binding sites in target genes. Interferon regulatory factor 8 (IRF8), also known as interferon consensus sequence binding protein (ICSBP), is a transcription factor expressed predominantly in myeloid and lymphoid cell lineages. To find the functional direct target genes of IRF8, the gene expression profiles of siRNA knockdown samples and genome-wide binding locations by ChIP-chip were analyzed in THP-1 myelomonocytic leukemia cells. Consequently, 84 genes were identified as functional direct targets. The ETS family transcription factor PU.1, also known as SPI1, binds to IRF8 and regulates basal transcription in macrophages. Using the same approach, we identified 53 direct target genes of PU.1; these overlapped with 19 IRF8 targets. These 19 genes included key molecules of IFN signaling such as OAS1 and IRF9, but excluded other IFN-related genes amongst the IRF8 functional direct target genes. We suggest that IRF8 and PU.1 can have both combined, and independent actions on different promoters in myeloid cells.
Molecular Immunology 08/2010; 47(14):2295-302. · 2.90 Impact Factor
-
Timothy Ravasi,
Harukazu Suzuki,
Carlo Vittorio Cannistraci,
Shintaro Katayama,
Vladimir B Bajic,
Kai Tan,
Altuna Akalin,
Sebastian Schmeier,
Mutsumi Kanamori-Katayama,
Nicolas Bertin, [......],
Michihira Tagami,
Shiro Fukuda,
Kengo Imamura,
Chikatoshi Kai,
Ryoko Ishihara,
Yayoi Kitazume,
Jun Kawai,
David A Hume,
Trey Ideker,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.
Cell 03/2010; 140(5):744-52. · 32.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Non-coding RNA (ncRNA) transcripts are RNA molecules that do not code for proteins, but elicit function by other mechanisms. The vast majority of RNA produced in a cell is non-coding ribosomal RNA, produced from relatively few loci, however more recently complementary DNA (cDNA) cloning, tag sequencing, and genome tiling array studies suggest that ncRNAs also account for the majority of RNA species produced by a cell. ncRNA based regulation has been referred to as a 'hidden layer' of signals or 'dark matter' that control gene expression in cellular processes by poorly described mechanisms. These terms have appeared as ncRNAs until recently have been ignored by expression profiling and cDNA annotation projects and their mode of action is diverse (e.g. influencing chromatin structure and epigenetics, translational silencing, transcriptional silencing). Here, we highlight recent functional genomics strategies toward identifying and assigning function to ncRNA transcription.
Briefings in Functional Genomics and Proteomics 11/2009; 8(6):437-43.
-
Ryan J Taft,
Evgeny A Glazov,
Nicole Cloonan,
Cas Simons,
Stuart Stephen,
Geoffrey J Faulkner,
Timo Lassmann, Alistair R R Forrest,
Sean M Grimmond,
Kate Schroder, [......],
Hiromi Nishiyori,
Shiro Fukuda,
Jun Kawai,
Carsten O Daub,
David A Hume,
Harukazu Suzuki,
Valerio Orlando,
Piero Carninci,
Yoshihide Hayashizaki,
John S Mattick
Nature Genetics 08/2009; 41(7):859. · 35.53 Impact Factor
-
Geoffrey J Faulkner,
Yasumasa Kimura,
Carsten O Daub,
Shivangi Wani,
Charles Plessy,
Katharine M Irvine,
Kate Schroder,
Nicole Cloonan,
Anita L Steptoe,
Timo Lassmann, [......],
Takahiro Arakawa,
Hazuki Takahashi,
Jun Kawai, Alistair R R Forrest,
Harukazu Suzuki,
Yoshihide Hayashizaki,
David A Hume,
Valerio Orlando,
Sean M Grimmond,
Piero Carninci
[show abstract]
[hide abstract]
ABSTRACT: Although repetitive elements pervade mammalian genomes, their overall contribution to transcriptional activity is poorly defined. Here, as part of the FANTOM4 project, we report that 6-30% of cap-selected mouse and human RNA transcripts initiate within repetitive elements. Analysis of approximately 250,000 retrotransposon-derived transcription start sites shows that the associated transcripts are generally tissue specific, coincide with gene-dense regions and form pronounced clusters when aligned to full-length retrotransposon sequences. Retrotransposons located immediately 5' of protein-coding loci frequently function as alternative promoters and/or express noncoding RNAs. More than a quarter of RefSeqs possess a retrotransposon in their 3' UTR, with strong evidence for the reduced expression of these transcripts relative to retrotransposon-free transcripts. Finally, a genome-wide screen identifies 23,000 candidate regulatory regions derived from retrotransposons, in addition to more than 2,000 examples of bidirectional transcription. We conclude that retrotransposon transcription has a key influence upon the transcriptional output of the mammalian genome.
Nature Genetics 05/2009; 41(5):563-71. · 35.53 Impact Factor
-
Ryan J Taft,
Evgeny A Glazov,
Nicole Cloonan,
Cas Simons,
Stuart Stephen,
Geoffrey J Faulkner,
Timo Lassmann, Alistair R R Forrest,
Sean M Grimmond,
Kate Schroder, [......],
Hiromi Nishiyori,
Shiro Fukuda,
Jun Kawai,
Carsten O Daub,
David A Hume,
Harukazu Suzuki,
Valerio Orlando,
Piero Carninci,
Yoshihide Hayashizaki,
John S Mattick
[show abstract]
[hide abstract]
ABSTRACT: It has been reported that relatively short RNAs of heterogeneous sizes are derived from sequences near the promoters of eukaryotic genes. In conjunction with the FANTOM4 project, we have identified tiny RNAs with a modal length of 18 nt that map within -60 to +120 nt of transcription start sites (TSSs) in human, chicken and Drosophila. These transcription initiation RNAs (tiRNAs) are derived from sequences on the same strand as the TSS and are preferentially associated with G+C-rich promoters. The 5' ends of tiRNAs show peak density 10-30 nt downstream of TSSs, indicating that they are processed. tiRNAs are generally, although not exclusively, associated with highly expressed transcripts and sites of RNA polymerase II binding. We suggest that tiRNAs may be a general feature of transcription in metazoa and possibly all eukaryotes.
Nature Genetics 05/2009; 41(5):572-8. · 35.53 Impact Factor
-
Harukazu Suzuki, Alistair R R Forrest,
Erik van Nimwegen,
Carsten O Daub,
Piotr J Balwierz,
Katharine M Irvine,
Timo Lassmann,
Timothy Ravasi,
Yuki Hasegawa,
Michiel J L de Hoon, [......],
Noriko Ninomiya,
Hiromi Nishiyori,
Shohei Noma,
Chihiro Ogawa,
Takuma Sano,
Christophe Simon,
Michihira Tagami,
Yukari Takahashi,
Jun Kawai,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process.
Nature Genetics 05/2009; 41(5):553-62. · 35.53 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: MicroRNAs (miRNAs) are short single stranded noncoding RNAs that suppress gene expression through either translational repression or degradation of target mRNAs. The annealing between messenger RNAs and 5' seed region of miRNAs is believed to be essential for the specific suppression of target gene expression. One miRNA can have several hundred different targets in a cell. Rapidly accumulating evidence suggests that many miRNAs are involved in cell cycle regulation and consequentially play critical roles in carcinogenesis.
Introduction of synthetic miR-107 or miR-185 suppressed growth of the human non-small cell lung cancer cell lines. Flow cytometry analysis revealed these miRNAs induce a G1 cell cycle arrest in H1299 cells and the suppression of cell cycle progression is stronger than that by Let-7 miRNA. By the gene expression analyses with oligonucleotide microarrays, we find hundreds of genes are affected by transfection of these miRNAs. Using miRNA-target prediction analyses and the array data, we listed up a set of likely targets of miR-107 and miR-185 for G1 cell cycle arrest and validate a subset of them using real-time RT-PCR and immunoblotting for CDK6.
We identified new cell cycle regulating miRNAs, miR-107 and miR-185, localized in frequently altered chromosomal regions in human lung cancers. Especially for miR-107, a large number of down-regulated genes are annotated with the gene ontology term 'cell cycle'. Our results suggest that these miRNAs may contribute to regulate cell cycle in human malignant tumors.
PLoS ONE 02/2009; 4(8):e6677. · 4.09 Impact Factor
-
Nicole Cloonan, Alistair R R Forrest,
Gabriel Kolle,
Brooke B A Gardiner,
Geoffrey J Faulkner,
Mellissa K Brown,
Darrin F Taylor,
Anita L Steptoe,
Shivangi Wani,
Graeme Bethel,
Alan J Robertson,
Andrew C Perkins,
Stephen J Bruce,
Clarence C Lee,
Swati S Ranade,
Heather E Peckham,
Jonathan M Manning,
Kevin J McKernan,
Sean M Grimmond
[show abstract]
[hide abstract]
ABSTRACT: We developed a massive-scale RNA sequencing protocol, short quantitative random RNA libraries or SQRL, to survey the complexity, dynamics and sequence content of transcriptomes in a near-complete fashion. This method generates directional, random-primed, linear cDNA libraries that are optimized for next-generation short-tag sequencing. We surveyed the poly(A)(+) transcriptomes of undifferentiated mouse embryonic stem cells (ESCs) and embryoid bodies (EBs) at an unprecedented depth (10 Gb), using the Applied Biosystems SOLiD technology. These libraries capture the genomic landscape of expression, state-specific expression, single-nucleotide polymorphisms (SNPs), the transcriptional activity of repeat elements, and both known and new alternative splicing events. We investigated the impact of transcriptional complexity on current models of key signaling pathways controlling ESC pluripotency and differentiation, highlighting how SQRL can be used to characterize transcriptome content and dynamics in a quantitative and reproducible manner, and suggesting that our understanding of transcriptional complexity is far from complete.
Nature Methods 08/2008; 5(7):613-9. · 19.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cap analysis gene expression (CAGE) is a high-throughput, tag-based method designed to survey the 5' end of capped full-length cDNAs. CAGE has previously been used to define global transcription start site usage and monitor gene activity in mammals. A drawback of the CAGE approach thus far has been the removal of as many as 40% of CAGE sequence tags due to their mapping to multiple genomic locations. Here, we address the origins of multimap tags and present a novel strategy to assign CAGE tags to their most likely source promoter region. When this approach was applied to the FANTOM3 CAGE libraries, the percentage of protein-coding mouse transcriptional frameworks detected by CAGE improved from 42.9 to 57.8% (an increase of 5516 frameworks) with no reduction in CAGE to microarray correlation. These results suggest that the multimap tags produced by high-throughput, short sequence tag-based approaches can be rescued to augment greatly the transcriptome coverage provided by single-map tags alone.
Genomics 04/2008; 91(3):281-8. · 3.02 Impact Factor
-
Piero Carninci,
Albin Sandelin,
Boris Lenhard,
Shintaro Katayama,
Kazuro Shimokawa,
Jasmina Ponjavic,
Colin A M Semple,
Martin S Taylor,
P|[auml]|r G Engstr|[ouml]|m G,
Martin C Frith, [......],
Sean M Grimmond,
Christine A Wells,
Valerio Orlando,
Claes Wahlestedt,
Edison T Liu,
Matthias Harbers,
Jun Kawai,
Vladimir B Bajic,
David A Hume,
Yoshihide Hayashizaki
Nature Genetics 08/2007; 39(9):1174-1174. · 35.53 Impact Factor
-
Piero Carninci,
Albin Sandelin,
Boris Lenhard,
Shintaro Katayama,
Kazuro Shimokawa,
Jasmina Ponjavic,
Colin A M Semple,
Martin S Taylor,
Pär G Engström,
Martin C Frith, [......],
Sean M Grimmond,
Christine A Wells,
Valerio Orlando,
Claes Wahlestedt,
Edison T Liu,
Matthias Harbers,
Jun Kawai,
Vladimir B Bajic,
David A Hume,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
Nature Genetics 07/2006; 38(6):626-35. · 35.53 Impact Factor
-
Rajith N Aturaliya,
J Lynn Fink,
Melissa J Davis,
Melvena S Teasdale,
Kelly A Hanson,
Kevin C Miranda, Alistair R R Forrest,
Sean M Grimmond,
Harukazu Suzuki,
Mutsumi Kanamori,
Chikatoshi Kai,
Jun Kawai,
Piero Carninci,
Yoshihide Hayashizaki,
Rohan D Teasdale
[show abstract]
[hide abstract]
ABSTRACT: Application of a computational membrane organization prediction pipeline, MemO, identified putative type II membrane proteins as proteins predicted to encode a single alpha-helical transmembrane domain (TMD) and no signal peptides. MemO was applied to RIKEN's mouse isoform protein set to identify 1436 non-overlapping genomic regions or transcriptional units (TUs), which encode exclusively type II membrane proteins. Proteins with overlapping predicted InterPro and TMDs were reviewed to discard false positive predictions resulting in a dataset comprised of 1831 transcripts in 1408 TUs. This dataset was used to develop a systematic protocol to document subcellular localization of type II membrane proteins. This approach combines mining of published literature to identify subcellular localization data and a high-throughput, polymerase chain reaction (PCR)-based approach to experimentally characterize subcellular localization. These approaches have provided localization data for 244 and 169 proteins. Type II membrane proteins are localized to all major organelle compartments; however, some biases were observed towards the early secretory pathway and punctate structures. Collectively, this study reports the subcellular localization of 26% of the defined dataset. All reported localization data are presented in the LOCATE database (http://www.locate.imb.uq.edu.au).
Traffic 06/2006; 7(5):613-25. · 4.92 Impact Factor
-
Norihiro Maeda,
Takeya Kasukawa,
Rieko Oyama,
Julian Gough,
Martin Frith,
Pär G Engström,
Boris Lenhard,
Rajith N Aturaliya,
Serge Batalov,
Kirk W Beisel, [......],
Koji Sugiura,
Yoichi Takenaka,
Rohan D Teasdale,
Christine A Wells,
Yunxia Zhu,
Chikatoshi Kai,
Jun Kawai,
David A Hume,
Piero Carninci,
Yoshihide Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
PLoS Genetics 05/2006; 2(4):e62. · 8.69 Impact Factor