About
158
Publications
31,633
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
31,818
Citations
Publications
Publications (158)
GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcript...
Accurate and complete gene annotations are indispensable for understanding how genome sequences encode biological functions. For twenty years, the GENCODE consortium has developed reference annotations for the human and mouse genomes, becoming a foundation for biomedical and genomics communities worldwide. Nevertheless, collections of important yet...
RNA therapeutics (RNATx) aim to treat diseases, including cancer, by targeting or employing RNA molecules for therapeutic purposes. Amongst the most promising targets are long non-coding RNAs (lncRNAs), which regulate oncogenic molecular networks in a cell type-restricted manner. lncRNAs are distinct from protein-coding genes in important ways that...
A key attribute of some long noncoding RNAs (lncRNAs) is their ability to regulate expression of neighbouring genes in cis. However, such ‘cis-lncRNAs’ are presently defined using ad hoc criteria that, we show, are prone to false-positive predictions. The resulting lack of cis-lncRNA catalogues hinders our understanding of their extent, characteris...
The human mitochondrial genome (mtDNA) is a circular DNA molecule with a length of 16.6 kb, which contains a total of 37 genes. Somatic mtDNA mutations accumulate with age and environmental exposure, and some types of mtDNA variants may play a role in carcinogenesis. Recent studies observed mtDNA variants not only in kidney tumors but also in adjac...
Background:
Cardiac fibroblasts have crucial roles in the heart. In particular, fibroblasts differentiate into myofibroblasts in the damaged myocardium, contributing to scar formation and interstitial fibrosis. Fibrosis is associated with heart dysfunction and failure. Myofibroblasts therefore represent attractive therapeutic targets. However, the...
Long noncoding RNAs (lncRNAs) are linked to cancer via pathogenic changes in their expression levels. Yet, it remains unclear whether lncRNAs can also impact tumour cell fitness via function-altering somatic “driver” mutations. To search for such driver-lncRNAs, we here perform a genome-wide analysis of fitness-altering single nucleotide variants (...
Background:
Birt-Hogg-Dubé (BHD) syndrome, caused by germline alteration of folliculin (FLCN) gene, develops hybrid oncocytic/chromophobe tumour (HOCT) and chromophobe renal cell carcinoma (ChRCC), whereas sporadic ChRCC does not harbor FLCN alteration. To date, molecular characteristics of these similar histological types of tumours have been inc...
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has e...
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has e...
Von Hippel-Lindau (VHL) disease is an autosomal dominant, inherited syndrome with variants in the VHL gene, causing predisposition to multi-organ neoplasms with vessel abnormality. Germline variants in VHL can be detected in 80-90% of patients clinically diagnosed with VHL disease. Here, we summarize the results of genetic tests for 206 Japanese VH...
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with...
Aims:
The major cardiac cell types composing the adult heart arise from common multipotent precursor cells. Cardiac lineage decisions are guided by extrinsic and cell-autonomous factors, including recently discovered long noncoding RNAs (lncRNAs). The human lncRNA CARMEN, which is known to dictate specification towards the cardiomyocyte (CM) and t...
Evolutionary conservation is a measure of gene functionality that is widely used to prioritise long noncoding RNAs (lncRNA) in cancer research. Intriguingly, while updating our Cancer LncRNA Census (CLC), we observed an inverse relationship between year of discovery and evolutionary conservation. This observation is specific to cancer over other di...
Long noncoding RNAs (lncRNAs) can positively and negatively regulate expression of target genes encoded in cis. However, the extent, characteristics and mechanisms of such cis-regulatory lncRNAs (cis-lncRNAs) remain obscure. Until now, they have been defined using inconsistent, ad hoc criteria that can result in false-positive predictions. Here, we...
Evolutionary conservation is a measure of gene functionality that is widely used to prioritise long noncoding RNAs (lncRNA) in cancer research. Intriguingly, while updating our Cancer LncRNA Census, we observed an inverse relationship between year of discovery and evolutionary conservation. This observation is specific to cancer over other diseases...
Long noncoding RNAs (lncRNAs) are widely dysregulated in cancer, yet their functional roles in cancer hallmarks remain unclear. We employ pooled CRISPR deletion to perturb 831 lncRNAs detected in KRAS-mutant non-small cell lung cancer (NSCLC) and measure their contribution to proliferation, chemoresistance, and migration across two cell backgrounds...
Human and other genomes encode tens of thousands of long noncoding RNAs (lncRNAs), the vast majority of which remain uncharacterised. High-throughput functional screening methods, notably those based on pooled CRISPR-Cas perturbations, promise to unlock the biological significance and biomedical potential of lncRNAs. Such screens are based on libra...
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify protein-coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we first de...
Long noncoding RNAs (lncRNAs) can act as tumour suppressor or oncogenes to contrast/promote tumour cell proliferation via RNA-dependent mechanisms. Recently, genome sequencing has identified elevated densities of tumour somatic single nucleotide variants (SNVs) in lncRNA genes. However, this has been attributed to phenotypically-neutral “passenger”...
Tumour DNA contains thousands of single nucleotide variants (SNVs) in non-protein-coding regions, yet it remains unclear which are driver mutations that promote cell fitness. Amongst the most highly mutated non-coding elements are long noncoding RNAs (lncRNAs), which can promote cancer and may be targeted therapeutically. We here searched for evide...
Previous large-scale studies have uncovered many features that determine the processing of microRNA (miRNA) precursors; however, they have been conducted in vitro. Here, we introduce MapToCleave, a method to simultaneously profile processing of thousands of distinct RNA structures in living cells. We find that miRNA precursors with a stable lower b...
Many developmental and differentiation processes take substantially longer in human than in mouse. To investigate the molecular mechanisms underlying this phenomenon, here we have specifically focused on the transdifferentiation from B cells to macrophages. The process is triggered by exactly the same molecular mechanism -- the induction by the tra...
Long noncoding RNAs (lncRNAs) are widely dysregulated in cancer, yet their functional roles in cellular disease hallmarks remain unclear. Here we employ pooled CRISPR deletion to perturb all 831 lncRNAs in KRAS-mutant non-small cell lung cancer (NSCLC), and measure their contribution to proliferation, chemoresistance and migration across two cell b...
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify protein coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we first de...
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify both protein coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we fir...
Long non-coding RNAs (lncRNAs) play key roles in cancer and are at the vanguard of precision therapeutic development. These efforts depend on large and high-confidence collections of cancer lncRNAs. Here, we present the Cancer LncRNA Census 2 (CLC2). With 492 cancer lncRNAs, CLC2 is 4-fold greater in size than its predecessor, without compromising...
CRISPR-Cas9 deletion (CRISPR-del) is the leading approach for eliminating DNA from mammalian cells and underpins a variety of genome-editing applications. Target DNA, defined by a pair of double-strand breaks (DSBs), is removed during nonhomologous end-joining (NHEJ). However, the low efficiency of CRISPR-del results in laborious experiments and fa...
Metazoan genomes produce thousands of long-noncoding RNAs (lncRNAs), of which just a small fraction have been well characterized. Understanding their biological functions requires accurate annotations, or maps of the precise location and structure of genes and transcripts in the genome. Current lncRNA annotations are limited by compromises between...
We have monitored the transcriptomic and epigenomic status of cells at twelve time-points during the transdifferentiation of human pre-B cells into macrophages. Using this data, we have investigated some fundamental questions regarding the role of chromatin in gene expression. We have found that, over time, genes are characterized by a limited numb...
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to...
Long noncoding RNAs (lncRNAs) can promote or repress the cellular hallmarks of cancer. Understanding their molecular roles and realising their therapeutic potential depend on high-quality catalogues of cancer lncRNA genes. Presently, such catalogues depend on labour-intensive curation of heterogeneous data with permissive criteria, resulting in unk...
Regulatory non-protein coding RNAs perform a remarkable variety of complex biological functions. Previously, we demonstrated a role of the human non-coding vault RNA1-1 (vtRNA1-1) in inhibiting intrinsic and extrinsic apoptosis in several cancer cell lines. Yet on the molecular level, the function of the vtRNA1-1 is still not fully clear. Here, we...
CRISPR-Cas9 deletion (CRISPR-del) is the leading approach for eliminating DNA from mammalian cells and underpins a variety of genome-editing applications. Target DNA, defined by a pair of double strand breaks (DSBs), is removed during non-homologous end-joining (NHEJ). However, the low efficiency of CRISPR-del results in laborious experiments and f...
The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and func...
Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of...
The discovery of drivers of cancer has traditionally focused on protein-coding genes1,2,3,4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium⁵ of the International Cancer Genome Consortium (ICGC) and The Cancer Genom...
Motivation:
CRISPR-Cas9 loss-of-function pooled screening promises to identify which long noncoding RNAs (lncRNAs), amongst the many thousands to have been annotated so far, are capable of mediating cellular functions. The two principal loss-of-function perturbations, CRISPR-inhibition and CRISPR-deletion, employ one and two guide RNAs, respective...
We analyze the physical origin and the chemical and biological consequences of the asymmetry that occurs in DNA·RNA hybrids when the purine/pyrimidine (Pu/Py) ratio is different in the DNA and RNA strands. When the DNA strand of the hybrid is Py rich, the duplex is much more stable, rigid, and A-like than when the DNA strand is Pu rich. The origins...
The localization of long noncoding RNAs (lncRNAs) within the cell is the primary determinant of their molecular functions. LncRNAs are often thought of as chromatin-restricted regulators of gene transcription and chromatin structure. However, a rich population of cytoplasmic lncRNAs has come to light, with diverse roles including translational regu...
Long non-coding RNAs (lncRNAs) represent a huge reservoir of potential cancer targets. Such “onco-lncRNAs” have resisted traditional RNAi methods, but CRISPR-Cas9 genome editing now promises functional screens at high throughput and low cost. The unique biology of lncRNAs demands screening strategies distinct from protein-coding genes. The first su...
The sequence domains underlying long noncoding RNA (lncRNA) activities, including their characteristic nuclear enrichment, remain largely unknown. It has been proposed that these domains can originate from neofunctionalised fragments of transposable elements (TEs), otherwise known as RIDLs (Repeat Insertion Domains of Long Noncoding RNA), although...
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GE...
In a 2018 paper posted to bioRxiv, Pertea et al. presented the CHESS database, a new catalog of human gene annotations that includes 1,178 new protein-coding predictions. These are based on evidence of transcription in human tissues and homology to earlier annotations in human and other mammals. Here, we reanalyze the evidence used by CHESS, and fi...
Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappre...
Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of...
Discovery of cancer drivers has traditionally focused on the identification of protein-coding genes. Here we present a comprehensive analysis of putative cancer driver mutations in both protein-coding and non-coding genomic regions across >2,500 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We developed a st...
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete—many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRN...
Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate l...
Long non-coding RNAs (lncRNAs) that drive tumorigenesis are a growing focus of cancer genomics studies. To facilitate further discovery, we have created the “Cancer LncRNA Census” (CLC), a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer. CLC has two principle applications: first, as a resource for trainin...
The subcellular localisation of long noncoding RNAs (lncRNAs) holds valuable clues to their molecular function. However, measuring localisation of newly-discovered lncRNAs involves time-consuming and costly experimental methods. We have created "LncATLAS", a comprehensive resource of lncRNA localisation in human cells based on RNA-sequencing datase...
Background
The subcellular localisation of long noncoding RNAs (lncRNAs) holds valuable clues to their molecular function. However, measuring localisation of newly-discovered lncRNAs involves time-consuming and costly experimental methods.
Results
We have created “LncATLAS”, a comprehensive resource of lncRNA localisation in human cells based on R...
CRISPR-Cas9 technology can be used to engineer precise genomic deletions with pairs of single guide RNAs (sgRNAs). This approach has been widely adopted for diverse applications, from disease modelling of individual loci, to parallelized loss-of-function screens of thousands of regulatory elements. However, no solution has been presented for the un...
Design spreadsheet for creating DECKO2 oligonucleotides.
(XLSX)
Distances to closest protospacers boxplot.
For every filtered protospacer, the distance to the next nearest filtered protospacer is calculated. Boxplots shows the distribution of these distances. Thick bar indicates the median, and boxes indicate the interquartile range.
(PDF)
Extended DECKO2 cloning protocol.
(DOCX)
Estimation of QC-PCR primer efficiencies.
(PDF)
Filtered protospacer scores density plot.
Density distribution of filtered protospacers scores computed with RuleSet1 algorithm (“Doench Score”, [16]). Vertical lines indicate the median for each distribution.
(PDF)
Oligonucleotide sequences.
(DOCX)
Details of MALAT1 sgRNA pairs.
(XLSX)
Assessing the accuracy of the QC-PCR method.
We tested the accuracy of QC-PCR using gDNA templates containing known proportions of a target allele. In a previous study, we generated a mutant clone of the human, diploid cell line HCT-116 [4], where one copy of the TFRC gene promoter was deleted by DECKO. This was verified by careful genotyping. Thus...
Long noncoding RNAs (lncRNAs) represent a vast unexplored genetic space that may hold missing drivers of tumourigenesis, but few such “driver lncRNAs” are known. Until now, they have been discovered through changes in expression, leading to problems in distinguishing between causative roles and passenger effects. We here present a different approac...