Roderic Guigo

Roderic Guigo
Centre for Genomic Regulation | CRG · Bioinformatics and Genomics

About

738
Publications
192,422
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
153,461
Citations
Introduction

Publications

Publications (738)
Article
Full-text available
Accumulating evidence suggests that genetic and epigenetic biomarkers hold potential for enhancing the early detection and monitoring of breast cancer (BC). Epigenetic alterations of the Homeobox A2 ( HOXA2 ) gene have recently garnered significant attention in the clinical management of various malignancies. However, the precise role of HOXA2 in b...
Article
Full-text available
We present a genome assembly from an individual female Lycaena helle (the Violet Copper; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 547.31 megabases in span. The entirety of the genome sequence was assembled into 25 contiguous chromosomal pseudomolecules with no gaps, including the Z and W sex chromosomes. The mitochondri...
Article
Full-text available
GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcript...
Preprint
Full-text available
Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, The ENCODE consortium mapped biochemical signals across many cell types and tissues and integrated these data to develop a Registry of 0.9 million human and 300 thousand mouse candidate cis-Regulatory Elements (cCREs) annotate...
Preprint
In the era of rapidly expanding human genomics in research and healthcare, efficient data reuse is essential to maximise benefits for society. In response, Federated EGA was launched in 2022, and as of 2024, the FEGA Network is composed of seven national nodes. Here we describe the complexities, challenges, and achievements of FEGA, unravelling the...
Article
Full-text available
Striving to build an exhaustive guidebook of the types and properties of human cells, the Human Cell Atlas’ (HCA) success relies on the sampling of diverse populations, developmental stages, and tissue types. Its open science philosophy preconizes the rapid, seamless sharing of data – as openly as possible. In light of the scope and ambition of suc...
Article
Full-text available
The Human Cell Atlas (HCA) is a global partnership "to create comprehensive reference maps of all human cells-the fundamental units of life - as a basis for both understanding human health and diagnosing, monitoring, and treating disease." ( https://www.humancellatlas.org/ ) The atlas shall characterize cells from diverse individuals across the glo...
Preprint
Full-text available
Accurate and complete gene annotations are indispensable for understanding how genome sequences encode biological functions. For twenty years, the GENCODE consortium has developed reference annotations for the human and mouse genomes, becoming a foundation for biomedical and genomics communities worldwide. Nevertheless, collections of important yet...
Article
The discovery of functional long non-coding RNAs (lncRNAs) changed their initial concept as transcriptional noise. LncRNAs have been identified as regulators of multiple biological processes, including chromatin structure, gene expression, splicing, mRNA degradation, and translation. However, functional studies of lncRNAs are hindered by the usual...
Article
Full-text available
The Catalan Initiative for the Earth BioGenome Project (CBP) is an EBP-affiliated project network aimed at sequencing the genome of the >40 000 eukaryotic species estimated to live in the Catalan-speaking territories (Catalan Linguistic Area, CLA). These territories represent a biodiversity hotspot. While covering less than 1% of Europe, they are h...
Article
Full-text available
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparati...
Article
Full-text available
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse...
Preprint
Full-text available
Exogenous spike-ins like SIRV and ERCC mixes lack the typical 7-Methylguanosine (m7G) cap structure found in natural eukaryotic RNA transcripts, rendering them incompatible with certain library preparation protocols. This method details the addition of 5’ m7G caps to these spike-ins, enabling their use as external references in cDNA library product...
Preprint
Full-text available
Long-read RNA sequencing is crucial for generating precise and comprehensive annotations of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library pr...
Preprint
Full-text available
The discovery of functional long non-coding RNAs (lncRNAs) changed their initial concept as transcriptional noise. LncRNAs have been found to participate in the regulation of multiple biological processes, including chromatin structure, gene expression, splicing, and mRNA degradation and translation. However, functional studies of lncRNAs are hinde...
Preprint
Full-text available
The precise coordination of important biological processes, such as differentiation and development, is highly dependent on the regulation of expression of the genetic information. The flow of the genetic information is tightly regulated on multiple levels. Among them, RNA export to cytosol is an essential step for the production of proteins in euk...
Article
Full-text available
Background Long non-coding RNAs (lncRNAs) are pivotal players in cellular processes, and their unique cell-type specific expression patterns render them attractive biomarkers and therapeutic targets. Yet, the functional roles of most lncRNAs remain enigmatic. To address the need to identify new druggable lncRNAs, we developed a comprehensive approa...
Article
Background: Microglial dysfunction plays a causative role in Alzheimer's disease (AD) pathogenesis. Here we focus on a germline insertion/deletion variant mapping SIRPβ1, a surface receptor that triggers amyloid-β(Aβ) phagocytosis via TYROBP. Objective: To analyze the impact of this copy-number variant in SIRPβ1 expression and how it affects AD...
Preprint
Full-text available
Biodiversity genomics projects are underway with the aim of sequencing the genomes of all eukaryotic species on Earth. Here we describe the BioGenome Portal, a web-based application to facilitate organization and access to the data produced by biodiversity genomics projects. The portal integrates user-generated data with data deposited in public re...
Article
Social insect reproductives and non‐reproductives represent ideal models with which to understand the expression and regulation of alternative phenotypes. Most research in this area has focused on the developmental regulation of reproductive phenotypes in obligately social taxa such as honey bees, while relatively few studies have addressed the mol...
Preprint
Full-text available
Background Long non-coding RNAs (lncRNAs) are pivotal players in cellular processes, and their unique cell-type specific expression patterns make them attractive biomarkers and therapeutic targets. Yet, the functional roles of most lncRNAs remain enigmatic. To address the need to identify new druggable lncRNAs, we developed a comprehensive approach...
Preprint
Full-text available
Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin s...
Article
Full-text available
The increasing availability of multidimensional phenotypic data in large cohorts of genotyped individuals requires efficient methods to identify genetic effects on multiple traits. Permutational multivariate analysis of variance (PERMANOVA) offers a powerful non-parametric approach. However, it relies on permutations to assess significance, which h...
Article
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a smal...
Preprint
Full-text available
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and seq...
Preprint
Full-text available
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library prepara...
Preprint
Full-text available
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads norma...
Article
Full-text available
Hornets are the largest of the social wasps, and are important regulators of insect populations in their native ranges. Hornets are also very successful as invasive species, with often devastating economic, ecological and societal effects. Understanding why these wasps are such successful invaders is critical to managing future introductions and mi...
Article
Full-text available
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read p...
Preprint
Full-text available
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has e...
Article
Full-text available
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has e...
Article
Full-text available
Aquaporin-mediated oocyte hydration is considered important for the evolution of pelagic eggs and the radiative success of marine teleosts. However, the molecular regulatory mechanisms controlling this vital process are not fully understood. Here, we analyzed >400 piscine genomes to uncover a previously unknown teleost-specific aquaporin-1 cluster...
Article
Full-text available
Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health (GA4GH) project we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expr...
Article
Full-text available
Circadian and circannual cycles trigger physiological changes whose reflection on human transcriptomes remains largely uncharted. We used the time and season of death of 932 individuals from GTEx to jointly investigate transcriptomic changes associated with those cycles across multiple tissues. Overall, most variation across tissues during day-nigh...
Article
Full-text available
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with...
Article
Recent genetic association studies suggested the polygenic contribution to white matter hyperintensities (WMH). The aim of this study was to provide a genetic characterization of WMH in middle‐aged cognitively unimpaired individuals at higher risk of Alzheimer’s disease (AD) by investigating whether the genetic predisposition to specific complex di...
Article
Gene‐environment interactions are important in understanding Alzheimer’s disease (AD) etiology. Current research is limited, possibly due to weak effects of individual genetic variants. We analysed interaction between genetics of hippocampal volume, environmental exposures and levels of AD biomarkers in cognitively unimpaired individuals at increas...
Article
Telomere length (TL) is a well‐known hallmark of biological aging, being telomere shortening associated with overall mortality and increased rates of age‐related diseases, such as Alzheimer’s disease (AD). However, observational studies are limited to conclude whether TL is causally associated with those outcomes or with related underlying patholog...
Preprint
Full-text available
Social insect queens and workers represent ideal models with which to understand the expression and regulation of alternative reproductive phenotypes. Most research in this area has focused on the molecular regulation of reproductive castes in obligately social taxa with complex social systems, while relatively few studies have addressed the molecu...
Article
Full-text available
Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue...
Preprint
Microglia play an important role in the maintenance of brain homeostasis, and microglial dysfunction plays a causative role in Alzheimer disease pathogenesis. Here we focus on the signal regulatory protein SIRPβ1, a surface receptor expressed on the myeloid cells that triggers amyloidβ and cell debris phagocytosis via TYROBP. We found that a common...
Article
Full-text available
Telomere length (TL) is a biomarker of biological aging. Shorter telomeres have been associated with mortality and increased rates of age-related diseases. However, observational studies are unable to conclude whether TL is causally associated with those outcomes. Mendelian randomization (MR) was developed for assessing causality using genetic vari...
Article
Full-text available
Neurodegenerative and neuropsychiatric disorders (ND-NPs) are multifactorial, polygenic and complex behavioral phenotypes caused by brain abnormalities. Large-scale collaborative efforts have tried to identify the genetic architecture of these conditions. However, the specific and shared underlying molecular pathobiology of brain illnesses is not c...
Preprint
The increasing availability of multidimensional phenotypic data in large cohorts of genotyped individuals requires efficient methods to identify genetic effects on multiple traits. Permutational multivariate analysis of variance (PERMANOVA) offers a powerful non-parametric approach. However, it relies on permutations to assess significance, which h...
Article
Full-text available
Over the last decade, the increasing interest in long non-coding RNAs (lncRNAs) has led to the discovery of these transcripts in multiple organisms. LncRNAs tend to be specifically, and often lowly, expressed in certain tissues, cell types and biological contexts. Although lncRNAs participate in the regulation of a wide variety of biological proces...
Article
Full-text available
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify protein-coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we first de...
Preprint
Full-text available
Background During development, most cells undergo striking changes in order to develop into functional tissues. All along this process, the identity of each tissue arises from the particular combination of regulatory transcription factors that specifically control the expression of relevant genes for growth, pattern formation and differentiation. I...
Article
Background The hippocampus is involved in several complex diseases which display sex differences in their prevalence. Therefore, it can be hypothesized that sex‐specific genetic vulnerability in hippocampal formation might partially explain these differences. The aim of this study was to investigate whether the genetic predisposition to several com...
Article
Background Perivascular spaces (PVS) have an important role in the elimination of metabolic waste from the brain. However, the relationship between enlarged PVS (ePVS) and risk factors or pathophysiological mechanisms involved in Alzheimer’s disease (AD) remains unknown. The aim of this study was to investigate the association between the ePVS, dem...
Article
Full-text available
The European Genome-phenome Archive (EGA - https://ega-archive.org/) is a resource for long term secure archiving of all types of potentially identifiable genetic, phenotypic, and clinical data resulting from biomedical research projects. Its mission is to foster hosted data reuse, enable reproducibility, and accelerate biomedical and translational...
Preprint
Full-text available
Many developmental and differentiation processes take substantially longer in human than in mouse. To investigate the molecular mechanisms underlying this phenomenon, here we have specifically focused on the transdifferentiation from B cells to macrophages. The process is triggered by exactly the same molecular mechanism -- the induction by the tra...
Article
Full-text available
Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatil...
Preprint
Full-text available
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify protein coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we first de...
Article
Full-text available
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of...
Preprint
Full-text available
Neurodegenerative and neuropsychiatric disorders (ND-NPs) are multifactorial, polygenic and complex behavioral phenotypes caused by brain abnormalities. Large-scale collaborative efforts have tried to identify the genetic architecture of these conditions. However, specific and shared underlying molecular pathobiology of brain illnesses is not clear...
Article
Full-text available
Background Perivascular spaces (PVS) have an important role in the elimination of metabolic waste from the brain. It has been hypothesized that the enlargement of PVS (ePVS) could be affected by pathophysiological mechanisms involved in Alzheimer’s disease (AD), such as abnormal levels of CSF biomarkers. However, the relationship between ePVS and t...
Preprint
Full-text available
With increased usage of long-read sequencing technologies to perform transcriptome analyses, there becomes a greater need to evaluate different methodologies including library preparation, sequencing platform, and computational analysis tools. Here, we report the study design of a community effort called the Long-read RNA-Seq Genome Annotation Asse...
Article
Full-text available
Tissue function and homeostasis reflect the gene expression signature by which the combination of ubiquitous and tissue-specific genes contribute to the tissue maintenance and stimuli-responsive function. Enhancers are central to control this tissue-specific gene expression pattern. Here, we explore the correlation between the genomic location of e...
Article
Full-text available
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassig...
Article
Full-text available
There is clear evidence that hippocampal subfield volumes have partly distinct genetic determinants associated with specific biological processes. The identification of genetic correlates of hippocampal subfield volumes may help to elucidate the mechanisms of neurologic diseases, as well as aging and neurodegenerative processes. However, despite th...
Article
Full-text available
This study investigated whether genetic factors involved in Alzheimer’s disease (AD) are associated with enlargement of Perivascular Spaces (ePVS) in the brain. A total of 680 participants with T2-weighted MRI scans and genetic information were acquired from the ALFA study. ePVS in the basal ganglia (BG) and the centrum semiovale (CS) were assessed...
Article
Full-text available
In contrast to the western honey bee, Apis mellifera, other honey bee species have been largely neglected despite their importance and diversity. The genetic basis of the evolutionary diversification of honey bees remains largely unknown. Here, we provide a genome-wide comparison of three honey bee species each representing one of the three subgene...
Article
Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements an...
Preprint
Full-text available
CRISPR-Cas9 screening libraries have arisen as a powerful tool to identify both protein coding (pc) and non-coding genes playing a role along different processes. In particular, the usage of a nuclease active Cas9 coupled to a single gRNA has proven to efficiently impair the expression of pc-genes by generating deleterious frameshifts. Here, we fir...
Preprint
Full-text available
Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements an...
Article
Full-text available
The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most c...
Article
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v...