About
68
Publications
23,847
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,187
Citations
Introduction
Additional affiliations
January 2019 - present
April 2014 - December 2018
May 2011 - March 2014
Publications
Publications (68)
Long non-coding RNAs (lncRNAs) are pivotal mediators of systemic immune response to viral infection, yet most studies concerning their expression and functions upon immune stimulation are limited to in vitro bulk cell populations. This strongly constrains our understanding of how lncRNA expression varies at single-cell resolution, and how their cel...
BACKGROUND
Cardiomyopathy (CMP) is a genetic disease of the heart muscle that causes heart failure and sudden cardiac death in children. Our goal was to define the missing genetic etiology of childhood onset CMP by identifying the role of functionally active genomic variants in regulatory elements of CMP genes.
METHODS AND RESULTS
A total of 225 u...
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v...
Ebola virus (EBOV) causes epidemics with high mortality yet remains understudied due to the challenge of experimentation in high-containment and outbreak settings. Here, we used single-cell transcriptomics and CyTOF-based single-cell protein quantification to characterize peripheral immune cells during EBOV infection in rhesus monkeys. We obtained...
Cardiomyopathy (CMP) is a heritable genetic disorder. Protein-coding variants account for 20-30% of cases. The contribution of variants in non-coding DNA elements that regulate gene expression has not been explored. We performed whole-genome sequencing (WGS) of 228 unrelated CMP families. Besides pathogenic protein-coding variants in known CMP gene...
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We compr...
Background:
Gene expression differences between species are driven by both cis and trans effects. Whereas cis effects are caused by genetic variants located on the same DNA molecule as the target gene, trans effects are due to genetic variants that affect diffusible elements. Previous studies have mostly assessed the impact of cis and trans effect...
Ebola virus (EBOV) causes epidemics with high case fatality rates, yet remains understudied due to the challenge of experimentation in high-containment and outbreak settings. To better understand EBOV infection in vivo , we used single-cell transcriptomics and CyTOF-based single-cell protein quantification to characterize peripheral immune cell act...
Gene expression differences between species are driven by both cis and trans effects. Whereas cis effects are caused by genetic variants in close proximity to the target gene, trans effects are due to distal genetic variants that affect diffusible elements such as transcription factors. Previous studies have mostly assessed the impact of cis and tr...
Transcription initiates at both coding and noncoding genomic elements, including mRNA and long noncoding RNA (lncRNA) core promoters and enhancer RNAs (eRNAs). However, each class has a different expression profile with lncRNAs and eRNAs being the most tissue specific. How these complex differences in expression profiles and tissue specificities ar...
Transcription initiates at both coding and non-coding genomic elements, including mRNA and long non-coding RNA (lncRNA) core promoters and enhancer RNAs (eRNAs). However, each class has different expression profiles with lncRNAs and eRNAs being the most tissue-specific. How these complex differences in expression profiles and tissue-specificities a...
Imaging and chromatin capture techniques have shed important insights into our understanding of nuclear organization. A limitation of these techniques is the inability to resolve allele-specific spatiotemporal properties of genomic loci in living cells. Here, we describe an allele-specific CRISPR live-cell DNA imaging technique (SNP-CLING) to provi...
To better understand the molecular and cellular differences in brain organization between human and nonhuman primates, we performed transcriptome sequencing of 16 regions of adult human, chimpanzee, and macaque brains. Integration with human single-cell transcriptomic data revealed global, regional, and cell-type–specific species expression differe...
Aboriginal Australians represent one of the oldest continuous cultures outside Africa, with evidence indicating that their ancestors arrived in the ancient landmass of Sahul (present-day New Guinea and Australia) ~55 thousand years ago. Genetic studies, though limited, have demonstrated both the uniqueness and antiquity of Aboriginal Australian gen...
While long intergenic noncoding RNAs (lincRNAs) and mRNAs share similar biogenesis pathways, these transcript classes differ in many regards. LincRNAs are less evolutionarily conserved, less abundant, and more tissue-specific, suggesting that their pre- and post-transcriptional regulation is different from that of mRNAs. Here, we perform an in-dept...
Aboriginal Australians are one of the more poorly studied populations from the standpoint of human evolution and genetic diversity. Thus, to investigate their genetic diversity, the possible date of their ancestors’ arrival and their relationships with neighboring populations, we analyzed mitochondrial DNA (mtDNA) diversity in a large sample of Abo...
While long intergenic noncoding RNAs (lincRNAs) and mRNAs share similar biogenesis pathways, these transcript classes differ in many regards. LincRNAs are less evolutionarily conserved, less abundant, and more tissue-specific, suggesting that their pre‐ and post-transcriptional regulation is different from that of mRNAs. Here, we perform an in-dept...
Background
Ebola virus is the causative agent of a severe syndrome in humans with a fatality rate that can approach 90 %. During infection, the host immune response is thought to become dysregulated, but the mechanisms through which this happens are not entirely understood. In this study, we analyze RNA sequencing data to determine the host respons...
There is growing evidence that transcription and nuclear organization are tightly linked. Yet, whether transcription of thousands of long noncoding RNAs (lncRNAs) could play a role in this packaging process remains elusive. Although some lncRNAs have been found to have clear roles in nuclear architecture (e.g., FIRRE, NEAT1, XIST, and others), the...
microRNAs are crucial post-transcriptional regulators of gene expression involved in a wide range of biological processes. Although microRNAs are highly conserved among species, the functional implications of existing lineage-specific changes and their role in determining differences between humans and other great apes have not been specifically ad...
Distribution of miRNAs in the PCA analysis.
(PDF)
Conservation among miRNA regions.
Top right in grey: p-values for the generalized linear model used to compare conservation among different miRNA regions. The model takes into account GC content: SNV density ~ GC content + miRNA region. Bottom left in white: p-values for the paired t-test comparing concatenated nucleotide diversity values of each m...
Top significant pathways and networks associated with genes exclusively deregulated by the human or the non-human miRNAs.
Data based on the Ingenuity Pathway Analysis software.
(PDF)
Primers used for miRNA cloning and qPCR expression analysis.
(PDF)
Main characteristics of the 235 miRNAs carrying human-specific nucleotide substitutions.
(XLS)
Target genes predicted by PITA for each miRNA.
(XLS)
Number of individuals in each great ape population used in this study.
Information based on Prado-Martinez et al. (2013).
(PDF)
Minimum free energy values (MFE).
MFE in kcal/mol of the secondary structures predicted for the four studied miRNAs according to RNAfold.
(PDF)
Candidate target genes exclusively deregulated by one of the two miRNA variants and predicted by PITA with high confidence for that variant.
(XLS)
Aging is one of the most important biological processes and is a known risk factor for many age-related diseases in human. Studying age-related transcriptomic changes in tissues across the whole body can provide valuable information for a holistic understanding of this fundamental process. In this work, we catalogue age-related gene expression chan...
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA
sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation
across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show st...
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show st...
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression...
Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million...
The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause,...
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitoc...
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationshi...
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationshi...
For decades, the peopling of the Americas has been explored through the analysis of uniparentally inherited genetic systems in Native American populations and the comparison of these genetic data with current linguistic groupings. In northern North America, two language families predominate: Eskimo-Aleut and Na-Dene. Although the genetic evidence f...
We have analyzed human genetic diversity in 33 Old World populations including 23 populations obtained through Genographic Project studies. A set of 1,536 SNPs in five X chromosome regions were genotyped in 1,288 individuals (mostly males). We use a novel analysis employing subARG network construction with recombining chromosomal segments. Here, a...
Given a set of extant haplotypes IRiS first detects high confidence recombination events in their shared genealogy. Next using the local sequence topology defined by each detected event, it integrates these recombinations into an ancestral recombination graph. While the current system has been calibrated for human population data, it is easily exte...
Y-chromosome Haplogroup O is the dominant lineage of East Asians, comprising more than a quarter of all males on the world; however, its internal phylogeny remains insufficiently investigated. In this study, we determined the phylogenetic position of recently defined markers (L127, KL1, KL2, P164, and PK4) in the background of Haplogroup O. In the...
The information left by recombination in our genomes can be used to make inferences on our recent evolutionary history. Specifically,
the number of past recombination events in a population sample is a function of its effective population size (Ne). We have applied a method, Identifying Recombination in Sequences (IRiS), to detect specific past rec...
Recombination rate estimates (4Ner/kb) corrected for effective population size for successive SNP-pairs for chromosome 22 and in each of 28 populations, grouped into geographical regions.
(PDF)
Mean of the recombination estimate (4Ner/kb) for all populations calculated for 10 categories of SNPs based on their minor allele frequency. MAFs are calculated using the global allele frequency of all populations together.
(TIF)
Mantel's r values between the genetic distance and recombination dissimilarity matrices. First row shows chromosome for which the genetic distance was calculated; first column show the chromosome for which the recombination matrix was calculated. All value were highly significant (p<0.00001).
(DOC)
Heatmap showing patterns of hotspots observed for 300 SNPs of chromosome 22 for the 28 populations, grouped according to their geographical region. The first 300 SNPs of chromosome 22, for which a hotspot is present in at least one population, are reported on the x axis. In color, for each population the value of the recombination estimate (4Ner/kb...
Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pai...
Mean values taken from the analysis of 100 simulations with different IRiS settings: grain sizes (5, 10, 15, 20 and 30), different thresholds, defined as number of detections to be considered as true divided by the grain size or the double of the grain size in the cases in which the algorithm is run in two directions. For each setting the algorithm...
Mean values taken from the analysis of 100 simulations with different IRiS settings that combine different grain sizes (indicated with different colors), different thresholds (defined as number of detections to be considered as true divided by the sum of the different grain size and multiplied by two since the algorithm is run in the two directions...
Plot showing the relationship between the false discovery rate and the number of COSI simulations under a scenario in which IRiS is given a different dataset than the one used to compare it with the COSI results.
(0.02 MB DOC)
Each dot represents mean values of false discovery rate and median age of the detected recombinations taken from the analysis of 100 simulations with different IRiS settings that combine different grain sizes (indicated with different colors) and different thresholds. All settings included running the algorithm in the two possible senses.
(0.29 MB...
Plot showing values of the number of times in silico recombination events were detected by IRiS run with no threshold depending on the breakpoint location along the sequence. Different colors indicate different ways to produce the recombinant sequence, from light gray to black: “random” indicates that parental haplotypes were taken at random, “1dif...
Percentage values on the number of times each of the simulated event is either not detected, detected as 1 recombination or as 2 recombinations. The percentage values are calculated over 1,000 in silico simulations.
(0.03 MB DOC)
Number of recombinations detected in each of the 18 regions in the male dataset, female dataset and female dataset when removing putative phasing errors. Females were phased using both PHASE and fastPHASE without using male phase information.
(0.05 MB DOC)
MDS 2D plot based on a recombinational distance matrix. The stress is 0.081 which is below the 0.16 stress obtained with 1% probability with random data sets (citation: Sturrock K, Rocha J (2000) A Multidimensional Scaling Stress Evaluation Table. Field Methods 12: 49-60).
(0.03 MB DOC)
Evaluation of IRiS with the optimal parameters for different SNP ascertainments. SNP selection process is explained in the methods section. Mean SNP density values are calculated over all simulations.
(0.04 MB DOC)
The main characteristics of 18 X-chromosome regions. From left to right: start position and end position in base pairs (based on NCBI Build 36 assembly), length of each in base pairs, number of SNPs (N SNPs), number of haplotypes (N haplo), recombination rate calculated by means of Ldhat, Number of recombinations detected, number of recotypes, aver...
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the...
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the...
We address the problem of studying recombinational variations in (human) populations. In this paper, our focus is on one computational aspect of the general task: Given two networks G1 and G2, with both mutation and recombination events, defined on overlapping sets of extant units the objective is to compute a consensus network G3 with minimum numb...
We address the problem of studying recombinational variations in (human) populations. In this paper, our focus is on one computational aspect of the general task: Given two networks G1 and G2, with both mutation and recombination events, defined on overlapping sets of extant units the objective is to compute a consensus network G3 with minimum numb...