
Chun-Long Chen- Doctor of Philosophy
- Group Leader at Institut Curie
Chun-Long Chen
- Doctor of Philosophy
- Group Leader at Institut Curie
About
85
Publications
19,894
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,950
Citations
Introduction
As a computational biologist, I have long term experience in using high throughput genomic approaches to address fundamental biology problems, with particular interest in DNA replication and gene transcription regulation as well as their impact on organization, evolution and stability of the genomes. I’m now focusing in developing new genome-wide approaches to study replication and transcription at single molecular/cell level, in order to study cell-to-cell heterogeneity.
Current institution
Additional affiliations
January 2016 - present
Institut Curie
Position
- Principal Investigator
October 2011 - present
Education
September 2014 - December 2017
Publications
Publications (85)
Neutral nucleotide substitutions occur at varying rates along genomes, and it remains a major issue to unravel the mechanisms that cause these variations and to analyze their evolutionary consequences. Here, we study the role of replication in the neutral substitution pattern. We obtained a high-resolution replication timing profile of the whole hu...
Despite intense investigation, human replication origins and termini remain elusive. Existing data have shown strong discrepancies. Here we sequenced highly purified Okazaki fragments from two cell types and, for the first time, quantitated replication fork directionality and delineated initiation and termination zones genome-wide. Replication init...
The timing of DNA replication is largely regulated by the location and timing of replication origin firing. Therefore, much effort has been invested in identifying and analyzing human replication origins. However, the heterogeneous nature of eukaryotic replication kinetics and the low efficiency of individual origins in metazoans has made mapping t...
Common fragile sites (CFSs) are chromosome regions prone to breakage upon replication stress known to drive chromosome rearrangements during oncogenesis. Most CFSs nest in large expressed genes, suggesting that transcription could elicit their instability; however, the underlying mechanisms remain elusive. Genome-wide replication timing analyses he...
Non-coding (nc)RNAs are key players in numerous biological processes such as gene regulation, chromatin domain formation and genome stability. Large ncRNAs interact with histone modifiers and are involved in cancer development, X-chromosome inactivation and autosomal gene imprinting. However, despite recent evidence showing that pervasive transcrip...
Genomic heterogeneity has largely been overlooked in single-cell replication timing (scRT) studies. Here, we develop MnM, an efficient machine learning-based tool that allows disentangling scRT profiles from heterogenous samples. We use single-cell copy number data to accurately perform missing value imputation, identify cell replication states, an...
Replication stress, a major hallmark of cancers, sources from replication fork slowing or blocking. In response, activation of extra-origins, otherwise dormant, supports the replication rate. Whether the DNA replication checkpoint drives this compensation process remained unclear. Here, DNA combing analyses show that a linear relationship ties inte...
We introduce MnM, an efficient tool for characterising single-cell DNA replication states and revealing genomic subpopulations in heterogeneous samples, notably cancers. MnM uses sin-gle-cell copy-number data to accurately perform missing-value imputation, classify cell repli-cation states and detect genomic heterogeneity, which allows to separate...
Maintaining chromatin integrity at the repetitive non-coding DNA sequences underlying centromeres is crucial to prevent replicative stress, DNA breaks and genomic instability. The concerted action of transcriptional repressors, chromatin remodelling complexes and epigenetic factors controls transcription and chromatin structure in these regions. Th...
Replication stress, a major hallmark of cancers, and ensuing genome instability source from impaired progression of replication forks. The first line of defense against fork slowing is compensation, a long-described process that elicits firing of otherwise dormant origins. It remains unclear whether compensation requires activation of the DNA repli...
Genome integrity requires replication to be completed before chromosome segregation. The DNA-replication checkpoint (DRC) contributes to this coordination by inhibiting CDK1, which delays mitotic onset. Under-replication of common fragile sites (CFSs), however, escapes surveillance, resulting in mitotic chromosome breaks. Here we asked whether loos...
Studying the dynamics of genome replication in mammalian cells has been historically challenging. To reveal the location of replication initiation and termination in the human genome, we developed Okazaki fragment sequencing (OK-seq), a quantitative approach based on the isolation and strand-specific sequencing of Okazaki fragments, the lagging str...
During each cell division, tens of thousands of DNA replication origins are co-ordinately activated to ensure the complete duplication of the human genome. However, replication fork progression can be challenged by many factors, including co-directional and head-on transcription-replication conflicts (TRC). Head-on TRCs are more dangerous for genom...
In eukaryotes, DNA replication initiation requires assembly and activation of the minichromosome maintenance (MCM) 2-7 double hexamer (DH) to melt origin DNA strands. However, the mechanism for this initial melting is unknown. Here, we report a 2.59-Å cryo-electron microscopy structure of the human MCM-DH (hMCM-DH), also known as the pre-replicatio...
Background
Despite having been extensively studied, it remains largely unclear why humans bear a particularly high risk of cancer. The antagonistic pleiotropy hypothesis predicts that primate-specific genes (PSGs) tend to promote tumorigenesis, while the molecular atavism hypothesis predicts that PSGs involved in tumors may represent recently deriv...
DNA replication occurs through an intricately regulated series of molecular events and is fundamental for genome stability1,2. At present, it is unknown how the locations of replication origins are determined in the human genome. Here we dissect the role of topologically associating domains (TADs)3–6, subTADs⁷ and loops⁸ in the positioning of repli...
Mammalian genomes are replicated in a cell type-specific order and in coordination with transcription and chromatin organization. Currently, single-cell replication studies require individual processing of sorted cells, yielding a limited number (<100) of cells. Here, we develop Kronos scRT, a software for single-cell Replication Timing (scRT) anal...
Eukaryotic genes are interrupted by introns that must be accurately spliced from mRNA precursors. With an average length of 25 nt, the >90,000 introns of Paramecium tetraurelia stand among the shortest introns reported in eukaryotes. The mechanisms specifying the correct recognition of these tiny introns remain poorly understood. Splicing can occur...
Mutational signatures defined by single base substitution (SBS) patterns in cancer have elucidated potential mutagenic processes that contribute to malignancy. Two prevalent mutational patterns in human cancers are attributed to the APOBEC3 cytidine deaminase enzymes. Among the seven human APOBEC3 proteins, APOBEC3A is a potent deaminase and propos...
Motivation: During each cell division, tens of thousands of DNA replication origins are coordinately activated to ensure the complete duplication of the entire human genome. However, the progression of replication forks can be challenged by numerous factors. One such factor is transcription-replication conflicts (TRC), which can either be co-direct...
Faithful genome duplication through DNA replication is pivotal for genome maintenance. A variety of stresses could challenge this fundamental process and therefore endanger the genome integrity and change the cell fate. These stresses include misincorporation of ribonucleotides, unusual DNA structures, common fragile sites, replication-transcriptio...
Mammalian genomes are replicated in a cell-type specific order and in coordination with transcription and chromatin organization. Although the field of replication is also entering the single-cell era, current studies require cell sorting, individual cell processing and have yielded a limited number (<100) of cells. Here, we have developed Kronos s...
Eukaryotic genes are interrupted by introns that must be accurately spliced from mRNA precursors. With an average length of 25 nt, the >90,000 introns of Paramecium tetraurelia stand among the shortest introns reported in eukaryotes. The mechanisms specifying the correct recognition of these tiny introns remain poorly understood. Splicing can occur...
Despite long being considered as “junk”, transposable elements (TEs) are now accepted as catalysts of evolution. One example is Mutator -like elements (MULEs, one type of terminal inverted repeat DNA TEs, or TIR TEs) capturing sequences as Pack-MULEs in plants. However, their origination mechanism remains perplexing, and whether TIR TEs mediate dup...
The heterogeneous nature of eukaryotic replication kinetics and the low efficiency of individual initiation sites make mapping the location and timing of replication initiation in human cells difficult. To address this challenge, we have developed optical replication mapping (ORM), a high-throughput single-molecule approach, and used it to map earl...
The replication strategy of metazoan genomes is still unclear, mainly because definitive maps of replication origins are missing. High-throughput methods are based on population average and thus may exclusively identify efficient initiation sites, whereas inefficient origins go undetected. Single-molecule analyses of specific loci can detect both c...
Fragile X syndrome (FXS) is a neurodevelopmental disorder caused by mutations in the FMR1 gene and deficiency of a functional FMRP protein. FMRP is known as a translation repressor whose nuclear function is not understood. We investigated the global impact on genome stability due to FMRP loss. Using Break-seq, we map spontaneous and replication str...
We are looking for a motivated bioinformatics engineer to join our ATIP-Avenir team Replication Program and Genome Instability at Institut Curie (Paris, France). The team focuses on using cutting-edge high-throughput genomic approaches and genome-wide data analyses to study the spatio-temporal replication program of the human genome and its impact...
R-loops have both positive and negative impacts on chromosome functions. To identify toxic R-loops, we mapped RNA:DNA hybrids, markers of replication fork stalling and DNA double-strand breaks along the human genome. This analysis indicates that transient replication fork pausing occurs at the transcription termination sites of highly expressed gen...
Genome integrity requires replication to be completed before chromosome segregation. This coordination essentially relies on replication-dependent activation of a dedicated checkpoint that inhibits CDK1, delaying mitotic onset. Under-replication of Common Fragile Sites (CFSs) however escapes surveillance, which triggers chromosome breakage. Using h...
Significance
Meiotic crossovers are essential for the production of gametes with balanced chromosome content. MutLγ (Mlh1–Mlh3) endonuclease, a mismatch repair heterodimer, also functions during meiosis to generate crossovers. Its activity requires Exo1 as well as the MutSγ heterodimer (Msh4–Msh5). Crossovers also require the polo kinase Cdc5 in a...
Background:
APOBEC-driven mutagenesis and functional positive selection of mutated genes may synergistically drive the higher frequency of some hotspot driver mutations compared to other mutations within the same gene, as we reported for FGFR3 S249C. Only a few APOBEC-associated driver hotspot mutations have been identified in bladder cancer (BCa)...
Graphical Abstract Highlights d Fragile X cells have increased DNA damage with or without DNA replication stress d Break-seq identifies genome-wide DNA double-strand breaks in fragile X cells d Fragile X cells have elevated R-loop formation under DNA replication stress d FMRP, but not a disease-causing FMRP-I304N mutant, reduces R-loop-induced DSBs...
DNA replication is a vital process in all living organisms. At each cell division, > 30,000 replication origins are activated in a coordinated manner to ensure the duplication of > 6 billion base pairs of the human genome. During differentiation and development, this program must adapt to changes in chromatin organization and gene transcription: it...
DNA replication is regulated by the location and timing of replication initiation. Therefore, much effort has been invested in identifying and analyzing the sites of human replication initiation. However, the heterogeneous nature of eukaryotic replication kinetics and the low efficiency of individual initiation site utilization in metazoans has mad...
R-loops have both positive and negative impacts on chromosome functions. To identify toxic R-loops in the human genome, here, we map RNA:DNA hybrids, replication stress markers and DNA double-strand breaks (DSBs) in cells depleted for Topoisomerase I (Top1), an enzyme that relaxes DNA supercoiling and prevents R-loop formation. RNA:DNA hybrids are...
How parental histones, the carriers of epigenetic modifications, are deposited onto replicating DNA remains poorly understood. Here, we describe the eSPAN method (enrichment and sequencing of protein-associated nascent DNA) in mouse embryonic stem (ES) cells and use it to detect histone deposition onto replicating DNA strands with a relatively smal...
DNA replication timing is regulated by the timing of initiation across the genome. However, there is no consensus as to how initiation timing is regulated. Deterministic models contend that different initiation sites are programed to initiate at different, well‐defined times. Stochastic models posit that different initiation sites have different in...
Common Fragile Sites (CFSs) are chromosome regions prone to breakage under replication stress, known to drive chromosome rearrangements during oncogenesis. Most CFSs nest in large expressed genes, suggesting that transcription elicits their instability but the underlying mechanisms remained elusive. Analyses of genome-wide replication timing of hum...
Fragile X syndrome (FXS) is the most prevalent inherited intellectual disability caused by mutations in the Fragile X Mental Retardation gene (FMR1) and deficiency of its product, FMRP. FMRP is a predominantly cytoplasmic protein thought to bind specific mRNA targets and regulate protein translation. Its potential role in the nucleus is not well un...
FGFR3 is one of the most frequently mutated genes in bladder cancer and a driver of an oncogenic dependency. Here we report that only the most common recurrent FGFR3 mutation, S249C (TCC→TGC), represents an APOBEC-type motif and is probably caused by the APOBEC-mediated mutagenic process, accounting for its over-representation. We observed signific...
Supplementary Figures 1-22, Supplementary Tables 1-2 and Supplementary References
The three-dimensional (3D) architecture of the mammalian nucleus is now being unraveled thanks to the recent development of chromatin conformation capture (3C) technologies. Here we report the results of a combined multiscale analysis of genome-wide mean replication timing and chromatin conformation data that reveal some intimate relationships betw...
We review the existence of a new type of megabase-sized replication domains along the human genome. These domains are revealed in 7 somatic cell types by U-shaped patterns in the replication timing profiles. In the germline, these domains appear as N-shaped patterns in the DNA compositional asymmetry profiles resulting from replication-associated m...
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitio...
In this protocol, we describe the use of the LastWave open-source signal-processing command language (http://perso.ens-lyon.fr/benjamin.audit/LastWave/) for analyzing cellular DNA replication timing profiles. LastWave makes use of a multiscale, wavelet-based signal-processing algorithm that is based on a rigorous theoretical analysis linking timing...
In paper I, we addressed the impact of the spatio-temporal program on the DNA composition evolution in the case of time homogeneous and neighbor-independent substitution rates. But substitution rates do depend on the flanking nucleotides as exemplified in vertebrates where CpG sites are hypermutable so that the substitution rate [Formula: see text]...
In paper I, we addressed the impact of the spatio-temporal program on the DNA composition evolution in the case of time homogeneous and neighbor-independent substitution rates. But substitution rates do depend on the flanking nucleotides as exemplified in vertebrates where CpG sites are hypermutable so that the substitution rate \(C \rightarrow T\)...
Same as in Supplementary Fig. S1 but for the fibroblast BJ cell line (Replicate experiment 1 ∶ 1150 replication timing U-domains).
(PDF)
Same as in Supplementary Fig. S1 but for the HeLa cell line (Replicate experiment 1 ∶ 1422 replication timing U-domains).
(PDF)
Pearson correlation (R values) of the derivative of MRT, dMRT/dx, between different pairs of human cell lines (Methods). dMRT/dx was calculated in non-overlapping 100 kb windows over the 22 human autosomes. All p-values are .
(PDF)
Number of matchings between replication timing U-domains in different pairs of cell lines including skew N-domains in the germline. A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides of each of these U-domains were common to the two domains.
(PDF)
Number of matchings between randomly re-positioned replication timing U-domains in different pairs of cell lines including skew N-domains in the germline (1000 simulations were used to obtain the mean values). A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides of ea...
Same as in Supplementary Fig. S1 but for the HeLa cell line (Replicate experiment 2 ∶ 1498 replication timing U-domains).
(PDF)
Same as in Supplementary Fig. S1 but for the fibroblast BJ cell line (Replicate experiment 2 ∶ 1247 replication timing U-domains).
(PDF)
Supplementary methods: (i) Substitution rate matrix associated to replication (ii) Determination of mean replication timing profiles from experimental data and (iii) Detection of U-domains along mean replication timing profiles.
(PDF)
Percentage of matchings between randomly re-positioned replication timing U-domains in different pairs of cell lines including skew N-domains in the germline (1000 simulations were used to obtain the mean values). A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides o...
In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant o...
Replication timing profiles segmented in CTRs/TTRs and multiscale analysis of apparent replication speeds. (A) Profile of replication timing (TR50 in hours) along the genome. Small TR50 values correspond to early replicating regions; large TR50 values correspond to late replicating regions. The replication timing profile was segmented into regions...
Complete set of all molecules of the IGH TTR analyzed by DNA combing. The top diagram shows a map of the IGH region, the position of the fosmid probes (red lines) and chromosome coordinates. The bottom panel shows the complete set of molecules schematized in Figure 8D.
(PDF)
Mean coverage (relative to the genome average) by DNase I hypersensitive zones, as a function of the distance to the closest U-domain border in H0287 (blue solid line : DNase GM06990, genome-wide mean value = 0.0107), in TL010 (blue dashed line : DNase GM06990, genome-wide mean value = 0.0107), in BJ R1 (light blue solid line : DNase BJtert, genome...
Models for replication fork progression in Constant Timing Regions (CTRs) and Timing Transition Region (TTRs). (A) A CTR is passively replicated from left to right in one half of the cells and from right to left in the other half. The average replication time is in mid-S phase for all sequences. (B) A CTR is replicated from multiple, synchronous in...
Comparison of replication timing data with replication bubble data in ENCODE regions. Each page shows: (top) the extent of each ENCODE region (dark line), the segmentation into CTRs (blue) and TTRs (red), the mapping of replication bubbles in log-phase HeLa library Rep3 (orange) and Rep4 (purple) and when the two libraries were combined (pale blue)...
Genome-wide replication timing studies have suggested that mammalian chromosomes consist of megabase-scale domains of coordinated origin firing separated by large originless transition regions. Here, we report a quantitative genome-wide analysis of DNA replication kinetics in several human cell types that contradicts this view. DNA combing in HeLa...
The heterochromatin-like structure formed by the yeast silent information regulator complex (SIR) represses transcription at the silent mating type loci and telomeres. Here, we report that tight protein-DNA complexes induce ectopic recruitment of the SIR complex, promoting gene silencing and changes in subnuclear localization when cis-acting elemen...
During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date,...
Potential base-pairing between box H/ACA snoRNAs and rRNAs. The data showed the functional analysis of the N. crassa box H/ACA snoRNAs.
SnoRNAs represent an excellent model for studying the structural and functional evolution of small non-coding RNAs involved in the post-transcriptional modification machinery for rRNAs and snRNAs in eukaryotic cells. Identification of snoRNAs from Neurospora crassa, an important model organism playing key roles in the development of modern genetics...
The compact genome of the unicellular eukaryote Paramecium tetraurelia contains noncoding DNA (ncDNA) distributed into >39,000 intergenic sequences and >90,000 introns of 390 base pairs (bp) and 25 bp on average, respectively. Here we analyzed the molecular features of the ncRNA genes, introns, and intergenic sequences of this genome. We mainly use...
Chlamydomonas reinhardtii is a unicellular green alga, the lineage of which diverged from that of land plants >1 billion years ago. Using the powerful small nucleolar RNA (snoRNA) mining platform to screen the C. reinhardtii genome, we identified 322 snoRNA genes grouped into 118 families. The 74 box C/D families can potentially guide methylation a...
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying
chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were
inherited from the common ancestor of plants and animals, but lost in...
2'-O-ribose methylation of eukaryotic ribosomal RNAs is guided by RNA duplexes consisting of rRNA and box C/D small nucleolar (sno)RNA sequences, the methylated sites invariably mapping five positions apart from the D box. Here we have analyzed the RNA duplex pairing constraints by investigating the features of 415 duplexes from the fungus, plant a...
Small nucleolar RNAs (snoRNAs) are an abundant group of noncoding RNAs mainly involved in the post-transcriptional modifications of rRNAs in eukaryotes. In this study, a large-scale genome-wide analysis of the two major families of snoRNA genes in the fruit fly Drosophila melanogaster has been performed using experimental and computational RNomics...
Using a powerful computer‐assisted analysis strategy, a large‐scale search of small nucleolar RNA (snoRNA) genes in the recently
released draft sequence of the rice genome was carried out. This analysis identified 120 different box C/D snoRNA genes with
a total of 346 gene variants, which were predicted to guide 135 2′‐O‐ribose methylation sites in...
Based on the analysis of structural features and conserved elements, 27 novel snoRNA genes have been identified from rice. All of them belong to the C/D box-containing snoRNA family except for one that belongs to the H/ACA box type. The newly found genes fall into six clusters that comprise at least three snoRNA genes, and in one case as many as ni...