ArticlePDF Available

Mapping of scaffold/matrix attachment regions in human genome: a data mining exercise

Authors:

Abstract and Figures

Scaffold/matrix attachment regions (S/MARs) are DNA elements that serve to compartmentalize the chromatin into structural and functional domains. These elements are involved in control of gene expression which governs the phenotype and also plays role in disease biology. Therefore, genome-wide understanding of these elements holds great therapeutic promise. Several attempts have been made toward identification of S/MARs in genomes of various organisms including human. However, a comprehensive genome-wide map of human S/MARs is yet not available. Toward this objective, ChIP-Seq data of 14 S/MAR binding proteins were analyzed and the binding site coordinates of these proteins were used to prepare a non-redundant S/MAR dataset of human genome. Along with co-ordinate (location) details of S/MARs, the dataset also revealed details of S/MAR features, namely, length, inter-SMAR length (the chromatin loop size), nucleotide repeats, motif abundance, chromosomal distribution and genomic context. S/MARs identified in present study and their subsequent analysis also suggests that these elements act as hotspots for integration of retroviruses. Therefore, these data will help toward better understanding of genome functioning and designing effective anti-viral therapeutics. In order to facilitate user friendly browsing and retrieval of the data obtained in present study, a web interface, MARome (http://bioinfo.net.in/MARome), has been developed.
Content may be subject to copyright.
Nucleic Acids Research, 2019 1–15
doi: 10.1093/nar/gkz562
Mapping of scaffold/matrix attachment regions in
human genome: a data mining exercise
Nitin Narwade1,, Sonal Patel2,, Aftab Alam2, Samit Chattopadhyay2,*, Smriti P.K. Mittal3,*
and Abhijeet Kulkarni1,*
1Bioinformatics Centre, Savitribai Phule Pune University, Pune, Maharashtra 411007, India, 2Chromatin and Disease
Biology Lab, National Centre for Cell Science, Pune, Maharashtra 411007, India and 3Department of Biotechnology,
Savitribai Phule Pune University, Pune, Maharashtra 411007, India
Received April 25, 2019; Revised June 8, 2019; Editorial Decision June 13, 2019
ABSTRACT
Scaffold/matrix attachment regions (S/MARs) are
DNA elements that serve to compartmentalize the
chromatin into structural and functional domains.
These elements are involved in control of gene ex-
pression which governs the phenotype and also
plays role in disease biology. Therefore, genome-
wide understanding these elements holds great ther-
apeutic promise. Several attempts have been made
toward identification of S/MARs in genomes of vari-
ous organisms including human. However, a compre-
hensive genome-wide map of human S/MARs is yet
not available. Toward this objective, ChIP-Seq data
of 14 S/MAR binding proteins were analyzed and the
binding site coordinates of these proteins were used
to prepare a non-redundant S/MAR dataset of hu-
man genome. Along with co-ordinate (location) de-
tails of S/MARs, the dataset also revealed details of
S/MAR features, namely, length, inter-SMAR length
(the chromatin loop size), nucleotide repeats, motif
abundance, chromosomal distribution and genomic
context. S/MARs identified in present study and their
subsequent analysis also suggests that these ele-
ments act as hotspots for integration of retroviruses.
Therefore, these data will help toward better under-
standing of genome functioning and designing ef-
fective anti-viral therapeutics. In order to facilitate
user friendly browsing and retrieval of the data ob-
tained in present study, a web interface, MARome
(http://bioinfo.net.in/MARome), has been developed.
INTRODUCTION
Eukaryotic cell is compartmentalized into several or-
ganelles and a well-dened nucleus that harbors the genetic
material. The human DNA with an approximate length
of 3 m is highly compacted to t into relatively small nu-
cleus. This compaction, however, does not render the DNA
inactive. Rather, DNA is accessed in a tightly controlled
and dynamic manner to facilitate regulated gene expression.
The nuclear matrix, a three-dimensional lamentous RNA–
protein meshwork, forms the basis of structural support for
orderly compaction of DNA (1). The chromatin is orga-
nized into loops by virtue of DNA sequences that tether
the chromatin to the nuclear matrix (2). These anchor se-
quences are known as scaffold/matrix attachment regions
(S/MARs). Various proteins, called S/MAR binding pro-
teins (S/MARBPs), are known to interact with S/MARs to
facilitate chromatin looping (2). Such looping of DNA has
been proved to be crucial for many cellular processes like
DNA replication, transcription, chromatin to chromosome
transition and DNA repair (3,4). Interestingly, the S/MARs
that tether these loops to the nuclear matrix lacks sequence
conservation (5,6). However, features related to their sec-
ondary structure appear to be conserved and functionally
relevant (5,7). S/MAR sequences are thus known to possess
features such as origin of replication (OriC), AT richness,
kinked and curved DNA, TG richness, MAR signature and
Topoisomerase-II sites (7–9).
The human genome comprehends about 3.2 billion base
pairs organized into 23 pairs of chromosomes. It is esti-
mated to contain 20 000 protein coding genes. Each chro-
mosome thus harbors several genes that are transcribed
in highly regulated manner under a well-studied spatio-
temporal control. Croft et al., in 1999, reported importance
of nuclear matrix in regulation of expression of genes on
*To whom correspondence should be addressed. Tel: +91 020 2569 0195; Fax: +91 020 2569 0087; Email: abhijeet@bioinfo.net.in
Correspondence may also be addressed to Smriti Mittal. Tel: +91 020 2569 4952; Fax: +91 020 2569 1821; Email: spmittal@unipune.ac.in
Correspondence may also be addressed to Samit Chattopadhyay. Tel: +91 33 2413 1157; Fax: +91 33 2473 5197; Email: samit@iicb.res.in
The authors wish it to be known that, in their opinion, the rst two authors should be regarded as joint First Authors.
Present addresses:
Nitin Narwade, National Centre for Microbial Resources, National Centre for Cell Science, Pashan, Pune, Maharashtra 411021, India.
Samit Chattopadhyay, Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Kolkata, West Bengal 700032, India.
C
The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2Nucleic Acids Research, 2019
chromosome 18 and 19. The study indicated that genes on
chromosome 19, that occupies an internal position in the
nucleus and has close association with nuclear matrix, are
transcribed actively. Whereas, chromosome 18, which pref-
erentially occupies peripheral position in nucleus, shows
lesser gene expression (10). Similarly, S/MARs have been
shown to increase the expression and stability of the trans-
gene in various organisms (5,11–13). Thus, the crucial role
of S/MARs and nuclear matrix in organization and func-
tioning of the genetic material is evident. Further, interplay
between S/MARs and nuclear matrix has been well studied
in various conditions including diseases (14–17). Therefore,
these two important players that control genome topology
and function appears to be lucrative targets for therapeu-
tic interventions. However, even after signicant efforts to-
ward better understanding of chromatin biology, a compre-
hensive genome-wide map of S/MARs is not yet available
for human genome.
Advancements in DNA sequencing technologies (the
next generation sequencing (NGS) has made it possible
to generate a large amount of sequence data in high-
throughput manner. Chromatin pull down using antibod-
ies specic to chromatin binding proteins followed by se-
quencing of enriched DNA fragments (ChIP-Seq) is one
such NGS application. ChIP-Seq experiments for various
S/MARBPs have also been performed in independent at-
tempts by various laboratories and the data is available in
public repositories (18–21). In the present study, we rean-
alyzed ChIP-Seq data of 14 different human S/MARBPs
to understand their genome-wide binding patterns. This in-
formation was then used to make a comprehensive S/MAR
dataset that is genome-wide and non-redundant across se-
lected proteins.
The dataset thus provides genomic co-ordinates of hu-
man S/MARs. It also reveals S/MAR details such as
length, chromatin loop size, nucleotide repeats, abundant
motifs, chromosomal distribution and genomic context.
Further analysis of this dataset also indicates that the identi-
ed S/MARs indeed act as hotspots for integration of retro-
viruses. Therefore, the data presented herewith gives a better
insight of chromatin organization occurring by S/MARs
and its implication in diseases.
MATERIALS AND METHODS
Dataset preparation
The ChIP-Seq data for 14 selected S/MARBPs, namely,
BRCA1, BRIGHT, SMAR1, CEBPB, CUX1/CDP, CTCF,
Fas t1 /FOXH1, HoxC11, Ku autoantigen, NMP4, Mut-
p53, SAF-A/hnRNPU, SATB1 and YY1 were retrieved
from ENCODE, NCBI-SRA database with their appro-
priate IgG control/input/mock in FASTQ format (18–23).
If available, sequence data for experimental replicates were
also retrieved. Data generated from a single sequencing
platform i.e. Illumina genome analyser) having single end
read layout for control human samples were considered for
the study. These sequence les were then analyzed by using
the standard ChIP-Seq data analysis pipeline as described
below.
Raw data quality control
The individual sample raw data quality was as-
sessed using FastQC tool v0.11.5 (https://www.
bioinformatics.babraham.ac.uk/projects/fastqc/)and
then reads were trimmed using NGSQC toolkit V2.3.3
(http://www.nipgr.res.in/ngsqctoolkit.html)(24) for retain-
ing good quality adapter free reads with average phred
score 20.
Raw read alignment
The high-quality reads from individual control and
pull down samples were aligned to the human genome
GhCR38/hg38 assembly in independent attempts using
bowtie aligner v1.0.0 (25) (default parameters). A pre-
built bowtie genome index available at http://bowtie-bio.
sourceforge.net/tutorial.shtml#preb was used for perform-
ing these alignments. The SAM les generated after align-
ment were converted in to binary alignment format i.e.
BAM using view utility provided by SAMtools v1.3.1 (26).
Polymerase chain reaction (PCR) duplicates from the ob-
tained alignment les were removed using rmdup utility of
SAMtools with default parameters.
Peak calling
Peak calling was thus carried out for BAM les of 14
S/MARBPs (control and pull down) using MACS v1.4.2
with default parameters. The obtained BED les were con-
catenated into single le for each S/MAR binding protein
and then subjected to the sortBed utility. These sorted BED
les were merged using mergeBed in independent attempts
for different S/MARBPs to get unique peaks within the
replicates (if available). This resulted in generation of 14
different BED les. These were further merged by subject-
ing them to Bedtools multiIntersect utility, thereby gen-
erating a single bed le with intersect peak coordinates
across all S/MARBPs. At last, bedtools merge utility was
used with default parameters to merge the overlapping
peaks in this le. The genomic DNA sequences corre-
sponding to these coordinates were fetched from UCSC-
DAS s erver (http://genome.ucsc.edu/cgi-bin/das/hg38/dna?
segment=chr:start.end) and saved as a multi-fasta le.
These obtained sequence and BED coordinates were used
for subsequent analysis.
Motif and nucleotide repeat analysis
The extracted DNA sequences were analyzed for pres-
ence of motifs using Linux-compatible, standalone MEME-
ChIP v4.10.1 tool (27). The motif analysis is carried out
using default parameters of MEME-ChIP program. Abun-
dance of mono-, di-, tri-, tetra-, penta- and hexa-nucleotide
repeats in these sequences were estimated using standalone
MISA v1.0 microsatellite nding PERL program.
Annotation of peak coordinates
The peak coordinates were annotated using R package
called ChIPseeker v1.12.1 (28). The tool annotates ChIP-
Seq peaks and reports nearest downstream gene and peak
Nucleic Acids Research, 2019 3
distribution in different genomic elements like promoter,
untranslated regions, intron, exon and intergenic regions.
The associated pathways to the nearest downstream gene
were retrieved using KEGGREST R package and gene on-
tologies were retrieved using UniProt/SwissProt database
(https://www.uniprot.org/).
S/MAR-associated features
S/MARs are characterized by presence of features like
OriC, AT richness, kinked and curved DNA, TG rich-
ness, MAR signature and Topoisomerase-II sites. There-
fore, the extracted DNA sequences were veried for the
presence of one or more of these features. The motifs
that reveals presence of these features have been dened
earlier (8,9). Therefore, presence of these features in se-
quences were determined by presence of specic motifs.
In brief, presence of OriC was determined by detecting
presence of ATTA or ATTTA or ATTTTA motif, AT
richness by presence of two WWWWWW (where W is
A or T) motifs intervened by 8–12 nt, Kinked DNA
bythepresenceofTAN
3TGN3CA or TAN3CAN3TG
or TGN3TAN3CA or TGN3CAN3TA or CAN3TAN3TG
or CAN3TGN3TA motif (where n is any nucleotide),
Curved DNA by presence of AAAAN7AAAAN7AAAA
or TTTTN7TTTTN7TTTT or TTTAAA (where n is any
nucleotide), TG richness by the presence of TGTTTTG or
TGTTTTTTG or TTTTGGGG motifs, MAR signature by
presence of a bipartite sequence containing AATAAYAA
and AWWRTAANNWWGNNNC (where W is A or T, Y
is pyrimidine, R is purine and N is any nucleotide) and
Topoisomerase II binding site by the presence of RNYNNC
NNGYNGKTNYNY or GTNWAYATTNATNNR con-
sensus.
These patterns were matched using custom PERL script
written in house. Counts of sequences that have unique or
combination of these features are represented in the form of
a venn diagram prepared using custom in house Javascript.
Nuclear matrix isolation
The 5 ×106cells were washed twice with phosphate-
buffered saline and lysed in extraction buffer (10 mM
HEPES-KOH pH-7.2, 24 mM KCl, 10 mM MgCl2,1mM
PMSF, 2 mM DTT, 0.03% NP40 with protease inhibitors).
The lysate was loaded on 0.8M sucrose bed and centrifuged
at 6000 rpm for 20 min. The pellet containing nucleus was
digested with DNase I for 30 min and then centrifuged at
6000 rpm for 10 min. The pellet was then washed with low
salt buffer (10 mM HEPES-KOH, 0.2 mM MgCl2and 10
mM -mercaptoethanol), high salt buffer (1.6M NaCl, 10
mM HEPES, 0.2 mM MgCl2,10mM-mercaptoethanol)
and again low salt buffer sequentially. EcoRI treatment
was given for 2 h at 37C followed by centrifugation. The
pellet was collected as nuclear matrix. DNA was puried
using phenol chloroform and precipitated using ethanol.
The quality of the matrix was checked by agarose DNA
electrophoresis and also by amplifying previously exper-
imentally veried S/MARs (29,30). Two S/MARs from
Girod et al., (29), namely, MAR 3–5 (P1) and MAR X-
29 (P2) and three from Keaton et al., (30), namely, seq =
94 (P3) (chr18:23835886-23838503; Length =2617), seq =
99 (P4) (chr18:24001839-24004790; Length =2951) and seq
=1 (P5) (chr1:149425310-149430000; Length =4690) were
used as positive controls. The DNA was further used for
amplifying S/MAR sequences using specic primers (Sup-
plementary Table S3).
Mapping retroviral integration sites
Retrovirus Integration Database (RID) archives retroviral
integration sites (IS) particularly, HIV and HTLV. This in-
formation is archived in the form of genomic locus of inte-
gration (i.e. Chromosome and the coordinate as per hg19
genome build). RID archives 1 141 461 and 11 283 IS
for HIV and HTLV, respectively. In the present study, the
S/MAR peak coordinates were deduced from hg38 assem-
bly. Therefore, before mapping, all peak coordinates were
converted to hg19 assembly using online version of UCSC
liftover tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver).
HIV and HTLV IS were then mapped on to the converted
peak coordinates. Number of IS residing within peak co-
ordinates were then estimated. If the IS resides outside the
peak coordinate, then its distance from nearest upstream
and downstream S/MAR peak was determined. Only those
IS that are anked on either side of S/MAR peaks were con-
sidered for this analysis. All the mapping and distance esti-
mations were carried out using custom PERL script written
in house.
Development of web interface, MARome
The MARome web interface has been developed us-
ing Spring Framework - 1.2.1, Apache Maven, HTML5,
JavaScript5, CSS3, Bootstrap3, Java - 1.8, PostgreSQL -
9.3.19. For automation/parsing, custom PERL scripts have
been used wherever necessary. MARome is freely available
at http://bioinfo.net.in/MARome.
RESULTS
Identication of S/MAR coordinates in the human genome:
the dataset preparation
S/MARBPs are known to bind S/MAR regions. A non-
redundant set of binding patterns of several SMARBPs
can thus be used to trace S/MARs, in a genome-
wide manner. Therefore, ChIP-Seq data of 14 different
S/MARBPs, namely, BRCA1 (31), BRIGHT (32), SMAR1
(33), CEBPB (34,35), CUX1/CDP (36), CTCF (35,37,38),
Fas t1 /FOXH1 (35), HoxC11 (35), Ku autoantigen (39),
NMP4 (35,40), Mut-p53 (41), SAF-A/hnRNPU (35,42),
SATB1 (35,36) and YY1 (40) were retrieved from public
repositories. The accession number and other relevant in-
formation about the data used in present study is provided
in Supplementary Table S1. After quality assessment and
ltering of raw data, the high-quality reads were aligned to
the human genome hg38 assembly. The detailed alignment
statistics is provided in Supplementary Table S2.
Peak calling using MACS14 resulted in a total of 452 881
peaks across all S/MARBPs proteins which, also includes
peaks resulted from their experimental replicates. At last,
overlapping coordinates were merged resulting in a total
4Nucleic Acids Research, 2019
of 298 443 peak coordinates. Then, these sequences were
analyzed for the S/MAR features and 283 568 sequences
showed presence of at least one S/MAR feature. These peak
coordinates are thus average representation of binding sites
of one or more of the selected 14 S/MARBPs and are non-
redundant.
Validation of dataset
In order to verify if the identied peak coordinates are in-
deed genomic locations for DNA sequences that resem-
ble S/MARs, the nucleotide sequences corresponding to
these coordinates were fetched from UCSC-DAS server.
The nucleotide sequences were then analyzed for presence
of S/MAR associated features such as OriC, AT richness,
kinked and curved DNA, TG richness, MAR signature and
Topoisomerase-II sites. The analysis revealed that, out of
298 443 curated sequences, 283 568 sequences show pres-
ence of at least one of these features indicating S/MAR like
nature of these sequences. There were 14 857 sequences that
lacked these features. OriC (272 016, 91%), AT richness
(196 611, 66%) and Kinked DNA (178 960, 60%) were
the most abundantly occurring features. The least repre-
sented feature was presence of Topoisomerase-II sites (9973,
3.3%). A total of 52 567 S/MARs showed presence of
combinations of six features and only 190 S/MARs showed
presence of all the seven features. (Figure 1A and B).
S/MARs and inferred topological details
In the present study, a total of 283 568 S/MARs were iden-
tied in human genome. The length of these S/MARs range
from 33 to 61 755 bp with a median length of 596 bp. Ag-
gregate length of all these sequences comes to 230177.6 kb
that accounts for 7.4% of human genome. Out of these se-
quences, 269 046 i.e. 94.87% have length 2Kb(Figure2A).
The chromatin is tethered to the nuclear matrix by
virtue of S/MARs thereby generating inter-S/MAR chro-
matin loops. We therefore, searched segments of genome
that are anked on either side by identied S/MAR
coordinates/sequences. We identied a total of 283 453
inter-S/MAR regions or loops. Analysis of these loops re-
vealed that their size ranges from 1 bp to 30025.7 kb, with
a median length of 4923 bp. Further, 267 096 number of
chromatin loops, i.e. 94.23% of total identied loops have
lengths less than or equal to 31 Kb (Figure 2B).
Chromosome-wise distribution of S/MARs
In order to determine if S/MARs follow a random distribu-
tion or have preference for localization over specic chro-
mosomes, the S/MARs coordinates obtained in the present
study were visualized over chromosomes in the form of a
circular plot Ideogram (Figure 3A). The S/MAR density
per chromosome was also calculated. It was observed to
be 95.74 S/MARs per Mb of genome for autosomes. Al-
losomes, however, showed a distinctly less S/MAR density
as compared to autosomes. The Y and X chromosomes
showed 10.8- and 1.7-fold lower densities of S/MARs com-
pared to autosomes, respectively. On an average, presence
of approximately 10 S/MARs per gene was detected. The
S/MAR count per chromosome is represented in Figure
3B. Further, a positive correlation was observed between
S/MAR density and gene density (Figure 3C). The details
of gene number/density, S/MAR number/density for each
human chromosome has been presented in Table 1.
Distribution of S/MARs in genomic elements
We determined distribution of S/MARs in various genomic
elements. Approximately, 96.3% of S/MARs were found
to be located in the non-coding region of genome. Out
of them, 21% were found to be located in the promoter
regions. Presence of S/MAR in promoter region is asso-
ciated with transcriptional regulation of the downstream
gene. Notably, miR-222, miR-34a, miR-371a, Bax, Cyclin
D1, NFB, CD40, FN1 and PDGFRB genes showed pres-
ence of S/MAR within 1 Kb region upstream to their tran-
scription start sites (TSS). Presence of S/MARs in the pro-
moters of these genes has already been demonstrated exper-
imentally (21,43–49). Further, 35.57% of the total S/MARs
were found to be located in the intergenic region (Figure
4A). It was also observed that 15 614 of the total identied
S/MARs were present within 100 to +100 bp of TSS of
14 425 genes (Figure 4B). This accounts for 26.78% of total
human genes (total number of genes is 58288 as per GEN-
CODE hg38 statistics https://www.gencodegenes.org/stats/
current.html). Presence of S/MARs around TSS of such
a high number of genes highlights essentiality of these el-
ements for transcriptional regulation of genes.
Functional categorization of S/MAR-associated genes
It was observed that 20 905 of the total S/MARs over-
lap exactly with the TSS of 15 319 genes. Therefore, func-
tional characterization of the genes containing S/MARs
within 1.5 kb of their TSS was carried out. The genes
were analyzed for enriched GO terms and pathways using
UniProt/SwissProt and KEGG pathway analysis, respec-
tively. The most represented molecular functions included
transcription and post-translation; biological process in-
cluded immune response, transcription and cell signaling;
cellular components included extracellular regions, nucleus
and extracellular space. This highlights the importance of
S/MARs in overall gene expression program (Figure 5A).
Pathway analysis of these genes revealed that 26% of these
genes belong to metabolic pathways, 23% of them belong to
signaling pathways, 16% of them belong to cancer related
pathways, 7% belong to human papilloma virus infection
related pathways and 5% are related to HTLV1 infection
(Figure 5B). A high fraction of these S/MAR associated
genes showed link with diseases (data not shown).
Nucleotide composition of S/MARs
Nucleotide sequence of the DNA is known to strongly in-
uence its structure. Changes in nucleotide composition
or order has been shown to inuence DNA structure and
DNA–protein interaction that regulate vital cellular process
(50,51). Function of S/MARs also associates with struc-
tural features such as kinks and curves in DNA and thus
these elements also have characteristic nucleotide compo-
sition. Therefore, nucleotide repeat and motif analysis of
Nucleic Acids Research, 2019 5
Figure 1. Validation of dataset by determining presence of S/MAR-associated features. (A) Abundance (in percentage) of seven S/MAR features including
OriC, TG richness, curved DNA, kinked DNA,Topo II site, AT richness and MRS in the dataset. (B) Venn diagram depicting number of S/MAR sequences
having one or more features.
Table 1. Distribution of genes and S/MARs on human chromosomes
Chromosome Size (Mb) S/MAR Count S/MAR density/Mb Number of Genes Gene density/Mb
chr1 248.9564 25 689 103.1867 2785 11.1867
chr2 242.1935 24 405 100.7665 1791 7.394913
chr3 198.2956 18 543 93.51193 1541 7.771228
chr4 190.2146 14 907 78.3694 1066 5.604198
chr5 181.5383 16 524 91.02214 1288 7.094923
chr6 170.806 16 841 98.59725 1416 8.290108
chr7 159.346 15 428 96.82077 1318 8.27131
chr8 145.1386 13 440 92.60112 1008 6.945084
chr9 138.3947 11 982 86.57845 1105 7.984409
chr10 133.7974 13 264 99.13494 1084 8.1018
chr11 135.0866 13 270 98.23327 1658 12.27361
chr12 133.2753 13 964 104.7756 1369 10.27197
chr13 114.3643 7703 67.35492 619 5.412527
chr14 107.0437 8567 80.03272 931 8.697381
chr15 101.9912 9017 88.4096 988 9.687111
chr16 90.33835 9427 104.3521 1125 12.45318
chr17 83.25744 10 989 131.9882 1556 18.68902
chr18 80.37329 6487 80.7109 425 5.287827
chr19 58.61762 7813 133.2876 1774 30.26394
chr20 64.44417 7349 114.0367 772 11.97936
chr21 46.70998 3428 73.38902 410 8.777567
chr22 50.81847 4527 89.08179 633 12.4561
chrX 156.0409 8773 56.22244 1151 7.376271
chrY 57.22742 509 8.894338 141 2.463854
6Nucleic Acids Research, 2019
Figure 2. Length distribution of S/MARs and chromatin loops. (A) Length of S/MARs (in bp) was plotted against their occurrence. (B) Inter-S/MAR
distance or chromatin loop size (in Kb) was plotted against their occurrence.
S/MAR sequences was carried out. Abundance of var-
ious mono-, di-, tri-, tetra-, penta-, hexa-nucleotide re-
peats was determined (Figure 6A). The analysis revealed
that [A]10/[T]10 repeat was the most abundant pattern
(75 023 times) in the dataset indicating A/T richness of
these sequences. The same was also evident from motif
analysis done using MEME-ChIP program. Motif 1 with
pattern GAGGYRGAGGTTGCAGTGAGC occurred in
7161 S/MARs. Motif 2 with A/T rich TTTTTTTTTTTG
AGAYRGAGTYTYRCTCT occurred in 4055 S/MARs.
Details of other nucleotide repeats and motifs predicted by
MEME has been shown (Figure 6B–D). Abundance of dif-
ferent types of repeat patterns were also checked. Tandem
repeats, direct repeats and palindromes were found to be
most represented in S/MAR dataset (Figure 6E).
Experimental validation of human S/MARs
To experimentally validate the presence of S/MAR se-
quences from the present dataset, the nuclear matrix from
human colon cancer cell line, HCT116 was isolated and
used as template. The matrix was validated by agarose gel
electrophoresis and also by amplifying ve previously ex-
perimentally validated S/MARs (29,30)(Figure7A). Thirty
representative S/MAR sequences from the entire dataset
were chosen randomly and amplied using specic primers.
Two random inter-S/MAR sequences were used as control
(Figure 7B). It was observed that all 30 S/MARs showed
specic amplication. However, S/MAR sequence number
19 showed very faint band (Figure 7B–D). Thus, randomly
chosen 30 sequences were experimentally proved to be part
of nuclear matrix.
Nucleic Acids Research, 2019 7
Figure 3. Distribution of S/MARs on human chromosomes. (A) Visualization of S/MARs on all human chromosomes. Color coding for different parts
on chromosomes: centromeres- pink, G negative band- white, G positive band- black, G positive (25, 50, 75%)- light gray/gray/dark gray, variable region-
blue, stalk- dark blue. Vertical blue lines above each chromosome represents S/MAR distribution on the chromosomes. (B) Number of S/MARs present
on each human chromosome. (C) Gene density and S/MAR density correlation graph for each human chromosome.
8Nucleic Acids Research, 2019
Figure 4. Genomic context of S/MARs: (A) Percentage distribution of S/MARs in different genomic regions. (B)DistanceofS/MARs from the TSS of
nearest downstream gene versus S/MAR count.
S/MARs: hotspots of retroviral integration
Retrovirus integration is not random event, various viral
and host factors are known to mediate this process. One
such factor discussed earlier is the S/MARs of the host
genome (17). In order to determine whether S/MARs iden-
tied in the present study has any correlation with retrovirus
integration event, HIV and HTLV insertion coordinates
were mapped on to the identied S/MAR coordinates. A
very strong correlation was observed between ‘presence of
S/MAR’ and ‘presence of IS’ for HIV and HTLV. Out of
total mapped 1 141 899 HIV IS, 102408 IS were present
within S/MAR coordinates. Further, 599 389 (52.5%) in-
sertion sites were present within 5 kb and 956 873 (84%)
within 15 kb region of identied S/MARs (Figure 8A). In
case of HTLV, out of total 11 286 mapped IS, 1059 were lo-
cated exactly within S/MAR coordinates. A total of 4986
(44%) IS were present within 5 kb of S/MAR sites. A total
of 8169 (72%) HTLV IS were present within 15 kb region
around S/MARs (Figure 8B).
MARome web interface
Using MARome, S/MARs identied in the present study
and related annotation (both for hg19 and hg38 assem-
blies) can easily be browsed using various search strategies.
MARome provides search by unique IDs, genomic coordi-
nates, query sequences and gene ID/symbol. In MARome,
every S/MAR entry is represented by unique identier.
With prior knowledge of these identiers, user can browse
particular S/MARs using search by ID strategy. Users can
submit genomic coordinates of their interest in standard
bed format to retrieve S/MARs available at and around loci
of their interest. Search by sequence strategy provided by
MARome allow users to search S/MARs similar to query
sequence of their interest. This strategy internally runs
NCBI-blast+ blastn against identied S/MAR sequences
and returns the best hit along with top 10 alignments. Sim-
ilarly, users can search S/MARs associated genes of their
interest using search by Gene Name/Symbol strategy. The
tabular output obtained through every search strategy fur-
ther provides, SMAR binding proteins targeting SMARs,
SMAR associated features, location of SMARs in genome
Nucleic Acids Research, 2019 9
Figure 5. Functional classication of S/MAR associated genes. (A) Classication of genes based on gene ontology; Biological Processes. (B) Classication
of genes based on their involvement in different pathways.
context/element and its distance from TSS of nearest gene
along with the gene details, HTLV/HIV insertion sites asso-
ciated with SMARs. The output data are also cross-linked
to public databases like NCBI-gene, ENSEMBLE, RID,
etc. for further annotation details. It is also cross-linked to
UCSC Genome browser for data visualization. The inter-
face also allows complete and S/MARBP-wise download
of S/MAR sequences, coordinate les, annotations, etc. in
bed and tsv formats. Further, a scoring scheme (details pro-
vided in online help manual of MARome) that considers
number of S/MARBPs, number of different ‘S/MAR as-
sociated features’ and number of times ‘S/MAR associated
features’ appears in a particular S/MAR has been imple-
mented in the database to score the S/MAR entries.
DISCUSSION
Spatio-temporal control of gene expression is a hallmark
of multicellular organisms. Apart from the individual’s ge-
netic makeup, epigenetics also plays a vital role in shaping
differential phenotypic traits. Epigenetic regulation occurs
through histone modications, DNA methylation, non-
coding RNAs and regulatory elements such as Locus Con-
trol Regions (LCRs), S/MARs etc. Chromatin organiza-
tion, an integral part of gene regulation is brought about
by DNA sequences called S/MARs (1). These S/MARs act
as topological sinks that hold the chromatin loops to nu-
clear matrix and are involved in context-dependent activa-
tion or repression of the surrounding genes. However, the
molecular mechanism underlying this loop organization re-
mains poorly characterized. Defects in S/MARs have also
been implicated in various diseases like cancers, inamma-
tory diseases, facioscapulohumeral dystrophy and viral in-
fections (14–16,52). In this context, a map of all the charac-
terized S/MARs in human genome would be benecial in
understanding chromatin- and disease-biology. Toward this
objective, we reanalyzed ChIP-Seq data of 14 different hu-
man S/MARBPs, namely, BRCA1, BRIGHT, SMAR1,
CEBPB, CUX1/CDP, CTCF, Fast1/FOXH1, HoxC11, Ku
autoantigen, NMP4, Mut-p53, SAF-A/hnRNPU, SATB1
and YY1 to understand their genome-wide binding pat-
10 Nucleic Acids Research, 2019
Figure 6. Repeats and motifs present in S/MAR sequences. (A) Graphical representation for number of various mono-, di-, tri-, tetra-, penta- and hex-
anucleotide repeats present in S/MARs. (B) Occurrence of 12 highly occurring nucleotide repeats in S/MAR sequences. (C) Three most abundant motifs
as identied by MEME-ChIP program in the S/MARs. (D) Graphical representation of abundance of the identied motifs. (E) Abundance of various
repeats in S/MAR dataset.
terns. This information was then used to make a com-
prehensive S/MAR dataset that is genome-wide and non-
redundant across selected proteins.
We obtained 452881 peak coordinates by analyzing
ChIP-Seq data of the selected S/MARBPs. The peak num-
ber reduced to 298 443 by drawing peak intersects and by
merging the overlapping peaks. This indicates that there
is 70% redundancy in identied binding sites and multi-
ple S/MARBPs target same/adjacent genomic loci. Anal-
ysis of protein-protein interaction data available in ‘Bio-
logical General Repository for Interaction Datasets’ (Bi-
oGRID) indicates that the selected S/MARBPs interact
with each other. Therefore, these proteins can form multi-
protein complexes or co-localize together while targeting
specic genomic loci. The same can account for the redun-
dancy in their binding sites observed in the present study.
It also conrms strong S/MAR potential of the identied
coordinates. DNA sequences corresponding to these coor-
dinates can thus be considered as S/MAR dataset.
Curves and kinks in DNA have been recognized as a vital
structural feature that favors DNA–protein interactions. Se-
quences with kinked and curved DNA signatures are prone
to undergo kinking and curving in response to binding of
accessory factors that induce distortions in DNA. Such dis-
tortions, in turn favors binding of other protein factors
to mediate biological processes (53–55). In present study,
60 and 43% of identied SMARs have kinked and curved
DNA signatures, respectively. The ability of S/MARs to in-
teract with a variety of regulatory proteins which, ultimately
regulates gene expression can thus be explained.
Similarly, DNA molecules that are rich in AT stretches
are exible and are prone to strand separation. They are
also susceptible to superhelical stress-induced duplex desta-
bilization (56). OriC is one such element that contains AT
stretches, making it prone to strand separation, thereby fa-
cilitating initiation of DNA replication (57). S/MARs are
known to possess both these features. In present study,
91% of identied S/MARs have OriC signatures and
66% of them have signatures of AT richness. Thus, role
played by S/MARs in biological processes such as replica-
tion, transcription and repair (viz., regulated DNA strand
separation) can be supported.
Nucleic Acids Research, 2019 11
Figure 7. Experimental validation of S/MAR sequences present in the nuclear matrix. Semi-quantitative PCR for positive (A) and negative controls (B)
N1 and N2. (CE) Semi-quantitative PCRs for randomly selected 30 S/MAR sequences.
The S/MARs length and the inter-S/MAR chromatin
loop size are major determinants of chromatin structure
and function. There is a lot of disparity about length of
S/MARs in published literature and they are discussed to be
100 bp to several kb long (30,58,59). The median S/MAR
length observed in the present study is 596 bp and 94.87%
of identied S/MARs have length 2 kb. Thus in gen-
eral S/MARs are small stretches of DNA having varied
lengths. The dataset also contain small number of excep-
tional S/MARs that are longer or shorter than the observed
median length. Similarly, the size of chromatin loop is re-
ported to vary from 20 to 200 Kb (60,61). Functionally
related genes tend to co-localize on same chromatin loop
to facilitate their expression in a concomitant manner (45).
In the present study, the median length of the chromatin
loop was observed to be 4.923 kb and 94.23% of the identi-
ed chromatin loops have length 31 kb. The dataset also
contain small number of exceptional chromatin loops that
are longer or shorter than the observed median length ac-
counting for the huge standard deviation of 76.35 kb. It
has been reported that the chromatin loop size varies de-
pending upon its position on the chromosome and corre-
lates with size of replicon (62,63). Telomeric regions tend to
have smaller loop size than the ones found away from the
telomeres (64). Size of loops are also hypothesized to inu-
ence the biological state of the cell. Increase in the length
of loops is linked with cellular differentiation whereas its
decrease is associated with proliferation (65). Thus the ob-
served chromatin loop lengths should be considered with a
clear caveat that they can be inuenced by various factors
in dynamic cellular environment.
S/MARs found on different chromosomes have different
structural as well as functional implications. Chromosome
18 and 19 are shown to have differential S/MAR densities
that correlates well with expression prole of genes located
on them (10). In the present study S/MAR density was de-
termined for different chromosomes. Allosomes were ob-
served to have lower S/MAR density as compared to au-
tosomes. The data revealed a positive correlation between
gene density and S/MAR density. It is known that chromo-
somes have preference for nuclear territories (66). It was ob-
served that the chromosomes that occupy central position
in nucleus (chr1, 16, 17 and 19) had higher S/MAR density
than the chromosomes that occupy nuclear periphery (chr2,
4, 13, 18).
Anchorage of S/MARs to nuclear matrix is known to
play a dual role. (i) Structural role to maintain the higher
order chromatin conrmation and (ii) functional role in
regulation of DNA replication and gene expression. The
S/MAR size and loop length are responsible for up-keeping
the structural domains of chromatin. The functional as-
pect of S/MARs can partly be answered on the basis of
the genomic loci they occupy. Recent reports suggest that
S/MARs can inuence transcription by insulating nearby
genes (67,68), thus making them act either as activator or
repressor for the transgene in a context dependent man-
ner (69). Localization of S/MARs in different genomic el-
ements such as promoters, introns and intergenic regions
has been demonstrated earlier (70,71). Differential distri-
bution of S/MARs across various genomic elements, de-
termined in the present study, revealed an inverse correla-
tion between coding regions of genome and the presence
of S/MAR. Thus a majority of S/MARs were present in
the non-coding region of genome indicating their regulatory
functions. Also, S/MARs have been reported to be associ-
ated with the TSS, thereby inuencing the transcription of
12 Nucleic Acids Research, 2019
Figure 8. Correlation between S/MARs and retrovirus integration sites. (A) Distance of HIV integration sites from the nearest upstream and downstream
S/MARs plotted against their count. (B) Distance of HTLV integration sites from the nearest upstream and downstream S/MARs plotted against their
count.
downstream gene (72,73). In agreement with this, a number
of S/MARs identied in the present study overlapped with
TSS of high number of genes which, can be attributed to
their role in transcriptional regulation.
S/MARs are known to physically associate with nu-
clear matrix, a three-dimensional lamentous RNA-protein
meshwork. Therefore, the most direct and legitimate evi-
dence for any sequence to be SMAR is its presence in nu-
clear matrix fraction. The matrix–DNA isolation method
provides complete nucleic acid complement that is in close
physical association with nuclear matrix. Therefore, matrix
DNA-PCR has been used to validate identied S/MARs.
This method is cost and time efcient over other laboratory
methods and allows validation of multiple S/MARs. ChIP-
PCR, S/MARBP-S/MAR co-localization studies and elec-
trophoretic mobility shift assays that can also be used for
validation purpose, need recombinant puried S/MARBPs
and antibodies specic to the S/MARBPs making them
time consuming and inefcient with respect to resources
required. Similarly, the data used as starting point in the
present study is based on ChIP experiments. Therefore, do-
ing similar experiment for validation purpose is redundant.
Retrovirus infection is almost incurable due to stable in-
tegration of viral genome in to host genome. This event in
viral life cycle makes the pathogen unique leading to lifelong
infection escaping the immune system and anti-retroviral
therapy regime. The integration of viral genome to host
genome is known to occur only at the terminal end of vi-
ral DNA, however, for host genome, integration sites can
be random. Decoding if this integration has a preferential
inclination toward any specic site holds a great advantage
in designing effective anti-retroviral therapy. It is believed
that host cis elements and chromosomal topography plays
an invincible role in viral integration and latency. Further,
a large number of genes coding for inammatory cytokines
and transcriptional regulator also get disrupted by viral in-
tegration thereby providing favorable condition for its sur-
vival. S/MARs are predicted to be most potent sites for
retroviral integration due to its structural features such as
DNA bending, topoisomerase sites, DNA hypersensitivity,
AT richness, kinked DNA etc. (17,74–77). Researchers all
Nucleic Acids Research, 2019 13
over world have contradictory assumption and hypothesis
regarding retroviral integration into the host genome. To de-
cipher whether it is a random event or a sequence/topology
associated phenomenon, HIV-1 and HTLV IS archived in
RID database were mapped on to the identied S/MARs.
It was observed that 84 and 72% of the total HIV-1 and
HTLV IS are located within 15 kb distance from their near-
est S/MAR, respectively. Thus, a major fraction of known
IS for these viruses are located within S/MARs and chro-
matin loop regions in its close proximity. In summary, closer
the loci to the S/MARs, higher is the probability of retrovi-
ral integration. A number of reports have shown that HIV-
1 prefers integration at the intronic regions as well as near
highly expressed genes (78). HIV-1 tends to target active
gene for its active transcription and viral propagation. A
number of active genes with S/MAR regions around their
TSS, were also identied in the present study that further
highlights the importance of S/MAR sites in retroviral in-
fection. Thus, HIV and HTLV integration is not a random
event and S/MARs indeed act as hotspots for their integra-
tion into the human genome.
In the light of above observations, our study will facilitate
a better understanding of the genome wide location data for
S/MARs and help unravel the functional aspects of chro-
matin. Understanding of S/MARs as HIV integration site
will greatly facilitate designing therapeutic arsenal against
the latent infection. Targeted genome editing with new ge-
netic engineering tools such as CRISPR/Cas9 can work as
potential therapy against this deadly infection. The ability
of retroviruses to stably integrate into the host genome has
also been harnessed to use them as vehicles for transduction
(79). Insertion of these retroviral vectors at wrong loci has
been associated with activation of proto-oncogenes. In the
view of this fact, a better understanding of the integration
sites will help us in designing a suitable retroviral vector for
treating and targeting various genetic disorders.
Several algorithms have been developed for in silico pre-
dictions of S/MAR elements. However, efcacy and predic-
tive potential of these algorithms have so far been restricted
due to limited number of sequences available for train-
ing the models and lack of features that denes S/MARs
effectively. Our attempt to make a genome-wide map of
S/MARs in human can complement the development of
better performing predictive tool. A collection of experi-
mentally proven S/MARs and nuclear matrix proteins of
various organisms including human is available in the form
of database (S/MAR transaction database, S/MARt DB)
(80). This database however, is published in year 2002, a
year before the release of rst draft of human genome
which itself has now been extensively revised with respect
to sequence information. Therefore, there is a need to re-
visit this problem and develop a database with updated hu-
man S/MAR sequence information. Further such data will
be useful to researchers working in the eld of computa-
tional biology, genomics, functional genomics and virology.
Therefore, the web interface, MARome developed by us will
facilitate such use of data.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGMENTS
Authors thank the Director, Bioinformatics Centre, Savit-
ribai Phule Pune University (SPPU) for providing infras-
tructural facilities. Bioinformatics Centre, Department of
Biotechnology at SPPU and National Centre for Cell Sci-
ence, Pune are The Department of Biotechnology (DBT),
Government of India supported Centres. Authors acknowl-
edge DBT, Government of India.
FUNDING
Departmental Research Development Programme (DRDP)
Grant of Savitribai Phule Pune University, Pune (to
A.K., S.P.K.M.); CSIR-Senior Research Fellowship (to
S.P.); CSIR and UGC Senior Research Fellowship (to
A.A.). Funding for open access charge: Savitribai Phule
Pune University.
Conict of interest statement. None declared.
REFERENCES
1. Heng,H.H.Q. (2004) Chromatin loops are selectively anchored using
scaffold/matrix-attachment regions. J. Cell Sci.,117, 999–1008.
2. Capco,D.G., Wan,K.M., Penman,S., Weber,K., Franke,W.W. and
Fyne,C.-T. (1982) The nuclear matrix: three-dimensional architecture
and protein composition. Cell,29, 847–858.
3. Razin,S. V, Gromova,I.I. and Iarovaia,O. V (1995) Specicity and
functional signicance of DNA interaction with the nuclear matrix:
new approaches to clarify the old questions. Int. Rev. Cytol.,162B,
405–448.
4. Stein,G.S., Zaidi,S.K., Braastad,C.D., Montecino,M., van
Wijnen,A.J., Choi,J.-Y., Stein,J.L., Lian,J.B. and Javed,A. (2003)
Functional architecture of the nucleus: organizing the regulatory
machinery for gene expression, replication and repair. Trends Cell
Biol.,13, 584–592.
5. Breyne,P., van Montagu,M., Depicker,N. and Gheysen,G. (1992)
Characterization of a plant scaffold attachment region in a DNA
fragment that normalizes transgene expression in tobacco. Plant Cell,
4, 463–471.
6. Laemmli,U.K., K¨
as,E., Poljak,L. and Adachi,Y. (1992)
Scaffold-associated regions: cis-acting determinants of chromatin
structural loops and functional domains. Curr. Opin. Genet. Dev.,2,
275–285.
7. Tikhonov,A.P., Bennetzen,J.L. and Avramova,Z. V (2000) Structural
domains and matrix attachment regions along colinear chromosomal
segments of maize and sorghum. Plant Cell,12, 249–264.
8. Singh,G.B., Kramer,J.A. and Krawetz,S.A. (1997) Mathematical
model to predict regions of chromatin attachment to the nuclear
matrix. Nucleic Acids Res.,25, 1419–1425.
9. Van Drunen,C.M., Sewalt,R.G.A.B., Oosterling,R.W., Weisbeek,P.J.,
Smeekens,S.C.M. and Van Driel,R. (1999) A bipartite sequence
element associated with matrix/scaffold attachment regions. Nucleic
Acids Res.,27, 2924–2930.
10. Croft,J.A., Bridger,J.M., Boyle,S., Perry,P., Teague,P. and
Bickmore,W.A. (1999) Differences in the localization and
morphology of chromosomes in the human nucleus. J. Cell Biol.,145,
1119–1131.
11. Allen,G.C., Spiker,S. and Thompson,W.F. (2000) Use of matrix
attachment regions (MARs) to minimize transgene silencing. Plant
Mol. Biol.,43, 361–376.
12. Zhao,C.-P., Guo,X., Chen,S.-J., Li,C.-Z., Yang,Y., Zhang,J.-H.,
Chen,S.-N., Jia,Y.-L. and Wang,T.-Y. (2017) Matrix attachment
region combinations increase transgene expression in transfected
Chinese hamster ovary cells. Sci. Rep.,7, 42805.
13. Vain,P., Worland,B., Kohli,A., Snape,J.W., Christou,P., Allen,G.C.
and Thompson,W.F. (1999) Matrix attachment regions increase
transgene expression levels and stability in transgenic rice plants and
their progeny. Plant J.,18, 233–242.
14 Nucleic Acids Research, 2019
14. Barboro,P., Repaci,E., D’Arrigo,C. and Balbi,C. (2012) The role of
nuclear matrix proteins binding to matrix attachment regions
(MARs) in prostate cancer cell differentiation. PLoS One,7, e40617.
15. Gluch,A., Vidakovic,M. and Bode,J. (2008) Scaffold/Matrix
Attachment Regions (S/MARs): Relevance for Disease and Therapy.
In: Springer, Berlin, Heidelberg, pp. 67–103.
16. Petrov,A., Pirozhkova,I., Carnac,G., Laoudj,D., Lipinski,M. and
Vassetzky,Y.S. (2006) Chromatin loop domain organization within
the 4q35 locus in facioscapulohumeral dystrophy patients versus
normal human myoblasts. Proc. Natl. Acad. Sci. U.S.A.,103,
6982–6987.
17. Johnson,C.N. and Levy,L.S. (2005) Matrix attachment regions as
targets for retroviral integration. Virol. J.,2, 68.
18. Kim,S.W., Yoon,S.-J., Chuong,E., Oyolu,C., Wills,A.E., Gupta,R.
and Baker,J. (2011) Chromatin and transcriptional signatures for
Nodal signaling during endoderm formation in hESCs. Dev. Biol.,
357, 492–504.
19. Do,P.M., Varanasi,L., Fan,S., Li,C., Kubacka,I., Newman,V.,
Chauhan,K., Daniels,S.R., Boccetta,M., Garrett,M.R. et al. (2012)
Mutant p53 cooperates with ETS2 to promote etoposide resistance.
Genes Dev.,26, 830–845.
20. Walsh,C.A., Bolger,J.C., Byrne,C., Cocchiglia,S., Hao,Y., Fagan,A.,
Qin,L., Cahalin,A., McCartan,D., McIlroy,M. et al. (2014) Global
gene repression by the steroid receptor coactivator SRC-1 promotes
oncogenesis. Cancer Res.,74, 2533–2544.
21. Mathai,J., Mittal,S.P.K., Alam,A., Ranade,P., Mogare,D., Patel,S.,
Saxena,S., Ghorai,S., Kulkarni,A.P. and Chattopadhyay,S. (2016)
SMAR1 binds to T(C/G) repeat and inhibits tumor progression by
regulating miR-371-373 cluster. Sci. Rep.,6, 33779.
22. ENCODE Project Consortium, T.E.P. (2012) An integrated
encyclopedia of DNA elements in the human genome. Nature,489,
57–74.
23. Winick-Ng,W., Caetano,F.A., Winick-Ng,J., Morey,T.M., Heit,B.
and Rylett,R.J. (2016) 82-kDa choline acetyltransferase and SATB1
localize to -amyloid induced matrix attachment regions. Sci. Rep.,6,
23914.
24. Patel,R.K. and Jain,M. (2012) NGS QC Toolkit: a toolkit for quality
control of next generation sequencing data. PLoS One,7, e30619.
25. Langmead,B., Trapnell,C., Pop,M. and Salzberg,S.L. (2009) Ultrafast
and memory-efcient alignment of short DNA sequences to the
human genome. Genome Biol.,10, R25.
26. Li,H., Handsaker,B., Wysoker,A., Fennell,T., Ruan,J., Homer,N.,
Marth,G., Abecasis,G. and Durbin,R. (2009) The Sequence
Alignment/Map format and SAMtools. Bioinformatics,25,
2078–2079.
27. Bailey,T.L., Williams,N., Misleh,C. and Li,W.W. (2006) MEME:
discovering and analyzing DNA and protein sequence motifs. Nucleic
Acids Res.,34, W369–W373.
28. Yu,G., Wang,L.-G. and He,Q.-Y. (2015) ChIPseeker: an
R/Bioconductor package for ChIP peak annotation, comparison and
visualization. Bioinformatics,31, 2382–2383.
29. Girod,P.-A., Nguyen,D.-Q., Calabrese,D., Puttini,S., Grandjean,M.,
Martinet,D., Regamey,A., Saugy,D., Beckmann,J.S., Bucher,P. et al.
(2007) Genome-wide prediction of matrix attachment regions that
increase gene expression in mammalian cells. Nat. Methods,4,
747–753.
30. Keaton,M.A., Taylor,C.M., Layer,R.M. and Dutta,A. (2011)
Nuclear scaffold attachment sites within ENCODE regions associate
with actively transcribed genes. PLoS One,6, e17912.
31. Huber,L.J. and Chodosh,L.A. (2005) Dynamics of DNA repair
suggested by the subcellular localization of Brca1 and Brca2 proteins.
J. Cell. Biochem.,96, 47–55.
32. Herrscher,R.F., Kaplan,M.H., Lelsz,D.L., Das,C., Scheuermann,R.
and Tucker,P.W. (1995) The immunoglobulin heavy-chain
matrix-associating regions are bound by Bright: A B cell-specic
trans-activator that describes a new DNA-binding protein family.
Genes Dev.,9, 3067–3082.
33. Chattopadhyay,S., Kaul,R., Charest,A., Housman,D. and Chen,J.
(2000) SMAR1, a novel, alternatively spliced gene product, binds the
scaffold/matrix-associated region at the T cell receptor locus.
Genomics,68, 93–96.
34. van Wijnen,A.J., Bidwell,J.P., Fey,E.G., Penman,S., Lian,J.B.,
Stein,J.L. and Stein,G.S. (1993) Nuclear matrix association of
multiple sequence-specic DNA binding activities related to SP-1,
ATF, CCAAT, C/EBP, OCT-1, and AP-1. Biochemistry,32,
8397–8402.
35. Maksimenko,O., Gasanov,N.B. and Georgiev,P. (2015) Regulatory
elements in vectors for efcient generation of cell lines producing
target proteins. Acta Nat.,7, 15–26.
36. Chattopadhyay,S., Whitehurst,C.E. and Chen,J. (1998) A nuclear
matrix attachment region upstream of the T cell receptor gene
enhancer binds Cux/CDP and SATB1 and modulates
enhancer-dependent reporter gene expression but not endogenous
gene expression. J. Biol. Chem.,273, 29838–29846.
37. Yusufzai,T.M. and Felsenfeld,G. (2004) The 5-HS4 chicken -globin
insulator is a CTCF-dependent nuclear matrix-associated element.
Proc. Natl. Acad. Sci. U.S.A.,101, 8620–8624.
38. Dunn,K.L., Zhao,H. and Davie,J.R. (2003) The insulator binding
protein CTCF associates with the nuclear matrix. Exp. Cell Res.,288,
218–223.
39. Galande,S. and Kohwi-Shigematsu,T. (1999) Poly(ADP-ribose)
polymerase and Ku autoantigen form a complex and synergistically
bind to matrix attachment sequences. J. Biol. Chem.,274,
20521–20528.
40. Torrungruang,K., Alvarez,M., Shah,R., Onyia,J.E., Rhodes,S.J. and
Bidwell,J.P. (2002) DNA binding and gene activation properties of the
Nmp4 nuclear matrix transcription factors. J. Biol. Chem.,277,
16153–16159.
41. Will,K., Warnecke,G., Wiesm ¨uller,L. and Deppert,W. (1998) Specic
interaction of mutant p53 with regions of matrix attachment region
DNA elements (MARs) with a high potential for base-unpairing.
Proc. Natl. Acad. Sci. U.S.A.,95, 13681–13686.
42. G¨
ohring,F. and Fackelmayer,F.O. (1997) The scaffold/matrix
attachment region binding protein hnRNP-U (SAF-A) is directly
bound to chromosomal DNA in vivo: a chemical cross-linking study.
Biochemistry,36, 8276–8283.
43. Mittal,S.P.K., Mathai,J., Kulkarni,A.P., Pal,J.K. and
Chattopadhyay,S. (2013) miR-320a regulates erythroid differentiation
through MAR binding protein SMAR1. Int. J. Biochem. Cell Biol.,
45, 2519–2529.
44. Sinha,S., Malonia,S.K., Mittal,S.P.K., Mathai,J., Pal,J.K. and
Chattopadhyay,S. (2012) Chromatin remodelling protein SMAR1
inhibits p53 dependent transactivation by regulating acetyl
transferase p300. Int. J. Biochem. Cell Biol.,44, 46–52.
45. Sinha,S., Malonia,S.K., Mittal,S.P.K.K., Singh,K., Kadreppa,S.,
Kamat,R., Mukhopadhyaya,R., Pal,J.K. and Chattopadhyay,S.
(2010) Coordinated regulation of p53 apoptotic targets BAX and
PUMA by SMAR1 through an identical MAR element. EMBO J.,
29, 830–842.
46. Rampalli,S., Pavithra,L., Bhatt,A., Kundu,T.K. and
Chattopadhyay,S. (2005) Tumor suppressor SMAR1 mediates cyclin
D1 repression by recruitment of the SIN3 /histone deacetylase 1
complex. Mol. Cell. Biol.,25, 8415–8429.
47. Singh,K., Sinha,S., Malonia,S.K., Bist,P., Tergaonkar,V. and
Chattopadhyay,S. (2009) Tumor suppressor SMAR1 represses IB
expression and inhibits p65 transactivation through matrix
attachment regions. J. Biol. Chem.,284, 1267–1278.
48. Chemmannur,S. V., Badhwar,A.J., Mirlekar,B., Malonia,S.K.,
Gupta,M., Wadhwa,N., Bopanna,R., Mabalirajan,U., Majumdar,S.,
Ghosh,B. et al. (2015) Nuclear matrix binding protein SMAR1
regulates T-cell differentiation and allergic airway disease. Mucosal.
Immunol.,8, 1201–1211.
49. Song,G., Liu,K., Yang,X., Mu,B., Yang,J., He,L., Hu,X., Li,Q.,
Zhao,Y., Cai,X. et al. (2017) SATB1 plays an oncogenic role in
esophageal cancer by up- regulation of FN1 and PDGFRB.
Oncotarget,8, 17771–17784.
50. Travers,A.A. (1995) Reading the minor groove. Nat. Struct. Biol.,2,
615–618.
51. Steitz,T.A. (1990) Structural studies of protein-nucleic acid
interaction: the sources of sequence-specic binding. Q. Rev.
Biophys.,23, 205–280.
52. Zink,D., Fische,A.H., Nickerson,J.A., Lozano,M., Kobayashi,R.,
Ross,S., Dudley,J., Romeyn,L. and Copeland,N. (2004) Nuclear
structure in cancer cells. Nat. Rev. Cancer,4, 677–687.
53. Han,W., Lindsay,S.M., Dlakic,M. and Harrington,R.E. (1997)
Kinked DNA. Nature,386, 563–563.
Nucleic Acids Research, 2019 15
54. Singh,R.K., Sasikala,W.D. and Mukherjee,A. (2015) Molecular
origin of DNA kinking by transcription factors. J. Phys. Chem. B,
119, 11590–11596.
55. Chen,C.-Y., Ko,T.-P., Lin,T.-W., Chou,C.-C., Chen,C.-J. and
Wang,A.H.-J. (2005) Probing the DNA kink structure induced by the
hyperthermophilic chromosomal protein Sac7d. Nucleic Acids Res.,
33, 430–438.
56. Benham,C., Kohwi-Shigematsu,T. and Bode,J. (1997) Stress-induced
duplex DNA destabilization in scaffold/matrix attachment regions. J.
Mol. Biol.,274, 181–196.
57. Boulikas,T. (1993) Nature of DNA sequences at the attachment
regions of genes to the nuclear matrix. J. Cell. Biochem.,52, 14–22.
58. Shaposhnikov,S.A., Akopov,S.B., Chernov,I.P., Thomsen,P.D.,
Joergensen,C., Collins,A.R., Frengen,E. and Nikolaev,L.G. (2007) A
map of nuclear matrix attachment regions within the breast cancer
loss-of-heterozygosity region on human chromosome 16q22.1.
Genomics,89, 354–361.
59. Frisch,M., Frech,K., Klingenhoff,A., Cartharius,K., Liebich,I. and
Werner,T. (2002) In silico prediction of scaffold/matrix attachment
regions in large genomic sequences. Genome Res.,12, 349–354.
60. Razin,S. V (1999) Chromosomal DNA loops may constitute basic
units of the eukaryotic genome organization and evolution. Crit. Rev.
Eukaryot. Gene Expr.,9, 279–283.
61. Jackson,D.A., Dickinson,P. and Cook,P.R. (1990) The size of
chromatin loops in HeLa cells. EMBO J.,9, 567–571.
62. Marilley,M. and Gassend-Bonnet,G. (1989) Supercoiled loop
organization of genomic DNA: a close relationship between loop
domains, expression units, and replicon organization in rDNA from
Xenopus laevis. Exp. Cell Res.,180, 475–489.
63. Buongiorno-Nardelli,M., Micheli,G., Carri,M.T. and Marilley,M.
(1982) A relationship between replicon size and supercoiled loop
domains in the eukaryotic genome. Nature,298, 100–102.
64. Heng,H.H., Krawetz,S.A., Lu,W., Bremer,S., Liu,G. and Ye,C.J.
(2001) Re-dening the chromatin loop domain. Cytogenet. Cell
Genet.,93, 155–161.
65. Vassetzky,Y.S., Hair,A. and Razin,S. V (2000) Rearrangement of
chromatin domains in cancer and development. J. Cell. Biochem.
Suppl.,(Suppl. 35), 54–60.
66. Boyle,S., Gilchrist,S., Bridger,J.M., Mahy,N.L., Ellis,J.A. and
Bickmore,W.A. (2001) The spatial organization of human
chromosomes within the nuclei of normal and emerin-mutant cells.
Hum. Mol. Genet.,10, 211–219.
67. Bushey,A.M., Dorman,E.R. and Corces,V.G. (2008) Chromatin
insulators: regulatory mechanisms and epigenetic inheritance. Mol.
Cell,32, 1–9.
68. Namciu,S.J. and Fournier,R.E.K. (2004) Human matrix attachment
regions are necessary for the establishment but not the maintenance
of transgene insulation in Drosophila melanogaster. Mol. Cell. Biol.,
24, 10236–10245.
69. Brouwer,C., Bruce,W., Maddock,S., Avramova,Z. and Bowen,B.
(2002) Suppression of transgene silencing by matrix attachment
regions in maize: a dual role for the maize 5ADH1 matrix
attachment region. Plant Cell,14, 2251–2264.
70. Pascuzzi,P.E., Flores-Vergara,M.A., Lee,T.-J., Sosinski,B.,
Vaughn,M.W., Hanley-Bowdoin,L., Thompson,W.F. and Allen,G.C.
(2014) In vivo mapping of arabidopsis scaffold/matrix attachment
regions reveals link to nucleosome-disfavoring poly(dA:dT) tracts.
Plant Cell,26, 102–120.
71. Chattopadhyay,S. and Pavithra,L. (2007) MARs and MARBPs.
Chromatin Dis.,41, 213–230.
72. Liebich,I., Bode,J., Reuter,I. and Wingender,E. (2002) Evaluation of
sequence motifs found in scaffold/matrix-attached regions
(S/MARs). Nucleic Acids Res.,30, 3433–3442.
73. Pathak,R.U., Srinivasan,A. and Mishra,R.K. (2014) Genome-wide
mapping of matrix attachment regions in Drosophila melanogaster.
BMC Genomics,15, 1022.
74. Mielke,C., Maass,K., T ¨ummler,M. and Bode,J. (1996) Anatomy of
highly expressing chromosomal sites targeted by retroviral vectors.
Biochemistry,35, 2239–2252.
75. D’ugo,E., Bruni,R., Argentini,C., Giuseppetti,R. and Rapicetta,M.
(1998) Identication of scaffold/matrix attachment region in
recurrent site of woodchuck hepatitis virus integration. DNA Cell
Biol.,17, 519–527.
76. Shera,K.A., Shera,C.A. and James,K. (2001) Small tumor virus
genomes are integrated near nuclear matrix attachment regions in
transformed cells. J. Vi rol. ,75, 12339–12346.
77. Kulkarni,A., Pavithra,L., Rampalli,S., Mogare,D., Babu,K.,
Shiekh,G., Ghosh,S. and Chattopadhyay,S. (2004) HIV-1 integration
sites are anked by potential MARs that alone can act as promoters.
Biochem. Biophys. Res. Commun.,322, 672–677.
78. Craigie,R. and Bushman,F.D. (2012) HIV DNA integration. Cold
Spring Harb. Perspect. Med.,2, a006890.
79. Barquinero,J., Eixarch,H. and P´
erez-Melgosa,M. (2004) Retroviral
vectors: new applications for an old tool. Gene Ther.,11, S3–S9.
80. Liebich,I., Bode,J., Frisch,M. and Wingender,E. (2002) S/MARt DB:
a database on scaffold/matrix attached regions. Nucleic Acids Res.,
30, 372–374.
... The number of replication origins shows a clear enrichment in the vicinity of loop anchors marked with a vertical dashed line; however, the number of origins drops significantly, approaching the loop center. We also found a weak association between scaffold/nuclear matrix attached regions (S/MARs), gathered in the MARome database [62], and replication origins which were previously reported to have a possible association [26]. We also found mORI to be overrepresented at topologically associating domain (TAD) borders and underrepresented in TAD middle, based on ENCODE data [63]. ...
... txt. gzChromatin loops Chromatin loop data were obtained from the GEO database (ID: GSE63525) for K562 cells[61].S/MARsLocations of scaffold/nuclear matrix attached regions (S/MARs) for hg19 were downloaded from the MARome database in a BED file format[62] TADs Genomic coordinates (hg19) of TADs mapped in 8 cell lines we downloaded from the ENCODE project website at https:// www. encod eproj ect. ...
Article
Full-text available
Background Despite the process of DNA replication being mechanistically highly conserved, the location of origins of replication (ORI) may vary from one tissue to the next, or between rounds of replication in eukaryotes, suggesting flexibility in the choice of locations to initiate replication. Lists of human ORI therefore vary widely in number and location, and there are currently no methods available to compare them. Here, we propose a method of detection of ORI based on somatic mutation patterns generated by the mutator phenotype of damaged DNA polymerase epsilon (POLE). Results We report the genome-wide localization of constitutive ORI in POLE-mutated human tumors using whole genome sequencing data. Mutations accumulated after many rounds of replication of unsynchronized dividing cell populations in tumors allow to identify constitutive origins, which we show are shared with high fidelity between individuals and tumor types. Using a Smith–Waterman-like dynamic programming approach, we compared replication origin positions obtained from multiple different methods. The comparison allowed us to define a consensus set of replication origins, identified consistently by multiple ORI detection methods. Many DNA features co-localized with the consensus set of ORI, including chromatin loop anchors, G-quadruplexes, S/MARs, and CpGs. Among all features, the H2A.Z histone exhibited the most significant association. Conclusions Our results show that mutation-based detection of replication origins is a viable approach to determining their location and associated sequence features.
... S/MARs are signatured by AT-or TG-rich sequences and curved DNA, with a median length ≤2 kb, usually possessing origin of replication (OriC) features. The median length of the formed chromatin loops by S/MARs is ≤31 kb [41][42][43], a considerably smaller scale than TADs, particularly of the HBB locus, which is composed of almost 300 kb [44]. Lengthwise, within the human HBB locus, four MAR elements are predicted [45] and suggested to cooperate at several developmental stages of the fetus by recruiting in a preset order: HBE, HBG2 and HBG1 and finally HBB promoters to LCR, leading to a dominant expression of a different hemoglobin at each stage. ...
... Looping models between LCR and globin genes' promoters have long been proposed and further expanded under new enlightened data derived from current techniques [24,46]. The proposed looping model derived from the present study ( Figure 4) is in consistency with recently published models and combines all basic factors influencing HbF transversion to HbA [24,43]. Additionally, this clarifies the characterization of LRF/ZBTB7A as an indirect epigenetic modifier factor with an evolving role in lncRNAs' regulation of expression. ...
Article
Full-text available
The hemoglobin switch from fetal (HbF) to adult (HbA) has been studied intensively as an essential model for gene expression regulation, but also as a beneficial therapeutic approach for β-hemoglobinopathies, towards the objective of reactivating HbF. The transcription factor LRF (Leukemia/lymphoma-related), encoded from the ZBTB7A gene has been implicated in fetal hemoglobin silencing, though has a wide range of functions that have not been fully clarified. We thus established the LRF/ZBTB7A-overexpressing and ZBTB7A-knockdown K562 (human erythroleukemia cell line) clones to assess fetal vs. adult hemoglobin production pre- and post-induction. Transgenic K562 clones were further developed and studied under the influence of epigenetic chromatin regulators, such as DNA methyl transferase 3 (DNMT3) and Histone Deacetylase 1 (HDAC1), to evaluate LRF’s potential disturbance upon the aberrant epigenetic background and provide valuable information of the preferable epigenetic frame, in which LRF unfolds its action on the β-type globin’s expression. The ChIP-seq analysis demonstrated that LRF binds to γ-globin genes (HBG2/1) and apparently associates BCL11A for their silencing, but also during erythropoiesis induction, LRF binds the BGLT3 gene, promoting BGLT3-lncRNA production through the γ-δ intergenic region of β-type globin’s locus, triggering the transcriptional events from γ- to β-globin switch. Our findings are supported by an up-to-date looping model, which highlights chromatin alterations during erythropoiesis at late stages of gestation, to establish an “open” chromatin conformation across the γ-δ intergenic region and accomplish β-globin expression and hemoglobin switch.
... During spermiogenesis, most of these histones are replaced by protamines, which is followed by the disassembly of nucleosomes and a specialized compaction of chromatin through toroids (Ward and Coffey, 1990; Barral et al., 2017) (Fig. 2). Toroid structures are anchored to the proteinaceous nuclear matrix, known as matrix attachment regions (Narwade et al., 2019). Unlike histones, protamines contain arginine to allow for stronger binding to DNA to form a toroid-like ridged structure (Ward and Coffey, 1991;DeRouchey et al., 2013). ...
Article
Background: It has long been thought that the factors affecting embryo and foetal development were exclusively maternally derived; hence, if issues regarding fertility and embryo development were to arise, the blame has traditionally been placed solely on the mother. An escalating interest in how paternal factors influence embryo development, however, has begun to prove otherwise. Evidence suggests that both seminal plasma (SP) and sperm contribute multiple factors that shape embryogenesis. This review thus focuses on the role that semen has in driving early embryonic development, and describes how paternal factors, such as SP, sperm centriole, sperm proteins, sperm RNA, sperm DNA, and its integrity, together with epigenetics, may influence the female reproductive tract and post-fertilization events. The important contributions of paternal factors to embryo development highlight the imperative need for further research in this area, which is sure to bring forth breakthroughs leading to improvements in infertility diagnosis and ART as well as reducing the risk of miscarriage. Objective and rationale: This review provides a comprehensive overview of the role of human semen in development of the early embryo, with the aim of providing a better understanding of the influence of SP and sperm on early embryonic divisions, gene and protein expression, miscarriage, and congenital diseases. Search methods: PubMed searches were performed using the terms 'sperm structure', 'capacitation', 'acrosome reaction', 'fertilization', 'oocyte activation', 'PLCζ', 'PAWP', 'sperm-borne oocyte activation factor', 'oocyte activation deficiency', 'sperm centriole', 'sperm transport', 'sperm mitochondria', 'seminal plasma', 'sperm epigenetics', 'sperm histone modifications', 'sperm DNA methylation', 'sperm-derived transcripts', 'sperm-derived proteins', 'sperm DNA fragmentation', 'sperm mRNA', 'sperm miRNAs', 'sperm piRNAs', and 'sperm-derived aneuploidy'. The reviewed articles were restricted to those published in English between 1980 and 2022. Outcomes: The data suggest that male-derived factors contribute much more than just the male haploid genome to the early embryo. Evidence indicates that semen contributes multiple factors that help shape the fate of embryogenesis. These male-derived factors include contributions from SP, the paternal centriole, RNA and proteins, and DNA integrity. In addition, epigenetic changes have an impact on the female reproductive tract, fertilization, and early stages of embryo development. For example, recent proteomic and transcriptomic studies have identified several sperm-borne markers that play important roles in oocyte fertilization and embryogenesis. Wider implications: This review highlights that several male-derived factors are required to work in tandem with female counterparts to allow for correct fertilization and development of the early embryo. A deeper understanding of the contributions of paternal factors that are shuttled over from the sperm cell to the embryo can shed light on how to improve ART from an andrological perspective. Further studies may aid in preventing the passing on of genetic and epigenetic abnormalities of paternal origin, thus decreasing the incidence of male factor infertility. In addition, understanding the exact mechanisms of paternal contribution may assist reproductive scientists and IVF clinicians in determining new causes of recurrent early miscarriage or fertilization failure.
... Thus, MARs are speculated to be highly cell context-dependent (Dobson et al. 2017). ChIP-seq of 14 S/MAR binding proteins from the human NuMat have shed light on molecular signatures of S/MARs and are implicated as hotspots for integrating retroviruses (Narwade et al. 2019). S/ MARs have also found applications in minimizing the effect of transgene silencing (Allen et al. 2000). ...
Article
Full-text available
The nucleus is a complex organelle that hosts the genome and is essential for vital processes like DNA replication, DNA repair, transcription, and splicing. The genome is non-randomly organized in the three-dimensional space of the nucleus. This functional sub-compartmentalization was thought to be organized on the framework of nuclear matrix (NuMat), a non-chromatin scaffold that functions as a substratum for various molecular processes of the nucleus. More recently, nuclear bodies or membrane-less subcompartments of the nucleus are thought to arise due to phase separation of chromatin, RNA, and proteins. The nuclear architecture is an amalgamation of the relative organization of chromatin, epigenetic landscape, the nuclear bodies, and the nucleoskeleton in the three-dimensional space of the nucleus. During mitosis, the nucleus undergoes drastic changes in morphology to the degree that it ceases to exist as such; various nuclear components, including the envelope that defines the nucleus, disintegrate, and the chromatin acquires mitosis-specific epigenetic marks and condenses to form chromosome. Upon mitotic exit, chromosomes are decondensed, re-establish hierarchical genome organization, and regain epigenetic and transcriptional status similar to that of the mother cell. How this mitotic memory is inherited during cell division remains a puzzle. NuMat components that are a part of the mitotic chromosome in the form of mitotic chromosome scaffold (MiCS) could potentially be the seeds that guide the relative re-establishment of the epigenome, chromosome territories, and the nuclear bodies. Here, we synthesize the advances towards understanding cellular memory of nuclear architecture across mitosis and propose a hypothesis that a subset of NuMat proteome essential for nucleation of various nuclear bodies are retained in MiCS to serve as seeds of mitotic memory, thus ensuring the daughter cells re-establish the complex status of nuclear architecture similar to that of the mother cells, thereby maintaining the pre-mitotic transcriptional status.
... As expected, by their SAFB binding, these cobound loci demonstrated characteristic sequence features of S/MAR regions, such as AT-proportion, topoisomerase II binding, and kinked or curved DNA, 27,28 with cobinding demonstrated at or near the classic S/MAR regions of the MYC, 29 TOPI, TOP2A, 30 and β-GLOBIN 31 genes (supplemental Figure 5A-B). ...
Article
Full-text available
HOXA9 is commonly upregulated in acute myeloid leukemia (AML), where it confers poor prognosis. Characterising the protein interactome of endogenous HOXA9 in human AML, we identified a chromatin complex of HOXA9 with the nuclear matrix attachment protein-SAFB. SAFB perturbation phenocopied HOXA9 knockout to decrease AML proliferation, increase differentiation and apoptosis in vitro and prolonged survival in vivo. Integrated genomic, transcriptomic and proteomic analyses further demonstrated that the HOXA9-SAFB-chromatin complex associates with NuRD and HP1g to repress the expression of factors associated with differentiation and apoptosis, including NOTCH1, CEBPd, S100A8, and CDKN1A. Chemical or genetic perturbation of NuRD and HP1g -associated catalytic activity also triggered differentiation, apoptosis and the induction of these tumor-suppressive genes. Importantly, this mechanism is operative in other HOXA9-dependent AML genotypes. This mechanistic insight demonstrates active HOXA9-dependent differentiation block as a potent mechanism of disease maintenance in AML, that may be amenable to therapeutic intervention via therapies targeting the HOXA9/SAFB interface and/or NuRD and HP1g activity.
... Also, they present highly dynamic properties enabling chromatin to anchor different domains to the nuclear matrix, thus promoting physical interactions between distant genomic regions that may be located apart in the genetic sequence, for instance, bringing to close proximity distant gene promoters and their target genes, or generating gene clusters with related functions. Distal interactions lead to the generation of active or inactive foci that are dependent on the cell type and the phase of the cell cycle (Narwade et al. 2019;Linnemann et al. 2009). These interactions are important to understand gene expression clusters, and numerous studies are performed using advanced methods such as Hi-C (Miko et al. 2021;Yang et al. 2021). ...
Sperm nuclei present a highly organized and condensed chromatin due to the interchange of histones by protamines during spermiogenesis. This high DNA condensation leads to almost inert chromatin, with the impossibility of conducting gene transcription as in most other somatic cells. The major chromosomal structure responsible for DNA condensation is the formation of protamine-DNA toroids containing 25–50 kilobases of DNA. These toroids are connected by toroid linker regions (TLR), which attach them to the nuclear matrix, as matrix attachment regions (MAR) do in somatic cells. Despite this high degree of condensation, evidence shows that sperm chromatin contains vulnerable elements that can be degraded even in fully condensed chromatin, which may correspond to chromatin regions that transfer functionality to the zygote at fertilization. This chapter covers an updated review of our model for sperm chromatin structure and its potential functional elements that affect embryo development.KeywordsSpermChromatin condensationToroid linker regionMatrix attachment regionDouble-stranded DNA breaks
... In spite of knowing these properties of MARs, the identities of MARs at whole genome level remain unexplored, and thus their role in gene regulation is not completely understood. With recent advances of NGS techniques a few attempts have been made to identify MARs at the genome-wide scale [10,11]. However, none of the studies addressed the dynamics of MARs during development and differentiation in any model organism. ...
Article
Full-text available
Background Eukaryotic genome is compartmentalized into structural and functional domains. One of the concepts of higher order organization of chromatin posits that the DNA is organized in constrained loops that behave as independent functional domains. Nuclear Matrix (NuMat), a ribo-proteinaceous nucleoskeleton, provides the structural basis for this organization. DNA sequences located at base of the loops are known as the M atrix A ttachment R egions (MARs). NuMat relates to multiple nuclear processes and is partly cell type specific in composition. It is a biochemically defined structure and several protocols have been used to isolate the NuMat where some of the steps have been critically evaluated. These sequences play an important role in genomic organization it is imperative to know their dynamics during development and differentiation. Results Here we look into the dynamics of MARs when the preparation process is varied and during embryonic development of D. melanogaster . A subset of MARs termed as “Core-MARs” present abundantly in pericentromeric heterochromatin, are constant unalterable anchor points as they associate with NuMat through embryonic development and are independent of the isolation procedure. Euchromatic MARs are dynamic and reflect the transcriptomic profile of the cell. New MARs are generated by nuclear stabilization, and during development, mostly at paused RNA polymerase II promoters. Paused Pol II MARs depend on RNA transcripts for NuMat association. Conclusions Our data reveals the role of MARs in functionally dynamic nucleus and contributes to the current understanding of nuclear architecture in genomic context.
Preprint
The nuclear matrix is a nuclear compartment that has diverse functions in chromatin regulation and transcription. However, how this structure influences epigenetic modifications and gene expression in plants is largely unknown. In this study, we showed that a nuclear matrix binding protein, AHL22, together with the two transcriptional repressors FRS7 and FRS12, regulates hypocotyl elongation by suppressing the expression of a group of genes known as SMALL AUXIN UP RNAs (SAURs) in Arabidopsis thaliana. The transcriptional repression of SAURs depends on their attachment to the nuclear matrix. The AHL22 complex not only brings these SAURs, which contained matrix attachment regions (MARs), to the nuclear matrix, but it also recruits the histone deacetylase HDA15 to the SAUR loci. This leads to the removal of H3 acetylation at the SAUR loci and the suppression of hypocotyl elongation. Taken together, our results indicate that MAR-binding proteins act as a hub for chromatin and epigenetic regulators. Moreover, we present a novel mechanism by which nuclear matrix attachment to chromatin regulates histone modifications, transcription, and hypocotyl elongation.
Article
Fourier transform infrared (FTIR) spectroscopy is a rapid, non-destructive and label-free technique for identifying subtle changes in all bio-macromolecules, and has been used as a method of choice for studying DNA conformation, secondary DNA structure transition and DNA damage. In addition, the specific level of chromatin complexity is introduced via epigenetic modifications forcing the technological upgrade in the analysis of such an intricacy. As the most studied epigenetic mechanism, DNA methylation is a major regulator of transcriptional activity, involved in the suppression of a broad spectrum of genes and its deregulation is involved in all non-communicable diseases. The present study was designed to explore the use of synchrotron-based FTIR analysis to monitor the subtle changes in molecule bases regarding the DNA methylation status of cytosine in the whole genome. In order to reveal the conformation-related best sample for FTIR-based DNA methylation analysis in situ, we used methodology for nuclear HALO preparations and slightly modified it to isolated DNA in HALO formations. Nuclear DNA-HALOs represent samples with preserved higher-order chromatin structure liberated of any protein residues that are closer to native DNA conformation than genomic DNA (gDNA) isolated by the standard batch procedure. Using FTIR spectroscopy we analyzed the DNA methylation profile of isolated gDNA and compared it with the DNA-HALOs. This study demonstrated the potential of FTIR microspectroscopy to detect DNA methylation marks in analyzed DNA-HALO specimens more precisely in comparison with classical DNA extraction procedures that yield unstructured whole genomic DNA. In addition, we used different cell types to assess their global DNA methylation profile, as well as defined specific infrared peaks that can be used for screening DNA methylation.
Article
USH2A mutations are a common cause of autosomal recessive retinitis pigmentosa (RP) and Usher syndrome, for which there are currently no approved treatments. Gene augmentation is a valuable therapeutic strategy for treating many inherited retinal diseases, however conventional adeno-associated virus (AAV) gene therapy cannot accommodate cDNAs exceeding 4.7kb, such as the 15.6kb-long USH2A coding sequence. In the present study, we adopted an alternative strategy to successfully generate scaffold/matrix attachment region (S/MAR) DNA plasmid vectors containing the full-length human USH2A coding sequence, a GFP reporter gene and a ubiquitous promoter (CMV or CAG), reaching a size of approximately 23kb. We assessed the vectors in transfected HEK-293 cells and USH2A patient-derived dermal fibroblasts, in addition to ush2au507 zebrafish microinjected with the vector at the one-cell stage. pS/MAR-USH2A vectors drove persistent transgene expression in patient fibroblasts with restoration of usherin. Twelve months of GFP expression was detected in the photoreceptor cells, with rescue of Usher 2 complex localisation in the photoreceptors of ush2au507 zebrafish retina injected with pS/MAR-USH2A. To our knowledge, this is the first reported vector which can be used to express full-length usherin with functional rescue. S/MAR DNA vectors have shown promise as a novel non-viral retinal gene therapy, warranting further translational development.
Article
Full-text available
Matrix attachment regions (MARs) are cis-acting DNA elements that can increase transgene expression levels in a CHO cell expression system. To investigate the effects of MAR combinations on transgene expression and the underlying regulatory mechanisms, we generated constructs in which the enhanced green fluorescent protein (eGFP) gene flanked by different combinations of human β-interferon and β-globin MAR (iMAR and gMAR, respectively), which was driven by the cytomegalovirus (CMV) or simian virus (SV) 40 promoter. These were transfected into CHO-K1 cells, which were screened with geneticin; eGFP expression was detected by flow cytometry. The presence of MAR elements increased transfection efficiency and transient and stably expression of eGFP expression under both promoters; the level was higher when the two MARs differed (i.e., iMAR and gMAR) under the CMV but not the SV40 promoter. For the latter, two gMARs showed the highest activity. We also found that MARs increased the ratio of stably transfected positive colonies. These results indicate that combining the CMV promoter with two different MAR elements or the SV40 promoter with two gMARs is effective for inducing high expression level and stability of transgenes.
Article
Full-text available
Esophageal cancer is a highly aggressive malignancy with very poor overall prognosis. Given the strong clinical relevance of SATB1 in esophagus cancer and other cancers suggested by previous studies, the exact function of SATB1 in esophagus cancer development is still unknown. Here we showed that the knockdown of SATB1 in esophageal cancer cell lines diminished the cell proliferation, survival and invasion. Whole genome transcriptome analysis of SATB1 knockdown cells revealed the different gene expression profiles between TE-1 cells and MDA-MB-231 cells. Network analysis and functional experiments further identified FN1 and PDGFRB to be key downstream genes regulated by SATB1 in esophageal cancer cells. Importantly, FN1 and PDGFRB were found to be highly expressed in human esophageal cancer. In summary, we provided the first molecular evidence that SATB1 played an oncogenic role in esophageal cancer by up-regulation of FN1 and PDGFRB.
Article
Full-text available
Chromatin architecture and dynamics are regulated by various histone and non-histone proteins. The matrix attachment region binding proteins (MARBPs) play a central role in chromatin organization and function through numerous regulatory proteins. In the present study, we demonstrate that nuclear matrix protein SMAR1 orchestrates global gene regulation as determined by massively parallel ChIP-sequencing. The study revealed that SMAR1 binds to T(C/G) repeat and targets genes involved in diverse biological pathways. We observe that SMAR1 binds and targets distinctly different genes based on the availability of p53. Our data suggest that SMAR1 binds and regulates one of the imperative microRNA clusters in cancer and metastasis, miR-371-373. It negatively regulates miR-371-373 transcription as confirmed by SMAR1 overexpression and knockdown studies. Further, deletion studies indicate that a ~200?bp region in the miR-371-373 promoter is necessary for SMAR1 binding and transcriptional repression. Recruitment of HDAC1/mSin3A complex by SMAR1, concomitant with alteration of histone marks results in downregulation of the miRNA cluster. The regulation of miR-371-373 by SMAR1 inhibits breast cancer tumorigenesis and metastasis as determined by in vivo experiments. Overall, our study highlights the binding of SMAR1 to T(C/G) repeat and its role in cancer through miR-371-373.
Article
Full-text available
The M-transcript of human choline acetyltransferase (ChAT) produces an 82-kDa protein (82-kDa ChAT) that concentrates in nuclei of cholinergic neurons. We assessed the effects of acute exposure to oligomeric amyloid-β1–42 (Aβ1–42) on 82-kDa ChAT disposition in SH-SY5Y neural cells, finding that acute exposure to Aβ1–42 results in increased association of 82-kDa ChAT with chromatin and formation of 82-kDa ChAT aggregates in nuclei. When measured by chromatin immunoprecipitation with next-generation sequencing (ChIP-seq), we identified that Aβ1–42 -exposure increases 82-kDa ChAT association with gene promoters and introns. The Aβ1–42 -induced 82-kDa ChAT aggregates co-localize with special AT-rich binding protein 1 (SATB1), which anchors DNA to scaffolding/matrix attachment regions (S/MARs). SATB1 had a similar genomic association as 82-kDa ChAT, with both proteins associating with synapse and cell stress genes. After Aβ1–42 -exposure, both SATB1 and 82-kDa ChAT are enriched at the same S/MAR on the APP gene, with 82-kDa ChAT expression attenuating an increase in an isoform-specific APP mRNA transcript. Finally, 82-kDa ChAT and SATB1 have patterned genomic association at regions enriched with S/MAR binding motifs. These results demonstrate that 82-kDa ChAT and SATB1 play critical roles in the response of neural cells to acute Aβ -exposure.
Article
Full-text available
To date, there has been an increasing number of drugs produced in mammalian cell cultures. In order to enhance the expression level and stability of target recombinant proteins in cell cultures, various regulatory elements with poorly studied mechanisms of action are used. In this review, we summarize and discuss the potential mechanisms of action of such regulatory elements.
Article
Full-text available
Although a gene’s location can greatly influence its expression, genome sequencing has shown that orthologous genes may exist in very different environments in the genomes of closely related species. Four genes in the maize alcohol dehydrogenase (adh1) region represent solitary genes dispersed among large repetitive blocks, whereas the orthologous genes in sorghum are located in a different setting surrounded by low-copy-number DNAs. A specific class of DNA sequences, matrix attachment regions (MARs), was found to be in comparable positions in the two species, often flanking individual genes. If these MARs define structural domains, then the orthologous genes in maize and sorghum should experience similar chromatin environments. In addition, MARs were divided into two groups, based on the competitive affinity of their association with the matrix. The “durable” MARs retained matrix associations at the highest concentrations of competitor DNA. Most of the durable MARs mapped outside genes, defining the borders of putative chromatin loops. The “unstable” MARs lost their association with the matrix under similar competitor conditions and mapped mainly within introns. These results suggest that MARs possess both domain-defining and regulatory roles. Miniature inverted repeat transposable elements (MITEs) often were found on the same fragments as the MARs. Our studies showed that many MITEs can bind to isolated nuclear matrices, suggesting that MITEs may function as MARs in vivo.
Article
Both the accomplishment of developmental programs and neoplastic transformation are linked to changes in the long‐range organization of chromatin, in particular, DNA loop domains. The development of new methods that allow the study of interactions between the bases of DNA loops and the proteins of the nuclear matrix will help our understanding of the molecular mechanisms in such changes. These methods should also allow the establishment of a fingerprint “signature” for many cancers that may serve for diagnostic purposes. J. Cell. Biochem. Suppl. 35:54–60, 2000. © 2001 Wiley‐Liss, Inc.
Article
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Article
Binding of transcription factor (TF) proteins with DNA may cause severe kinks in the latter. Here, we investigate the molecular origin of the DNA kinks observed in the TF-DNA complexes using small molecule intercalation pathway, crystallographic analysis, and free energy calculations involving four different transcription factor (TF) protein-DNA complexes. We find that although protein binding may bend the DNA, bending alone is not sufficient to kink the DNA. We show that partial, not complete, intercalation is required to form the kink at a particular place in the DNA. It turns out that while amino acid alone can induce the desired kink through partial intercalation, protein provides thermodynamic stabilization of the kinked state in TF-DNA complexes.