About
82
Publications
7,106
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,559
Citations
Introduction
Additional affiliations
January 2010 - present
Publications
Publications (82)
SARSNTdb offers a curated, nucleotide-centric database for users of varying levels of SARS-CoV-2 knowledge. Its user-friendly interface enables querying coding regions and coordinate intervals to find out the various functional and selective constraints that act upon the corresponding nucleotides and amino acids. Users can easily obtain information...
tRNA-derived fragments (tRFs) are a class of emerging post-transcriptional regulators of gene expression likely binding to the transcripts of target genes. However, only a few tRFs targets have been experimentally validated, making it hard to extrapolate the functions or binding mechanisms of tRFs. The paucity of resources supporting the identifica...
Controversial reports of human-virus chimeric reads (HVCRs) suggested a possible integration of SARS-CoV-2 sequences into human DNA (1-3).…
Accumulating evidence has suggested that tRNA-derived fragments (tRFs) could be loaded to Argonaute proteins and function as regulatory small RNAs. However, their mode of action remains largely unknown, and investigations of their binding mechanisms have been limited, revealing little more than microRNA-like seed regions in a handful of tRFs and a...
The most abundant cellular RNA species, ribosomal RNA (rRNA), appears to be a source of massive amounts of non-randomly generated fragments. We found rRNA fragments (rRFs) in immunoprecipitated Argonaute (Ago-IP) complexes in human and mouse cells and in small RNA sequencing datasets. In human Ago1-IP, guanine-rich rRFs were preferentially cut in s...
Transfer RNA and ribosomal RNA are known for their traditional roles in translation. Here we reviewed evidence of their novel functionality as sources of short RNA fragments and the emerging role of these fragments as posttranscriptional regulators, including neuronal and aging processes. Such transfer RNA and ribosomal RNA fragments (tRFs and rRFs...
Transfer RNA fragments (tRFs) are an emerging class of small RNA molecules derived from mature or precursor tRNAs. They are found across a wide range of organisms and tissues, in small RNA fraction or loaded to Argonaute in numbers comparable to microRNAs. Their functions and mechanisms of action are largely unknown, and results obtained on individ...
Background
Accumulating evidence is pointing to functional roles of rRNA derived fragments (rRFs), often considered degradation byproducts. Small RNAs, including miRNAs and tRNA-derived fragments (tRFs), have been implicated in the aging process and we considered rRFs in this context.
Objective
We performed a computational analysis of Argonaute-lo...
Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 milli...
Ancient DNA (aDNA) studies often rely on standard methods of mutation calling, optimized for high-quality contemporary DNA but not for excessive contamination, time- or environment-related damage of aDNA. In the absence of validated datasets and despite showing extreme sensitivity to aDNA quality, these methods have been used in many published stud...
Background
Current human whole genome sequencing projects produce massive amounts of data, often creating significant computational challenges. Different approaches have been developed for each type of genome variant and method of its detection, necessitating users to run multiple algorithms to find variants.
Results
We present GROM (Genome Rearra...
Comparative genomics studies typically limit their focus to single nucleotide variants (SNVs) and that was the case for previous comparisons of woolly mammoth genomes. We extended the analysis to systematically identify not only SNVs but also larger structural variants (SVs) and indels and found multiple mammoth-specific deletions and duplications...
Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of r...
Background: The progress of next-generation sequencing technologies has unveiled various non-coding RNAs that have previously been considered products of random degradation and attracted only minimal interest. Among small RNA families, microRNA (miRNAs) have traditionally been considered key post-transcriptional regulators. However, recent studies...
Development of sequencing technologies and supporting computation enable discovery of small RNA molecules that previously escaped detection or were ignored due to low count numbers. While the focus in the analysis of small RNA libraries has been primarily on microRNAs (miRNAs), recent studies have reported findings of fragments of transfer RNAs (tR...
Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilitie...
Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on t...
Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on t...
Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilitie...
Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilitie...
Background
RNA-related applications of the next-generation sequencing (NGS) technologies require context-specific interpretations: e.g., sequence mismatches may indicate sites of RNA editing, or uneven read coverage often points to mature form of microRNA. Existing visualization tools traditionally show RNA molecules in two dimensions, with their b...
The temperature in the Arctic region has been increasing in the recent past accompanied by melting of its glaciers. We took a snapshot of the current microbial inhabitation of an Alaskan glacier (which can be considered as one of the simplest possible ecosystems) by using metagenomic sequencing of 16S rRNA recovered from ice/snow samples. Somewhat...
MicroRNAs (miRNAs) are 20- to ∼24-nucleotide (nt) small RNAs that impact a variety of biological processes, from development to age-associated events. To study the role of miRNAs in aging, studies have profiled the levels of miRNAs with time. However, evidence suggests that miRNAs show heterogeneity in length and sequence in different biological co...
Cold environments, such as glaciers, are large reservoirs of microbial life. The present study employed 16S rRNA gene amplicon
metagenomic sequencing to survey the prokaryotic microbiota on Alaskan glacial ice, revealing a rich and diverse microbial
community of some 2,500 species of bacteria and archaea.
We build a model of storage of well-defined positional information in probabilistic sequence patterns. Once a pattern is defined, it is possible to judge the effect of any mutation in it. We show that the frequency of beneficial mutations can be high in general and the same mutation can be either advantageous or deleterious depending on the pattern...
Functional proteins are known to contain stretches of amino acid sequences highly conserved in different protein families and across species. These conserved sequences constitute protein domains that are generally integral structural units, conferring specific function- alities and often self-folding. The observation of highly conserved protein dom...
NCBI completed the transition of its main genome annotation database from Locuslink to Entrez Gene in Spring 2005. However, to this date few parsers exist for the Entrez Gene annotation file. Owing to the widespread use of Locuslink and the popularity of Perl programming language in bioinformatics, a publicly available high performance Entrez Gene...
The yeast proteome and its interactome (that is, the sum of all protein interactions) are the best studied of all organisms. Currently, there are about 3000 verified protein interactions and several thousand nonverified interactions known in yeast. Independent studies estimated that there may be more than 30 000 interactions in yeast although most...
We present evidence of remarkable genome-wide mobility and evolutionary expansion for a class of protein domains whose borders
locate close to the borders of their encoding exons. These exon-bordering domains are more numerous and widely distributed
in the human genome than other domains. They also co-occur with more diverse domains to form a large...
Here, we describe the identification of a chromosomal DNA replication origin (oriC) from the hyperthermophilic archaeon Sulfolobus solfataricus (subdomain of Crenarchaeota). By means of a cumulative GC-skew analysis of the Sulfolobus genome sequence, a candidate oriC was mapped within a 1.12-kb region located between the two divergently transcribed...
We conducted a multi-genome analysis correlating protein domain organization with the exon-intron structure of genes in nine eukaryotic genomes. We observed a significant correlation between the borders of exons and domains on a genomic scale for both invertebrates and vertebrates. In addition, we found that the more complex organisms displayed con...
Rapid development of genomic and proteomic methodologies has provided a wealth of data for deciphering the biomolecular circuitry of a living cell. The main areas of computational research of proteomes outlined in this review are: understanding the system, its features and parameters to help plan the experiments; data integration, to help produce m...
Focused efforts by several international laboratories have resulted in the sequencing of the genome of the causative agent of severe acute respiratory syndrome (SARS), novel coronavirus SARS-CoV, in record time. Using cumulative skew diagrams, I found tht mutational patterns in the SARS-CoV genome were strikingly different from other coronaviruses...
Using two different approaches, we estimated that on average there are about five interacting partners per protein in the
proteome of the yeast Saccharomyces cerevisiae. In the first approach, we used a novel method to model sampling overlap by a Bernoulli process, compared the results of
two independent yeast two‐hybrid interaction screens and tes...
Retroviral RNA genomes are known to have a biased nucleotide composition. For instance, the plus-strand RNA of human immunodeficiency virus (HIV) is A-rich, and the genome of human T cell leukemia virus (HTLV) is C-rich, and other retroviruses have a U-rich or G-rich genome. The biased composition of these genomes is most likely caused by direction...
The relationship between the similarity of expression patterns for a pair of genes and interaction of the proteins they encode
is demonstrated both for the simple genome of the bacteriophage T7 and the considerably more complex genome of the yeast Saccharomyces cerevisiae. Statistical analysis of large-scale gene expression and protein interaction...
We show here that transcription by the bacteriophage T7 RNA polymerase increases the deamination of cytosine bases in the non-transcribed strand to uracil, causing C to T mutations in that strand. Under optimal conditions, the mutation frequency increases about fivefold over background, and is similar to that seen with the Escherichia coli RNA poly...
Analysis of 22 complete sequences of double-stranded DNA viruses reveals striking compositional asymmetries between leading and lagging, and between transcribed and non-transcribed strands. In all bi-directionally replicated genomes analyzed, the observed leading strand GC skew (measuring relative excess of guanines versus cytosines) is different f...
A novel method of cumulative diagrams shows that the nucleotide composition of a microbial chromosome changes at two points
separated by about a half of its length. These points coincide with sites of replication origin and terminus for all bacteria
where such sites are known. The leading strand is found to contain more guanine than cytosine residu...
This paper describes a prototype genome display and query system for the World Wide Web, which could play the role of a graphical interactive gateway to online genome information services. It provides a uniform interface to display mapping and sequencing data for the human, mouse and yeast genomes and could be easily extended to accommodate more in...
With the main focus of the Human Genome Project shifting to sequencing, bioinformatics support for constructing large-scale genomic maps of other organisms is still required. We attempt to provide for this with our work, aimed at the delivery of robust and user-friendly contig-building software on the WWW.
We present a prototype distributed analyti...
The integrated X chromosome database (IXDB) is a repository for physical mapping data of the human X chromosome. Its current
content is the result of a strict integration of data stemming from many different sources. The main features of IXDB include
a flexible and extendible schema, a comfortable and fully crossreferenced WWW interface (http://ixd...
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they...
The human X chromosome is associated with a large number of disease phenotypes, principally because of its unique mode of inheritance that tends to reveal all recessive disorders in males. With the longer term goal of identifying and characterizing most of these genes, we have adopted a chromosome-wide strategy to establish a YAC contig map. We hav...
We present a radiation hybrid (RH) map of human Chromosome (Chr) X, using 50 markers on 72 radiation hybrids. The markers, obtained from the consensus map, form a grid spanning the entire chromosome. To check the RH map, the marker order was determined by analysis of presence or absence of retained human DNA fragments in the RHs; the comparison wit...
We describe an approach to integration of different types of genetic and physical mapping data for the human chromosome X, including radiation hybrids (RHs), STSs, microsatellites, genes and clones. It includes a rapid construction of the marker-RH map over the entire chromosome and a subsequent positioning of yeast artificial chromosome (YAC) clon...
The paper describes a new software package DNASUN developed for supporting gene engineering laboratories. The package provides a user-friendly interface for experimental researches and supports the traditional nucleotide/protein sequence analysis as well as physical mapping, sequencing, plasmid manipulations, optimal oligonucleotide probe selection...
The paper describes a new software package DNASUN developed for supporting gene engineering laboratories. The package provides a user-friendly interface for experimental researches and supports the traditional nucleotide/protein sequence analysis as well as physical mapping, sequencing, plasmid manipulations, optimal oligonucleotide probe selec...
Experimental noise and noncontiguous clone inserts can pose serious problems in reconstructing genomic maps from hybridization data. We describe an algorithm that easily identifies false positive signals and clones containing chimeric inserts/internal deletions. The algorithm "dechimerizes" clones, splitting them into independent contiguous compone...
We describe here the construction of an ordered clone map of human chromosome 21, based on the identification of ordered sets of YAC clones covering > 90% of the chromosome, and their use to identify groups of cosmid clones (cosmid pockets) localised to subregions defined by the YAC clone map. This is to our knowledge the highest resolution map of...
A complete set of software tools to aid the physical mapping of a genome has been developed and successfully applied to the
genomlc mapping of the fission yeast Schlzosaccharomyces pombe. Two approaches were used for ordering single-copy hybridisation probes: one was based on the simulated annealing algorithm
to order all probes, and another on inf...
Gridded on high density filters, a P1 genomic library of 17-fold coverage and a cosmid library of 8 genome equivalents, both made from S. pombe strain 972h-, were ordered by hybridizing genetic markers and individual clones from the two libraries. Yeast artificial chromosome (YAC) clones covering the entire genome were used to subdivide the librari...
Genome mapping by anchoring random clones has recently been the subject of intensive theoretical study. In this paper, differences between published predictions of properties of anchored groups of clones ("contigs") are analyzed and simplifications of the mathematical formulae describing these properties are presented. The theoretical predictions a...
The genome of the fission yeast, Schizosaccharomyces pombe, consists of some 14 million base pairs of DNA contained in three chromosomes. On account of its excellent genetics we used it as a test system for a strategy designed to map mammalian chromosomes and genomes. Data obtained from hybridization fingerprinting established an ordered library of...
The analysis of the various cooling schedules for the simulated annealing algorithm is made towards the restriction map construction. Algorithm behaviour under control of three different cost functions is considered and the discrete cost function is found to handle successfully experimental data with realistic error sizes. A program using this func...
Instead of the traditional manipulations with given fixed fragment lengths in the restriction map construction a method of varying the lengths is proposed and realized under the simulated annealing algorithm scheme. The described approach has no upper limit on the number of fragments mapped with even ordinary hardware. A program has been derived fr...