How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
Publications (305)
Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically ≲ 104 ba...
Engineered SpCas9s and AsCas12a cleave fewer off-target genomic sites than wild-type (wt) Cas9. However, understanding their fidelity, mechanisms and cleavage outcomes requires systematic profiling across mispaired target DNAs. Here we describe NucleaSeq-nuclease digestion and deep sequencing-a massively parallel platform that measures the cleavage...
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all...
Engineered Streptococcus pyogenes (Sp) Cas9s and Acidaminococcus sp. (As) Cas12a (formerly Cpf1) improve cleavage specificity in human cells. However, the fidelity, enzymatic mechanisms, and cleavage products of emerging CRISPR nucleases have not been profiled systematically across partially mispaired off-target DNA sequences. Here, we describe Nuc...
Historically, the evolution of bats has been analyzed using a small number of genetic loci for many species or many genetic loci for a few species. Here we present a phylogeny of 18 bat species, each of which is represented in 1,107 orthologous gene alignments used to build the tree. We generated a transcriptome sequence of Hypsignathus monstrosus...
Synthetic DNA can in principle be used for the archival storage of arbitrary data. Because errors are introduced during DNA synthesis, storage, and sequencing, an error-correcting code (ECC) is necessary for error-free recovery of the data. Previous work has utilized ECCs that can correct substitution errors, but not insertion or deletion errors (i...
Modern high-throughput biological assays study pooled populations of individual members by labeling each member with a unique DNA sequence called a “barcode.” DNA barcodes are frequently corrupted by DNA synthesis and sequencing errors, leading to significant data loss and incorrect data interpretation. Here, we describe an error corre...
Many large-scale high-throughput experiments use DNA barcodes—short DNA sequences prepended to DNA libraries—for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely-used...
CRISPR-Cas nucleoproteins target foreign DNA via base pairing with a crRNA. However, a quantitative description of protein binding and nuclease activation at off-target DNA sequences remains elusive. Here, we describe a chip-hybridized association-mapping platform (CHAMP) that repurposes next-generation sequencing chips to simultaneously measure th...
Schlafen11 (encoded by the SLFN11 gene) has been shown to inhibit the accumulation of HIV-1 proteins. We show that the SLFN11 gene is under positive selection in simian primates and is species-specific in its activity against HIV-1. The activity of human Schlafen11 is relatively weak compared to that of some other primate versions of this protein,...
The expression of Schlafen11 is stimulated by interferon β in 293T cells.
Four common human cell lines were treated with 1x10^6 IU/mL of interferon β-1b for 24 hours before cell lysates were harvested. RNA was purified from these extracts and reverse transcribed. A fragment of the SLFN11 transcript was then amplified by PCR. It can be noted that in...
No single mutation in human Schlafen11 conveys the ability to inhibit translation (A) A multiple sequence alignment of human, bonobo, and chimpanzee Schlafen11 is shown. Differences are highlighted as indicated. (B) Site directed mutagenesis was used to change indicated residues in human Schlafen11 to those found in chimpanzee Schlafen11. Plasmids...
Unprocessed HIV-1 Gag is affected by Schlafen11.
Image of a western blot showing that unprocessed Gag is affected by marmoset Schlafen11 in our experiments.
Codons in SLFN11 identified with dN/dS > 1 in four different tests for positive selection.
Ribosome profiling produces snapshots of the locations of actively translating ribosomes on messenger RNAs. These snapshots can be used to make inferences about translation dynamics. Recent ribosome profiling studies in yeast, however, have reached contradictory conclusions regarding the average translation rate of each codon. Some experiments have...
Comparison of P-site occupanices between experiments.
(A) To measure how frequently ribosomes are observed with a particular codon identity (in this example, ACT) in the P-site, the mean of the relative enrichments at all codon positions one codon downstream of an occurrence of the codon identity is computed. Panels (B) and (C) are constructed as...
Downstream peaks for all Gerashchenko [34] CHX concentration gradient experiments.
Profiles of mean relative enrichments around CGA for each set of experiments (unstressed, oxidatively stressed, and heat shocked cells) are both plotted and shown in heatmap form.
Some no-CHX-pretreatment experiments show evidence of small amounts of elongation with disrupted dynamics.
Enrichments around CGA (top panel) and CGG (bottom panel) in the immediate vicinity of the tRNA binding sites in no-CHX-pretreatment experiments from several studies. In data from Weinberg and from Gerashchenko NAR’s oxidatively stressed sampl...
Enrichment profiles around CGA for all CHX-pretreatment experiments.
Each row in the heatmap shows mean relative enrichments around CGA in a different experiment, with columns corresponding to different offsets. Experiments are grouped by source study. The number of uniquely mapped reads entering into the computations for each experiment is given o...
Enrichment profiles around CGA for all no-CHX-pretreatment experiments.
Figure is constructed as in S5 Fig but shows experiments annotated as being performed without CHX pretreatment. Downstream peaks are observed only in three experiments from Pop et al. [36].
Scatter plot of downstream wave area vs active site changes in Gerashchenko [34].
Figure is constructed as in Fig 5 but compares changes between Gerashchenko’s oxidative_noCHX and oxidative_8x_CHX experiments.
Downstream peaks in experiments from Zinshteyn [23].
Each panel shows enrichment profiles around all 61 non-stop codons for a particular experiment from Zinshteyn et al. [23], with CGA highlighted in red. All experiments show clear downstream peaks, suggesting that tRNA binding site enrichments no longer reflect in vivo translation dynamics.
Distributions of changes in corrected aggregate enrichments of each codon identity for all Zinshteyn etexperiments.
Each panel is constructed as in Fig 8B and shows comparisons between a wild-type experiment and a wild-type replicate (top panel) or different mcm5s2U pathway deletion strains (all other panels). AAA shows a consistently large increas...
Downstream peaks in experiments from Pop [36].
Figure is constructed as in S5 Fig but shows five experiments from Pop [36]. WT-URA_footprint, AGG_OE_footprint, and AGG-QC_footprint all show clear downstream peaks.
Computational details.
Details of simulations and analytical model of translation.
Comparison of E-site occupancies between experiments.
(A) To measure how frequently ribosomes are observed with a particular codon identity in the E-site (in this example, ACT), the mean of the relative enrichments at all codon positions two codons downstream of an occurrence of the codon identity is computed. Panels (B) and (C) are constructed as...
A-site enrichments in some experiments from Pop et al. [36] cluster with CHX-pretreatment experiments.
Figure is constructed as in Fig 1C but includes all experiments from each study. Three no-CHX-pretreatment experiments from Pop [36] are more similar to CHX-pretreatment experiments than they are to other no-CHX-pretreatment experiments.
Enrichment profiles around windows containing multiple occurrences of CGA.
For several different experiments (different colors), mean relative enrichments around all occurrences of CGA that either contain no additional CGAs within the subsequent 4 codons (thick lines) or contain exactly one additional CGA within the subsequent 4 codons (thin lines)...
A-, P-, and E-site occupancy changes across CHX concentration gradients.
Each panel is constructed as in Fig 2. Each row reports occupancies of a different tRNA binding site (top, A-site; middle, P-site; bottom, E-site). Each column reports occupancies for samples from Gerashchenko [34] under different conditions (left, unstressed; middle, oxidativ...
Enrichment profiles around codons for different amino acids in our CHX-pretreatment experiment.
Each panel is constructed as in Fig 3B but shows codons encoding a different amino acid. Several amino acids show substantial variation in the magnitude and direction of downstream peaks between different codons.
Increasing the relative elongation time of a codon creates downstream waves of depletion.
(A) In a simulation of translation, the average relative elongation time of each codon identity was changed from its A-site enrichment in the no-CHX experiment of Weinberg to its A-site enrichment in our CHX experiment. Allowing translation to proceed for a b...
Scatter plot of downstream wave area vs active site changes in Jan [41] exlcuding CGA, CGG, and CCG.
Figure is constructed as in Fig 5 but excludes CGA, CGG, and CCG from the regression. Insets highlight examples of codons with no substantial change (ACT) or a moderate increase (TTG) in net tRNA binding site enrichments in the presence of CHX. The...
Correlations in total ribosome occupancy per gene between experiments.
Figure is constructed as in Fig 1C but displays correlations in reads per kilobase per million mapped reads (RPKM) for each gene between experiments. Total ribosome occupancy per gene is not systematically different in CHX-pretreatment experiments (labeled in orange) than in no-...
Data sources.
Accession numbers and sample names for data sets analyzed.
Ribosome profiling produces snapshots of the locations of actively translating ribosomes on messenger RNAs. These snapshots can be used to make inferences about translation dynamics. Recent ribosome profiling studies in yeast, however, have reached contradictory conclusions regarding the average translation rate of each codon. Some experiments have...
It has been proposed that patterns in the usage of synonymous codons provide evidence that individual tRNA molecules are recycled through the ribosome, translating several occurrences of the same amino acid before diffusing away. The claimed evidence is based on counting the frequency with which pairs of synonymous codons are used at nearby occurre...
Schmitt et al. (1) raise the concern that circle sequencing (2) may spuriously count information from the same starting molecule multiple times. There are indeed several mechanisms by which multiple final reads produced by the circle-sequencing process could be derived from the same starting molecule, as discussed briefly in the last paragraph of o...
This paper presents a library preparation method that dramatically improves the error rate associated with high-throughput DNA sequencing and is substantially more cost-effective than existing error-correction methods. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, an...
Would you spend money today to make the world a substantially better place for your children and grandchildren? Most of us would. But what if the benefit would accrue only to your great-great-great-great-grandchildren, not born until the 22nd century? That's an awfully distant time horizon for most people. Many would probably spend today's resource...
The two-player Iterated Prisoner's Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpec...
In the situation of interest, we want to classify an object into one of two categories, call them A and B, on the basis of the presence or absence of multiple features, call them F1, F2,.... A specific feature F is characterized by the probabilities with which it occurs (denoted +F) or does not occur (denoted −F)
Abbondanzieri et al. [1] have observed single base-pair stepping by RNA polymerase along DNA as the polymerase produces a new RNA transcript. At each step, the RNAP binds a coding-appropriate NTP (that is, one of ATP, CTP, GTP, and UTP) and incorporates it into the RNA. Varying the concentration
This note is about how to get a full statistical reconstruction of the internal-node state probabilities and the Markov transition matrices of a phylogenetic tree, given only data at the leaf nodes (that is, data for extant taxa). Almost all of this is standard textbook stuff—except that I can’t find any standard textbook
In a world threatened by terrorists from a small number of countries, it is tempting to think that racial profiling for security purposes, even if morally objectionable, might save lives. But is it mathematically sound?William Press shows that even with unrealistically perfect data it is surprisingly difficult to gain any benefit from such profilin...
As electronic medical records enable increasingly ambitious studies of treatment outcomes, ethical issues previously important only to limited clinical trials become relevant to unlimited whole populations. For randomized clinical trials, adaptive assignment strategies are known to expose substantially fewer patients to avoidable treatment failures...
The following two books are reviewed: Encyclopedia of Algorithms (Kao, M.-Y., Ed.; 2008) and Algorithms in a Nutshell: A Desktop Quick Reference (Heineman, G.T. et al.; 2008).
The use of profiling by ethnicity or nationality to trigger secondary security screening is a controversial social and political issue. Overlooked is the question of whether such actuarial methods are in fact mathematically justified, even under the most idealized assumptions of completely accurate prior probabilities, and secondary screenings conc...
Previous attempts to correct type la supernovae (SN Ia's) for host galaxy extinction have given strange results: increased dispersion on the Hubble diagram or impossibly low values of the reddening ratio for dust in distant galaxies. The cause is the incorrect assumption that SN Ia's have a uniform intrinsic luminosity and color at maximum light. O...
JASON was asked to recommend ways in which the DOD/IC can handle present and future sensor data in fundamentally different ways, taking into account both the state-of-the-art, the potential for advances in areas such as data structures, the shaping of sensor data for exploitation, as well as methodologies for data discovery. This report examines th...
The tasking for this study was to evaluate the potential for adversaries to exploit advances in Human Performance Modification, and thus create a threat to national security. In making this assessment, we were asked to evaluate long-term scenarios. We have thus considered the present state of the art in pharmaceutical intervention in cognition and...
This paper will summarize, presumably for criticism and improvement by the N-body community, a few of the ideas brought out in my impromptu talk at Piet Hut's splendidly organized workshop "The Use of Supercomputers in Stellar Dynamics". The reader's forebearance is asked for the picaresque style here adopted, reflecting the nature of both the talk...
The spectrum of initial cosmological perturbations posited by Zel'dovich and co-workers for the "pancake theory" of galaxy formation has (i) adiabatic perturbations only, and (ii) constant perturbation amplitude on all scales at their respective horizon times. Assumption (i) has recently been shown to follow from grand unified gauge theories. This...
Co-authored by four leading scientists from academia and industry, Numerical Recipes Third Edition starts with basic mathematics and computer science and proceeds to complete, working routines. Widely recognized as the most comprehensive, accessible and practical basis for scientific computing, this new edition incorporates more than 400 Numerical...
Götz, Druckmüller, and, independently, Brady have defined a discrete Radon transform (DRT) that sums an image's pixel values along a set of aptly chosen discrete lines, complete in slope and intercept. The transform is fast, O(N2log N) for an N x N image; it uses only addition, not multiplication or interpolation, and it admits a fast, exact algori...
The genomes of mammals and birds can be partitioned into megabase-long regions, termed isochores, with consistently high, or low, average C + G content. Isochores with high CG contain a mixture of CG-rich and AT-rich genes, while high-AT isochores contain predominantly AT-rich genes. The two gene populations in the high-CG isochores are functionall...
While investigating microRNA targets, we have found that human genes divide into two roughly equal populations, based on the fraction of A plus T bases in their 3' UTRs. Using the Gene Ontology database, we find significant functional differences between the two gene populations, with AT-rich genes implicated in transcription and translation proces...
The passive interaction of a black hole with its ambient environment is examined. Criteria for gravitational collapse are presented for nonrotating stars, supermassive stars, and galactic nuclei as well as for rotating supermassive stars and galactic nuclei. Spherical collapse is described analytically together with nonspherical collapse, and prope...
The two Numerical Recipes books are marvellous. The principal book, The Art of Scientific Computing, contains program listings for almost every conceivable requirement, and it also contains a well written discussion of the algorithms and the numerical methods involved. The Example Book provides a complete driving program, with helpful notes, for ne...
Celebrate our new astronomical insight into the ultimate fate of
the Universe.
DARPA requested a JASON summer study on Small Unit Operations, with the emphasis to be on the SUO vision of total situational awareness for small ground units, remote commanders and remote weapons systems. The study focused on new technologies and concepts which might lead to a dramatic improvement in battlefield situational awareness.
Galaxies in the Las Campanas Redshift Survey are classified according to their spectra, and the resulting spectral types are analyzed to determine if local environment affects their properties. We find that the luminosity function of early-type objects varies as a function of local density. Our results suggest that early-type galaxies (presumably e...
We construct a spectral classification scheme for the galaxies of the Las Campanas Redshift Survey (LCRS) based on a principal component analysis of the measured galaxy spectra. We interpret the physical significance of the spectral classes and conclude that they are sensitive to morphological type and the amount of active star formation. In this f...
We have shown that hypercarrier modulation with binary on/off keying, and with bit times longer than the whole "ringing time" of the urban environment, is potentially capable of achieving megabit per second data rates in a manner that is completely insensitive to multipath fading. A straw-man design, using current COTS components, has been presente...
this article. Just as the "central dogma" of the mathematical software community turned most program libraries of the 1970s and 80s into black boxes with defined interfaces, there is a similar emerging dogma of the 1990s, that scientific programmers should move to high-level "total environments" (here called "TEs") such as Mathematica, MATLAB, IDL,...
“Non-Gaussian” is the casual explanation often given for anything unexpected in an astronomical time series. What better place to look for non-Gaussianity, therefore, than in the light curve of 0957+561, the gravitational lens that, until recently, had yielded frustratingly inconsistent determinations of its lag. We discuss the difficulties in meas...
Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.
To understand their data better, astronomers need to use statistical tools that are more advanced than traditional ``freshman lab'' statistics. As an illustration, the problem of combining apparently incompatible measurements of a quantity is presented from both the traditional, and a more sophisticated Bayesian, perspective. Explicit formulas are...
We present an empirical method that uses multicolor light-curve shapes (MLCSs) to estimate the luminosity, distance, and total line-of-sight extinction of Type Ia supernovae (SNe Ia). The empirical correlation between the MLCSs and the luminosity is derived from a "training set" of nine SN Ia light curves with independent distance and reddening est...
Edge effects and Gibbs phenomena are a ubiquitous problem in signal processing. We show how this problem can arise from a mismatch between the “topology” of the data D (e.g., an interval in the case of a time series or a rectangle in the case of a photographic image) and the topology X (often a circle or tours) natural to the construction of the tr...
About 20 years elapsed between my first and second papers on gravitational lenses (Press & Gunn 1973; Press, Rybicki & Hewitt 1992ab). Therefore, the conference organizers have asked me to prognosticate on the future of gravitational lenses. Their reasoning, if I understand it correctly, is that I will likely go to sleep for another 20 years immedi...
orkshop (Monterey, California), pp. 967-971, 1994. Annual Technical Report 35 Since the success of our algorithm depends on the likelihood of having at least a pair of views whose corresponding parameter estimation converge to good solution even with trivial initial guesses, it is important that such a pair generally exist. Recently, Weinshall, Wer...
About 20 years elapsed between my first and second papers on gravitational lenses (Press & Gunn 1973; Press, Rybicki & Hewitt 1992ab). Therefore, the conference organizers have asked me to prognosticate on the future of gravitational lenses. Their reasoning, if I understand it correctly, is that I will likely go to sleep for another 20 years immedi...
We have measured our Galaxy's motion relative to distant galaxies in which Type Ia supernovae (SNe Ia) have been observed. The effective recession velocity of this sample is 7000 km/sec, which approaches the depth of the survey of brightest cluster galaxies by Lauer & Postman (1994). We use the light curve shape (LCS) method for deriving distances...
With the ansatz that a data set's correlation matrix has a certain parameterized form (one general enough, however, to allow the arbitrary specification of a slowly-varying decorrelation distance and population variance) the general machinery of Wiener or optimal filtering can be reduced from O(n 3 ) to O(n) operations, where n is the size of the d...
The National Information Infrastructure (NII) is a vast undertaking to provide a web of networks, computers and databases to communication and information throughout the country. One of the more difficult topics is privacy and security on the NII. These are areas that are crucial to making the NII fully useful for government and for commerce. The J...
It is widely agreed that urban military operations demand greater 'situational awareness' than now exists. Soldiers need mapping tools to tell them where they are, real time information on what's around the corner and behind walls as well as reliable data links to receive and send orders and intelligence. At the same time, commanders need accurate...
Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.
We present an empirical method that uses visual band light curve shapes (LCSs) to estimate the luminosity of Type Ia supernova (SN Ia's). This method is first applied to a 'training set' of eight SN Ia light curves with independent distance estimates to derive the correlation between the LCS and the luminosity. We employ a linear estimation algorit...
0.23> Bulletin of the American Mathematical Society, Vol. 21, No. 1, 1-46, July 1989. [54] P. Switzer, "Numerical classification," in Geostatistics, Plenum, New York, 1970. [55] , "Numerical classification applied to certain Jamaican Eocene nummulitids," Math. Geol. 3:297-311. [56] P. Walters, Ergodic theory--Introductory Lectures, Springer-Verlag,...