
Luís A Nunes Amaral- Ph.D.
- Professor at Northwestern University
Luís A Nunes Amaral
- Ph.D.
- Professor at Northwestern University
About
393
Publications
101,690
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
57,702
Citations
Introduction
Current institution
Additional affiliations
September 1995 - December 1996
September 2009 - August 2015
July 2002 - present
Publications
Publications (393)
Mining of electronic health records (EHR) promises to automate the identification of comprehensive disease phenotypes. However, the realization of this promise is hindered by the unavailability of generalizable ground-truth information, data incompleteness and heterogeneity, and the lack of generalization to multiple cohorts. We present here a data...
More than half of all drug clinical trials fail due to lack of efficacy, highlighting gaps in our understanding of disease-target relationships. While it is widely believed that scientific research is essential for identifying drug-targets, whether scientists can allocate their research attention according to the clinical potential of disease-targe...
The evolution of T cell molecular signatures in the distal lung of patients with severe pneumonia is understudied. Here, we analyzed T cell subsets in longitudinal bronchoalveolar lavage fluid samples from 273 patients with severe pneumonia, including unvaccinated patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) o...
Physicians, particularly intensivists, face information overload and decision fatigue, underscoring the need for automated diagnostic tools. Acute Respiratory Distress Syndrome (ARDS) affects over 10% of critical care patients, with over 40% mortality rate, yet is only recognized in 30-70% of cases in clinical settings. We present a reproducible co...
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although se...
Gene expression is a regulated process fueled by ATP consumption. Therefore, regulation must be coupled to constraints imposed by the level of energy metabolism. Here, we explore this relationship both theoretically and experimentally. A stylized mathematical model predicts that activators of gene expression have variable impact depending on metabo...
Gene expression is a regulated process fueled by ATP consumption. Therefore, regulation must be coupled to constraints imposed by the level of energy metabolism. Here, we explore this relationship both theoretically and experimentally. A stylized mathematical model predicts that activators of gene expression have variable impact depending on metabo...
Under-recognition of acute respiratory distress syndrome (ARDS) by clinicians is an important barrier to adoption of evidence-based practices such as low tidal volume ventilation. The burden created by the COVID-19 pandemic makes it even more critical to develop scalable data-driven tools to improve ARDS recognition. The objective of this study was...
Aging is among the most important risk factors for morbidity and mortality. To contribute toward a molecular understanding of aging, we analyzed age-resolved transcriptomic data from multiple studies. Here, we show that transcript length alone explains most transcriptional changes observed with aging in mice and humans. We present three lines of ev...
Objectives
Critical illness reduces β-lactam pharmacokinetic/pharmacodynamic (PK/PD) attainment. We sought to quantify PK/PD attainment in patients with hospital-acquired pneumonia.
Methods
Meropenem plasma PK data (n = 70 patients) were modelled, PK/PD attainment rates were calculated for empirical and definitive targets, and between-patient vari...
To craft effective public policy, modern governments must gather and analyze data on both the performance of their public functions and the responses by the public. Federal administrative agencies such as the Patent Office and Centers for Disease Control routinely do this, as does the United States Congress. More importantly, they make such data fr...
Abstract Background Adoption of innovations in the field of medicine is frequently hindered by a failure to recognize the condition targeted by the innovation. This is particularly true in cases where recognition requires integration of patient information from different sources, or where disease presentation can be heterogeneous and the recognitio...
Throughout the last 2 decades, several scholars observed that present day research into human genes rarely turns toward genes that had not already been extensively investigated in the past. Guided by hypotheses derived from studies of science and innovation, we present here a literature-wide data-driven meta-analysis to identify the specific scient...
Some patients infected with Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) develop severe pneumonia and the acute respiratory distress syndrome (ARDS)1. Distinct clinical features in these patients have led to speculation that the immune response to virus in the SARS-CoV-2-infected alveolus differs from other types of pneumonia2. We c...
Single cell RNA sequencing (scRNA-seq) data are now routinely generated in experimental practice because of their promise to enable the quantitative study of biological processes at the single cell level. However, cell type and cell state annotations remain an important computational challenge in analyzing scRNA-seq data. Here, we report on the dev...
Court records are unstructured and costly to access—here's how to fix it
Female representation has been slowly but steadily increasing in many sectors of society. One sector where one would expect to see gender parity is the movie industry, yet the representation of females in most functions within the U.S. movie industry remain surprisingly low. Here, we study the historical patterns of female representation among acto...
Mosaic analysis provides a means to probe developmental processes in situ by generating loss-of-function mutants within otherwise wildtype tissues. Combining these techniques with quantitative microscopy enables researchers to rigorously compare RNA or protein expression across the resultant clones. However, visual inspection of mosaic tissues rema...
Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophistica...
One of the most widely used approaches in natural language processing and information retrieval is the so-called bag-of-words model. A common component of such methods is the removal of uninformative words, commonly referred to as stopwords. Currently, most practitioners use manually curated stopword lists. This approach is problematic because it c...
Importance
Despite its efficacy, low tidal volume ventilation (LTVV) remains severely underutilized for patients with acute respiratory distress syndrome (ARDS). Physician under-recognition of ARDS is a significant barrier to LTVV use. We propose a computational method that addresses some of the limitations of the current approaches to automated me...
Mosaic analysis provides a means to probe developmental processes in situ by generating loss-of-function mutants within otherwise wildtype tissues. Combining these techniques with quantitative microscopy enables researchers to rigorously compare RNA or protein expression across the resultant clones. However, visual inspection of mosaic tissues rema...
Katahira et al. investigated the potential impact of skewness in the marginal distributions of personality trait on the findings reported by us in Gerlach et al. We concur with Katahira et al.’s finding in synthetic 2-dimensional data that there exists a mechanism by which skewness can induce detection of “meaningful clusters” using our proposed me...
Aging manifests itself through a decline in organismal homeostasis and a multitude of cellular and physiological functions. Efforts to identify a common basis for vertebrate aging face many challenges; for example, while there have been documented changes in the expression of many hundreds of mRNAs, the results across tissues and species have been...
The analysis of citations to scientific publications has become a tool that is used in the evaluation of a researcher’s work; especially in the face of an ever-increasing production volume1–6. Despite the acknowledged shortcomings of citation analysis and the ongoing debate on the meaning of citations7,8, citations are still primarily viewed as end...
Metabolic conditions affect the developmental tempo of most animal species. Consequently, developmental gene regulatory networks (GRNs) must faithfully adjust their dynamics to a variable time scale. We find evidence that layered weak repression of genes provides the necessary coupling between GRN output and cellular metabolism. Using a mathematica...
Metabolic conditions affect the developmental tempo of most animal species. Consequently, developmental gene regulatory networks (GRNs) must faithfully adjust their dynamics to a variable time scale. We find evidence that layered weak repression of genes provides the necessary coupling between GRN output and cellular metabolism. Using a mathematica...
Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophistica...
Topic models are in widespread use in natural language processing and beyond. Here, we propose a new framework for the evaluation of probabilistic topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. The major innovation of our approach is the ability to quantify the agreement betwee...
Rationale: The contributions of diverse cell populations in the human lung to pulmonary fibrosis pathogenesis are poorly understood. Single-cell RNA sequencing can reveal changes within individual cell populations during pulmonary fibrosis that are important for disease pathogenesis.
Objectives: To determine whether single-cell RNA sequencing can r...
In this Formal Comment, the authors of the recent publication "Large-scale investigation of the reasons why potentially important genes are ignored" maintain that it can be read as an opportunity to explore the unknown.
Rationale: The identification of informative elements of the host response to infection may improve the diagnosis and management of bacterial pneumonia. Objectives: To determine whether the absence of alveolar neutrophilia can exclude bacterial pneumonia in critically ill patients with suspected infection and to test whether signatures of bacterial...
Understanding human personality has been a focus for philosophers and scientists for millennia¹. It is now widely accepted that there are about five major personality domains that describe the personality profile of an individual2,3. In contrast to personality traits, the existence of personality types remains extremely controversial⁴. Despite the...
Cells must reliably respond to changes in transcription factor levels in order to execute cell state transitions in the correct time and place. These transitions are typically thought to be triggered by changes in the absolute nuclear concentrations of relevant transcription factors. We have identified a developmental context in which cell fate tra...
Cell state transitions are usually thought to be triggered by changes in the absolute concentrations of relevant transcription factors. In the Drosophila eye, the transcription factor Yan maintains cells in a progenitor state by repressing gene expression, while the Pointed transcription factor activates gene expression programs that promote photor...
Biomedical research has been previously reported to primarily focus on a minority of all known genes. Here, we demonstrate that these differences in attention can be explained, to a large extent, exclusively from a small set of identifiable chemical, physical, and biological properties of genes. Together with knowledge about homologous genes from m...
Study of homologous genes predicts study of human genes.
(A) Prediction of the number of research publications using the model of Fig 1A, extended to include the year of the initial publications on homologous nonhuman genes (S1 Data). (B) Number of publications for individual genes conditioned on the existence of homologous genes in nonhuman model...
Mapping of PubMed IDs to Web of Science IDs.
Mapping of PubMed IDs to Web of Science IDs for publications linked to genes.
(XLSX)
Comparison of feature importance for prediction of the year of initial publication and the total number of publications.
Median importance of features over 500 independent randomizations of the models for predicting the number of publications and the year of their discovery.
(XLSX)
Nearby accessible important genes that are studied less than expected.
Closest gene of S8 Table for every other gene in the 15-dimensional feature space in Fig 1B.
(XLSX)
Physical, chemical, and biological features of genes predict the number of publications.
(A) Ward-clustering of feature importance of 500 gradient boosting regression models. Numbers in brackets indicate order of features in heatmaps in Fig 1B. (B) Prediction of the number of publications for the 12,948 genes with a complete catalog of features usi...
Physical, chemical, and biological features mapped to individual genes.
z-score of individual features for genes in the tSNE mapping of Fig 1. Numbers in brackets indicate order of features in heatmaps in Fig 1 (S1 Data). tSNE, t-distributed stochastic neighbor embedding.
(TIF)
Publications reporting the discovery of new genes preferentially cite model organism.
(A) As Fig 2D, but for individual years during the 1980s and 1990s, the decades in which most human genes were discovered. Also see S5D Fig (S1 Data). (B) Fraction of nonhuman organisms cited by initial publications of human genes. Enrichment represents log2 ratio...
Attention in publications closely tracks number of publications.
Fractional counting, in which the occurrence of a gene in a publication counts as 1/(number of genes in publication), versus normal counting, in which the occurrence of a gene in a publication counts as 1, of publications with multiple genes (S1 Data).
(TIF)
Career rewards disfavor novelty.
(A) Career prospects of junior scientists correlate with the preceding attention directed towards genes: probability to transition to principal investigator (PI) status for authors of publications, according to the median attention of the genes in these publications. If, in the preceding years, this attention fell i...
Large-scale studies are a reference for many other publications.
(A) Kernel-density estimation of the fraction of genes with a given number of publications versus the median number of genes co-occurring in the respective publications. The observed pattern is consistent with the notions of “small science” and “big science” (S1 Data). (B) Median perc...
Literature survey of genes with increased attention between 2011 and 2015.
Enrichment in publications per gene between 2011 and 2015 over the time until 2010. The count of publications until 2010 has been normalized such that the total number of publications matches the time between 2011 and 2015.
(XLSX)
Fraction of unstudied homologs.
Number and fraction of unstudied homologs of unstudied human genes for different taxa. Unstudied genes were defined as in S12 Fig and marking genes that have not been covered by the research effort corresponding to a single single-gene study.
(XLSX)
Accessible important genes that are studied less than expected.
Genes with characteristics that have occurred in fewer publications than predicted by models of Fig 1A and carry the three favorable strategic properties described in Fig 4E (strong loss-of-function sensitivity and GWAS associations, experimental approachability, and the presence of in...
Extreme inequality in the research attention given to human protein-coding genes.
(A) Frequency of the number of research publications associated with human protein-coding genes in MEDLINE. Black line shows a log-normal fit to the data (S1 Data). (B) Human-curated GO annotations for individual genes, binned by number of publications. Upper limit of...
Catalog of absence of features.
(A) Hamming-clustering of genes according to absence of features (S1 Data). (B) Number of research publications for genes with and without complete catalog of features.
(TIF)
Health research funding correlates with the number of publications.
(A) The number of grants for genes as a function of the number of publications on a gene. (B) Correlation between the attention of NIH-sponsored research publications and the amount of allocated NIH budget on individual genes (dots). The latter is approximated by equal allocation o...
What we know about poorly studied genes.
(A) Distribution of the attention (measured by fractional publications) in publications given to genes. Genes with attention levels below 1 are denoted unstudied (blue), whereas genes with attention levels above 1 are denoted studied (orange). (B) Percentage of genes with indicated characteristic. (C) As B,...
Decrease in the fraction of scientists working on model organisms.
Fraction of scientists who—within the indicated year—publish exclusively on nonhuman genes (or gene products) or exclusively on human genes (or gene products), or both. The fraction of scientists who exclusively published on human genes had been stable in the 1980s and 1990s, while...
Accessible important genes.
List of genes that have strong loss-of-function sensitivity and GWAS associations, experimental approachability, and the presence of invertebrate model organisms for genes in 15-dimensional feature space. GWAS, genome-wide association study.
(XLSX)
Predictability of research effort.
(A) Cumulative share of publications in MEDLINE covered by the fraction of most common genes in decreasing order (S1 Data). (B) Gini coefficient (a measure of inequality) for genes in publications over time. When looking at income or wealth, Gini coefficients of 0.6 are considered extreme (S1 Data). (C) Correlatio...
List of genes with an incomplete catalog of features.
NCBI gene identifiers (Entrez genes), NCBI gene symbols, and Ensemble Gene IDs are provided. NCBI, National Center for Biotechnology Information.
(XLSX)
Map of the 15-dimensional space.
Coordinates of genes in Fig 1B. In addition, the inferred number of publications, NCBI gene symbols, and Ensemble Gene IDs are provided. NCBI, National Center for Biotechnology Information.
(XLSX)
Gene-specific context for further exploration of genes.
Gene-specific information to facilitate further experimentation. Tissue and cell line with highest RNA expression (“highest tissue,” “highest cells”); flag indicating whether frequently differentially expressed in EBI-GXA (https://www.ebi.ac.uk/gxa); flag indicating whether frequently reported...
The Shine-Dalgarno (SD) sequence motif facilitates translation initiation and is frequently found upstream of bacterial start codons. However, thousands of instances of this motif occur throughout the middle of protein coding genes in a typical bacterial genome. Here, we use comparative evolutionary analysis to test whether SD sequences located wit...
Pulmonary fibrosis is a devastating disorder that results in the progressive replacement of normal lung tissue with fibrotic scar. Available therapies slow disease progression, but most patients go on to die or require lung transplantation. Single-cell RNA-seq is a powerful tool that can reveal cellular identity via analysis of the transcriptome, b...
The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metr...
The Shine-Dalgarno (SD) sequence motif is frequently found upstream of protein coding genes and is thought to be the dominant mechanism of translation initiation used by bacteria. Experimental studies have shown that the SD sequence facilitates start codon recognition and enhances translation initiation by directly interacting with the highly conse...
Reduced motor control is one of the most frequent features associated with aging and disease. Nonlinear and fractal analyses have proved to be useful in investigating human physiological alterations with age and disease. Similar findings have not been established for any of the model organisms typically studied by biologists, though. If the physiol...
The Shine-Dalgarno (SD) sequence is often found upstream of protein coding genes across the bacterial kingdom, where it enhances start codon recognition via hybridization to the anti-SD (aSD) sequence on the small ribosomal subunit. Despite widespread conservation of the aSD sequence, the proportion of SD-led genes within a genome varies widely acr...
Death from chronic lung disease is increasing and Chronic Obstructive Pulmonary Disease has become the third leading cause of death in the United States in the past decade. Both chronic and acute lung diseases disproportionately affect elderly individuals, making it likely that these diseases will become more frequent and severe as the worldwide po...
Reduced motor control is one of the most frequent features associated with aging and disease. Nonlinear and fractal analyses have proved to be useful in investigating human physiological alterations with age and disease. Similar findings have not been established for any of the model organisms typically studied by biologists, though. If the physiol...
Frequent school shootings are a unique US phenomenon that has defied understanding1,2. Uncovering the aetiology of this problem is hampered by the lack of an established dataset3,4. Here we assemble a carefully curated dataset for the period 1990–2013 that is built upon an exhaustive review of existing data and original sources. Using this dataset,...
Studies dating back to the 1970s established that sequence complementarity between the anti-Shine–Dalgarno (aSD) sequence on prokaryotic ribosomes and the 5′ untranslated region of mRNAs helps to facilitate translation initiation. The optimal location of aSD sequence binding relative to the start codon, the full extents of the aSD sequence and the...
The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to gene...
Distribution of the GC contents of random sequences when an allowable GC content range is specified.
When a the desired GC content is set to a range instead of a singular value, the GC content distribution for the random sequences will be uniform within most of GC content range with a decaying tail at both ends (top). To get a uniform distribution...
The dependence of GC content on β given amino acid usage frequencies.
For a given amino acid usage frequency, the GC content of the generated sequence will depending on the values of β. Low values of β will yield sequences will higher GC content, and vice versa. The GC content of the sequence is also dependent on the amino acid usage frequency of t...
Expected GC content of random sequences depends on amino acid usage if synonymous codons are chosen with uniform probability.
The histogram shows the GC content distribution for three different amino acid usage frequencies, from a high GC organism (Streptomyces coelicolor), a low GC organism (Anaeromyxobacter dehalogenans), and uniform usage. The m...
Amino acid usage probabilities.
The high GC content organism is Streptomyces coelicolor and the low GC content organism is Anaeromyxobacter dehalogenans.
(PDF)
Author Summary
Collaboration plays an increasingly important role in promoting research productivity and impact. What remains unclear is whether female and male researchers differ in their collaboration practices. In our study, we report on an empirical analysis of the complete publication records of 3,980 faculty members in six science, technology...
Gender difference in the propensity to repeat previous co-authors measured using the disparity index.
Distribution of the disparity index measuring the repetition of co-authors of females (orange) and males (purple). The p-values indicate the significance of the gender difference, obtained with Kolmogorov-Smirnov test. The result is in good agreeme...
Heterogeneity in the number of publications and team size masks the effect of gender difference in the propensity to repeat co-authors.
Survival curves of the simulated total number of distinct co-authors with fixed number of publications and team size (A), fixed number of publications and team sizes sampled from real data (B), and both number of p...
Growth of average number of co-authors during considered period.
Average number of co-authors per publication for females (orange) and males (purple) as a function of publication year. The data are smoothed using a moving averaging method with window size 3. The shaded region indicates the 99% confidence interval obtained with bootstrapping. Data f...
Correlation between the average number of co-authors corrected for the annual average versus the fraction of publications authored by female faculty in psychology departments.
See the caption of S7 Fig for details. Data for this figure are in S4 Data.
(EPS)
Research topics in molecular biology.
We show for each topic the list of most representative words and journals. The topic numbers and words are given by the topic classifying method [35], and the journals are those in which the number of publications is significantly more than expected to occur by chance if drawn from a hypergeometric distribution...
The 20 most prolific scientists in our dataset publishing in topic B21 identified as telomere research.
(PDF)
Data for Fig 3.
(XLSX)
Data for S6 Fig.
(XLSX)
Gender differences in the propensity to repeat previous collaboration measured using the Gini coefficient.
Distribution of the Gini coefficient of collaboration heterogeneity [33] for females (orange) and males (purple) in the dataset with at least 10 publications. We exclude single-author publications. We obtain p-values for the validity of the nu...
In molecular biology departments, female faculty work in smaller teams than male faculty.
Logarithm of the ratio of observed number of publications authored by females over that expected from a hypergeometric distribution (orange circles). The publications are binned by the number of co-authors corrected for the annual average with a bin size of 0....
Correlation between the average number of co-authors corrected for the annual average versus the fraction of publications authored by female faculty in ecology departments.
See the caption of S7 Fig for details. Data for this figure are in S4 Data.
(EPS)
Data for Fig 2.
(XLSX)
Data for Fig 4, and S7 Fig through S11 Fig.
(XLSX)
Data for S2 Fig.
(XLSX)
Data for S3 Fig.
(XLSX)
Correlation between Gini coefficient and probability to repeat previous co-authors.
Orange (female) and purple (male) lines are linear fits to data, and RF2 and RM2 are the corresponding coefficient of determination. Data for this figure are in S8 Data.
(EPS)
Correlation between the average number of co-authors corrected for the annual average versus the fraction of publications authored by female faculty in chemical engineering departments.
Publications are grouped by journal. We restricted the publication types to “article”, “letter”, and “note”. The size of the circle is proportional to the logarithm...
Correlation between the average number of co-authors corrected for the annual average versus the fraction of publications authored by female faculty in materials science departments.
See the caption of S7 Fig for details. Data for this figure are in S4 Data.
(EPS)
The 20 most prolific scientists in our dataset publishing in topic B10 (outlier topic 7 in Table 2).
(PDF)
Data for S1 Fig.
(XLSX)