Thomas Stoeger's research while affiliated with Northwestern University and other places

Publications (42)

Preprint
Full-text available
Background Patients with severe SARS-CoV-2 pneumonia experience longer durations of critical illness yet similar mortality rates compared to patients with severe pneumonia secondary to other etiologies. As secondary bacterial infection is common in SARS-CoV-2 pneumonia, we hypothesized that unresolving ventilator-associated pneumonia (VAP) drives t...
Preprint
Full-text available
The condition of having a healthy, functional proteome is known as protein homeostasis, or proteostasis. Establishing and maintaining proteostasis is the province of the proteostasis network, approximately 2,500 genes that regulate protein synthesis, folding, localization, and degradation. The proteostasis network is a fundamental entity in biology...
Article
Objectives Critical illness reduces β-lactam pharmacokinetic/pharmacodynamic (PK/PD) attainment. We sought to quantify PK/PD attainment in patients with hospital-acquired pneumonia. Methods Meropenem plasma PK data (n = 70 patients) were modelled, PK/PD attainment rates were calculated for empirical and definitive targets, and between-patient vari...
Conference Paper
Rationale: Clinical and laboratory tests provide rich detail regarding patients’ clinical trajectory. However, making sense of the nearly 100,000 tests captured by the Logical Observation Identifiers Names and Codes (LOINC) system is challenging for machines and even clinicians. Capturing which tests are equivalent and which ones encode unique info...
Article
Full-text available
Nucleotide sequence reagents underpin molecular techniques that have been applied across hundreds of thousands of publications. We have previously reported wrongly identified nucleotide sequence reagents in human research publications and described a semi-automated screening tool Seek & Blastn to fact-check their claimed status. We applied Seek & B...
Article
Full-text available
Mathematical models have many applications in infectious diseases: epidemiologists use them to forecast outbreaks and design containment strategies; systems biologists use them to study complex processes sustaining pathogens, from the metabolic networks empowering microbial cells to ecological networks in the microbiome that protects its host. Here...
Article
Full-text available
Throughout the last 2 decades, several scholars observed that present day research into human genes rarely turns toward genes that had not already been extensively investigated in the past. Guided by hypotheses derived from studies of science and innovation, we present here a literature-wide data-driven meta-analysis to identify the specific scient...
Preprint
Full-text available
Nucleotide sequence reagents underpin a range of molecular genetics techniques that have been applied across hundreds of thousands of research publications. We have previously reported wrongly identified nucleotide sequence reagents in human gene function publications and described a semi-automated screening tool Seek & Blastn to fact-check the tar...
Article
Full-text available
Some patients infected with Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) develop severe pneumonia and the acute respiratory distress syndrome (ARDS)1. Distinct clinical features in these patients have led to speculation that the immune response to virus in the SARS-CoV-2-infected alveolus differs from other types of pneumonia2. We c...
Article
Alveolar macrophages orchestrate the response to viral infections. Age-related changes in these cells may underlie the differential severity of pneumonia in older patients. We performed an integrated analysis of single-cell RNA-Seq data that revealed homogenous age-related changes in the alveolar macrophage transcriptome in humans and mice. Using g...
Article
The relation between the ethos of large-scale projects in the life sciences and the epistemic culture of molecular biology has been the subject of heated discussions for the past 30 years. Molecular biology is typically a ‘small science’, organized around a laboratory leader who decides what to pursue, placing ‘bets’ on different research strategie...
Article
Full-text available
It is known that research into human genes is heavily skewed towards genes that have been widely studied for decades, including many genes that were being studied before the productive phase of the Human Genome Project. This means that the genes most frequently investigated by the research community tend to be only marginally more important to huma...
Preprint
Aging is associated with an increased risk for the development of many diseases. This is exemplified by the increased incidence of lung injury, muscle dysfunction and cognitive impairment in the elderly following influenza infection. Because the infectious cycle of flu is dependent upon the properties of the host, we examined the proteome of alveol...
Preprint
A dysfunctional response to inhaled pathogens and toxins drives a substantial portion of the susceptibility to acute and chronic lung disease in the elderly. We used transcriptomic profiling combined with genetic lineage tracing, heterochronic adoptive transfer, parabiosis and treatment with metformin to show that the lung microenvironment defines...
Preprint
Full-text available
Aging manifests itself through a decline in organismal homeostasis and a multitude of cellular and physiological functions. Efforts to identify a common basis for vertebrate aging face many challenges; for example, while there have been documented changes in the expression of many hundreds of mRNAs, the results across tissues and species have been...
Article
Full-text available
In this Formal Comment, the authors of the recent publication "Large-scale investigation of the reasons why potentially important genes are ignored" maintain that it can be read as an opportunity to explore the unknown.
Data
Publications reporting the discovery of new genes preferentially cite model organism. (A) As Fig 2D, but for individual years during the 1980s and 1990s, the decades in which most human genes were discovered. Also see S5D Fig (S1 Data). (B) Fraction of nonhuman organisms cited by initial publications of human genes. Enrichment represents log2 ratio...
Data
Health research funding correlates with the number of publications. (A) The number of grants for genes as a function of the number of publications on a gene. (B) Correlation between the attention of NIH-sponsored research publications and the amount of allocated NIH budget on individual genes (dots). The latter is approximated by equal allocation o...
Data
Career rewards disfavor novelty. (A) Career prospects of junior scientists correlate with the preceding attention directed towards genes: probability to transition to principal investigator (PI) status for authors of publications, according to the median attention of the genes in these publications. If, in the preceding years, this attention fell i...
Data
Attention in publications closely tracks number of publications. Fractional counting, in which the occurrence of a gene in a publication counts as 1/(number of genes in publication), versus normal counting, in which the occurrence of a gene in a publication counts as 1, of publications with multiple genes (S1 Data). (TIF)
Data
Study of homologous genes predicts study of human genes. (A) Prediction of the number of research publications using the model of Fig 1A, extended to include the year of the initial publications on homologous nonhuman genes (S1 Data). (B) Number of publications for individual genes conditioned on the existence of homologous genes in nonhuman model...
Data
Decrease in the fraction of scientists working on model organisms. Fraction of scientists who—within the indicated year—publish exclusively on nonhuman genes (or gene products) or exclusively on human genes (or gene products), or both. The fraction of scientists who exclusively published on human genes had been stable in the 1980s and 1990s, while...
Data
Mapping of PubMed IDs to Web of Science IDs. Mapping of PubMed IDs to Web of Science IDs for publications linked to genes. (XLSX)
Data
Accessible important genes. List of genes that have strong loss-of-function sensitivity and GWAS associations, experimental approachability, and the presence of invertebrate model organisms for genes in 15-dimensional feature space. GWAS, genome-wide association study. (XLSX)
Data
Comparison of feature importance for prediction of the year of initial publication and the total number of publications. Median importance of features over 500 independent randomizations of the models for predicting the number of publications and the year of their discovery. (XLSX)
Data
List of genes with an incomplete catalog of features. NCBI gene identifiers (Entrez genes), NCBI gene symbols, and Ensemble Gene IDs are provided. NCBI, National Center for Biotechnology Information. (XLSX)
Article
Full-text available
Author summary Biomedical research is one of the largest areas of present-day science and embeds the hope and potential to improve the lives of the general public. In order to understand how individual scientists choose individual research questions, we study why certain genes are well studied but others are not. While it has been previously observ...
Data
Extreme inequality in the research attention given to human protein-coding genes. (A) Frequency of the number of research publications associated with human protein-coding genes in MEDLINE. Black line shows a log-normal fit to the data (S1 Data). (B) Human-curated GO annotations for individual genes, binned by number of publications. Upper limit of...
Data
Accessible important genes that are studied less than expected. Genes with characteristics that have occurred in fewer publications than predicted by models of Fig 1A and carry the three favorable strategic properties described in Fig 4E (strong loss-of-function sensitivity and GWAS associations, experimental approachability, and the presence of in...
Data
Catalog of absence of features. (A) Hamming-clustering of genes according to absence of features (S1 Data). (B) Number of research publications for genes with and without complete catalog of features. (TIF)
Data
Physical, chemical, and biological features of genes predict the number of publications. (A) Ward-clustering of feature importance of 500 gradient boosting regression models. Numbers in brackets indicate order of features in heatmaps in Fig 1B. (B) Prediction of the number of publications for the 12,948 genes with a complete catalog of features usi...
Data
Nearby accessible important genes that are studied less than expected. Closest gene of S8 Table for every other gene in the 15-dimensional feature space in Fig 1B. (XLSX)
Data
Predictability of research effort. (A) Cumulative share of publications in MEDLINE covered by the fraction of most common genes in decreasing order (S1 Data). (B) Gini coefficient (a measure of inequality) for genes in publications over time. When looking at income or wealth, Gini coefficients of 0.6 are considered extreme (S1 Data). (C) Correlatio...
Data
Physical, chemical, and biological features mapped to individual genes. z-score of individual features for genes in the tSNE mapping of Fig 1. Numbers in brackets indicate order of features in heatmaps in Fig 1 (S1 Data). tSNE, t-distributed stochastic neighbor embedding. (TIF)
Data
What we know about poorly studied genes. (A) Distribution of the attention (measured by fractional publications) in publications given to genes. Genes with attention levels below 1 are denoted unstudied (blue), whereas genes with attention levels above 1 are denoted studied (orange). (B) Percentage of genes with indicated characteristic. (C) As B,...
Data
Large-scale studies are a reference for many other publications. (A) Kernel-density estimation of the fraction of genes with a given number of publications versus the median number of genes co-occurring in the respective publications. The observed pattern is consistent with the notions of “small science” and “big science” (S1 Data). (B) Median perc...
Data
Gene-specific context for further exploration of genes. Gene-specific information to facilitate further experimentation. Tissue and cell line with highest RNA expression (“highest tissue,” “highest cells”); flag indicating whether frequently differentially expressed in EBI-GXA (https://www.ebi.ac.uk/gxa); flag indicating whether frequently reported...
Data
Map of the 15-dimensional space. Coordinates of genes in Fig 1B. In addition, the inferred number of publications, NCBI gene symbols, and Ensemble Gene IDs are provided. NCBI, National Center for Biotechnology Information. (XLSX)
Data
Literature survey of genes with increased attention between 2011 and 2015. Enrichment in publications per gene between 2011 and 2015 over the time until 2010. The count of publications until 2010 has been normalized such that the total number of publications matches the time between 2011 and 2015. (XLSX)
Data
Fraction of unstudied homologs. Number and fraction of unstudied homologs of unstudied human genes for different taxa. Unstudied genes were defined as in S12 Fig and marking genes that have not been covered by the research effort corresponding to a single single-gene study. (XLSX)

Citations

... Lasers are widely used for the detection of microorganisms because of high-intensity and monochromatic features. Various light-scattering theories, including Rayleigh theory, Mie scattering, and Rayleigh-Gans theory, have been applied to predict homogeneous particles [82][83][84][85][86]. Modern devices based on light-scattering techniques are designed based on mathematical and physics-related models. ...
... 1,2 Fraudulent and otherwise compromised papers are not just a drain on publisher and editorial office resources, but contaminate the scientific record, undermine legitimate research, and erode trust in that research (and the publishers). [3][4][5][6][7] However, we are not at the point where we click a button, or a run a script, and the machines spit out an answer. False alarms, moral and ethical complexity, missed problems, and good old human ingenuity (to attempt to circumvent checks) make for a bumpy road to full automation. ...
... Human gene research is commonly biased toward known pathogenic genes and pathways that are fairly well established (Stoeger & Amaral, 2022). However, given the development of novel datadriven tools, it is now possible to move beyond the targets identified in previous articles and into the realm of big data. ...
... Some forms of error detection performed manually should benefit from automation. We have reported how misidentified cell lines 3 were misused in selected biomedical publications Park et al., 2021). Biological research using (or citing papers using) misidentified materials are not only a waste of time and resources but also a risk for health. ...
... The importance of AMs in controlling IAV infection is exhibited by the rapid weight loss, increased tissue damage, and poor survival in IAV-infected murine models of genetic AM deficiency or pharmacological AM depletion 8,9 . Adoptive transfer experiments in which AMs isolated from young mice are transferred into aged mice and vice versa show that the aging-associated transcriptomic differences in AMs before infection are driven by the local aged lung microenvironment 10 . However, the specific factors within the aged lung that contribute to the aging-associated defects of AMs remain unknown, thereby precluding our ability to target the underlying signals. ...
... Although macrophages are an important component of innate immunity, they are also associated with adaptive immunity. A report found a positive feedback loop between SARS-CoV-2 containing macrophages and activated T cells that promote inflammation and subsequent injury [123]. The study noted that the virus first infects and replicates in the nasopharynx cells because of high levels of ACE2. ...
... Although the identified variants may be common in the population and their allelic variability confers only a small but detectable effect on the disease phenotype, they can underline the key molecular pathways implicated in severe disease. Stoeger and Amaral, revising candidate genes in COVID-19 publications, flag that research into human protein-coding genes is disproportionately skewed towards a comparably small set of genes, and that genes that are identified by genome-wide datasets, and hence likely to have biological significance in the context of COVID-19, are at a risk of remaining ignored by researchers (Stoeger and Amaral 2020). ...
... Yet, there is a relative paucity of studies dedicated to the examination of proteostasis in the lung during healthy aging compared to other tissue types. A recent proteomic analysis of AT2 cells isolated from young and old mice revealed maladaptive collapse of the proteostasis network with age and an important role for the co-chaperone adaptive response (CARE) network in handling chronic misfolded proteins in the aging lung (Loguercio et al., 2019), paving the way for further investigation into manipulating proteostasis pathways to rejuvenate pulmonary health. ...
... When mice were treated with bleomycin every 2 months and the profile of InfResMacs was analysed 3 weeks after the first and second bleomycin treatments 60 , the monocyte-derived AMs that had developed during the first bleomycin injury responded similarly to the second bleomycin injury as the ResAMs that were present in the tissue before the first injury. This indicates that the 2 months between the bleomycin treatments was sufficient to convert the InfResMacs that developed during the first bleomycin injury into ResMacs, However, the monocytes recruited to the lung during the second injury gave rise to InfResMacs with a stronger pro-fibrotic profile than the monocytes that gave rise to InfResMacs during the first bleomycin injury 60 . This suggests that inflammatory memory is not linked to ontogeny itself and that a return to homeostasis for 2 months is sufficient to erase most of the inflammatory signature, but that the environmental cues imprinting the InfResMacs profile during the second injury must be stronger. ...
... Consistent with previous reports, R-loops were associated with known genomic hotspots such as gene termini and specific genic features such as high GC content, gene length, and expression level. The aging transcriptomes of multiple organisms and cell types show a positive correlation between transcriptional downregulation and specific genic features, such as gene length and GC content (Stoeger et al., 2019), and R-loops are known to play a key physiological role in transcription regulation due to their presence at promoters and terminators, where they regulate transcription initiation and termination, respectively (Niehrs & Luke, 2020). Thus, Rloop accumulation over these genomic regions may be a conserved mechanism that contributes to gene expression regulation in multiple cell types, including neurons. ...