• Home
  • IBM
  • Computational Biology Center
  • Gustavo Stolovitzky
Gustavo Stolovitzky

Gustavo Stolovitzky
IBM · Computational Biology Center

PhD

About

301
Publications
61,577
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
24,047
Citations
Introduction
Skills and Expertise

Publications

Publications (301)
Article
Full-text available
From eukaryotes to prokaryotes, all cells secrete extracellular vesicles (EVs) as part of their regular homeostasis, intercellular communication, and cargo disposal. Accumulating evidence suggests that small EVs carry functional small RNAs, potentially serving as extracellular messengers and liquid‐biopsy markers. Yet, the complete transcriptomic l...
Article
Motivation The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregat...
Article
Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clin...
Article
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals,...
Preprint
Full-text available
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vag...
Article
Full-text available
Importance: An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly...
Chapter
Systems biology is predicated on the use of model-based, genome-anchored methodologies to elucidate biological phenotypes and mechanisms. Models belong to one of two classes: first-principle, mathematical models aimed at representing the underlying biology that determines cell behavior—such as the interaction between two proteins, based on their st...
Article
Full-text available
We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communicat...
Article
Full-text available
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound–kinase interactions for novel and potent activities. Here, we carry out a crowdsource...
Preprint
Full-text available
To develop machine learning methods to quantify joint damage in patients with rheumatoid arthritis (RA), we developed the RA2 DREAM Challenge, a crowdsourced competition that utilized existing radiographic images and "gold-standard" scores on 674 sets of films from 562 patients. Training and leaderboard sets were provided to participants to develop...
Preprint
Full-text available
Circulating extracellular vesicles (EVs) contain molecular footprints from their cell of origin and may provide potential non-invasive access for detection, characterization, and monitoring of numerous diseases. Despite their growing promise, the integrated proteo-transcriptomic landscape of EVs and their donor cells remain poorly understood. To as...
Article
Full-text available
Significance While it would be desirable that the output of binary classification algorithms be the probability that the classification is correct, most algorithms do not provide a method to calculate such a probability. We propose a probabilistic output for binary classifiers based on an unexpected mapping of the probability of correct classificat...
Article
Objective Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker stud...
Article
Full-text available
Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-ba...
Article
Full-text available
The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to b...
Article
Full-text available
Nanoscale deterministic lateral displacement (nanoDLD) has emerged as an effective method for separating nanoscopic colloids for applications in molecular biology, yet present limits in throughput, purification, on‐chip filtration, and workflow restricting its adoption as a practical separation technology. To overcome these impediments, array scali...
Article
Full-text available
Background Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparati...
Article
Full-text available
Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature o...
Article
Full-text available
Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature o...
Article
Full-text available
Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature o...
Article
The advent of microfluidics in the 1990s promised a revolution in multiple industries from healthcare to chemical processing. Deterministic lateral displacement (DLD) is a continuous-flow micro- fluidic particle separation method discovered in 2004 that has been applied successfully and widely to the separation of blood cells, yeast, spores, bacter...
Conference Paper
Introduction: There is limited understanding of epigenetic intra-tumoral heterogeneity (ITH) in hepatocellular carcinoma (HCC) and its potential role in driving cancer evolution. We aimed at deciphering the contribution of DNA methylation to molecular heterogeneity in HCC and identifying novel epi-drivers of cancer evolution. Methods: We conducted...
Article
Full-text available
Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteo...
Preprint
Full-text available
Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. We found that whole blood gene expression predicts ultrasound-based gestation...
Article
Full-text available
IMPORTANCE Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. OBJECTIVE To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased eval...
Article
Full-text available
Importance Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased eva...
Article
Full-text available
Clonal evolution of a tumor ecosystem depends on different selection pressures that are principally immune and treatment mediated. We integrate RNA-seq, DNA sequencing, TCR-seq and SNP array data across multiple regions of liver cancer specimens to map spatio-temporal interactions between cancer and immune cells. We investigate how these interactio...
Article
Full-text available
Abstract Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challen...
Preprint
Full-text available
Our ability to predict the effects of drug combinations is limited, in part by limited understanding of how the transcriptional response of two monotherapies results in that of their combination. We performed the first analysis of matched time course RNAseq profiling of cells treated with both single drugs and their combinations. The transcriptiona...
Preprint
Full-text available
Single-cell RNA-seq technologies are rapidly evolving but while very informative, in standard scRNAseq experiments the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to keep the localization of the cells have limited throughput and gene coverage. Mapping scRNAseq to genes with sp...
Article
Full-text available
Biological and regulatory mechanisms underlying many multi-gene expression-based disease biomarkers are often not readily evident. We describe an innovative framework, NeTFactor, that combines network analyses with gene expression data to identify transcription factors (TFs) that significantly and maximally regulate such a biomarker. NeTFactor uses...
Conference Paper
Hepatocellular carcinoma (HCC) is the tumor with the highest increase in incidence and mortality in the last 20 years. Exosomes are nano-sized particles loaded with nucleic acids and proteins involved in cell-to-cell communication. The aims of this study are: 1) to establish a method to isolate exosomes from plasma of HCC patients, and 2) to detect...
Conference Paper
Hepatocellular carcinoma (HCC) is the tumor with the highest increase in incidence and mortality in the last 20 years. Exosomes are nano-sized particles loaded with nucleic acids and proteins involved in cell-to-cell communication. The aims of this study are: 1) to establish a method to isolate exosomes from plasma of HCC patients, and 2) to detect...
Article
Full-text available
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consis...
Article
Full-text available
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consis...
Article
We studied the trajectories of polymers being advected while diffusing in a pressure driven flow along a periodic pillar nanostructure known as nanoscale deterministic lateral displacement (nanoDLD) array. We found that polymers follow different trajectories depending on their length, flow velocity and pillar array geometry, demonstrating that nano...
Article
Full-text available
To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation be...
Article
Full-text available
Individual cells in clonal populations often respond differently to environmental changes; for binary phenotypes, such as cell death, this can be measured as a fractional response. These types of responses have been attributed to cell-intrinsic stochastic processes and variable abundances of biochemical constituents, such as proteins, but the influ...
Preprint
Full-text available
Identification of modules in molecular networks is at the core of many current analysis methods in biomedical research. However, how well different approaches identify disease-relevant modules in different types of gene and protein networks remains poorly understood. We launched the “Disease Module Identification DREAM Challenge”, an open competiti...
Article
Full-text available
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease where substantial heterogeneity in clinical presentation urgently requires a better stratification of patients for the development of drug trials and clinical care. In this study we explored stratification through a crowdsourcing approach, the DREAM Prize4Life ALS Stratificati...
Article
Full-text available
Background: The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer,...
Preprint
Full-text available
Background The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, fa...
Article
Efficient isolation of circulating biomarkers is key for enabling liquid biopsies for cancer diagnosis and prognosis. Circulating biomarkers such as exosomes and cell-free DNA hold promise for non-invasive early cancer diagnosis, determining cancer stage and prognosis, as well as monitoring treatment progression and the development of drug resistan...
Article
Background and aims: Clonal evolution of a tumor ecosystem depends not only on somatic mutations driving uncontrolled growth, but on a full array of selection pressures, principally immune and resource mediated. We aimed at mapping the spatio-temporal interactions between cancer and immune cells in hepatocellular carcinoma (HCC) by quantifying regi...
Article
Fractional killing of a population of cancer cells is often linked to cell-to-cell variability due to stochastic gene expression and fluctuations in protein levels. We found that mitochondria abundance is a major source of cell-to-cell variability that determines the fraction of cells that live or die in response to TNF-related apoptosis-inducing l...
Preprint
Full-text available
Individual cells in clonal populations often respond differently to environmental changes; for binary phenotypes, such as cell death, this can be measured as a fractional response. These types of responses have been attributed to cell-intrinsic stochastic processes and variable abundances of biochemical constituents, such as proteins, but the influ...
Preprint
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with substantial heterogeneity in clinical presentation with an urgent need for better stratification tools for clinical development and care. In this study we used a crowdsourcing approach to address the problem of ALS patient stratification. The DREAM Prize4Life ALS Stratifi...
Article
Full-text available
Identification of modules in molecular networks is at the core of many current analysis methods in biomedical research. However, how well different approaches identify disease-relevant modules in different types of gene and protein networks remains poorly understood. We launched the “Disease Module Identification DREAM Challenge”, an open competiti...
Article
Learning algorithms that aggregate predictions from an ensemble of diverse base classifiers consistently outperform individual methods. Many of these strategies have been developed in a supervised setting, where the accuracy of each base classifier can be empirically measured and this information is incorporated in the training process. However, th...
Preprint
Full-text available
The effectiveness of most cancer targeted therapies is short lived since tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance, however the number of possible combinations is vast necessitating data-driven approaches to find optimal treatments tailored to a patient’s tumor. AstraZeneca carried out 11...
Article
We characterize the transcriptional splicing landscape of a prostate cancer cell line treated with a previously identified synergistic drug combination. We use a combination of third generation long-read RNA sequencing technology and short-read RNAseq to create a high-fidelity map of expressed isoforms and fusions to quantify splicing events trigge...
Article
Full-text available
Purpose: Docetaxel has a demonstrated survival benefit for patients with metastatic castration-resistant prostate cancer (mCRPC); however, 10% to 20% of patients discontinue docetaxel prematurely because of toxicity-induced adverse events, and the management of risk factors for toxicity remains a challenge. Patients and methods: The comparator a...
Article
We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes...
Article
Full-text available
Significance Deterministic lateral displacement (DLD) is a technique for size fractionation of particles in continuous flow that has shown great potential for biological and clinical applications. Several theoretical models have been proposed to explain the trajectories of different-sized particles in relation to the geometry of the pillar array, b...
Article
Full-text available
How will this molecule smell? We still do not understand what a given substance will smell like. Keller et al. launched an international crowd-sourced competition in which many teams tried to solve how the smell of a molecule will be perceived by humans. The teams were given access to a database of responses from subjects who had sniffed a large nu...
Article
Full-text available
Wafer-scale fabrication of complex nanofluidic systems with integrated electronics is essential to realizing ubiquitous, compact, reliable, high-sensitivity and low-cost biomolecular sensors. Here we report a scalable fabrication strategy capable of producing nanofluidic chips with complex designs and down to single-digit nanometre dimensions over...
Data
Supplementary Figures 1-14, Supplementary Notes 1-6 and Supplementary References
Article
Background: Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an ope...
Preprint
Full-text available
Despite 25 years of progress in understanding the molecular mechanisms of olfaction, it is still not possible to predict whether a given molecule will have a perceived odor, or what olfactory percept it will produce. To address this stimulus-percept problem for olfaction, we organized the crowd-sourced DREAM Olfaction Prediction Challenge. Working...
Article
Full-text available
Nature Communications 7 : Article number: 12460 10.1038/ncomms12460 ( 2016 ); Published: 23 August 2016 ; Updated: 10 October 2016 . The HTML version of this Article incorrectly duplicated the authors S.
Article
Full-text available
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Data
Supplementary Figures 1-6, Supplementary Tables 1-4, Supplementary Note 1 and Supplementary References
Article
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Article
Deterministic lateral displacement (DLD) pillar arrays are an efficient technology to sort, separate and enrich micrometre-scale particles, which include parasites, bacteria, blood cells and circulating tumour cells in blood. However, this technology has not been translated to the true nanoscale, where it could function on biocolloids, such as exos...
Article
Full-text available
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to...