Elias Chaibub NetoSage Bionetworks
Elias Chaibub Neto
PhD
About
114
Publications
26,964
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,426
Citations
Introduction
Additional affiliations
July 2011 - present
Education
August 2004 - April 2010
Publications
Publications (114)
Healthcare researchers are increasingly utilizing smartphone sensor data as a scalable and cost-effective approach to studying individualized health-related behaviors in real-world settings. However, to develop reliable and robust digital behavioral signatures that may help in the early prediction of the individualized disease trajectory and future...
Objective
Psoricatic disease remains underdiagnosed and undertreated. We developed and validated a suite of novel, smartphone sensor-based assessments that can be self-administered to measure cutaneous and musculoskeletal signs and symptoms of psoriatic disease.
Methods
Participants with psoriasis, psoriatic arthritis, or healthy controls were rec...
Non-parametric two-sample tests based on energy distance or maximum mean discrepancy are widely used statistical tests for comparing multivariate data from two populations. While these tests enjoy desirable statistical properties, their test statistics can be expensive to compute as they require the computation of 3 distinct Euclidean distance (or...
Healthcare researchers are increasingly utilizing smartphone sensor data as a scalable and cost-effective approach to studying individualized health-related behaviors in real-world settings. However, to develop reliable and robust digital behavioral signatures that may help predict disease trajectory early and future prognosis, there is a critical...
Background
The two-way partial AUC has been recently proposed as a way to directly quantify partial area under the ROC curve with simultaneous restrictions on the sensitivity and specificity ranges of diagnostic tests or classifiers. The metric, as originally implemented in the tpAUC R package, is estimated using a nonparametric estimator based on...
In the field of statistical disclosure control, the tradeoff between data confidentiality and data utility is measured by comparing disclosure risk and information loss metrics. Distance based metrics such as the mean absolute error (MAE), mean squared error (MSE), mean variation (IL1), and its scaled alternative (IL1s) are popular information loss...
We propose a counterfactual approach to train “causality-aware” predictive models that are able to leverage causal information in static anticausal machine learning tasks (i.e., prediction tasks where the outcome influences the inputs). In applications plagued by confounding, the approach can be used to generate predictions that are free from the i...
Ideally, a patient’s response to medication can be monitored by measuring changes in performance of some activity. In observational studies, however, any detected association between treatment (“on-medication” vs “off-medication”) and the outcome (performance in the activity) might be due to confounders. In particular, causal inferences at the pers...
Background
Psoriasis and psoriatic arthritis are common immune-mediated inflammatory conditions that primarily affect the skin, joints and entheses and can lead to significant disability and worsening quality of life. Although early recognition and treatment can prevent the development of permanent damage, psoriatic disease remains underdiagnosed a...
Remote health assessments that gather real-world data (RWD) outside clinic settings require a clear understanding of appropriate methods for data collection, quality assessment, analysis and interpretation. Here we examine the performance and limitations of smartphones in collecting RWD in the remote mPower observational study of Parkinson’s diseas...
Consumer wearables and sensors are a rich source of data about patients’ daily disease and symptom burden, particularly in the case of movement disorders like Parkinson’s disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires suffi...
Linear residualization is a common practice for confounding adjustment in machine learning (ML) applications. Recently, causality-aware predictive modeling has been proposed as an alternative causality-inspired approach for adjusting for confounders. The basic idea is to simulate counterfactual data that is free from the spurious associations gener...
In health related machine learning applications, the training data often corresponds to a non-representative sample from the target populations where the learners will be deployed. In anticausal prediction tasks, selection biases often make the associations between confounders and the outcome variable unstable across different target environments....
There are many approaches to maintaining wellness, including taking a simple vacation to attending highly structured wellness retreats, which typically regulate the attendee's personal time and activities. In a healthy English-speaking cohort of 112 women and men (aged 30–80 years), this study examined the effects of participating in either a 6-day...
While the past decade has seen meaningful improvements in clinical outcomes for multiple myeloma patients, a subset of patients does not benefit from current therapeutics for unclear reasons. Many gene expression-based models of risk have been developed, but each model uses a different combination of genes and often involves assaying many genes mak...
Causal modeling has been recognized as a potential solution to many challenging problems in machine learning (ML). While counterfactual thinking has been leveraged in ML tasks that aim to predict the consequences of actions/interventions, it has not yet been applied to more traditional/static supervised learning tasks, such as the prediction of lab...
IMPORTANCE Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. OBJECTIVE To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased eval...
Importance
Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.
Objective
To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased eva...
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representat...
Mobile health, the collection of data using wearables and sensors, is a rapidly growing field in health research with many applications. Deriving validated measures of disease and severity that can be used clinically or as outcome measures in clinical trials, referred to as digital biomarkers, has proven difficult. In part due to the complicated an...
While counterfactual thinking has been used in ML tasks that aim to predict the consequences of different actions, policies, and interventions, it has not yet been leveraged in more traditional/static supervised learning tasks, such as the prediction of discrete labels in classification tasks or continuous responses in regression problems. Here, we...
Collection of high-dimensional, longitudinal digital health data has the potential to support a wide-variety of research and clinical applications including diagnostics and longitudinal health tracking. Algorithms that process these data and inform digital diagnostics are typically developed using training and test sets generated from multiple repe...
Machine learning practice is often impacted by confounders. Confounding can be particularly severe in remote digital health studies where the participants self-select to enter the study. While many different confounding adjustment approaches have been proposed in the literature, most of these methods rely on modeling assumptions, and it is unclear...
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representat...
While the past decade has seen meaningful improvements in clinical outcomes for multiple myeloma patients, a subset of patients do not benefit from current therapeutics for unclear reasons. Many gene expression-based models of risk have been developed, but each model uses a different combination of genes and often involve assaying many genes making...
Machine learning applications are often plagued with confounders that can impact the generalizability of the learners. In clinical settings, demographic characteristics often play the role of confounders. Confounding is especially problematic in remote digital health studies where the participants self-select to enter the study, thereby making it d...
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consis...
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consis...
Clinical machine learning applications are often plagued with confounders that can impact the generalizability and predictive performance of the learners. Confounding is especially problematic in remote digital health studies where the participants self-select to enter the study, thereby making it challenging to balance the demographic characterist...
Current clinimetrics assessment of Parkinson's disease (PD) is insensitive, episodic, subjective, and provider-centered. Ubiquitous technologies such as smartphones promise to fundamentally change PD assessments. To enable frequent remote assessment of PD tremor severity, here we present a 39-month smartphone research study in a real-world setting...
Clinical machine learning applications are often plagued with confounders that are clinically irrelevant, but can still artificially boost the predictive performance of the algorithms. Confounding is especially problematic in mobile health studies run "in the wild", where it is challenging to balance the demographic characteristics of participants...
The roles played by learning and memorization represent an important topic in deep learning research. Recent work on this subject has shown that the optimization behavior of DNNs trained on shuffled labels is qualitatively different from DNNs trained with real labels. Here, we propose a novel permutation approach that can differentiate memorization...
The effectiveness of most cancer targeted therapies is short lived since tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance, however the number of possible combinations is vast necessitating data-driven approaches to find optimal treatments tailored to a patient’s tumor. AstraZeneca carried out 11...
Recently, Saeb et al (2017) showed that, in diagnostic machine learning applications, having data of each subject randomly assigned to both training and test sets (record-wise data split) can lead to massive underestimation of the cross-validation prediction error, due to the presence of "subject identity confounding" caused by the classifier's abi...
Purpose:
Docetaxel has a demonstrated survival benefit for patients with metastatic castration-resistant prostate cancer (mCRPC); however, 10% to 20% of patients discontinue docetaxel prematurely because of toxicity-induced adverse events, and the management of risk factors for toxicity remains a challenge.
Patients and methods:
The comparator a...
Mental Health conditions are now amongst the top five burdensome diseases in the US. Disparities in access to services and health outcomes vary due to several factors including socioeconomic status, shortage of mental health professionals, stigma and the linguistic gap between providers and non-English speaking minority population. This study explo...
In this work we provide a couple of contributions to the analysis of longitudinal data collected by smartphones in mobile health applications. First, we propose a novel statistical approach to disentangle personalized treatment and "time-of-the-day" effects in observational studies. Under the assumption of no unmeasured confounders, we show how to...
Mindboggle (http://mindboggle.info) is an open source brain morphometry platform that takes in preprocessed T1-weighted MRI data and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. In this article, we document the software and demonstrate its use in studies of shape variation in healt...
Mindboggle flowchart.
Nipype automatically generates a flow diagram of the processing steps when running Mindboggle.
(PDF)
Mindboggle output directory tree.
(PDF)
Tables of shape differences between scans and between hemispheres.
(PDF)
Variance components analysis of the shapes of 62 cortical regions in 101 human brains.
(PDF)
Background:
Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an ope...
Human genome-wide association studies (GWAS) have shown that genetic variation at >130 gene loci is associated with type 2 diabetes (T2D). We asked if the expression of the candidate T2D-associated genes within these loci is regulated by a common locus in pancreatic islets. Using an obese F2 mouse intercross segregating for T2D, we show that the ex...
Regional association plots for NFAT and fasting insulin in human GWAS.
Association (-log10 P-value) to fasting insulin levels for SNPs near NFATC1 (A) and NFATC2 (B). Plots were generated using LocusZoom [13] and data provided in [14]. Color scale shows correlation (r2) between the SNP with the strongest association within the plotted region (lead...
T2D GWAS islet eQTLs and plasma insulin QTL that are conditional on Nfatc2.
Heat maps show the linkage for plasma insulin and the islet eQTLs for Nfatc2 and 54 transcripts for genes identified in human GWAS that are associated with Type 2 Diabetes (T2D). Linkage data was obtained from an F2 intercross between diabetes resistant (B6) and diabetes-su...
Genotype dependence of T2D GWAS trans-eQTLs on Chr 2.
Expression of GWAS gene candidates in islets of 491 F2 mice. For each gene, mice are grouped according to genotype at the peak locus of the respective eQTL; homozygous B6 (B6:B6), heterozygous (B6:BTBR), or homozygous BTBR (BTBR:BTBR). The expression of 26 GWAS gene candidates increased (A) in r...
Genotype dependence of expression of Nfatc2 in pancreatic islets.
Expression of Nfatc2 in pancreatic islets of 491 F2 mice. Mice are grouped according to their genotype at ~168.4 Mb on Chr 2 (rs3024096), the marker position closest to the maximum LOD (~70) of the cis-eQTL for Nfatc2. At this position, mice were homozygous B6 (B6:B6, N = 127), heter...
Sequence comparison of mouse and human Nfatc1 and Nfatc2.
Amino acid sequence for mouse and human, proteins for equivalent isoforms of Nfatc2 (A) and Nfatc1 (B) were aligned using Clustal Omega. For Nfatc2, we used isoforms A (NP_035029.2) and C (NP_775114.1) for mouse and human, respectively. For Nfatc1, we used isoforms 1 (NP_058071.2) and I (NP_...
Expression of the NFAT gene family in mouse islets transduced with adenoviruses.
Normalized RNA-sequencing values for the NFAT gene family in mouse islets 48 hr after transduction with adenoviruses containing GFP, ca-Nfatc1 or ca-Nfatc2 (A). Average expression values (± S.E.M., N = 5) are shown for each gene/virus combination. Western blot analysis...
Comparison of GWAS genes that were regulated by NFAT in mouse and human islets.
Heat maps illustrate the change in the expression of T2D-associated GWAS gene candidates in mouse (left) and human (right) islets replotted from Fig 6 as the average Z-score for each transcript; N = 5 for mouse and N = 3 for human.
(TIF)
The overexpression of ca-Nfatc2 promotes cell cycle progression and not DNA damage repair pathways in mouse islets.
Immunocytochemistry of mouse islets for (A) Ki67, (B) pHH3 (S10) and (C) 53BP1 following Ad-LacZ (control) and Ad-ca-Nfatc2 transduction (72 hr). To identify β-cells, islets were stained for insulin. All islets were exposed to BrdU (1...
Additional cell cycle regulatory genes that were differentially regulated by ca-Nfatc1 in mouse and human islets.
The regulation of expression for cell cycle genes is illustrated in mouse (A) and human (B) islets following overexpression of either ca-Nfatc1 or ca-Nfatc2. The data is plotted as the log2 fold-change in expression relative to that mea...
Gene candidates linked to T2D risk loci from human GWAS, and their mouse homologues.
Separate tabs list: Tab 1, ~300 entries in the GWAS catalog for genomic loci associated with the disease/trait “Type 2 diabetes”; Tab 2, distinct genomic loci and their associated P-values; and Tab 3, candidate human genes reported for the loci, with accompanying m...
Summary scores for islet transcription factor cis-eQTLs on chromosome 2 for conditional dependence on T2D GWAS gene candidates.
Table summarizing conditional dependence of T2D GWAS gene candidates on islet Chr 2 cis-eQTLs for genes annotated as playing a role in "transcription" or "DNA binding" (https://david.ncifcrf.gov/). For each cis-eQTL, the s...
Isoform-specific expression of the NFAT gene family in mouse islets.
Islets from B6 mice were used for deep RNA-sequencing to determine isoform-specific expression of all genes. The table shows the expression level and relative proportion of each isoform for the NFAT gene family. Expression values for all genes is available at GEO submission GSE736...
Islet eQTLs for T2D GWAS candidates in mouse islets.
Excel sheet lists the 205 eQTLs for GWAS candidates genes that were identified genome-wide in islets from ~500 B6:BTBR-F2 obese mice. Genomic positions for the genes and their eQTLs are shown. Cis is defined as an eQTL that occurred within 2.5 cM (~5 Mbp) of the genomic position of the correspond...
Donor information for human islet preparations.
For several islet preparations, multiple studies were conducted, which are listed in the final column labeled “Experiments”. Values that are missing are not known.
(PDF)
Quantitative real time PCR primers used for gene expression measurements in human islets.
(XLSX)
Normalized expression values for all genes from RNA-sequencing of mouse islets following overexpression of GFP, ca-Nfatc1 or ca-Nfatc2.
Excel spreadsheet contains normalized expression values for all genes (Tab 1) identified in mouse islets 48 hr after overexpression of GFP, ca-Nfatc1 or ca-Nfatc2 (N = 5 ea.). Posterior probabilities (PP) for diffe...
Mindboggle ( http://mindboggle.info ) is an open source brain morphometry platform that takes in preprocessed T1-weighted MRI data and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. In this article, we document the software and demonstrate its use in studies of shape variation in hea...
A major challenge in biomedical research is to identify causal relationships among genotypes, phenotypes, and clinical outcomes from high-dimensional measurements. Causal networks have been widely used in systems genetics for modeling gene regulatory systems and for identifying causes and risk factors of diseases. In this chapter, we describe funda...
Nature Communications 7 : Article number: 12460 10.1038/ncomms12460 ( 2016 ); Published: 23 August 2016 ; Updated: 10 October 2016 . The HTML version of this Article incorrectly duplicated the authors S.
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Supplementary Figures 1-6, Supplementary Tables 1-4, Supplementary Note 1 and Supplementary References
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Over-fitting is a dreaded foe in challenge-based competitions. Because participants rely on public leaderboards to evaluate and refine their models, there is always the danger they might over-fit to the holdout data supporting the leaderboard. The recently published Ladder algorithm aims to address this problem by preventing the participants from e...
Clinical trials traditionally employ blinding as a design mechanism to reduce the influence of placebo effects. In practice, however, it can be difficult or impossible to blind study participants and unblinded trials are common in medical research. Here we show how instrumental variables can be used to quantify and disentangle treatment and placebo...
Mobile health studies can leverage longitudinal sensor data from smartphones to guide the application of personalized medical interventions. These studies are particularly appealing due to their ability to attract a large number of participants. In this paper, we argue that the adoption of an instrumental variable approach for randomized trials wit...
Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimer's disease. The Alzheimer's disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state-of-the-art in predicting cognitive outcomes in Alzheimer's disease based on high...
Current measures of health and disease are often insensitive, episodic, and subjective. Further, these measures generally are not designed to provide meaningful feedback to individuals. The impact of high-resolution activity data collected from mobile phones is only beginning to be explored. Here we present data from mPower, a clinical observationa...
We propose hypothesis tests for detecting dopaminergic medication response in Parkinson disease patients, using longitudinal sensor data collected by smartphones. The processed data is composed of multiple features extracted from active tapping tasks performed by the participant on a daily basis, before and after medication, over several months. Ea...
Biological functions are carried out by groups of interacting molecules,
cells or tissues, known as communities. Membership in these communities may
overlap when biological components are involved in multiple functions. However,
traditional clustering methods detect non-overlapping communities. These
detected communities may also be unstable and di...
DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data....