Jason H Moore

Jason H Moore
University of Pennsylvania | UP · Perelman School of Medicine

Ph.D.

About

947
Publications
128,981
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
37,697
Citations
Introduction
Director, Institute for Quantitative Biomedical Sciences (iQBS) Director, Graduate Program in Quantitative Biomedical Sciences (QBS) Associate Director, Norris-Cotton Cancer Center (NCCC) Editor-in-Chief, BioData Mining
Additional affiliations
August 2004 - February 2015
Dartmouth College
Position
  • Director, Institute for Quantitative Biomedical Sciences

Publications

Publications (947)
Preprint
Full-text available
Large database sources, such as the National Health and Nutrition Examination Survey (NHANES), while being a great utility for epidemiological studies, pose challenges for machine learning due to data heterogeneity, varied sample sizes, missing values/outliers and variations in data collection and interpretation requiring thorough data-quality asse...
Article
Full-text available
In response to the escalating global obesity crisis and its associated health and financial burdens, this paper presents a novel methodology for analyzing longitudinal weight loss data and assessing the effectiveness of financial incentives. Drawing from the Keep It Off trial—a three-arm randomized controlled study with 189 participants—we examined...
Chapter
Alzheimer’s disease (AD) leads to irreversible cognitive decline, with Mild Cognitive Impairment (MCI) as its prodromal stage. Early detection of AD and related dementia is crucial for timely treatment and slowing disease progression. However, classifying cognitive normal (CN), MCI, and AD subjects using machine learning models faces class imbalanc...
Article
Full-text available
Motivation Biomedical and healthcare domains generate vast amounts of complex data that can be challenging to analyze using machine learning tools, especially for researchers without computer science training. Results Aliro is an open-source software package designed to automate machine learning analysis through a clean web interface. By infusing...
Article
Full-text available
There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some algorithms model skew, even fewer model kurtosis, and none of them model bimodality and monotonicity. To overcome these challenges, we have implemented an algorithm for Skew and Tail-heavi...
Preprint
Full-text available
Background and Objectives: The two most common neurodegenerative diseases are Alzheimer's disease (AD) and Parkinson's disease (PD), both related to age and affect millions of people across the world, especially as life expectancy increases in certain countries. Here, we explore the potential predictiveness of the genetic risk of AD and PD separate...
Preprint
Full-text available
Motivation: . Genome-Wide Association Studies (GWAS) commonly assume phenotypic and genetic homogeneity that is not present in complex conditions. We designed Transformative Regression Analysis of Combined Effects (TRACE), a GWAS methodology that better accounts for clinical phenotype heterogeneity and identifies gene-by-environment (GxE) interacti...
Article
Full-text available
In computational toxicology, prediction of complex endpoints has always been challenging, as they often involve multiple distinct mechanisms. State-of-the-art models are either limited by low accuracy, or lack of interpretability due to their black-box nature. Here, we introduce AIDTox, an interpretable deep learning model which incorporates curate...
Article
Full-text available
The introduction of large language models (LLMs) that allow iterative “chat” in late 2022 is a paradigm shift that enables generation of text often indistinguishable from that written by humans. LLM-based chatbots have immense potential to improve academic work efficiency, but the ethical implications of their fair use and inherent bias must be con...
Article
Investigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer's disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetics-based treatment a...
Article
Full-text available
Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression me...
Preprint
Full-text available
Statistical epistasis has been studied extensively because of its potential to provide evidence for genetic interactions for phenotypes, but there have been methodological limitations to its exhaustive, widespread application. We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many va...
Article
Full-text available
Background Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning appro...
Article
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as domina...
Chapter
In many evolutionary computation systems, parent selection methods can affect, among other things, convergence to a solution. In this paper, we present a study comparing the role of two commonly used parent selection methods in evolving machine learning pipelines in an automated machine learning system called Tree-based Pipeline Optimization Tool (...
Preprint
BACKGROUND As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer’s Disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture heterogeneous biomedical knowledge that is central to the disease’s etiology and response to drugs. We designed the Alz...
Article
Full-text available
Background Loss-of-Function (LoF) variants in human genes are important due to their impact on clinical phenotypes and frequent occurrence in the genomes of healthy individuals. The association of LoF variants with complex diseases and traits may lead to the discovery and validation of novel therapeutic targets. Current approaches predict high-conf...
Preprint
Full-text available
In many evolutionary computation systems, parent selection methods can affect, among other things, convergence to a solution. In this paper, we present a study comparing the role of two commonly used parent selection methods in evolving machine learning pipelines in an automated machine learning system called Tree-based Pipeline Optimization Tool (...
Preprint
Background Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning appro...
Article
The primary efforts of disease and epidemiological research can be divided into two areas: identifying the causal mechanisms and utilizing important variables for risk prediction. The latter is generally perceived as a more obtainable goal due to the vast number of readily available tools and the faster pace of obtaining results. However, the lower...
Chapter
Artificial Intelligence (AI) in medicine stands at the cusp of revolutionizing clinician reasoning and decision-making. Since its foundational years in the mid-20th century, the progression of medical AI has seen considerable advancements, concurrently grappling with various challenges. Early attempts of AI showcased immense potential, yet faced hu...
Chapter
The selection and tuning of feature selection, feature engineering, and classification or regression algorithms is a major challenge in machine learning, affecting both beginners and experts. Automated machine learning (AutoML) offers a solution by automating the creation of machine learning pipelines, eliminating the guesswork associated with a ma...
Article
Brain imaging genetics examines associations between imaging quantitative traits (QTs) and genetic factors such as single nucleotide polymorphisms (SNPs) to provide important insights into the pathogenesis of Alzheimer’s diseases (AD). Given the high dimensionality, the individual level SNP‐QT signals typically have small effect sizes and are hard...
Conference Paper
Investigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer's disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetic-based treatment ap...
Preprint
Full-text available
Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging...
Article
Full-text available
Objectives Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets. Materials and Methods We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to...
Article
Full-text available
Understanding the strengths and weaknesses of machine learning (ML) algorithms is crucial to determine their scope of application. Here, we introduce the Diverse and Generative ML Benchmark (DIGEN), a collection of synthetic datasets for comprehensive, reproducible, and interpretable benchmarking of ML algorithms for classification of binary outcom...
Preprint
Full-text available
Evolutionary multi-agent systems (EMASs) are very good at dealing with difficult, multi-dimensional problems, their efficacy was proven theoretically based on analysis of the relevant Markov-Chain based model. Now the research continues on introducing autonomous hybridization into EMAS. This paper focuses on a proposed hybrid version of the EMAS, a...
Preprint
Full-text available
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWAS). Standard GWAS are well-powered to interrogate additive models; however, new approaches are required to investigate other modes of inheritance such as dominance...
Preprint
Full-text available
In computational toxicology, prediction of complex endpoints has always been challenging, as they often involve multiple distinct mechanisms. State-of-the-art models are either limited by low accuracy, or lack of interpretability due to their black-box nature. Here we introduce AIDTox, an interpretable deep learning model which incorporates curated...
Article
The rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, output to users. This concern is especially legitimate in biomedical contexts, where patient safety is of paramount importance. This positio...
Article
Full-text available
The genetic analysis of complex traits has been dominated by parametric statistical methods due to their theoretical properties, ease of use, computational efficiency, and intuitive interpretation. However, there are likely to be patterns arising from complex genetic architectures which are more easily detected and modeled using machine learning me...
Article
Full-text available
Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, fibrosing interstitial pneumonia of unknown etiology. The role of genetic risk factors has been the focus of numerous studies probing for associations of genetic variants with IPF. We aimed to determine whether single-nucleotide polymorphisms (SNPs) of four candidate genes are associate...
Article
Full-text available
Evolutionary multi-agent systems (EMASs) are very good at dealing with difficult, multi-dimensional problems, their efficacy was proven theoretically based on analysis of the relevant Markov-Chain based model. Now the research continues on introducing autonomous hybridization into EMAS. This paper focuses on a proposed hybrid version of the EMAS, a...
Article
Full-text available
Brain imaging genetics examines associations between imaging quantitative traits (QTs) and genetic factors such as single nucleotide polymorphisms (SNPs) to provide important insights into the pathogenesis of Alzheimer’s disease (AD). The individual level SNP-QT signals are high dimensional and typically have small effect sizes, making them hard to...
Article
Full-text available
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments....
Article
Full-text available
Background Alzheimer’s disease (AD) is a complex neurodegenerative disorder and the most common type of dementia. AD is characterized by a decline of cognitive function and brain atrophy, and is highly heritable with estimated heritability ranging from 60 to 80 $$\%$$ % . The most straightforward and widely used strategy to identify AD genetic basi...
Article
Objective For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. Materials and...
Article
Full-text available
In drug development, a major reason for attrition is the lack of understanding of cellular mechanisms governing drug toxicity. The black-box nature of conventional classification models has limited their utility in identifying toxicity pathways. Here we developed DTox (deep learning for toxicology), an interpretation framework for knowledge-guided...
Article
Full-text available
The opioid epidemic continues to contribute to loss of life through overdose and significant social and economic burdens. Many individuals who develop problematic opioid use (POU) do so after being exposed to prescribed opioid analgesics. Therefore, it is important to accurately identify and classify risk factors for POU. In this review, we discuss...
Article
ComptoxAI is a new data infrastructure for computational and artificial intelligence research in predictive toxicology. Here, we describe and showcase ComptoxAI's graph-structured knowledge base in the context of three real-world use-cases, demonstrating that it can rapidly answer complex questions about toxicology that are infeasible using previou...
Article
Full-text available
Integrating data across institutions can improve learning efficiency. To integrate data efficiently while protecting privacy, we propose A one-shot, summary-statistics-based, Distributed Algorithm for fitting Penalized (ADAP) regression models across multiple datasets. ADAP utilizes patient-level data from a lead site and incorporates the first-ord...
Preprint
When seeking a predictive model in biomedical data, one often has more than a single objective in mind, e.g., attaining both high accuracy and low complexity (to promote interpretability). We investigate herein whether multiple objectives can be dynamically tuned by our recently proposed coevolutionary algorithm, SAFE (Solution And Fitness Evolutio...
Preprint
We recently highlighted a fundamental problem recognized to confound algorithmic optimization, namely, \textit{conflating} the objective with the objective function. Even when the former is well defined, the latter may not be obvious, e.g., in learning a strategy to navigate a maze to find a goal (objective), an effective objective function to \tex...
Preprint
We have recently presented SAFE -- Solution And Fitness Evolution -- a commensalistic coevolutionary algorithm that maintains two coevolving populations: a population of candidate solutions and a population of candidate objective functions. We showed that SAFE was successful at evolving solutions within a robotic maze domain. Herein we present an i...
Preprint
Modifying standard gradient boosting by replacing the embedded weak learner in favor of a strong(er) one, we present SyRBo: Symbolic-Regression Boosting. Experiments over 98 regression datasets show that by adding a small number of boosting stages -- between 2--5 -- to a symbolic regressor, statistically significant improvements can often be attain...
Article
Full-text available
Given the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates acr...
Article
Full-text available
The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the lim...
Preprint
Full-text available
Accurate disease risk stratification can lead to more precise and personalized prevention and treatment of diseases. As an important component to disease risk, genetic risk factors can be utilized as an early and stable predictor for disease onset. Recently, the polygenic risk score (PRS) method has combined the effects from hundreds to millions of...
Preprint
Full-text available
In drug development, a major reason for attrition is the lack of understanding of cellular mechanisms governing drug toxicity. The black-box nature of conventional classification models has limited their utility in identifying toxicity pathways. Here we developed DTox ( D eep learning for Tox icology), an interpretation framework for knowledge-guid...
Article
Full-text available
Background Gene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that e...
Preprint
Full-text available
Objective For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. Materials and...
Article
Full-text available
Quantitative Structure-Activity Relationship (QSAR) modeling is a common computational technique for predicting chemical toxicity, but a lack of new methodological innovations has impeded QSAR performance on many tasks. We show that contemporary QSAR modeling for predictive toxicology can be substantially improved by incorporating semantic graph da...
Article
Scientific innovation has long been heralded the collaborative effort of many people, groups, and studies to drive forward research. However, the traditional peer review process relies on reviewers acting in a silo to critically judge research. As research becomes more cross-disciplinary, finding reviewers with appropriate expertise to provide feed...
Article
Full-text available
The genetic basis of phenotypic variation across populations has not been well explained for most traits. Several factors may cause disparities, from variation in environments to divergent population genetic structure. We hypothesized that a population-level polygenic risk score (PRS) can explain phenotypic variation among geographic populations ba...
Article
Full-text available
Semantic GP is a promising branch of GP that introduces semantic awareness during genetic evolution to improve various aspects of GP. This paper presents a new Semantic GP approach based on Dynamic Target (SGP-DT) that divides the search problem into multiple GP runs. The evolution in each run is guided by a new (dynamic) target based on the residu...
Article
We present AddGBoost, a gradient boosting-style algorithm, wherein the decision tree is replaced by a succession of (possibly) stronger learners, which are optimized via a state-of-the-art hyperparameter optimizer. Through experiments over 90 regression datasets we show that AddGBoost emerges as the top performer for 33% (with 2 stages) up to 42% (...
Article
Background Multimodal neuroimaging data can provide complementary information that a single modality cannot about neurodegenerative diseases such as Alzheimer's disease (AD). Deep Generalized Canonical Correlation Analysis (DGCCA) is able to learn a shared feature representation from different views of data by applying non‐linear transformation usi...
Article
Background Brain imaging genetics is an emerging research topic in the study of Alzheimer’s disease (AD). The conventional approach, such as canonical correlation analysis (CCA), has been widely used to identify imaging genetic associations. A deep learning model has recently been proposed to better understand the roots of the complex association b...
Article
The advances in technologies for acquiring brain imaging and high-throughput genetic data allow the researcher to access a large amount of multi-modal data. Although the sparse canonical correlation analysis is a powerful bi-multivariate association analysis technique for feature selection, we are still facing major challenges in integrating multi-...
Article
Full-text available
Motivation Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results This rel...
Article
Full-text available
Neurological complications worsen outcomes in COVID-19. To define the prevalence of neurological conditions among hospitalized patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test in geographically diverse multinational populations during early pandemic, we used electronic health records (EHR) from 338 participat...
Article
Neurological complications worsen outcomes in COVID-19. To define the prevalence of neurological conditions among hospitalized patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test in geographically diverse multinational populations during early pandemic, we used electronic health records (EHR) from 338 participat...