Kenney Ng

Kenney Ng
IBM Research - Thomas J. Watson Research Center

PhD

About

175
Publications
23,913
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,426
Citations
Additional affiliations
January 2005 - January 2012
IBM
Position
  • Software Engineer
January 2000 - January 2006
iPhrase Technologies
Position
  • Principal Software Engineer
January 1995 - January 2000
Massachusetts Institute of Technology
Position
  • Research Assistant

Publications

Publications (175)
Preprint
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and t...
Preprint
Full-text available
Background Developing medicine from scratch to governmental authorization and detecting adverse drug reactions (ADR) have barely been economical, expeditious, and risk-averse investments. The availability of large-scale observational healthcare databases and the popularity of large language models offer an unparalleled opportunity to enable automat...
Conference Paper
Full-text available
Biomedical foundation models, trained on diverse sources of small molecule data, hold great potential for accelerating drug discovery. However, their complex nature often presents a barrier for researchers seeking scientific insights and drug candidate generation. SPARK addresses this challenge by providing a user-friendly, web-based interface that...
Article
There is mounting interest in the possibility that metformin, indicated for glycemic control in type 2 diabetes, has a range of additional beneficial effects. Randomized trials have shown that metformin prevents adverse cardiovascular events, and metformin use has also been associated with reduced cognitive decline and cancer incidence. In this pap...
Article
Full-text available
Causal inference from observational data often rests on the unverifiable assumption of no unmeasured confounding. Recently, Tchetgen Tchetgen and colleagues have introduced proximal inference to leverage negative control outcomes and exposures as proxies to adjust for bias from unmeasured confounding. However, some of the key assumptions that proxi...
Article
OBJECTIVE To characterize distinct islet autoantibody profiles preceding stage 3 type 1 diabetes RESEARCH DESIGN AND METHODS The T1DI (Type 1 Diabetes Intelligence) study combined data from 1,845 genetically susceptible prospectively observed children who were positive for at least one islet autoantibody: insulin autoantibody (IAA), GAD antibody (...
Article
Full-text available
Fibrotic diseases affect multiple organs and are associated with morbidity and mortality. To examine organ-specific and shared biologic mechanisms that underlie fibrosis in different organs, we developed machine learning models to quantify T1 time, a marker of interstitial fibrosis, in the liver, pancreas, heart and kidney among 43,881 UK Biobank p...
Article
Full-text available
Increased left atrial volume and decreased left atrial function have long been associated with atrial fibrillation. The availability of large-scale cardiac magnetic resonance imaging data paired with genetic data provides a unique opportunity to assess the genetic contributions to left atrial structure and function, and understand their relationshi...
Article
Objective The COVID-19 pandemic presented a challenge to inpatient safety. It is unknown whether there were spillover effects due to COVID-19 into non–COVID-19 care and safety. We sought to evaluate the changes in inpatient Agency for Healthcare Research and Quality patient safety indicators (PSIs) in the United States before and during the first s...
Preprint
Full-text available
We developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding...
Preprint
Full-text available
In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estim...
Article
Full-text available
A fundamental challenge in diagnostics is integrating multiple modalities to develop a joint characterization of physiological state. Using the heart as a model system, we develop a cross-modal autoencoder framework for integrating distinct data modalities and constructing a holistic representation of cardiovascular state. In particular, we use our...
Article
Full-text available
Myocardial interstitial fibrosis is associated with cardiovascular disease and adverse prognosis. Here, to investigate the biological pathways that underlie fibrosis in the human heart, we developed a machine learning model to measure native myocardial T1 time, a marker of myocardial fibrosis, in 41,505 UK Biobank participants who underwent cardiac...
Article
Background: As the largest conduit vessel, the aorta is responsible for the conversion of phasic systolic inflow from ventricular ejection into more continuous peripheral blood delivery. Systolic distention and diastolic recoil conserve energy and are enabled by the specialized composition of the aortic extracellular matrix. Aortic distensibility...
Article
Objective: To estimate the risk of progression to stage 3 type 1 diabetes based on varying definitions of multiple islet autoantibody positivity (mIA). Research design and methods: Type 1 Diabetes Intelligence (T1DI) is a combined prospective data set of children from Finland, Germany, Sweden, and the U.S. who have an increased genetic risk for...
Preprint
Full-text available
The role of race in medical decision-making has been a contentious issue. Insights from history and population genetics suggest considering race as a differentiating marker for medical practices can be influenced by systemic bias, leading to serious errors. This may negatively impact treatment of complex diseases such as cardiovascular disease (CVD...
Article
Background: Screening for islet autoantibodies in children and adolescents identifies individuals who will later develop type 1 diabetes, allowing patient and family education to prevent diabetic ketoacidosis at onset and to enable consideration of preventive therapies. We aimed to assess whether islet autoantibody screening is effective for predi...
Article
Full-text available
For any given body mass index (BMI), individuals vary substantially in fat distribution, and this variation may have important implications for cardiometabolic risk. Here, we study disease associations with BMI-independent variation in visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) fat depots in 40,032 individuals of the UK...
Chapter
This chapter provides a comprehensive overview to data driven disease progression modeling techniques. It adopts a broad approach to disease progression, focusing on all computational methods able to model any temporal aspects of disease progression. Consequently, we have focused on three classes of analysis: staging and trajectory estimation analy...
Article
Full-text available
Aims/hypothesis The aim of this study was to explore the utility of islet autoantibody (IAb) levels for the prediction of type 1 diabetes in autoantibody-positive children. Methods Prospective cohort studies in Finland, Germany, Sweden and the USA followed 24,662 children at increased genetic or familial risk of developing islet autoimmunity and d...
Conference Paper
Full-text available
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across differe...
Article
Full-text available
Our previous data-driven analysis of evolving patterns of islet autoantibodies (IAbs) against insulin (IAA), glutamic acid decarboxylase (GADA) and islet antigen 2 (IA-2A) discovered three trajectories characterized by either multiple IAbs (TR1), IAA (TR2), or GADA (TR3) as the first appearing autoantibodies. Here we examined the evolution of IAb l...
Preprint
Full-text available
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across differe...
Article
Full-text available
The clinical presentation of amyotrophic lateral sclerosis (ALS), a fatal neurodegenerative disease, varies widely across patients, making it challenging to determine if potential therapeutics slow progression. We sought to determine whether there were common patterns of disease progression that could aid in the design and analysis of clinical tria...
Article
Background The first surge of the COVID-19 pandemic entirely altered healthcare delivery. Whether this also altered the receipt of high- and low-value care is unknown.Objective To test the association between the April through June 2020 surge of COVID-19 and various high- and low-value care measures to determine how the delivery of care changed.Des...
Article
Background The left ventricular outflow tract (LVOT) and ascending aorta are spatially complex, with distinct pathologies and embryologic origins. Prior work examined the genetics of thoracic aortic diameter in a single plane. Objectives We sought to elucidate the genetic basis for the diameter of the LVOT, aortic root, and ascending aorta. Metho...
Article
Full-text available
Background State-of-the-art genetic risk interpretation for a common complex disease such as coronary artery disease (CAD) requires assessment for both monogenic variants—such as those related to familial hypercholesterolemia—as well as the cumulative impact of many common variants, as quantified by a polygenic score. Objectives The objective of t...
Preprint
Full-text available
Causal inference from observational data often rests on the unverifiable assumption of no unmeasured confounding. Recently, Tchetgen Tchetgen and colleagues have introduced proximal inference to leverage negative control outcomes and exposures as proxies to adjust for bias from unmeasured confounding. However, some of the key assumptions that proxi...
Article
Full-text available
Inter-individual variation in fat distribution is increasingly recognized as clinically important but is not routinely assessed in clinical practice, in part because medical imaging has not been practical to deploy at scale for this task. Here, we report a deep learning model trained on an individual's body shape outline-or "silhouette" -that enabl...
Article
Full-text available
Prediction models are commonly used to estimate risk for cardiovascular diseases, to inform diagnosis and management. However, performance may vary substantially across relevant subgroups of the population. Here we investigated heterogeneity of accuracy and fairness metrics across a variety of subgroups for risk prediction of two common diseases: a...
Article
Full-text available
For any given level of overall adiposity, individuals vary considerably in fat distribution. The inherited basis of fat distribution in the general population is not fully understood. Here, we study up to 38,965 UK Biobank participants with MRI-derived visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volumes. B...
Article
Congenital heart diseases often involve maldevelopment of the evolutionarily recent right heart chamber. To gain insight into right heart structure and function, we fine-tuned deep learning models to recognize the right atrium, right ventricle and pulmonary artery, measuring right heart structures in 40,000 individuals from the UK Biobank with magn...
Preprint
Full-text available
A fundamental challenge in diagnostics is integrating multiple modalities to develop a joint characterization of physiological state. Using the heart as a model system, we develop a cross-modal autoencoder framework for integrating distinct data modalities and constructing a holistic representation of cardio-vascular state. In particular, we use ou...
Article
Our previous data-driven analysis from five large-scale prospective studies discovered three trajectories (TR1, TR2, and TR3) composed of latent states for evolving patterns of islet autoantibodies (IAbs) : IAA, GADA and IA-2A. Here we examined the evolution of IAb levels within these trajectories for 2145 IAb positive participants, followed from e...
Article
Full-text available
With availability of voluminous sets of observational data, an empirical paradigm to screen for drug repurposing opportunities (i.e., beneficial effects of drugs on non‐indicated outcomes) is feasible. In this paper, we use a linked claims and electronic health record database to comprehensively explore repurposing effects of anti‐hypertensive drug...
Article
Importance: Pathogenic variants associated with inherited cardiomyopathy are recognized as important and clinically actionable when identified, leading some clinicians to recommend population-wide genomic screening. Objective: To determine the prevalence and clinical importance of pathogenic variants associated with inherited cardiomyopathy with...
Article
Full-text available
Rapid advances in artificial intelligence (AI) and availability of biological, medical, and healthcare data have enabled the development of a wide variety of models. Significant success has been achieved in a wide range of fields, such as genomics, protein folding, disease diagnosis, imaging, and clinical tasks. Although widely used, the inherent o...
Article
Full-text available
Development of islet autoimmunity precedes the onset of type 1 diabetes in children, however, the presence of autoantibodies does not necessarily lead to manifest disease and the onset of clinical symptoms is hard to predict. Here we show, by longitudinal sampling of islet autoantibodies (IAb) to insulin, glutamic acid decarboxylase and islet antig...
Article
Full-text available
Importance: Familial hypercholesterolemia variants impair clearance of cholesterol from the circulation and increase risk of coronary artery disease (CAD). The extent to which adherence to a healthy lifestyle is associated with a lower risk of CAD in carriers and noncarriers of variants warrants further study. Objective: To assess the associatio...
Article
Full-text available
Context Rapid growth has been suggested to promote islet autoimmunity and progression to type 1 diabetes. Childhood growth has not been analyzed separately from infant growth period in most previous studies, which may have distinct features due to differences between those stages of development. Objective We aimed to analyze the association of chi...
Article
Full-text available
The central task of causal inference is to remove (via statistical adjustment) confounding bias that would be present in naive unadjusted comparisons of outcomes in different treatment groups. Statistical adjustment can roughly be broken down into two steps. In the first step, the researcher selects some set of variables to adjust for. In the secon...
Article
Full-text available
To date, there have been 180 million confirmed cases of COVID-19, with more than 3.8 million deaths, reported to WHO worldwide. In this paper we address the problem of understanding the host genome's influence, in concert with clinical variables, on the severity of COVID-19 manifestation in the patient. Leveraging positive-unlabeled machine learnin...
Preprint
Full-text available
Background: Inter-individual variation in fat distribution is increasingly recognized as clinically important but is not routinely assessed in clinical practice because quantification requires medical imaging. Objectives: We hypothesized that a deep learning model trained on an individual's body shape outline - or silhouette - would enable accurate...
Article
Introduction: Pathogenic DNA variants associated with inherited cardiomyopathies are recognized as clinically important and actionable when identified, leading some clinicians to recommend population-wide genomic screening. The prevalence and clinical importance of such variants within the context of contemporary clinical care warrant further study...
Article
Background: Obesity is defined based on body-mass index (BMI), a proxy for overall adiposity. However, for any given BMI, individuals vary substantially in fat distribution. The clinical implications of this variability are not fully understood. Methods: We studied MRI imaging data of 40,032 UK Biobank participants. Using previously quantified visc...
Article
Objective: To use islet autoantibody titers to improve the estimation of future type 1 diabetes risk in children. Research design and methods: Prospective cohort studies in Finland, Germany, Sweden, and the U.S. followed 24,662 children at increased genetic or familial risk to develop islet autoimmunity and diabetes. For 1,604 children with conf...
Preprint
Full-text available
Myocardial interstitial fibrosis is a common thread in multiple cardiovascular diseases including heart failure, atrial fibrillation, conduction disease and sudden cardiac death. To investigate the biologic pathways that underlie interstitial fibrosis in the human heart, we developed a machine learning model to measure myocardial T1 time, a marker...
Preprint
Full-text available
Background The left ventricular outflow tract (LVOT) and ascending aorta are spatially complex, with distinct pathologies and embryologic origins. Prior work examined genetics of thoracic aortic diameter in a single plane. We sought to elucidate the genetic basis for the diameter of the LVOT, the aortic root, and the ascending aorta. Methods We use...
Preprint
Full-text available
As the largest conduit vessel, the aorta is responsible for the conversion of phasic systolic inflow from ventricular ejection into more continuous blood delivery to peripheral arteries. Distension during systole and recoil during diastole conserves ventricular energy and is enabled by the specialized composition of the aortic extracellular matrix....
Article
Full-text available
Background Polygenic scores—which quantify inherited risk by integrating information from many common sites of DNA variation—may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of standardized approaches for score disclosure. Here, we review the landscape of polygeni...
Article
Full-text available
Current cardiovascular risk assessment tools use a small number of predictors. Here, we study how machine learning might: (1) enable principled selection from a large multimodal set of candidate variables and (2) improve prediction of incident coronary artery disease (CAD) events. An elastic net-based Cox model (ML4HEN-COX) trained and evaluated in...
Preprint
Full-text available
Prediction models are commonly used to estimate risk for cardiovascular diseases; however, performance may vary substantially across relevant subgroups of the population. Here we investigated the variability of performance and fairness across a variety of subgroups for risk prediction of two common diseases, atherosclerotic cardiovascular disease (...
Preprint
Full-text available
For any given level of overall adiposity - as commonly quantified by body mass index (BMI) within clinical practice - individuals vary considerably in fat distribution. We and others have noted that increased visceral fat (VAT) is associated with increased cardiometabolic risk, while gluteofemoral fat (GFAT) may be protective. Familial partial lipo...
Preprint
Full-text available
Aims Increased left atrial (LA) volume is a known risk factor for atrial fibrillation (AF). There is also emerging evidence that alterations in LA function due to an atrial cardiomyopathy are associated with an increased risk of AF. The availability of large-scale cardiac MRI data paired with genetic data provides a unique opportunity to assess the...
Article
Background Individuals of South Asian ancestry represent 23% of the global population, corresponding to 1.8 billion people, and have substantially higher risk of atherosclerotic cardiovascular disease compared with most other ethnicities. US practice guidelines now recognize South Asian ancestry as an important risk-enhancing factor. The magnitude...
Article
Full-text available
Background Parkinson's disease is heterogeneous in symptom presentation and progression. Increased understanding of both aspects can enable better patient management and improve clinical trial design. Previous approaches to modelling Parkinson's disease progression assumed static progression trajectories within subgroups and have not adequately acc...
Article
OBJECTIVE To combine prospective cohort studies, by including HLA harmonization, and estimate risk of islet autoimmunity and progression to clinical diabetes. RESEARCH DESIGN AND METHODS For prospective cohorts in Finland, Germany, Sweden, and the U.S., 24,662 children at increased genetic risk for development of islet autoantibodies and type 1 di...
Preprint
Full-text available
Bayesian decision theory provides an elegant framework for acting optimally under uncertainty when tractable posterior distributions are available. Modern Bayesian models, however, typically involve intractable posteriors that are approximated with, potentially crude, surrogates. This difficulty has engendered loss-calibrated techniques that aim to...
Article
While associations between the type and number of islet autoantibodies and progression to type 1 diabetes (T1D) have been reported, the effect of titer values is less well understood. We aim to quantify the ability of autoantibody titers at seroconversion to improve T1D onset prediction. Prospective cohorts in Finland, Germany, Sweden, and the US h...
Preprint
Full-text available
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that is complex in its onset, pattern of spread, and disease progression. The heterogeneity of ALS makes it extremely challenging to determine if a disease modifying therapy is effectively slowing progression. While accurately modeling ALS progression is critical to developing thera...
Article
Full-text available
Deep learning architectures have an extremely high-capacity for modeling complex data in a wide variety of domains. However, these architectures have been limited in their ability to support complex prediction problems using insurance claims data, such as readmission at 30 days, mainly due to data sparsity issue. Consequently, classical machine lea...
Preprint
Full-text available
Background Obesity is defined based on body-mass index (BMI), a proxy for overall adiposity. However, for any given BMI, individuals vary substantially in fat distribution. The clinical implications of this variability are not fully understood. Methods We studied MRI imaging data of 40,032 UK Biobank participants. Using previously quantified viscer...
Preprint
Full-text available
Background: Polygenic scores - which quantify inherited risk by integrating information from many common sites of DNA variation - may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of systematic approaches for score disclosure. Here, we review the landscape of polyg...
Preprint
Full-text available
Deep learning architectures have an extremely high-capacity for modeling complex data in a wide variety of domains. However, these architectures have been limited in their ability to support complex prediction problems using insurance claims data, such as readmission at 30 days, mainly due to data sparsity issue. Consequently, classical machine lea...
Preprint
Full-text available
The SARS-CoV2 virus behind the COVID-19 pandemic is manifesting itself in different ways among infected people. While many are experiencing mild flue-like symptoms or are even remaining asymptomatic after infection, the virus has also led to serious complications, overloading ICUs while claiming more than 2.6 million lives world-wide. In this work,...
Preprint
Full-text available
The heart evolved hundreds of millions of years ago. During mammalian evolution, the cardiovascular system developed with complete separation between pulmonary and systemic circulations incorporated into a single pump with chambers dedicated to each circulation. A lower pressure right heart chamber supplies deoxygenated blood to the lungs, while a...
Article
Full-text available
Analyzing disease progression patterns can provide useful insights into the disease processes of many chronic conditions. These analyses may help inform recruitment for prevention trials or the development and personalization of treatments for those affected. We learn disease progression patterns using Hidden Markov Models (HMM) and distill them in...