Jialiang Li

Jialiang Li
  • PhD
  • Professor (Full) at National University of Singapore

About

221
Publications
35,577
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,121
Citations
Current institution
National University of Singapore
Current position
  • Professor (Full)
Additional affiliations
January 2008 - present
Duke-NUS Medical School
Position
  • Professor (Associate)
January 2008 - present
Singapore Eye Research Institute
Position
  • Researcher
July 2006 - present
National University of Singapore
Position
  • Professor (Associate)
Education
July 2003 - June 2005
University of Wisconsin–Madison
Field of study
  • Population Health Sciences
July 2001 - June 2006
University of Wisconsin–Madison
Field of study
  • Statistics
July 1997 - June 2001

Publications

Publications (221)
Preprint
Full-text available
The community detection problem on multilayer networks have drawn much interest. When the nodal covariates ar also present, few work has been done to integrate information from both sources. To leverage the multilayer networks and the covariates, we propose two new algorithms: the spectral clustering on aggregated networks with covariates (SCANC),...
Article
Full-text available
Background In randomized clinical trials (RCTs) with non-compliance, evaluating the causal effects of interventions would lead to a more precise estimation of treatment effect when the estimand of interest is the effect of treatment amongst compliers. While there is a large body of literature addressing the issue of non-compliance for continuous, b...
Article
The additive hazard model, which focuses on risk differences rather than risk ratios, has been widely applied in practice. In this paper, we consider an additive hazard model with varying coefficients to analyze recurrent events data. The model allows for both varying and constant coefficients. We first propose an estimating equation‐based approach...
Article
Pharmacogenomics stands as a pivotal driver toward personalized medicine, aiming to optimize drug efficacy while minimizing adverse effects by uncovering the impact of genetic variations on inter-individual outcome variability. Despite its promise, the intricate landscape of drug metabolism introduces complexity, where the correlation between drug...
Article
Using informative sources to enhance statistical analysis in target studies has become an increasingly popular research topic. However, cohorts with time‐to‐event outcomes have not received sufficient attention, and external studies often encounter issues of incomparability due to population heterogeneity and unmeasured risk factors. To improve ind...
Article
Full-text available
Background Dementia, particularly Alzheimer's disease, is a major healthcare challenge in ageing societies. Objective This study aimed to investigate the efficacy and safety of a dietary compound, ergothioneine, in delaying cognitive decline in older individuals. Methods Nineteen subjects aged 60 or above with mild cognitive impairment were recru...
Article
To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non‐standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non‐monotone tr...
Article
Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we prop...
Preprint
Background and objective: Dementia, particularly Alzheimer's disease, is a major healthcare challenge in ageing societies. Therefore, this study aimed to investigate the efficacy and safety of a dietary compound, ergothioneine, in delaying cognitive decline in elderly individuals. Design, intervention and measurements: Nineteen subjects aged 60 or...
Article
In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a sub...
Article
The analysis of streaming time-to-event cohorts has garnered significant research attention. Most existing methods require observed cohorts from a study sequence to be independent and identically sampled from a common model. This assumption may be easily violated in practice. Our methodology operates within the framework of online data updating, wh...
Article
Background and Objective: There is a rising interest in exploiting aggregate information from external medical studies to enhance the statistical analysis of a modestly sized internal dataset. Currently available software packages for analyzing survival data with a cure fraction ignore the potentially available auxiliary information. This paper aim...
Article
Background Diabetic kidney disease (DKD) and diabetic retinopathy (DR) are major diabetic microvascular complications, contributing significantly to morbidity, disability, and mortality worldwide. The kidney and the eye, having similar microvascular structures and physiological and pathogenic features, may experience similar metabolic changes in di...
Article
We propose a new auxiliary information synthesis method to utilize subgroup survival information at multiple time points under the semi-parametric mixture cure rate model. After summarizing the auxiliary information via estimating equations, a control variate technique is adopted to reduce the variance efficiently, together with a test statistic to...
Article
Full-text available
Background: Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD). Methods: We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabet...
Article
Full-text available
There is a lack of data on the adequacy of nutrient intake and prevalence of malnutrition risk in Asian populations. The aim was to report on the nutrient intake and prevalence of malnutrition risk in a community sample of older adults in Singapore. Analysis was performed on 738 (n = 206 male, n = 532 male, aged 67.6 ± 6.0 years) adults 60 years an...
Article
Mendelian randomization is a technique used to examine the causal effect of a modifiable exposure on a trait using an observational study by utilizing genetic variants. The use of many instruments can help to improve the estimation precision but may suffer bias when the instruments are weakly associated with the exposure. To overcome the difficulty...
Article
Full-text available
Central to personalized medicine and tailored therapies is discovering the subpopulations that account for treatment effect heterogeneity and are likely to benefit more from given interventions. In this article, we introduce a change plane model averaging method to identify subgroups characterized by linear combinations of predictive variables and...
Article
We propose a new model averaging approach to investigate segment regression models with multiple threshold variables and multiple structural breaks. We first fit a series of models, each with a single threshold variable and multiple breaks over its domain, using a two‐stage change point detection method. Then these models are combined together to p...
Preprint
BACKGROUND Diabetic kidney disease (DKD) and diabetic retinopathy (DR) are major diabetic microvascular complications, contributing significantly to morbidity, disability, and mortality worldwide. The kidney and the eye, having similar microvascular structures and physiological and pathogenic features, may experience similar metabolic changes in di...
Article
There has been a growing interest in incorporating auxiliary summary information from external studies into the analysis of internal individual‐level data. In this paper, we propose an adaptive estimation procedure for an additive risk model to integrate auxiliary subgroup survival information via a penalized method of moments technique. Our approa...
Article
Computation of hypervolume under ROC manifold (HUM) is necessary to evaluate biomarkers for their capability to discriminate among multiple disease types or diagnostic groups. However the original definition of HUM involves multiple integration and thus a medical investigation for multi‐class receiver operating characteristic (ROC) analysis could s...
Article
Partly interval‐censored event time data arise naturally in medical, biological, sociological and demographic studies. In practice, some patients may be immune from the event of interest, invoking a cure model for survival analysis. Choosing an appropriate parametric distribution for the failure time of susceptible patients is an important step to...
Article
Full-text available
Results from multiple diagnostic tests are combined in many ways to improve the overall diagnostic accuracy. For binary classification, maximization of the empirical estimate of the area under the receiver operating characteristic curve has widely been used to produce an optimal linear combination of multiple biomarkers. However, in the presence of...
Article
We propose a novel two‐stage procedure for change point detection and parameter estimation in a multi‐threshold proportional hazards model. In the first stage, we estimate the number of thresholds by formulating the threshold detection problem as a variable selection problem and applying the penalized partial likelihood approach. In the second stag...
Article
Full-text available
Minimizing the divergence between two probability distributions offers an alternative parameter estimation method. The current literature mainly focuses on minimizing the Kullback-Leibler (K-L) divergence between the true and the proposed models in which the true model is assumed to be known or fixed. In this paper, we propose a parameter estimatio...
Preprint
Full-text available
Background: Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multi-dimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD). Methods: We utilized longitudinal data from 1365 Chinese, Malay and Indian participants aged 40-80 years with di...
Preprint
AIMS Using machine learning integrated with clinical and metabolomic data to identify biomarkers associated with diabetic kidney disease (DKD) and diabetic retinopathy (DR), and to improve the performance of DKD/DR detection models beyond traditional risk factors. METHODS We examined a population-based cross-sectional sample of 2,772 adults with t...
Article
We consider the structural equation models with both high-dimensional instrumental variables and observed exogenous covariates. To overcome the challenge of unknown optimal instruments and high-dimensional exogenous confounders, a novel two-stage sparse boosting approach is proposed to select the optimal instruments and important exogenous variable...
Article
We consider the problem of estimating multiple change points for a functional data process. There are numerous examples in science and finance in which the process of interest may be subject to some sudden changes in the mean. The process data that are not in a close vicinity of any change point can be analysed by the usual nonparametric smoothing...
Article
In this paper, we consider the instrumental variable estimation for causal regression parameters with multiple unknown structural changes across subpopulations. We propose a multiple change point detection method to determine the number of thresholds and estimate the threshold locations in the two-stage least squares procedure. After identifying th...
Article
Full-text available
Introduction Chronic kidney disease (CKD) is increasing in Asia, but there are sparse data on incident CKD among different ethnic groups. We aimed to describe the incidence and risk factors associated with CKD in the three major ethnic groups in Asia: Chinese, Malays and Indians. Research design and methods Prospective cohort study of 5580 general...
Article
Medical multi-category diagnostic problems may involve discrete biomarkers. Many traditional accuracy measures are based on the assumption that all biomarkers follow continuous distributions and consequently may underestimate the true discrimination ability of the discrete markers. In particular, we focus on Hypervolume Under ROC Manifold (HUM) in...
Article
Forecasting survival risks for time‐to‐event data is an essential task in clinical research. Practitioners often rely on well‐structured statistical models to make predictions for patient survival outcomes. The nonparametric proportional hazards model, as an extension of the Cox proportional hazards model, involves an additive nonlinear combination...
Article
We propose a multithreshold change plane regression model which naturally partitions the observed subjects into subgroups with different covariate effects. The underlying grouping variable is a linear function of observed covariates and thus multiple thresholds produce change planes in the covariate space. We contribute a novel two‐stage estimation...
Article
The instrumental variable (IV) methods are attractive since they can lead to a consistent answer to the main question in causal modelling, i.e., the estimation of average causal effect of an exposure on the outcome in the presence of unmeasured confounding. However, it is now acknowledged in the literature that using weak IVs might not suit the inf...
Article
Full-text available
We conducted a randomized controlled trial to examine choral singing's effect on cognitive decline in aging. Older Singaporeans who were at high risk of future dementia were recruited: 47 were assigned to choral singing intervention (CSI) and 46 were assigned to health education program (HEP). Participants attended weekly one-hour choral singing or...
Article
Model averaging has attracted abundant attentions in the past decades as it emerges as an impressive forecasting device in econometrics, social sciences and medicine. So far most developed model averaging methods focus only on either parametric models or nonparametric models with a continuous response. In this paper, we propose a semiparametric mod...
Article
Factor modeling is an essential tool for exploring intrinsic dependence structures in financial and economic studies through the construction of common latent variables, including the famous Fama–French three factor models for the description of asset returns in finance. However, most of the existing statistical methods for analyzing latent factors...
Article
Full-text available
The mixture cure model has been widely applied to survival data in which a fraction of the observations never experience the event of interest, despite long-term follow-up. In this paper, we study the Cox proportional hazards mixture cure model where the covariate effects on the distribution of uncured subjects’ failure time may jump when a covaria...
Article
Model average techniques are very useful for model-based prediction. However most earlier works in this field focused on parametric models and continuous responses. In this paper, we study varying coefficient multinomial logistic models and propose a semiparametric model averaging prediction (SMAP) approach for multi-category outcomes. The proposed...
Article
Statistical learning methods are widely used in medical literature for the purpose of diagnosis or prediction. Conventional accuracy assessment via sensitivity, specificity, and ROC curves does not fully account for clinical utility of a specific model. Decision curve analysis (DCA) becomes a novel complement as it incorporates a clinical judgment...
Article
Full-text available
We respond here on a recent letter in this journal, on the transformation based on likelihood ratio.
Article
Full-text available
Background: The concurrent sampling design was developed for case-control studies of recurrent events. It involves matching for time. Standard conditional logistic-regression (CLR) analysis ignores the dependence among recurrent events. Existing methods for clustered observations for CLR do not fit the complex data structure arising from the concu...
Article
Full-text available
PurposePlasma galectin-3 (pG3) regulates inflammation. B-type natriuretic peptide (BNP), high-sensitivity Troponin I (hsTnI), and pG3 concentrations are elevated in chronic kidney disease (CKD) patients. The associations of pG3 with hsTnI/BNP are unclear. We explored the relationship of hsTnI and BNP with pG3 in Asian CKD patients and healthy contr...
Article
Objective To evaluate the performance of machine learning (ML) algorithms and to compare them to logistic regression for the prediction of risk of cardiovascular diseases (CVD), chronic kidney disease (CKD), diabetes (DM), and hypertension (HTN) and in a prospective cohort study using simple clinical predictors. Study Design and Setting We conduct...
Article
Full-text available
We propose a non-monotone transformation to biomarkers in order to improve the diagnostic and screening accuracy. The proposed quadratic transformation only involves modeling the distribution means and variances of the biomarkers and is therefore easy to implement in practice. Mathematical justification was rigorously established to support the val...
Article
Full-text available
Background: Previous simulation studies of the case-control study design using incidence density sampling, which required individual matching for time, showed biased estimates of association from conditional logistic regression (CLR) analysis; however, the reason for this is unknown. Separately, in the analysis of case-control studies using the ex...
Article
Full-text available
Background Socioeconomic disparities in infant mortality have persisted for decades in high-income countries and may have become stronger in some populations. Therefore, new understandings of the mechanisms that underlie socioeconomic differences in infant deaths are essential for creating and implementing health initiatives to reduce these deaths....
Article
Screening for ultrahigh dimensional features may encounter complicated issues such as outlying observations, heteroscedasticity or heavy-tailed distribution, multi-collinearity and confounding effects. Standard correlation-based marginal screening methods may be a weak solution to these issues. We contribute a novel robust joint screener to safegua...
Article
In the clinical trial community, it is usually not easy to find a treatment that benefits all patients since the reaction to treatment may differ substantially across different patient subgroups. The heterogeneity of treatment effect plays an essential role in personalized medicine. To facilitate the development of tailored therapies and improve th...
Article
Thresholding variable plays a crucial role in subgroup identification for personalized medicine. Most existing partitioning methods split the sample based on one predictor variable. In this paper, we consider setting the splitting rule from a combination of multivariate predictors, such as the latent factors, principle components, and weighted sum...
Article
Full-text available
We examined the cross-sectional association between mushroom intake and mild cognitive impairment (MCI) using data from 663 participants aged 60 and above from the Diet and Healthy Aging (DaHA) study in Singapore. Compared with participants who consumed mushrooms less than once per week, participants who consumed mushrooms >2 portions per week had...
Article
Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for vari...
Article
Recurrent hospitalizations are common in longitudinal studies; however, many forms of cumulative event analyses assume recurrent events are independent. We explore the presence of event dependence when readmissions are spaced apart by at least 30 and 60 days. We set up a comparative framework with the assumption that patients with emergency percuta...
Article
Estimating optimal individualized treatment rules (ITRs) in single or multi-stage clinical trials is one key solution to personalized medicine and has received more and more attention in statistical community. Recent development suggests that using machine learning approaches can significantly improve the estimation over model-based methods. Howeve...
Article
Full-text available
It is often necessary to differentiate subjects from multiple categories using medical tests. We may then adopt statistical measures to characterize the performance of these tests. The three-way ROC analysis has been proposed to evaluate the diagnostic accuracy of medical tests with three categories, reflecting the correct classification probabilit...
Article
We present a novel model averaging method to construct a prediction function in semi‐parametric form. The weighted sum of candidate semi‐parametric models is taken as a prediction of the mean response. Marginal non‐parametric regression models are approximated by spline basis functions and we apply a Bayesian Monte Carlo approach to fit such models...
Article
Varying-coefficient models are widely used to model nonparametric interaction and recently adopted to analyze longitudinal data measured repeatedly over time. We focus on high-dimensional longitudinal observations in this article. A novel two-step sparse boosting approach is proposed to carry out the variable selection and the model-based predictio...
Preprint
Statistical learning evolves quickly with more and more sophisticated models proposed to incorporate the complicated data structure from modern scientific and business problems. Varying index coefficient models extend varying coefficient models and single index models, becoming the latest state-of-the-art for semiparametric regression. This new cla...
Preprint
We propose a multi-threshold change plane regression model which naturally partitions the observed subjects into subgroups with different covariate effects. The underlying grouping variable is a linear function of covariates and thus multiple thresholds form parallel change planes in the covariate space. We contribute a novel 2-stage approach to es...
Article
Full-text available
Introduction: This study is a parallel-arm randomized controlled trial evaluating choral singing’s efficacy and underlying mechanisms in preventing cognitive decline in at-risk older participants. Methods: Three-hundred and sixty community-dwelling, non-demented older participants are recruited for a 2-year intervention. Inclusion criteria are self...
Preprint
Full-text available
Screening for ultrahigh dimensional features may encounter complicated issues such as outlying observations, heteroscedasticity or heavy-tailed distribution, multi-collinearity and confounding effects. Standard correlation-based marginal screening methods may be a weak solution to these issues. We contribute a novel robust joint screener to safegua...
Article
In this paper, we consider a robust approach to the ultrahigh dimensional variable screening under varying coefficient models. While the existing works focusing on the mean regression function, we propose a procedure based on conditional quantile correlation sure independence screening (CQC-SIS). This proposal is applicable to heterogeneous or heav...
Article
Full-text available
Longitudinal data analysis requires a proper estimation of the within-cluster correlation structure in order to achieve efficient estimates of the regression parameters. When applying likelihood-based methods one may select an optimal correlation structure by the AIC or BIC. However, such information criteria are not applicable for estimating equat...
Article
Instrumental variable (IV) methods are popular in non-experimental settings to estimate the causal effects of scientific interventions. These approaches allow for the consistent estimation of treatment effects even if major confounders are unavailable. There have been some extensions of IV methods to survival analysis recently. We specifically cons...
Article
The Fama-French three factor models are commonly used in the description of asset returns in finance. Statistically speaking, the Fama-French three factor models imply that the return of an asset can be accounted for directly by the Fama-French three factors, i.e. market, size and value factor, through a linear function. A natural question is: woul...
Article
A two-stage procedure for simultaneously detecting multiple thresholds and achieving model selection in the segmented accelerated failure time (AFT) model is developed in this paper. In the first stage, we formulate the threshold problem as a group model selection problem so that a concave 2-norm group selection method can be applied. In the second...
Article
Forecasting in economic data analysis is dominated by linear prediction methods where the predicted values are calculated from a fitted linear regression model. With multiple predictor variables, multivariate nonparametric models were proposed in the literature. However, empirical studies indicate the prediction performance of multi-dimensional non...
Article
Motivated by high-throughput profiling studies in biomedical research, variable selection methods have been a focus for biostatisticians. In this paper, we consider semiparametric varying-coefficient accelerated failure time models for right censored survival data with high-dimensional covariates. Instead of adopting the traditional regularization...
Article
We consider sample size calculation to obtain sufficient estimation precision and control the length of confidence intervals under high dimensional assumptions. In particular, we intend to provide more general results for sample size determination when a large number of parameter values need to be computed for a fixed sample. We consider three desi...
Article
Forecasting and predictive inference are fundamental data analysis tasks. Most studies employ parametric approaches making strong assumptions about the data generating process. On the other hand, while nonparametric models are applied, it is sometimes found in situations involving low signal to noise ratios or large numbers of covariates that their...
Article
The functional autoregressive (FAR) model belongs to an important class of models for dependent functional data analysis (FDA) and has been investigated intensively in many applications, especially for modeling the autoregressive dynamics of high-volume time series data. In this paper, we extend the classical FAR model to address the intrinsic loca...
Article
Motivated by risk prediction studies with ultra-high dimensional bio markers, we propose a novel improvement screening methodology. Accurate risk prediction can be quite useful for patient treatment selection, prevention strategy or disease management in evidence-based medicine. The question of how to choose new markers in addition to the conventio...
Article
We provide a detailed review for the statistical analysis of diagnostic accuracy in a multi-category classification task. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity and area under the ROC curve are no longer applicable. In recent literature, new diagnostic ac...
Article
We propose a competing risks approach to analyse customer behaviours in freemium products and services. The event of interest is when a customer starts to pay for additional features or functionalities. The observation of such an event may be preempted by an event where the customer quits using the product before paying and consuming the additional...
Article
Full-text available
Polytomous discrimination index is a novel and important diagnostic accuracy measure for multi-category classification. After reconstructing its probabilistic definition, we propose a nonparametric approach to the estimation of polytomous discrimination index based on an empirical sample of biomarker values. In this paper, we provide the finite-sam...
Article
Full-text available
High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for mode...
Article
Background: Several risk scores have been developed for predicting cognitive impairment and dementia, but none have been validated in Asian samples. We aimed to produce a risk score that best predicts incident neurocognitive disorder (NCD) among Chinese elderly and to validate this score against the modified risk score derived from the Cardiovascul...
Article
Background: Several risk scores have been developed for predicting cognitive impairment and dementia, but none have been validated in Asian samples. We aimed to produce a risk score that best predicts incident neurocognitive disorder (NCD) among Chinese elderly and to validate this score against the modified risk score derived from the Cardiovascul...
Article
Full-text available
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional...
Article
Panel data analysis is an important topic in statistics and econometrics. In such analysis, it is very common to assume the impact of a covariate on the response variable remains constant across all individuals. While the modelling based on this assumption is reasonable when only the global effect is of interest, in general, it may overlook some in...
Article
Full-text available
The Fama-French three factor models are commonly used in the description of asset returns in finance. Statistically speaking, the Fama-French three factor models imply that the return of an asset can be accounted for directly by the Fama-French three factors, i.e. market, size and value factor, through a linear function. A natural question is: woul...
Preprint
The Fama-French three factor models are commonly used in the description of asset returns in finance. Statistically speaking, the Fama-French three factor models imply that the return of an asset can be accounted for directly by the Fama-French three factors, i.e. market, size and value factor, through a linear function. A natural question is: woul...
Article
Generalized varying coefficient model (GVCM) is an important extension of generalized linear model and varying coefficient model. It has been widely applied in many areas. This paper mainly considers the variable screening problem with dichotomous response data under GVCM, where a spline approximation is employed to estimate the coefficient functio...
Article
We propose a Bayesian approach to the estimation of the net reclassification improvement (NRI) and three versions of the integrated discrimination improvement (IDI) under the logistic regression model. Both NRI and IDI were proposed as numerical characterizations of accuracy improvement for diagnostic tests and were shown to retain certain practica...
Article
Full-text available
Motivated by ultrahigh-dimensional biomarkers screening studies, we propose a model-free screening approach tailored to censored lifetime outcomes. Our proposal is built upon the introduction of a new measure, survival impact index (SII). By its design, SII sensibly captures the overall influence of a covariate on the outcome distribution, and can...
Article
Full-text available
In many longitudinal studies, the covariates and response are often intermittently observed at irregular, mismatched and subject-specific times. Last observation carried forward (LOCF) is one of the most commonly used methods to deal with such data when covariates and response are observed asynchronously. However, this can lead to considerable bias...

Network

Cited By