Longhai LI

Longhai LI
University of Saskatchewan | U of S · Department of Mathematics and Statistics

Doctor of Philosophy
I am currently working on developing residual diagnostics tools for various statistical models.

About

37
Publications
15,989
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
295
Citations
Introduction
My research activities focus on developing and applying statistical learning methods for high-throughput data and spatial-temporal data. My research articles with PDF files are also archived with zotero: https://www.zotero.org/longhai
Additional affiliations
July 2018 - present
University of Saskatchewan
Position
  • Professor (Full)
July 2007 - July 2018
University of Saskatchewan
Position
  • Professor (Assistant)
September 2002 - July 2007
University of Toronto
Position
  • PhD Student

Publications

Publications (37)
Article
Autism spectrum disorder (ASD) is a neurological developmental disorder that typically causes impaired communication and compromised social interactions. The current clinical assessment of ASD is typically based on behavioral observations and lack of the understanding of the neurological mechanism and the progression of the brain development. The f...
Article
Full-text available
Background For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demand...
Article
Genome-Wide Association Studies (GWAS) has demonstrated its power in discovering genetic variations to particular traits related to agronomically important features in crops. The typical output of a GWAS program includes a series of Single Nucleotide Polymorphisms (SNPs) and their significance. Currently, there is no standard way to compare results...
Article
Residuals in normal regression are used to assess a model's goodness‐of‐fit (GOF) and discover directions for improving the model. However, there is a lack of residuals with a characterized reference distribution for censored regression. In this article, we propose to diagnose censored regression with normalized randomized survival probabilities (R...
Article
In modern neuroscience and clinical study, neuroscientists and clinicians often use non-invasive imaging techniques to validate theories and computational models, observe brain activities and diagnose brain disorders. The functional Magnetic Resonance Imaging (fMRI) is one of the commonly-used imaging modalities that can be used to understand human...
Article
Full-text available
Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p...
Article
Full-text available
Background: Examining residuals is a crucial step in statistical analysis to identify the discrepancies between models and data, and assess the overall model goodness-of-fit. In diagnosing normal linear regression models, both Pearson and deviance residuals are often used, which are equivalently and approximately standard normally distributed when...
Article
Full-text available
Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to identify gene signatures that are related to a certain disease from high-dimensional gene expression data. The expression of genes may have grouping structures, for example, a group of co-regulated genes that have simila...
Book
Full-text available
I am very pleased that, thanks to the hard work of Mohsen Soltanifar and Longhai Li, this solutions manual for my book is now available. I hope readers will find these solutions helpful as you struggle with learning the foundations of measure-theoretic probability. Of course, you will learn best if you first attempt to solve the exercises on your o...
Article
Full-text available
Background: Antimicrobial resistance (AMR) is a major threat to global public health because it makes standard treatments ineffective and contributes to the spread of infections. It is important to understand AMR's biological mechanisms for the development of new drugs and more rapid and accurate clinical diagnostics. The increasing availability o...
Preprint
Full-text available
Residual analysis is extremely important in regression modelling. Residuals are used to graphically and numerically check the overall goodness-of-fit of a model, to discover the direction for improving the model, and to identify outlier observations. Cox-Snell residuals, which are transformed from survival probabilities (SPs), are typically used fo...
Article
Full-text available
Background: Falls pose major health problems to the middle-aged and older adults and may potentially lead to various levels of injuries. Sleep duration and disturbances have been shown to be associated with falls in literature; however, studies of the joint and distinct effects of those sleep problems are still sparse. To fill this gap, we aimed t...
Article
Full-text available
Background: Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. TEs are believed to promote speciation and their activities play a significant role in human disease. In the human genome, the 22 AluY and 6 AluS TE subfamilies have been the most recently active, and their transposition h...
Article
Full-text available
Residual analysis is a standard tool for assessing normal regression. However, for a discrete response, the traditional Pearson and deviance residuals cluster on lines and their distributions are far from normality. Graphical and quantitative inspection of these residuals provides little information for model diagnosis. Marshall and Spiegelhalter (...
Article
In practice, survival data are often collected over geographical regions. Shared spatial frailty models have been used to model spatial variation in survival times, which are often implemented using the Bayesian Markov chain Monte Carlo method. However, this method comes at the price of slow mixing rates and heavy computational cost, which may rend...
Article
Full-text available
An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to reru...
Chapter
Ischaemic heart disease is the top one cause of death in the world; however, quantifying its burden in a population is a challenge. Hospitalization data provide a proxy for measuring the severity of ischaemic heart disease. Length of stay (LOS) in hospital is often used as an indicator of hospital efficiency and a proxy of resource consumption, whi...
Article
Full-text available
Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the most useful genes that are related to a certain disease (eg, cancer) from high-dimensional gene expressions. The expressions of genes have grouping structures, for example, a group of co-regulated genes that hav...
Article
Full-text available
An important statistical task in disease mapping problems is to identify out- lier/divergent regions with unusually high or low residual risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is a gold standard for computing predictive p-value that can flag such outliers. However, actual LOOCV is time-consuming because one needs t...
Article
Full-text available
High-dimensional feature selection arises in many areas of modern sciences. For example, in genomic research we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal) from tens of thousands of genes that are active (expressed) in certain tissue cells. To this end, we wish to fit regression and class...
Article
Full-text available
A natural method for approximating out-of-sample predictive evaluation is leave-one-out cross-validation (LOOCV) --- we alternately hold out each case from a full data set and then train a Bayesian model using Markov chain Monte Carlo (MCMC) without the held-out; at last we evaluate the posterior predictive distribution of all cases with their actu...
Article
Full-text available
The mode of a distribution provides an important summary of data and is often estimated on the basis of some non-parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high-dimensional data. Modal linear regression models the conditional mode of a response Y given a set...
Article
Full-text available
The problem of selecting the most useful features from a great many (eg, thousands) of candidates arises in many areas of modern sciences. An interesting problem from genomic research is that, from thousands of genes that are active (expressed) in certain tissue cells, we want to find the genes that can be used to separate tissues of different clas...
Article
Full-text available
Concerns about the concentrations of chlorofluorocarbons (CFCs) in the atmosphere are based on their effects on the ozone layer by catalytically destroying ozone. The recent steady decline in atmospheric concentration of CFC could be a direct result of the Montréal Protocol's ban on CFC products, in effect since 1989. However, CFCs have long atmosp...
Article
Discriminant analysis (DA) procedures based on parsimonious mean and/or covariance structures have recently been proposed for repeated measures data. However, these procedures rest on the assumption of a multivariate normal distribution. This study examines repeated measures DA (RMDA) procedures based on maximum likelihood (ML) and coordinatewise t...
Article
Full-text available
Solving label switching is crucial for interpreting the results of fitting Bayesian mixture models. The label switching originates from the invariance of posterior distribution to permutation of component labels. As a result, the component labels in Markov chain simulation may switch to another equivalent permutation, and the marginal posterior dis...
Article
Full-text available
Discriminant analysis (DA) procedures based on parsimonious mean and/or covariance structures have been proposed for repeated measures (RM) data. Bias and means square error of discriminant function coefficients (DFCs) for DA procedures are investigated when the mean and/or covariance structures are correctly specified and misspecified.
Article
Full-text available
Class prediction based on high-dimensional features has received a great deal of attention in many areas. For example, biologists are interested in using microarray gene expression profiles for diagnosis or prognosis of a certain disease (eg, cancer). For com-putational and other reasons, it is necessary to select a subset of features before fittin...

Questions

Question (1)
Question
How to define a data science? 

Network

Cited By

Projects

Projects (2)
Project
Using MCMC methods with heavy-tailed priors to select sparse signal