Yan Zhou

Yan Zhou
Shenzhen University · College of Mathematics and Statistics

PhD

About

44
Publications
8,290
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
531
Citations
Additional affiliations
September 2009 - December 2013
Northeast Normal University
Position
  • PhD Student

Publications

Publications (44)
Article
Full-text available
In this paper, we study the model estimation for the partial linear varying coefficient errors-in-variables (EV) models with longitudinal data. Based on the empirical likelihood and quadratic inference functions, we propose an orthogonality-based bias-corrected empirical likelihood estimation method using the QR decomposition method of matrix. The...
Article
Full-text available
While single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression in individual cells, its unique characteristics like over-dispersion, zero-inflation, high gene-gene correlation, and large data volume with many features pose challenges for most existing feature selection methods. In this paper, we present a feature selecti...
Article
We study the goodness of fit tests for checking the normality of the model errors under the additive distortion measurement errors settings. Neither the response variable nor the covariates can be directly observed but are distorted in additive fashions by an observed confounding variable. The proposed test statistics are based on logarithmic trans...
Article
Full-text available
Motivation: The utilization of single-cell bisulfite sequencing (scBS-seq) methods allows for precise analysis of DNA methylation patterns at the individual cell level, enabling the identification of rare populations, revealing cell-specific epigenetic changes, and improving differential methylation analysis. Nonetheless, the presence of sparse da...
Article
This paper considers linear regression models when neither the response variable nor the covariates can be directly observed, but are measured with multiplicative distortion measurement errors. We propose new identifiability conditions for the distortion functions via the varying coefficient models, then moment-based estimators of parameters in the...
Article
Variable selection for varying coefficient models includes the separation of varying and constant effects, and the selection of variables with nonzero varying effects and those with nonzero constant effects. This paper proposes a unified variable selection approach called the double-penalized quadratic inference functions method for varying coeffic...
Article
Full-text available
Background Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (...
Article
We consider quantile functional regression with a functional part and a scalar linear part. We establish the optimal prediction rate for the model under mild assumptions in the reproducing kernel Hilbert space (RKHS) framework. Under stronger assumptions related to the capacity of the RKHS, the non-functional linear part is shown to have asymptotic...
Article
Full-text available
Early detection is crucial to improve breast cancer (BC) patients’ outcomes and survival. Mammogram and ultrasound adopting the Breast Imaging Reporting and Data System (BI-RADS) categorization are widely used for BC early detection, while suffering high false-positive rate leading to unnecessary biopsy, especially in BI-RADS category-4 patients. P...
Article
We consider partially linear quantile regression with a high-dimensional linear part, with the nonparametric function assumed to be in a reproducing kernel Hilbert space. We establish the overall learning rate in this setting, as well as the rate of the linear part separately. Our proof relies heavily on the empirical processes and the Rademacher c...
Article
Full-text available
In medical studies, the collected covariates contain underlying outliers. For clustered/longitudinal data with censored observations, the traditional Gehan-type estimator is robust to outliers in response but sensitive to outliers in the covariate domain, and it also ignores the within-cluster correlations. To take account of within-cluster correla...
Article
Objective To detect the concentration of high-sensitivity cardiac troponin T (hs-cTnT) in healthy children aged 0–14 years by electrochemiluminescence immunoassay (ECLIA), so as to explore the differences in different ages and genders. The aim of this study is to establish the reference interval for hs-cTnT in healthy children aged 0–14 years. Met...
Article
Full-text available
High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data,...
Article
Full-text available
Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of bi...
Article
Full-text available
Bulk and single‐cell RNA‐seq (scRNA‐seq) data are being used as alternatives to traditional technology in biology and medicine research. These data are used, for example, for the detection of differentially expressed (DE) genes. Several statistical methods have been developed for the classification of bulk and single‐cell RNA‐seq data. These featur...
Article
We propose a kernel density based estimation by constructing a nonparametric kernel version of the maximum profile likelihood estimator for partial linear multivariate responses regression models. The method proposed in this article makes use of multivariate kernel smoothing nonparametric techniques to estimate the unknown multivariate density func...
Article
Quantile regression estimate gives more complete information about the response distribution but is more costly to compute than mean regression. When the dimension is large, a ridge penalty is conventionally used to stabilize the estimate which achieves better bias‐variance trade‐off. We investigate a random projection approach to ease the computat...
Article
In this paper, we consider the estimation and model selection for longitudinal partial linear varying coefficient errors-in-variables (EV) models when the covariates are measured with some additive errors. Bias-corrected penalized quadratic inference functions method is proposed based on quadratic inference functions with two penalty function terms...
Article
Full-text available
Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA sequencing (RNA-seq) data are commonly used to identify which type of disease a patient has. Because of the discrete nature of RNA-seq data, the existing statistical methods that have been developed for microa...
Article
Full-text available
This paper considers estimation and variable selection for multiplicative linear regression models when neither the response variable nor the covariates can be directly observed, but are distorted by unknown functions of a commonly observable confounding variable. After taking logarithmic transformation on the response variable, we propose two esti...
Preprint
Full-text available
In medical studies, the collected covariates usually contain underlying outliers. For clustered /longitudinal data with censored observations, the traditional Gehan-type estimator is robust to outliers existing in response but sensitive to outliers in the covariate domain, and it also ignores the within-cluster correlations. To take account of with...
Article
This paper considers linear regression models when neither the response variable nor the covariates can be directly observed, but are measured with multiplicative distortion measurement errors. To eliminate the effect caused by the distortion, we propose two calibration procedures: the conditional absolute mean calibration and the conditional varia...
Article
Generalized estimating equations (GEE) approach has been used to estimate the parameters in semiparametric accelerated failure time (AFT) models with clustered and censored data. However, the working correlation model has a substantial impact on estimator efficiency when using the GEE method. This paper proposes a general correlation model to incor...
Article
Full-text available
Background High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, norm...
Article
In this paper, we propose several dimension reduction methods when the covariates are measured with additive distortion measurement errors. These distortions are modeled by unknown functions of a commonly observable confounding variable. To estimate the central subspace, we propose residuals-based dimension reduction estimation methods and direct e...
Article
A popular approach, generalized estimating equations (GEE), has been applied to the multivariate accelerated failure time (AFT) model. However, it is necessary to estimate the correlation parameters and calculate the inversion of the correlation matrix. On the other hand, the e�ciency is low when the marginal distribution is heavy-tailed. This pape...
Article
Full-text available
DNA methylation is an essential epigenetic modification involved in regulating the expression of mammalian genomes. A variety of experimental approaches to generate genome-wide or whole-genome DNA methylation data have emerged in recent years. Methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) is one of the major tools used in wh...
Chapter
Full-text available
Next-generation sequencing has become a powerful tool for gene expression analysis with the development of high-throughput techniques. Discriminating which type of diseases a new sample belongs to is a fundamental issue in medical and biological studies. Different from continuous microarray data, next-generation sequencing reads are mapped onto the...
Article
Full-text available
We study varying coefficient partially linear models when some linear covariates are error-prone, but their ancillary variables are available. After calibrating the error-prone covariates, we study quantile regression estimates for parametric coefficients and nonparametric varying coefficient functions, and we develop a semiparametric composite qua...
Article
We consider simultaneous semiparametric estimation of conditional quantiles for multiple responses using a dynamic single-index structure. Motivated by a financial application, a market factor index is constructed that is shared among different portfolios which results in a more interpretable and efficient model, compared to separately building mul...
Article
Based on the difference in the objective function proposed by Dominguez and Lobato between the unconstrained and constrained estimators, a simply test is proposed for hypothesis testing of parameters in conditional moment restriction models. This test is guaranteed to be consistent. The asymptotic distribution of the proposed test statistic is prov...
Article
Full-text available
Motivation: With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research...
Article
We consider estimation and hypothesis test for partial linear measurement errors models when the response variable and covariates in the linear part are measured with additive distortion measurement errors, which are unknown functions of a commonly observable confounding variable. We propose a transformation based profile least squares estimator...
Article
We consider the estimation for the unknown single-index parameter in the conditional density function. Firstly, estimation method and asymptotic properties for the estimator are obtained. Secondly, to test a hypothesis on the single-index parameter, a test statistic based on the difference between the minimization criteria under the null and altern...
Article
Full-text available
High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying which type of diseases a new patient belongs to has been recognized as an important problem. For highdimensional small sample size data, the classical discriminant methods suffer from the singularity problem and are therefore no longer appl...
Article
Full-text available
Next-generation sequencing technologies have made RNA sequencing (RNA-seq) a popular choice for measuring gene expression level. To reduce the noise of gene expression measures and compare them between several conditions or samples, normalization is an essential step to adjust for varying sample sequencing depths and other unwanted technical effect...
Article
Full-text available
To analyze the reliability of a complex system described by minimal paths, an empirical likelihood method is proposed to solve the reliability test problem when the subsystem distributions are unknown. Furthermore, we provide a reliability test statistic of the complex system and extract the limit distribution of the test statistic. Therefore, we c...
Article
Full-text available
The change-point detection problem is determining whether a change has taken place. Two nonparametric methods based on empirical likelihood and the likelihood ratio are proposed for detecting a change-point problem in distributions for independent observations. Numerical studies are carried out to evaluate the performance of the proposed methods. T...
Article
Functional Sliced Inverse Regression (FSIR) and Functional Sliced Average Variance Estimation (FSAVE) are two popular functional effective dimension reduction methods. However, both of them have restrictions: FSIR is vulnerable to symmetric dependencies and FSAVE has low efficiency for monotone dependencies and is sensitive to the number of slices....
Article
Full-text available
Background Aberrant DNA methylation is a hallmark of many cancers. Classically there are two types of endometrial cancer, endometrioid adenocarcinoma (EAC), or Type I, and uterine papillary serous carcinoma (UPSC), or Type II. However, the whole genome DNA methylation changes in these two classical types of endometrial cancer is still unknown. Res...
Article
Full-text available
DNA methylation plays key roles in diverse biological processes such as X chromosome inactivation, transposable element repression, genomic imprinting, and tissue-specific gene expression. Sequencing-based DNA methylation profiling provides an unprecedented opportunity to map and compare complete DNA methylomes. This includes one of the most widely...
Code
Full-text available
M&M was developed for analyzing data derived from methylated DNA immunoprecipitation (MeDIP) experiments followed by sequencing (MeDIP-Seq) and the digestions with the methyl-sensitive restriction enzymes (MRE-Seq). Nevertheless, functionalities like the quality controls may be applied to other types of sequencing data (e.g. ChIP-Seq). MeDIP-MRE (m...

Network

Cited By