
Z. D. BaiNortheast Normal University · School of Mathematics and Statistics
Z. D. Bai
Doctor of Philosophy
About
328
Publications
59,484
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,252
Citations
Publications
Publications (328)
In this paper, we investigate hypothesis testing for the linear combination of mean vectors across multiple populations through the method of random integration. We have established the asymptotic distributions of the test statistics under both null and alternative hypotheses. Additionally, we provide a theoretical explanation for the special use o...
In this paper, we focus on the test for high-dimensional covariance structures using double-normalized observations for an elliptical population. By investigating the limiting spectral properties of the sample covariance matrix of double-normalized observations, we propose test statistics applicable for testing the diagonality of the population cov...
Abstract This paper investigates the central limit theorem for linear spectral statistics
of high dimensional sample covariance matrices of the form $B_n = n^{−1} \sum_{j=1}^n Q x_j x_j^* Q^∗$ under the assumption that p/n → y > 0, where
Q is a p × k nonrandom matrix and {x_j }_{j=1}^n is a sequence of independent
k-dimensional random vector with i...
In this paper, we extend the CLT for sample spiked eigenvalues in the generalized spiked covariance model proposed in Jiang and Bai (2021a) to the case where RDS is considered free, i.e., except for an upper limit of the RDS to guarantee that the spiked eigenvalue is distant, there is no limit for p/n, which is the Ratio of Dimension to sample Size...
The BDS test is a test for detecting whether a random sequence is i.i.d. (independent and identically distributed). It has been used in economics and finance to examine whether a fitted time series model is adequate by examining whether the residual sequence is nearly i.i.d. Though the BDS test is widely used in the literature, it has a weakness of...
In this paper, we adopt the eigenvector empirical spectral distribution (VESD) to investigate the limiting behavior of eigenvectors of a large dimensional Wigner matrix W_n. In particular, we derive the optimal bound for the rate of convergence of the expected VESD of W_n to the semicircle law, which is of order O(n^{-1/2}) under the assumption of...
We analyze the impact of the most recent global financial crisis (GFC) on the seven most important Latin American stock markets. Our mean-variance analysis shows that the markets are significantly less volatile and, in general, investors prefer to invest in the post-GFC period. Our results from the Hurst exponent and runs and variance-ratio tests s...
In this paper, we primarily focus on simultaneous testing mean vector and covariance matrix with high-dimensional non-Gaussian data, based on the classical likelihood ratio test. Applying the central limit theorem for linear spectral statistics of sample covariance matrices, we establish new modification for the likelihood ratio test, and find that...
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(...
We establish a joint central limit theorem for sums of squares and the fourth powers of residuals in a high-dimensional regression model. We then apply this CLT to detect the existence of heteroscedasticity for linear regression models without assuming randomness of covariates when the sample size n tends to infinity and the number of covariates p...
This paper investigates the central limit theorem for linear spectral statistics of high dimensional sample covariance matrices of the form $\mathbf{B}_n=n^{-1}\sum_{j=1}^{n}\mathbf{Q}\mathbf{x}_j\mathbf{x}_j^{*}\mathbf{Q}^{*}$ where $\mathbf{Q}$ is a nonrandom matrix of dimension $p\times k$, and $\{\mathbf{x}_j\}$ is a sequence of independent $k$...
In this paper, we propose a new test statistic for testing the equality of high-dimensional covariance matrices for multiple populations. The proposed test statistic generalizes the test of the equality of two population covariance matrices proposed by Li and Chen [11] Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance...
Magnetoencephalography (MEG) is an advanced imaging technique used to measure the magnetic fields outside the human head produced by the electrical activity inside the brain. Various source localization methods in MEG require the knowledge of the underlying active sources, which are identified by a priori. Common methods used to estimate the number...
Magnetoencephalography (MEG) is an advanced imaging technique used to measure the magnetic fields outside the human head produced by the electrical activity inside the brain. Various source localization methods in MEG require the knowledge of the underlying active sources, which are identified by a priori. Common methods used to estimate the number...
This paper considers the optimal modification of the likelihood ratio test (LRT) for the equality of two high-dimensional covariance matrices. The classical LRT is not well defined when the dimensions are larger than or equal to one of the sample sizes. In this paper, an optimally modified test that works well in cases where the dimensions may be l...
Random Fisher matrices arise naturally in multivariate statistical analysis and understanding
the properties of its eigenvalues is of primary importance for many hypothesis
testing problems like testing the equality between two covariance matrices, or testing the
independence between sub-groups of a multivariate random vector. Most of the existing...
The multivariate nonlinear Granger causality developed by Bai et al. (2010) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994), they attempt to establish a central limit theorem (CLT) of their test statistic by appl...
The famous Hiemstra-Jones (HJ) test developed by Hiemstra and Jones (1994) plays a significant role in studying nonlinear causality. Over the last two decades, there have been numerous applications and theoretical extensions based on this pioneering work. However, several works note that counterintuitive results are obtained from the HJ test, and s...
In this paper, we propose a quick and efficient method to examine whether a time series Yt possesses any nonlinear feature by testing a kind of dependence remained in the residuals after fitting Yt with a linear model. The advantage of our proposed nonlinearity test is that it is not required to know the exact nonlinear features and the detailed no...
In this study, we propose a procedure for simultaneous testing \(l (l\ge 1)\) linear relations on \(k(k\ge 2)\) high-dimensional mean vectors with heterogeneous covariance matrices, which extends the result derived by Nishiyama et al. (J Stat Plan Inference 143(11):1898–1911, 2013) and does not need the normality assumption. The newly proposed test...
In this paper, we establish the limit of empirical spectral distributions of quaternion sample covariance matrices. Motivated by Bai and Silverstein (Spectral analysis of large dimensional random matrices, Springer, New York, 2010) and Marenko and Pastur (Matematicheskii Sb, 114:507-536, 1967), we can extend the results of the real or complex sampl...
This paper is to prove the asymptotic normality of a statistic for detecting the existence of heteroscedasticity for linear regression models without assuming randomness of covariates when the sample size $n$ tends to infinity and the number of covariates $p$ is either fixed or tends to infinity. Moreover our approach indicates that its asymptotic...
In this paper, we will introduce the so called naive tests and give a brief review on the newly development. Naive testing methods are easy to understand and performs robust especially when the dimension is large. In this paper, we mainly focus on reviewing some naive testing methods for the mean vectors and covariance matrices of high dimensional...
Consider the following dynamic factor model:
$\bbR_{t}=\sum_{i=0}^{q}\bgL_{i}\bbf_{t-i}+\bbe_{t},t=1,...,T$, where
$\bgL_{i}$ is an $n\times k$ loading matrix of full rank, $\{\bbf_t\}$ are
i.i.d. $k\times1$-factors, and $\bbe_t$ are independent $n\times1$ white
noises. Now, assuming that $n/T\to c>0$, we want to estimate the orders $k$ and
$q$ res...
Central limit theorems (CLT) of linear spectral statistics of general Fisher matrices ${\bf F}$ are widely
used in multivariate statistical analysis where ${\bf F}={\bf S}_y{\bf M}{\bf S}_x^{-1}{\bf M}^*$ with a deterministic complex matrix ${\bf M}$ and two sample covariance matrices ${\bf S}_x$ and ${\bf S}_y$
from two independent samples with s...
<---- NOTE. The published version is at
https://www.researchgate.net/publication/282331316_CLT_for_large_dimensional_general_Fisher_matrices_and_its_applications_in_high-dimensional_data_analysis
------>
Random Fisher matrices arise naturally in multivariate statistical analysis
and understanding the properties of its eigenvalues is of primary imp...
Sample covariance matrices are widely used in multivariate statistical
analysis. The central limit theorems (CLT's) for linear spectral statistics of
high-dimensional non-centered sample covariance matrices have received
considerable attention in random matrix theory and have been applied to many
high-dimensional statistical problems. However, know...
High-dimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. However, it has long been observed that several well-known methods in multivariate analysis become inefficient, or even misleading, when the data dimension p is larger than, say, several tens. A seminal example is the well-known i...
To evaluate the performance of the prospects X and Y, financial professionals are interested in testing the equality of their Sharpe ratios (SRs), the ratios of the excess expected returns to their standard deviations. Bai et al. (Statistics and Probability Letters 81, 1078–1085, 2011d) have developed the mean-varianceratio (MVR) statistic to test...
In this chapter, we recommend the use of both the mean-variance (MV) rule and mean-variance-ratio (MVR) test to examine the performance of investment assets. We illustrate the approaches by investigating the performance of different Asian hedge funds over an entire sample period as well as over sub-periods that may be described as boom, crisis, and...
In this paper, convergence rates of the spectral distributions of quaternion self-dual Hermitian matrices are investigated. We show that under conditions of finite 6th moments, the expected spectral distribution of a large quaternion self-dual Hermitian matrix converges to the semicircular law in a rate of \(O(n^{-1/2})\) and the spectral distribut...
Skorokhod’s representation theorem states that if on a Polish space, there is a weakly convergent sequence of probability measures , as n → ∞, then there exist a probability space and a sequence of random elements X n such that X n → X almost surely and X n has the distribution function μn , n = 0, 1, 2, … We shall extend the Skorokhod representati...
This paper proposes a CLT for linear spectral statistics of random matrix
$S^{-1}T$ for a general non-negative definite and {\bf non-random} Hermitian
matrix $T$.
**********************
[[ **** Note added in June 2015 **** This preprint is no more reliable. An improved version of this work appears in "CLT for linear spectral statistics of a rescale...
When the multiple correlation coefficient is used to measure how strongly a given variable can be linearly associated with
a set of covariates, it suffers from an upward bias that cannot be ignored in the presence of a moderately high dimensional
covariate. Under an independent component model, we derive an asymptotic approximation to the distribut...
This paper studies the limiting spectral distribution (LSD) of a
symmetrized auto-cross covariance matrix. The auto-cross covariance
matrix is de�ned asM = 1
2T
PT
j=1(eje�j
+ +ej+ e�j
); where ej is an
N dimensional vectors of independent standard complex components
with properties stated in Theorem (1.1) and is the lag. M0 is well
studied in the...
In this article, we focus on the problem of testing the equality of several
high dimensional mean vectors with unequal covariance matrices. This is one of
the most important problem in multivariate statistical analysis and there have
been various tests proposed in the literature. Motivated by \citet{BaiS96E} and
\cite{ChenQ10T}, a test statistic is...
The method of generalized estimating equations (GEE) introduced by K. Y. Liang and S. L. Zeger has been widely used to analyze longitudinal data. Recently, this method has been criticized for a failure to protect against misspecification of working correlation models, which in some cases leads to loss of efficiency or infeasibility of solutions. In...
Since E.P.Wigner (1958) established his famous semicircle law, lots of
attention has been paid by physicists, probabilists and statisticians to study
the asymptotic properties of the largest eigenvalues for random matrices. Bai
and Yin (1988) obtained the necessary and sufficient conditions for the strong
convergence of the extreme eigenvalues of a...
In Jin et al. (2014), the limiting spectral distribution (LSD) of a
symmetrized auto-cross covariance matrix is derived using matrix manipulation,
with finite $(2+\delta)$-th moment assumption. Here we give an alternative
method using a result in Bai and Silverstein (2010), in which a weaker
condition of finite 2nd moment is assumed.
The book contains three parts: Spectral theory of large dimensional random matrices; Applications to wireless communications; and Applications to finance. In the first part, we introduce some basic theorems of spectral analysis of large dimensional random matrices that are obtained under finite moment conditions, such as the limiting spectral distr...
In this paper, we shall investigate the almost sure limits of the largest and
smallest eigenvalues of a quaternion sample covariance matrix. Suppose that
$\mathbf X_n$ is a $p\times n$ matrix whose elements are independent quaternion
variables with mean zero, variance 1 and uniformly bounded fourth moments.
Denote $\mathbf S_n=\frac{1}{n}\mathbf X_...
The auto-cross covariance matrix is defined as $${\bf
M}_n=\frac{1}{2T}\sum_{j=1}^{T} ({\bf e}_{j}{\bf e}_{j+\tau}^{*}+{\bf
e}_{j+\tau}{\bf e}_{j}^{*}),$$ where ${\bf e}_{j}$'s are $n$-dimensional
vectors of independent standard complex components with a common mean 0,
variance $\sigma^{2}$, and uniformly bounded $2+\eta$-th moments and $\tau$ is
t...
The eigenvector Empirical Spectral Distribution (VESD) is adopted to
investigate the limiting behavior of eigenvectors and eigenvalues of covariance
matrices. In this paper, we shall show that the Kolmogorov distance between the
expected VESD of sample covariance matrix and the Mar\v{c}enko-Pastur
distribution function is of order $O(N^{-1/2})$. Gi...
This paper introduces a new method to estimate the spectral distribution of a
population covariance matrix from high-dimensional data. The method is founded
on a meaningful generalization of the seminal Marcenko-Pastur equation,
originally defined in the complex plan, to the real line. Beyond its easy
implementation and the established asymptotic c...
This paper proposes the corrected likelihood ratio test (LRT) and large-dimensional trace criterion to test the independence of two large sets of multivariate variables of dimensions p
1 and p
2 when the dimensions p = p
1 + p
2 and the sample size n tend to infinity simultaneously and proportionally. Both theoretical and simulation results demonst...
In this paper we establish the limit of the empirical spectral distribution
of quaternion sample covariance matrices. Suppose $\mathbf X_n =
({x_{jk}^{(n)}})_{p\times n}$ is a quaternion random matrix. For each $n$, the
entries $\{x_{ij}^{(n)}\}$ are independent random quaternion variables with a
common mean $\mu$ and variance $\sigma^2>0$. It is s...
It is well known that Gaussian symplectic ensemble (GSE) is defined on the
space of $n\times n$ quaternion self-dual Hermitian matrices with Gaussian
random elements. There is a huge body of literature regarding this kind of
matrices. As a natural idea we want to get more universal results by removing
the Gaussian condition. For the first step, in...
Many kernel-based learning algorithms have the computational load scaled with the sample size n due to the column size of a full kernel Gram matrix K. This article considers the Nyström low-rank approximation. It uses a reduced kernel K̂, which is n × m, consisting of m columns (say columns i1, i2, ⋯ , i m) randomly drawn from K. This approximation...
The measurement error model (MEM) is an important model in statistics because in a regression problem, the measurement error of the explanatory variable will seriously affect the statistical inferences if measurement errors are ignored. In this paper, we revisit the MEM when both the response and explanatory variables are further involved with roun...
For a multivariate linear model, Wilk's likelihood ratio test (LRT)
constitutes one of the cornerstone tools. However, the computation of its
quantiles under the null or the alternative requires complex analytic
approximations and more importantly, these distributional approximations are
feasible only for moderate dimension of the dependent variabl...
Sample covariance matrix and multivariate $F$-matrix play important roles in
multivariate statistical analysis. The central limit theorems {\sl (CLT)} of
linear spectral statistics associated with these matrices were established in
Bai and Silverstein (2004) and Zheng (2012) which received considerable
attentions and have been applied to solve many...
In order to investigate property of the eigenvector matrix of sample covariance matrix \(\mathbf {S}_n\) , in this paper, we establish the central limit theorem of linear spectral statistics associated with a new form of empirical spectral distribution \(H^{\mathbf {S}_n}\) , based on eigenvectors and eigenvalues of sample covariance matrix \(\math...
Let $B_n=S_n(S_n+a_nT_N)^{-1}$, where $S_n$ and $T_N$ are two independent
sample covariance matrices with dimension $p$ and sample sizes $n$ and $N$
respectively. This is the so-called Beta matrix. In this paper, we focus on the
limiting empirical spectral distribution function and the central limit theorem
of linear spectral statistics (LSS) of $B...
In this paper we investigate the Fisher information matrix of a rounded ranked set sampling (RSS) sample and show that the
sample is always more informative than a rounded simple random sampling (SRS) sample of the same size. On the other hand,
we propose a new method to approximate maximum likelihood estimates (MLE) of unknown parameters for this...
The paper proposes new estimators of spiked eigenvalues of the population covariance matrix from the sample covariance matrix and investigates its consistency and asymp-totic normality.
In the spiked population model introduced by Johnstone (2001),the population
covariance matrix has all its eigenvalues equal to unit except for a few fixed
eigenvalues (spikes). The question is to quantify the effect of the
perturbation caused by the spike eigenvalues. Baik and Silverstein (2006)
establishes the almost sure limits of the extreme sa...
A new form of empirical spectral distribution of a Wigner matrix W
n
with weights specified by the eigenvectors is defined and it is then shown to converge with probability one to the semicircular law. Moreover, central limit theorem for linear spectral statistics defined by the eigenvectors and eigenvalues is also established under some moment con...