Junni Zhang

Junni Zhang
Peking University | PKU · Guanghua School of Mangement

About

48
Publications
3,071
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
788
Citations
Introduction
Skills and Expertise

Publications

Publications (48)
Article
To improve covariate balance over a complete randomization, a number of methods have been proposed recently to utilize modern computational capabilities to find allocations with balance in observed covariates. Asymptotic inference on treatment effects based on these designs is more complicated than that under complete randomization, and this is why...
Article
This paper studies the competence-loyalty tradeoff and its evolution in China's political system characterized by hierarchical selection. From the eyes of the central controllers, the rational selection rule is to mix competence and loyalty when officials are selected to fill lower-tier positions and to select from them the more loyal to fill highe...
Article
Causal inference with observational data is a central goal in many fields. Propensity score methods are design-based approaches that try to ensure covariate balance without using information from the outcome variables. Analysis-based approaches, such as the Bayesian Additive Regression Tree and the Causal Forest, bypass the issue of covariate balan...
Chapter
Full-text available
Local-level demographic forecasts are in high demand. Constructing local-level forecasts requires confronting the problems of random variation and sparse data. Bayesian methods offer promising solutions to both these problems. We illustrate using the example of inter-regional migration in Iceland.
Preprint
Full-text available
The gargantuan plethora of opinions, facts and tweets on financial business offers the opportunity to test and analyze the influence of such text sources on future directions of stocks. It also creates though the necessity to distill via statistical technology the informative elements of this prodigious and indeed colossal data source. Using mixed...
Article
Full-text available
Estimates for small areas defined by social, demographic, and geographic variables are increasingly important for official statistics. To overcome problems of small sample sizes, statisticians usually derive model-based estimates. When aggregated, however, the model- based estimates typically do not agree with aggregate estimates (benchmarks) obtai...
Preprint
Full-text available
Rerandomization is a strategy of increasing efficiency as compared to complete randomization. The idea with rerandomization is that of removing allocations with imbalance in the observed covariates and then randomizing within the set of allocations with balance in these covariates. Standard asymptotic inference based on mean difference estimator is...
Article
Demographic estimation becomes a problem of small area estimation when detailed disaggregation leads to small cell counts. The usual difficulties of small area estimation are compounded when the available data sources contain measurement errors. We present a Bayesian approach to the problem of small area estimation with imperfect data sources. The...
Article
Full-text available
User comments, as a large group of online short texts, are becoming increasingly prevalent with the development of online communications. These short texts are characterized by their co-occurrences with usually lengthier normal documents. For example, there could be multiple user comments following one news article, or multiple reader reviews follo...
Article
We investigate the estimation of subgroup treatment effects with observational data. Existing propensity score matching and weighting methods are mostly developed for estimating overall treatment effect. Although the true propensity score should balance covariates for the subgroup populations, the estimated propensity score may not balance covariat...
Article
Population forecasts for small areas within a country are an important planning tool. Standard methods for forecasting demographic rates do not, however, perform well with the noisy data that are typical of small areas. We develop a Bayesian model that combines ideas from the demographic, time series, and small area estimation literatures. We apply...
Article
It is still unclear whether business organizations benefit from promoting the public reading their Corporate Social Responsibility (CSR) reports. We study how readers’ comprehension of CSR report can have a positive effect on their trust toward the organization by using the data in China. From an organization’s CSR report, readers receive signals o...
Article
The gargantuan plethora of opinions, facts, and tweets on financial business offers the opportunity to test and analyze the influence of such text sources on future directions of stocks. It also creates though the necessity to distill via statistical technology the informative elements of this prodigious and indeed colossal data source. Using mixed...
Article
News carry information of market moves. The gargantuan plethora of opinions, facts and tweets on financial business owners the opportunity to test and analyze the influence of such text sources on future directions of stocks. It also creates though the necessity to distill via statistical technology the informative elements of this prodigious and i...
Article
Bayesian pp values are a popular and important class of approaches for Bayesian model checking. They are used to quantify the degree of surprise from the observed data given the specified data model and prior distribution. A systematic investigation is conducted to compare three Bayesian pp values — the posterior predictive pp value, the sampled po...
Article
We propose a new nonlinear classification method based on a Bayesian "sum-of-trees" model, the Bayesian Additive Classification Tree (BACT), which extends the Bayesian Additive Regression Tree (BART) method into the classification context. Like BART, the BACT is a Bayesian nonparametric additive model specified by a prior and a likelihood in which...
Article
Government-sponsored job-training programs must be subject to evaluation to assess whether their effectiveness justifies their cost to the public. The evaluation usually focuses on employment and total earnings, although the effect on wages is also of interest, because this effect reflects the increase in human capital due to the training program,...
Article
In an evaluation of a job training program, the causal effects of the program on wages are often of more interest to economists than the program's effects on employment or on income. The reason is that the effects on wages reflect the increase in human capital due to the training program, whereas the effects on total earnings or income may be simpl...
Book
We propose a new nonlinear classification method based on a Bayesian "sum-of-trees" model, the Bayesian Additive Classification Tree (BACT), which extends the Bayesian Additive Regression Tree (BART) method into the classification context. Like BART, the BACT is a Bayesian nonparametric additive model specified by a prior and a likelihood in which...
Article
Full-text available
The traditional variable selection problem has attracted renewed atten-tion from statistical researchers due to the recent advances in data collection, es-pecially in fields such as bioinformatics and marketing. In this paper, we formulate regression variable selection as an optimization problem, propose and study several deterministic and stochast...
Article
Full-text available
Sequential Monte Carlo methods, especially the particle filter (PF) and its various modifications, have been used effectively in dealing with stochastic dynamic systems. The standard PF samples the current state through the underlying state dynamics, then uses the current observation to evaluate the sample's importance weight. How-ever, there is a...
Article
IntroductionKey assumptions for the LATE interpretation of the IV estimandEstimating causal effects with IVSome recent applicationsDiscussion
Article
The topic of “truncation by death” in randomized experiments arises in many fields, such as medicine, economics and education. Traditional approaches addressing this issue ignore the fact that the outcome after the truncation is neither “censored” nor “missing,” but should be treated as being defined on an extended sample space. Using an educationa...
Article
Full-text available
The clustering problem has attracted much attention from both statisticians and computer scientists in the past fifty years. Methods such as hierarchical clustering and the K-means method are convenient and competitive first choices off the shelf for the scientist. Gaussian mixture modeling is another popular but computationally expensive clusterin...
Article
Full-text available
The sequential importance sampling method and its various modifications have been developed intensively and used effectively in diverse research areas ranging from polymer simulation to signal processing and statistical inference. We propose a new variant of the method, sequential importance sampling with pilot-exploration resampling (SISPER), and...
Article
Thesis (Ph. D., Dept. of Statistics)--Harvard University, 2002. Includes bibliographical references (leaves 109-113).

Network

Cited By

Projects

Project (1)
Project
To address the text classi cation problem, we propose the Class Structured Topic Model (CSTM) which extends the basic topic model, Latent Dirichlet Allocation (LDA), to incorporate class structure on the latent topics. We develop Bayesian inference of the CSTM in supervised and semi- supervised scenarios, and also address the model selection issue. In detailed analysis of two real data sets, we compare our approach with two competing approaches, including (1) a two-stage approach that rst derives the documents' topic proportions using the LDA model, and then uses random forest or support vector machine to classify the documents; and (2) a L1 penalized logistic regression based on normalized term frequencies. We demonstrate that our approach can improve classi cation accuracy of test documents, and at the same time better capture the semantic structure of documents within each class.