Jian Zou

Jian Zou
Worcester Polytechnic Institute | WPI · Department of Mathematical Sciences

Professor

About

45
Publications
2,897
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
437
Citations
Citations since 2017
23 Research Items
318 Citations
20172018201920202021202220230204060
20172018201920202021202220230204060
20172018201920202021202220230204060
20172018201920202021202220230204060

Publications

Publications (45)
Article
Clustering a large number of time series into relatively homogeneous groups is a well‐studied unsupervised learning technique that has been widely used for grouping financial instruments (say, stocks) based on their stochastic properties across the entire time period under consideration. However, clustering algorithms ignore the notion of bicluster...
Article
Clustering large financial time series data enables pattern extraction that facilitates risk management. The knowledge gathered from unsupervised learning is useful for improving portfolio optimization and making stock trading recommendations. Most methods available in the literature for clustering financial time series are based on exploiting line...
Article
Pancreatic ductal adenocarcinoma (PDAC) represents one of the most common cancers with dismal prognosis. Definitive diagnosis of PDAC remains challenging due to the lack of specific biomarkers. A transcription factor essential for pancreatic development named HNF-1B can be a potential biomarker for PDAC. However, HNF-1B was not entirely specific fo...
Preprint
Full-text available
Tissue microarray (TMA) images have emerged as an important high-throughput tool for cancer study and the validation of biomarkers. Efforts have been dedicated to further improve the accuracy of TACOMA, a cutting-edge automatic scoring algorithm for TMA images. One major advance is due to deepTacoma, an algorithm that incorporates suitable deep rep...
Article
Squamous cell carcinoma (SqCC) is the most common malignancy of the anal canal, where it is strongly associated with HPV infection. Characteristic genomic alterations have been identified in anal SqCC, but their clinical significance and correlation with HPV status, pathologic features, and immunohistochemical markers are not well established. We e...
Article
Full-text available
In finance, it is often of interest to study market volatility for portfolios that may consist of a large number of assets using multivariate stochastic volatility models. However, such models, though useful, do not usually incorporate investor views that might be available. In this paper we introduce a novel hierarchical Bayesian methodology of mo...
Article
An optimal and flexible multiple hypotheses testing procedure is constructed for dependent data based on Bayesian techniques, aiming at handling two challenges, namely dependence structure and non-null distribution specification. Ignoring dependence among hypotheses tests may lead to loss of efficiency and bias in decision. Misspecification in the...
Article
Full-text available
Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of...
Preprint
Full-text available
Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of...
Preprint
Full-text available
Anomalies and outliers are common in real-world data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated...
Preprint
Anomalies and outliers are common in real-world data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated...
Article
High‐frequency financial data are more readily available and provide a deeper understanding of market infrastructure, market dynamics, and structural instability. The class of autoregressive conditional duration models is useful for statistical analysis of intra‐event durations in asset prices. Often, time series of durations exhibit structural bre...
Article
Accurate modeling under least restrictive assumptions of patterns in inter-event durations is of considerable interest in the analysis of high-frequency financial data which show liquidity induced patterns for different stocks and often exhibit diurnal patterns in addition to temporal dependence. For analyzing durations between user-defined events...
Article
Due to the low signal‐to‐noise ratio and high‐dimensional structure, spatiotemporal data analysis is challenging. In outbreak detection, the assumptions for control charts, including independence, normality, and stationarity, are often violated in syndromic surveillance data. We develop a novel hybrid hierarchical Bayesian model through the combina...
Article
Full-text available
Background Given the widespread adoption of electronic health record (EHR) systems in health care organizations, public health agencies are interested in accessing EHR data to improve health assessment and surveillance. Yet there exist few examples in the U.S. of governmental health agencies using EHR data routinely to examine disease prevalence an...
Article
With recent technological advances, high-frequency transaction-by-transaction data are widely available to investors and researchers. To explore the microstructure of variability of stock prices on transaction-level intra-day data and to dynamically study patterns of comovement over multiple trading days, we propose a multiple day time series biclu...
Article
Asset allocation strategy involves dividing an investment portfolio among different assets according to their risk levels. In recent decades, estimating volatilities of asset returns based on high-frequency data has emerged as a topic of interest in financial econometrics. However, most available methods are not directly applicable when the number...
Article
OBJECTIVE Several similarities exist between the Massachusetts health care reform law of 2006 and the Affordable Care Act (ACA). The authors’ prior neurosurgical research showed a decrease in uninsured surgeries without a significant change in surgical volume after the Massachusetts reform. An analysis of the payer-mix status and the age of spine s...
Article
Background: Hospital readmission rate has become a major indicator of quality of care, with penalties given to hospitals with high rates of readmission. At the same time, insurers are increasing pressure for greater efficiency and reduced costs, including decreasing hospital lengths of stay (LOS). Objective: To analyze the authors' service to de...
Conference Paper
Traditionally, investors try to estimate short term portfolio volatility based on daily return. When tick-by-tick data are available, investors use different volatility estimators based on high-frequency data to evaluate the portfolio risk in the hope of outperforming those based on low-frequency data. In this paper, we optimize block realized kern...
Article
Full-text available
Background: Community health assessments assist health departments in identifying health needs as well as disparities, and they enable linking of needs with available interventions. Electronic health record (EHR) systems possess growing volumes of clinical and administrative data, making them a valuable source of data for ongoing community health a...
Chapter
Financial statistics covers a wide array of applications in the financial world, such as (high-frequency) trading, risk management, pricing and valuation of securities and derivatives, and various business and economic analytics. Portfolio allocation is one of the most important problems in financial risk management. One most challenging part in po...
Article
Quantum computation performs calculations by using quantum devices instead of electronic devices following classical physics and used by classical computers. Although general purpose quantum computers of practical scale may be many years away, special purpose quantum computers are being built with capabilities exceeding classical computers. One pro...
Article
Hospital readmission rate has become a major indicator of quality of care, with penalties given to hospitals that have high rates of readmission. At the same time, insurers are applying increasing pressure to improve efficiency and reduce costs, including decreasing hospital lengths of stay. We analyze these trends to determine if reducing lengths...
Article
In financial practices and research studies, we often encounter a large number of assets. The availability of high-frequency financial data makes it possible to estimate the large volatility matrix of these assets. Existing volatility matrix estimators such as kernel realized volatility and pre-averaging realized volatility perform poorly when the...
Article
This paper aims to introduce jump tests to the actuarial community. In actuarial science, semimartingales are extensively used in the models for interest rates, options, variable annuities and equity-linked annuities. Those models usually assume without justification that the underlying asset process follows a continuous stochastic process such as...
Article
The Massachusetts Healthcare Policy of 2006 has many similarities to the Affordable Care Act (ACA). There are concerns the ACA will negatively impact case volume and reimbursement for physicians. Analyzing neurosurgical cases and patient insurance status before and after the Massachusetts policy change can provide insight into the future of neurosu...
Article
Financial statistics covers a wide array of applications in the financial world, such as (high frequency) trading, risk management, pricing and valuation of securities and derivatives, and various business and economic analytics. Portfolio allocation is one of the most important problems in financial risk management. One most challenging part in po...
Article
Full-text available
Background For researchers and public health agencies, the complexity of high-dimensional spatio-temporal data in surveillance for large reporting networks presents numerous challenges, which include low signal-to-noise ratios, spatial and temporal dependencies, and the need to characterize uncertainties. Central to the problem in the context of di...
Article
In this paper we provide a detailed review of univariate and multivariate volatility analysis within the framework of high-frequency financial data setting. The field of volatility modeling and analysis for high-frequency financial data has experienced a rapid development over the past decade. High-frequency financial data pose tremendous challenge...
Article
Portfolio allocation is one of the most important problems in financial risk management. It involves dividing an investment portfolio among different assets based on the volatilities of the asset returns. In the recent decades, it gains popularity to estimate volatilities of asset returns based on highfrequency data in financial economics. In this...
Article
Portfolio allocation is one of the most fundamental problems in finance. The process of determining the optimal mix of assets to hold in the portfolio is a very important issue in risk management. It involves dividing an investment portfolio among different assets based on the volatilities of the asset returns. In the recent decades, it gains popul...
Article
Early and accurate detection of outbreaks is one of the most important objectives of syndromic surveillance systems. We propose a general Bayesian framework for syndromic surveillance systems. The methodology incorporates Gaussian Markov random field (GMRF) and spatio-temporal conditional autoregressive (CAR) modeling. By contrast, most previous ap...
Article
Reliable surveillance models are an important tool in public health because they aid in mitigating disease outbreaks, identify where and when disease outbreaks occur, and predict future occurrences. Although many statistical models have been devised for surveillance purposes, none are able to simultaneously achieve the important practical goals of...
Article
Full-text available
In order to study climate at scales where policy decisions can be made, regional climate models (RCMs) have been developed with much finer resolution (~50 km) than the ~500 km resolution of atmosphere-ocean general circulation models (AOGCMs). The North American Regional Climate Change Assessment Program (NARCCAP) is an international program that p...
Article
Full-text available
It is increasingly important in financial economics to estimate volatilities of asset returns. However, most of the available methods are not directly applicable when the number of assets involved is large, due to the lack of accuracy in estimating high-dimensional matrices. Therefore it is pertinent to reduce the effective size of volatility matri...
Article
Full-text available
High-frequency data observed on the prices of financial assets are commonly modeled by diffusion processes with micro-structure noise, and realized volatility-based methods are often used to estimate integrated volatility. For problems involving a large number of assets, the estimation objects we face are volatility matrices of large size. The exis...
Article
In this dissertation we focus on two points in the study of financial statistics, volatility estimation and option pricing. In the first chapter we briefly introduce the relevant terms and concepts as well as a general literature review for the diffusion and GARCH model and realized volatility. The second chapter investigates the convergence speed...
Article
Full-text available
It is well known that as the time interval between two consecutive observations shrinks to zero, a properly constructed GARCH model will weakly converge to a bivariate diffusion. Naturally the European option price under the GARCH model will also converge to its bivariate diffusion counterpart. This paper investigates the convergence speed of the G...

Network

Cited By

Projects

Projects (4)
Archived project
Methodology for the analysis of tissue microarray images.
Archived project
Mining, inference, and learning with random projection forests