
Bonnie K. RayIBM · Thomas J. Watson Research Center
Bonnie K. Ray
About
78
Publications
24,973
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,066
Citations
Publications
Publications (78)
This paper presents a strategy to quantify the influence major point sources in a region have on extreme pollution values observed at each of the monitors in the network. We focus on the number of hours in a day the levels at a monitor exceed a specified health threshold. The number of daily exceedances are modeled using observation-driven negative...
Strategic risks represent the largest challenge for corporate risk management, often due to lack of data or incompatibility with existing financial modeling frameworks. Indeed, industry surveys found that from 2002 to 2012 strategic risks accounted for over 80% of the cases of significant shareholder value loss among Top 1000 companies. To meet the...
Distinguishing among linear and nonlinear time series or between nonlinear time series generated by different underlying processes is challenging, as second-order properties are generally insufficient for the task. Different nonlinear processes have different nonconstant bispectral signatures, whereas the bispectral density function of a Gaussian o...
The importance of strategic planning is universally recognized in the business world as an effective approach to enable achievement of enterprise business objectives over long time periods. However, despite the criticality of the task, the strategic planning process often does not take advantage of analytics to support the process in a consistent w...
Often there is substantial disparity in sales performance across various units of an organization. It is crucial to model the effects of various drivers/inhibitors on sales performance, particularly those that can be acted upon, since insight into such drivers/inhibitors is essential for determining optimal actions for improving performance. We pre...
Although business process modeling is considered as a core activity in enterprise risk management, existing process modeling languages do not include a complete notation for documenting how processes can fail. This paper develops a conceptual framework for extending standard business process metamodels to include comprehensive information that is u...
We present a new L1-distance-based k-means clustering algorithm to address the challenge of clustering high-dimensional proportional vectors. The new algorithm explicitly incorporates proportionality constraints in the computation of the cluster centroids, resulting in reduced L1 error rates. We compare the new method to two competing methods, an a...
We present algorithms for improved Viterbi decoding for the case of hidden semi-Markov models. By carefully constructing directed acyclic graphs, we pose the decoding problem as that of finding the longest path between specific pairs of nodes. We consider fully connected models as well as restrictive topologies and state duration conditions, and sh...
Web-delivered service is an emerging approach for IT service to reduce IT service cost and improve delivery efficiency by leveraging the partnership and web technology. This paper proposes a three-tier analytical framework of web-delivered service to improve business deign based on the centralised information from the platform. At business model de...
SummaryA general framework is presented for Bayesian inference of multivariate time series exhibiting long-range dependence. The series are modelled using a vector autoregressive fractionally integrated moving-average (VARFIMA) process, which can capture both short-term correlation structure and long-range dependence characteristics of the individu...
We look at the implications of modeling observations from a fractionally differenced noise process using an approximating AR (p) model. The approximation is used because of computational difficulties in the estimation of the differencing parameter of the fractional noise model. Because the fractional noise process is long-range dependent, we assess...
We describe the use of a latent Markov process governing the parameters of a nonhomogeneous Poisson process (NHPP) model for characterizing the software development defect discovery process. Use of a Markov switching process allows us to characterize non-smooth variations in the rate at which defects are found, better reflecting the industrial soft...
We present two new spectral-based methods for detection of changes in autocorrelation structure in a continuous-valued time series in an online process monitoring setting. Our methods are based on the idea that changes in the autocorrelation structure are reflected by changes in the Fourier or wavelet-based spectrum and can be detected by comparing...
This paper presents a framework for the modeling and analysis of business model designs involving a network of interconnected business entities. The framework includes an ecosystem-modeling component, a simulation component, and a service-analysis component, and integrates methods from value network modeling, game theory analysis, and multiagent sy...
Reuse-based development effort is an important factor to be considered when selecting appropriate reusable components. However, it's rarely considered very seriously in current practice, as current methods for estimation of reuse development effort rely heavily on personal experience and different developers may provide very diverse estimates. In t...
We develop techniques for mining labor records from a large number of historical IT consulting projects in order to discover clusters of projects exhibiting similar resource usage over the project life-cycle. The clustering results, together with domain expertise, are used to build a meaningful project taxonomy that can be linked to project resourc...
We present a methodology for managing outsourcing projects from the vendor's perspective, designed to maximize the value to both the vendor and its clients. The methodology is applicable across the outsourcing lifecycle, providing the capability to select and target new clients, manage the existing client portfolio and quantify the realized benefit...
In order to successfully deliver a labor-based professional service, the right people with the right skills must be available to deliver the service when it is needed. Meeting this objective requires a systematic, repeatable approach for determining the staffing requirements that enable informed staffing management decisions. We present a methodolo...
form only given. Most organizations do not have sufficient resources to meet all of their obligations; selecting which projects should be funded is not just ranking projects and funding them 'top-down' until resources are depleted. Organizations need to balance the benefits that project portfolios provide with their respective constraints and they...
We propose a simulation-based Bayesian approach to analyze multivariate time series with possible common long-range dependent factors. A state-space approach is used to represent the likelihood function in a tractable manner. The approach taken here allows for extension to fit a non-Gaussian multivariate stochastic volatility (MVSV) model with comm...
We extend the functional coefficient autoregressive (FCAR) model to the multivariate nonlinear time series framework. We show how to estimate parameters of the model using kernel regression techniques, discuss properties of the estimators, and provide a bootstrap test for determining the presence of nonlinearity in a vector time series. The power o...
This article presents a new model for software reliability characterization using a growth Curve formulation that allows model parameters to vary as a function of covariate information. In the software reliability framework, covariates may include such things as the number of lines of code for a product throughout its development cycle and the numb...
We introduce an approach for model-based sequence clustering that addresses several drawbacks of existing algorithms. The approach uses a combination of Hidden Markov Models (HMMs) for sequence estimation and Dynamic Time Warping (DTW) for hierarchical clustering, with interlocking steps of model selection, estimation and sequence grouping. We demo...
Professional services firms are project based. Execution of these projects involve identifying and planning for the right skills. In this paper, we model the problem of predicting the skills requirement for the projects in the pipeline of a professional services firm from the bill of resources of some similar projects that had been executed in the...
We develop NHPP models to characterize categorized event data, with application to modelling the discovery process for categorized software defects. Conditioning on the total number of defects, multivariate models are proposed for modelling the defects by type. A latent vector autoregressive structure is used to characterize dependencies among the...
This paper presents and evaluates alternative methods for multi-step forecasting using univariate and multivariate functional coefficient autoregressive (FCAR) models. The methods include a simple “plug-in” approach, a bootstrap-based approach, and a multi-stage smoothing approach, where the functional coefficients are updated at each step to incor...
Projecting defect occurrences over time is a necessary com- ponent in the development of methods to mitigate the risks of software defects for software producers and software con- sumers. In this paper, we examine user-reported software defect occurrences across twenty-two releases of four widely- deployed business-critical production software syst...
Traditional software metrics, such as code coverage, McCabe complexity, etc. address the needs of a software engineer. In contrast, managers of software development organizations face a broader set of issues. For example, an executive responsible for multiple products and releases has to understand the customer views of those products and put in pl...
A key problem in time series analysis is the determination or estimation of the degree of integration, d , of the series. In the case of autoregressive moving-average (ARMA) models, for example, a series with a unit root in the AR polynomial (and all MA roots outside the unit circle) has d = 1 and is nonstationary, a series with a unit root in the...
A modified multivariate adaptive regression splines method for modeling vector nonlinear time series is investigated. The method results in models that can capture certain types of vector self-exciting threshold autoregressive behavior, as well as provide good predictions for more general vector nonlinear time series. The effect of different model...
We propose a new semiparametric estimator of the degree of persistence in volatility for long memory stochastic volatility (LMSV) models. The estimator uses the periodogram of the log squared returns in a local Whittle criterion which explicitly accounts for the noise term in the LMSV model. Finite-sample and asymptotic standard errors for the esti...
We describe a Bayesian method for detecting structural changes in a long-range dependent process. In particular, we focus on changes in the long-range dependence parameter, d, and changes in the process level, μ. Markov chain Monte Carlo (MCMC) methods are used to estimate the posterior probability and size of a change at time t, along with other m...
The development of long-memory stochastic volatility (LMSV) models has increased the interest in the estimation of persistent processes observed with added noise. This paper investigates the performance of semi-parametric methods for estimating the long-memory-parameter in the long-range dependence plus noise case and demonstrates improvements obta...
We present new methods for modelling nonlinear threshold-type autoregressive behaviour in periodically correlated time series. The methods are illustrated using a series of average monthly flows of the Fraser River in British Columbia. Commonly used nonlinearity tests of the river flow data in each month indicate nonlinear behaviour in certain mont...
Standard quality control chart interpretation assumes that the observed data are uncorrelated. The presence of autocorrelation in process data has adverse effects on the performance of control charts. The objective of this paper is to assess the behavior of moving average forecast-based control charts on data having correlation that is persistent o...
We provide explicit formulae for the joint predictive distribution of a Gaussian vector autoregressive fractionally integrated moving average (VARFIMA) process and describe a Bayesian method for its feasible evaluation. Inference for the parameters in the Bayesian framework is based on the joint posterior distribution of the model parameters using...
Presented are investigations into the spatial structure of teleconnections between both the winter El Nino-Southern Oscillation (ENSO) and global sea surface temperatures (SSTs), and a measure of continental U.S. summer drought during the twentieth century. Potential nonlinearities and nonstationarities in the relationships are noted. During the fi...
Various authors claim to have found evidence of stochastic long memory behavior in futures' contract returns using the Hurst statistic. This paper reexamines futures' re- turns for evidence of persistent behavior using a biased-corrected version of the Hurst statistic, a nonparametric spectral test, and a spectral regression estimate of the long- m...
We describe statistical methods for sensitivity and performance analysis of complex computer simulation experiments. Graphical methods, such as trellis plots, are suggested for exploratory analysis of individual or aggregate performance metrics conditional on different experiment inputs. More formal statistical methods, such as analysis of variance...
Software recreates arenecessitated due to inadequate diagnostic capability following a failure. They impact the serviceprocess and the perception of availability, but have never been adequately quanti#ed. This paper develops a technique to make the key measurements of: percent recreate, arrival rate and open time,from problem service data without r...
Exploratory methods for determining appropriate lagged variables in a vector nonlinear time series model are investigated. The first is a multivariate extension of the R statistic considered by Granger and Lin (1994), which is based on an estimate of the mutual information criterion. The second method uses Kendall's and partial statistics for lag d...
: Various authors claim to have found evidence of stochastic long memory behavior in futures' contract returns using the Hurst statistic. This paper reexamines futures' returns for evidence of persistent behavior using a biased-corrected version of the Hurst statistic and an estimate of the long-memory parameter based on the process spectrum. Resul...
A multivariate extension of the univariate nonlinearity test of Tsay (1986) is presented. Simulation results show that the multivariate test is more powerful than its univariate counterpart, especially for series having nonlinear structure involving several components of the vector process and weakly or moderately cross-correlated process error ter...
We present a sampling-based Bayesian approach for modeling and forecasting a bivariate process having a long-range dependent component that is common to each series and additive short-range dependent noise terms that are unique to each series. A common long-range dependent component might be observed, for instance, in two climatological series coll...
We describe statistical methods for sensitivity and performance analysis of complex computer simulation experiments. Graphical methods, such as trellis plots, are suggested for exploratory analysis of individual or aggregate performance metrics conditional on different experiment inputs. More formal statistical methods, such as analysis of variance...
We propose a method to identify common persistent components in a k-dimensional time series. Assuming that the individual series of the vector process have long-range dependence, we apply canonical correlation analysis to the series and its lagged values. A zero canonical correlation implies the existence of a short-memory linear combination, hence...
In this article we use the Time Series Multivariate Adaptive Regression Splines (TSMARS) methodology to estimate and forecast non-linear structure in weekly exchange rates for four major currencies during the 1980s. The methodology is applied in three steps. First, univariate models are fitted to the data and the residuals are checked for outliers....
We investigate the effect of long-range dependence on bandwidth selection for kernel regression with the plug-in method of Herrmann, Gasser & Kneip (1992). A new bandwidth estimator is proposed to allow for long-range dependence. Properties of the proposed estimator are investigated theoretically and via simulation. We find that the proposed estima...
We analyze a time series of 20 years of daily sea surface temperatures measured off the California coast. The temperatures exhibit quite complicated features, such as effects on many different time scales, nonlinear effects, and long-range dependence. We show how a time series version of the multivariate adaptive regression splines (MARS) algorithm...
We present a general framework for Bayesian inference of multivariate time series exhibiting both long and short memory behavior. The series are modeled using a multivariate autoregressive fractionally integrated moving average (MVARFIMA) process, which can capture both the short and long memory characteristics of the individual series, as well as...
We present a methodology for estimation, prediction and model assessment of multivariate autoregressive moving-average (VARMA) models in the Bayesian framework using Markov chain Monte Carlo algorithms. The sampling-based Bayesian framework for inference allows for the incorporation of parameter restrictions, such as stationarity restrictions or ze...
The purpose of this study is to analyze time series of daily and monthly values for the Tokyo Stock Price Index (TOPIX) and stock price values for 15 companies listed on the Tokyo Stock Exchange, Section 1 (TSE-I), to determine the contribution of permanent and temporary components to Japanese stock prices. The existence of temporary components in...
Fractionally integrated autoregressive moving-average (ARFIMA) models have proved useful tools in the analysis of time series with long-range dependence. However, little is known about various practical issues regarding model selection and estimation methods, and the impact of selection and estimation methods on forecasts. By means of a large-scale...
We discuss the problem of bandwidth selection for a kernel regression trend estimator when the errors are long-range dependent. The iterative plug-in bandwidth selection method is investigated and modified to account for long memory in the errors. We compare the mean average-squared errors of the trend estimates using the bandwidth obtained from th...
We consider the asymptotic characteristics of the periodogram ordinates of a fractionally integrated process having memory parameter d≥ 0.5, for which the process is nonstationary, or d≤ -.5, for which the process is noninvertible. Series having d outside the range (-.5,.5) may arise in practice when a raw series is modeled without preliminary cons...
Nonstationary ARIMA processes and nearly nonstationary ARMA pro-cesses, such as autoregressive processes having a root of the AR polynomial close to the unit circle, have sample autocovariance and spectral properties that are, in practice, almost indistinguishable from those of a stationary long-memory process, such a s a F ractionally Integrated A...
This paper presents an automatic technique for making simple inferences about the stages in a software production process, discusses implementation of the technique, and validates the technique using defect data from several software development projects. The technique represents an approach to automate process feedback that may be based on either...
We use a series of monthly IBM product revenues to illustrate the usefulness of seasonal fractionally differenced ARMA models for business forecasting. By allowing two seasonal fractional differencing parameters in the model, one at lag three and the other at lag twelve, we obtain a stationary series without losing information about the process beh...
Orthogonal defect classification (ODC), a concept that enables
in-process feedback to software developers by extracting signatures on
the development process from defects, is described. The ideas are
evolved from an earlier finding that demonstrates the use of semantic
information from defects to extract cause-effect relationships in the
developmen...
Bivariate time series which display nonstationary behavior, such as cycles or long-term trends, are common in fields such as oceanography and meteorology. These are usually very large-scale data sets and often may contain long gaps of missing values in one or both series, with the gaps perhaps occurring at different time periods in the two series....
The authors present a reliability growth model for defects that
have been categorized into defect types associated with specific stages
in the software development process. Modeling the reliability growth of
defects for each type separately allows identification of problems in
the development process which may otherwise be masked when defects of
al...
The authors present a reliability growth model for defects that have
been categorized into defect types associated with specific stages in
the software development process. Modeling the reliability growth of
defects for each type separately allows identification of problems in
the development process which may otherwise be masked when defects of
al...
We present a sampling-based Bayesian approach for modeling and forecasting a common factor time series model in which the common components are long-range dependent. The Gibbs sampling framework allows us to use a less computationally demanding ARMA model to approximate the common long-range dependent behavior in the sampling algorithm; we then adj...
We investigate bootstrap inference methods for nonlinear time series models obtained using Multivariate Adaptive Regression Splines for Time Series (TSMARS). Bootstrapping is carried out using two methods: resampling residuals from an initial fitted model and resampling regression pairs. Bootstrap AGGregatING (bagging) (Breiman, 1996) is implemente...
this paper, we propose a multivariate 2
Recent empirical studies show that the squares of high-frequency stock returns are long-range dependent and can be modeled as fractionally integrated processes, using, for example, long-memory stochastic volatility models. Are such long-range dependencies common among stocks? Are they caused by the same sources of variation? In this paper, we class...
Building Bayesian belief networks in the ab-sence of data involves the challenging task of eliciting conditional probabilities from ex-perts. In this paper, we develop analytical methods for determining the order in which parameters are to be elicited, based on a proximity criteria for the distribution of ei-ther the entire set of variables, or a s...
Thesis (Ph. D.)--Columbia University, 1991. Includes bibliographical references (leaves 153-157). Microfilm. "91-27,958."