• Home
  • IBM
  • Thomas J. Watson Research Center
  • Bonnie K. Ray
Bonnie K. Ray

Bonnie K. Ray
IBM · Thomas J. Watson Research Center

About

78
Publications
24,973
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,066
Citations

Publications

Publications (78)
Article
Full-text available
This paper presents a strategy to quantify the influence major point sources in a region have on extreme pollution values observed at each of the monitors in the network. We focus on the number of hours in a day the levels at a monitor exceed a specified health threshold. The number of daily exceedances are modeled using observation-driven negative...
Article
Strategic risks represent the largest challenge for corporate risk management, often due to lack of data or incompatibility with existing financial modeling frameworks. Indeed, industry surveys found that from 2002 to 2012 strategic risks accounted for over 80% of the cases of significant shareholder value loss among Top 1000 companies. To meet the...
Article
Full-text available
Distinguishing among linear and nonlinear time series or between nonlinear time series generated by different underlying processes is challenging, as second-order properties are generally insufficient for the task. Different nonlinear processes have different nonconstant bispectral signatures, whereas the bispectral density function of a Gaussian o...
Article
The importance of strategic planning is universally recognized in the business world as an effective approach to enable achievement of enterprise business objectives over long time periods. However, despite the criticality of the task, the strategic planning process often does not take advantage of analytics to support the process in a consistent w...
Article
Full-text available
Often there is substantial disparity in sales performance across various units of an organization. It is crucial to model the effects of various drivers/inhibitors on sales performance, particularly those that can be acted upon, since insight into such drivers/inhibitors is essential for determining optimal actions for improving performance. We pre...
Article
Full-text available
Although business process modeling is considered as a core activity in enterprise risk management, existing process modeling languages do not include a complete notation for documenting how processes can fail. This paper develops a conceptual framework for extending standard business process metamodels to include comprehensive information that is u...
Conference Paper
Full-text available
We present a new L1-distance-based k-means clustering algorithm to address the challenge of clustering high-dimensional proportional vectors. The new algorithm explicitly incorporates proportionality constraints in the computation of the cluster centroids, resulting in reduced L1 error rates. We compare the new method to two competing methods, an a...
Conference Paper
Full-text available
We present algorithms for improved Viterbi decoding for the case of hidden semi-Markov models. By carefully constructing directed acyclic graphs, we pose the decoding problem as that of finding the longest path between specific pairs of nodes. We consider fully connected models as well as restrictive topologies and state duration conditions, and sh...
Article
Web-delivered service is an emerging approach for IT service to reduce IT service cost and improve delivery efficiency by leveraging the partnership and web technology. This paper proposes a three-tier analytical framework of web-delivered service to improve business deign based on the centralised information from the platform. At business model de...
Article
SummaryA general framework is presented for Bayesian inference of multivariate time series exhibiting long-range dependence. The series are modelled using a vector autoregressive fractionally integrated moving-average (VARFIMA) process, which can capture both short-term correlation structure and long-range dependence characteristics of the individu...
Article
We look at the implications of modeling observations from a fractionally differenced noise process using an approximating AR (p) model. The approximation is used because of computational difficulties in the estimation of the differencing parameter of the fractional noise model. Because the fractional noise process is long-range dependent, we assess...
Article
We describe the use of a latent Markov process governing the parameters of a nonhomogeneous Poisson process (NHPP) model for characterizing the software development defect discovery process. Use of a Markov switching process allows us to characterize non-smooth variations in the rate at which defects are found, better reflecting the industrial soft...
Article
We present two new spectral-based methods for detection of changes in autocorrelation structure in a continuous-valued time series in an online process monitoring setting. Our methods are based on the idea that changes in the autocorrelation structure are reflected by changes in the Fourier or wavelet-based spectrum and can be detected by comparing...
Article
Full-text available
This paper presents a framework for the modeling and analysis of business model designs involving a network of interconnected business entities. The framework includes an ecosystem-modeling component, a simulation component, and a service-analysis component, and integrates methods from value network modeling, game theory analysis, and multiagent sy...
Conference Paper
Reuse-based development effort is an important factor to be considered when selecting appropriate reusable components. However, it's rarely considered very seriously in current practice, as current methods for estimation of reuse development effort rely heavily on personal experience and different developers may provide very diverse estimates. In t...
Article
We develop techniques for mining labor records from a large number of historical IT consulting projects in order to discover clusters of projects exhibiting similar resource usage over the project life-cycle. The clustering results, together with domain expertise, are used to build a meaningful project taxonomy that can be linked to project resourc...
Article
We present a methodology for managing outsourcing projects from the vendor's perspective, designed to maximize the value to both the vendor and its clients. The methodology is applicable across the outsourcing lifecycle, providing the capability to select and target new clients, manage the existing client portfolio and quantify the realized benefit...
Article
In order to successfully deliver a labor-based professional service, the right people with the right skills must be available to deliver the service when it is needed. Meeting this objective requires a systematic, repeatable approach for determining the staffing requirements that enable informed staffing management decisions. We present a methodolo...
Conference Paper
form only given. Most organizations do not have sufficient resources to meet all of their obligations; selecting which projects should be funded is not just ranking projects and funding them 'top-down' until resources are depleted. Organizations need to balance the benefits that project portfolios provide with their respective constraints and they...
Article
Full-text available
We propose a simulation-based Bayesian approach to analyze multivariate time series with possible common long-range dependent factors. A state-space approach is used to represent the likelihood function in a tractable manner. The approach taken here allows for extension to fit a non-Gaussian multivariate stochastic volatility (MVSV) model with comm...
Article
Full-text available
We extend the functional coefficient autoregressive (FCAR) model to the multivariate nonlinear time series framework. We show how to estimate parameters of the model using kernel regression techniques, discuss properties of the estimators, and provide a bootstrap test for determining the presence of nonlinearity in a vector time series. The power o...
Article
This article presents a new model for software reliability characterization using a growth Curve formulation that allows model parameters to vary as a function of covariate information. In the software reliability framework, covariates may include such things as the number of lines of code for a product throughout its development cycle and the numb...
Conference Paper
We introduce an approach for model-based sequence clustering that addresses several drawbacks of existing algorithms. The approach uses a combination of Hidden Markov Models (HMMs) for sequence estimation and Dynamic Time Warping (DTW) for hierarchical clustering, with interlocking steps of model selection, estimation and sequence grouping. We demo...
Article
Professional services firms are project based. Execution of these projects involve identifying and planning for the right skills. In this paper, we model the problem of predicting the skills requirement for the projects in the pipeline of a professional services firm from the bill of resources of some similar projects that had been executed in the...
Article
We develop NHPP models to characterize categorized event data, with application to modelling the discovery process for categorized software defects. Conditioning on the total number of defects, multivariate models are proposed for modelling the defects by type. A latent vector autoregressive structure is used to characterize dependencies among the...
Article
This paper presents and evaluates alternative methods for multi-step forecasting using univariate and multivariate functional coefficient autoregressive (FCAR) models. The methods include a simple “plug-in” approach, a bootstrap-based approach, and a multi-stage smoothing approach, where the functional coefficients are updated at each step to incor...
Conference Paper
Projecting defect occurrences over time is a necessary com- ponent in the development of methods to mitigate the risks of software defects for software producers and software con- sumers. In this paper, we examine user-reported software defect occurrences across twenty-two releases of four widely- deployed business-critical production software syst...
Conference Paper
Traditional software metrics, such as code coverage, McCabe complexity, etc. address the needs of a software engineer. In contrast, managers of software development organizations face a broader set of issues. For example, an executive responsible for multiple products and releases has to understand the customer views of those products and put in pl...
Chapter
A key problem in time series analysis is the determination or estimation of the degree of integration, d , of the series. In the case of autoregressive moving-average (ARMA) models, for example, a series with a unit root in the AR polynomial (and all MA roots outside the unit circle) has d = 1 and is nonstationary, a series with a unit root in the...
Article
A modified multivariate adaptive regression splines method for modeling vector nonlinear time series is investigated. The method results in models that can capture certain types of vector self-exciting threshold autoregressive behavior, as well as provide good predictions for more general vector nonlinear time series. The effect of different model...
Article
We propose a new semiparametric estimator of the degree of persistence in volatility for long memory stochastic volatility (LMSV) models. The estimator uses the periodogram of the log squared returns in a local Whittle criterion which explicitly accounts for the noise term in the LMSV model. Finite-sample and asymptotic standard errors for the esti...
Article
We describe a Bayesian method for detecting structural changes in a long-range dependent process. In particular, we focus on changes in the long-range dependence parameter, d, and changes in the process level, μ. Markov chain Monte Carlo (MCMC) methods are used to estimate the posterior probability and size of a change at time t, along with other m...
Article
The development of long-memory stochastic volatility (LMSV) models has increased the interest in the estimation of persistent processes observed with added noise. This paper investigates the performance of semi-parametric methods for estimating the long-memory-parameter in the long-range dependence plus noise case and demonstrates improvements obta...
Article
We present new methods for modelling nonlinear threshold-type autoregressive behaviour in periodically correlated time series. The methods are illustrated using a series of average monthly flows of the Fraser River in British Columbia. Commonly used nonlinearity tests of the river flow data in each month indicate nonlinear behaviour in certain mont...
Article
Standard quality control chart interpretation assumes that the observed data are uncorrelated. The presence of autocorrelation in process data has adverse effects on the performance of control charts. The objective of this paper is to assess the behavior of moving average forecast-based control charts on data having correlation that is persistent o...
Article
We provide explicit formulae for the joint predictive distribution of a Gaussian vector autoregressive fractionally integrated moving average (VARFIMA) process and describe a Bayesian method for its feasible evaluation. Inference for the parameters in the Bayesian framework is based on the joint posterior distribution of the model parameters using...
Article
Full-text available
Presented are investigations into the spatial structure of teleconnections between both the winter El Nino-Southern Oscillation (ENSO) and global sea surface temperatures (SSTs), and a measure of continental U.S. summer drought during the twentieth century. Potential nonlinearities and nonstationarities in the relationships are noted. During the fi...
Article
Various authors claim to have found evidence of stochastic long memory behavior in futures' contract returns using the Hurst statistic. This paper reexamines futures' re- turns for evidence of persistent behavior using a biased-corrected version of the Hurst statistic, a nonparametric spectral test, and a spectral regression estimate of the long- m...
Article
Full-text available
We describe statistical methods for sensitivity and performance analysis of complex computer simulation experiments. Graphical methods, such as trellis plots, are suggested for exploratory analysis of individual or aggregate performance metrics conditional on different experiment inputs. More formal statistical methods, such as analysis of variance...
Article
Full-text available
Software recreates arenecessitated due to inadequate diagnostic capability following a failure. They impact the serviceprocess and the perception of availability, but have never been adequately quanti#ed. This paper develops a technique to make the key measurements of: percent recreate, arrival rate and open time,from problem service data without r...
Article
Full-text available
Exploratory methods for determining appropriate lagged variables in a vector nonlinear time series model are investigated. The first is a multivariate extension of the R statistic considered by Granger and Lin (1994), which is based on an estimate of the mutual information criterion. The second method uses Kendall's and partial statistics for lag d...
Article
Full-text available
: Various authors claim to have found evidence of stochastic long memory behavior in futures' contract returns using the Hurst statistic. This paper reexamines futures' returns for evidence of persistent behavior using a biased-corrected version of the Hurst statistic and an estimate of the long-memory parameter based on the process spectrum. Resul...
Article
Full-text available
A multivariate extension of the univariate nonlinearity test of Tsay (1986) is presented. Simulation results show that the multivariate test is more powerful than its univariate counterpart, especially for series having nonlinear structure involving several components of the vector process and weakly or moderately cross-correlated process error ter...
Article
Full-text available
We present a sampling-based Bayesian approach for modeling and forecasting a bivariate process having a long-range dependent component that is common to each series and additive short-range dependent noise terms that are unique to each series. A common long-range dependent component might be observed, for instance, in two climatological series coll...
Conference Paper
We describe statistical methods for sensitivity and performance analysis of complex computer simulation experiments. Graphical methods, such as trellis plots, are suggested for exploratory analysis of individual or aggregate performance metrics conditional on different experiment inputs. More formal statistical methods, such as analysis of variance...
Article
We propose a method to identify common persistent components in a k-dimensional time series. Assuming that the individual series of the vector process have long-range dependence, we apply canonical correlation analysis to the series and its lagged values. A zero canonical correlation implies the existence of a short-memory linear combination, hence...
Article
In this article we use the Time Series Multivariate Adaptive Regression Splines (TSMARS) methodology to estimate and forecast non-linear structure in weekly exchange rates for four major currencies during the 1980s. The methodology is applied in three steps. First, univariate models are fitted to the data and the residuals are checked for outliers....
Article
We investigate the effect of long-range dependence on bandwidth selection for kernel regression with the plug-in method of Herrmann, Gasser & Kneip (1992). A new bandwidth estimator is proposed to allow for long-range dependence. Properties of the proposed estimator are investigated theoretically and via simulation. We find that the proposed estima...
Article
We analyze a time series of 20 years of daily sea surface temperatures measured off the California coast. The temperatures exhibit quite complicated features, such as effects on many different time scales, nonlinear effects, and long-range dependence. We show how a time series version of the multivariate adaptive regression splines (MARS) algorithm...
Article
We present a general framework for Bayesian inference of multivariate time series exhibiting both long and short memory behavior. The series are modeled using a multivariate autoregressive fractionally integrated moving average (MVARFIMA) process, which can capture both the short and long memory characteristics of the individual series, as well as...
Article
We present a methodology for estimation, prediction and model assessment of multivariate autoregressive moving-average (VARMA) models in the Bayesian framework using Markov chain Monte Carlo algorithms. The sampling-based Bayesian framework for inference allows for the incorporation of parameter restrictions, such as stationarity restrictions or ze...
Article
The purpose of this study is to analyze time series of daily and monthly values for the Tokyo Stock Price Index (TOPIX) and stock price values for 15 companies listed on the Tokyo Stock Exchange, Section 1 (TSE-I), to determine the contribution of permanent and temporary components to Japanese stock prices. The existence of temporary components in...
Article
Fractionally integrated autoregressive moving-average (ARFIMA) models have proved useful tools in the analysis of time series with long-range dependence. However, little is known about various practical issues regarding model selection and estimation methods, and the impact of selection and estimation methods on forecasts. By means of a large-scale...
Chapter
We discuss the problem of bandwidth selection for a kernel regression trend estimator when the errors are long-range dependent. The iterative plug-in bandwidth selection method is investigated and modified to account for long memory in the errors. We compare the mean average-squared errors of the trend estimates using the bandwidth obtained from th...
Article
We consider the asymptotic characteristics of the periodogram ordinates of a fractionally integrated process having memory parameter d≥ 0.5, for which the process is nonstationary, or d≤ -.5, for which the process is noninvertible. Series having d outside the range (-.5,.5) may arise in practice when a raw series is modeled without preliminary cons...
Article
Full-text available
Nonstationary ARIMA processes and nearly nonstationary ARMA pro-cesses, such as autoregressive processes having a root of the AR polynomial close to the unit circle, have sample autocovariance and spectral properties that are, in practice, almost indistinguishable from those of a stationary long-memory process, such a s a F ractionally Integrated A...
Article
This paper presents an automatic technique for making simple inferences about the stages in a software production process, discusses implementation of the technique, and validates the technique using defect data from several software development projects. The technique represents an approach to automate process feedback that may be based on either...
Article
We use a series of monthly IBM product revenues to illustrate the usefulness of seasonal fractionally differenced ARMA models for business forecasting. By allowing two seasonal fractional differencing parameters in the model, one at lag three and the other at lag twelve, we obtain a stationary series without losing information about the process beh...
Article
Full-text available
Orthogonal defect classification (ODC), a concept that enables in-process feedback to software developers by extracting signatures on the development process from defects, is described. The ideas are evolved from an earlier finding that demonstrates the use of semantic information from defects to extract cause-effect relationships in the developmen...
Article
Bivariate time series which display nonstationary behavior, such as cycles or long-term trends, are common in fields such as oceanography and meteorology. These are usually very large-scale data sets and often may contain long gaps of missing values in one or both series, with the gaps perhaps occurring at different time periods in the two series....
Conference Paper
The authors present a reliability growth model for defects that have been categorized into defect types associated with specific stages in the software development process. Modeling the reliability growth of defects for each type separately allows identification of problems in the development process which may otherwise be masked when defects of al...
Article
The authors present a reliability growth model for defects that have been categorized into defect types associated with specific stages in the software development process. Modeling the reliability growth of defects for each type separately allows identification of problems in the development process which may otherwise be masked when defects of al...
Article
Full-text available
We present a sampling-based Bayesian approach for modeling and forecasting a common factor time series model in which the common components are long-range dependent. The Gibbs sampling framework allows us to use a less computationally demanding ARMA model to approximate the common long-range dependent behavior in the sampling algorithm; we then adj...
Article
We investigate bootstrap inference methods for nonlinear time series models obtained using Multivariate Adaptive Regression Splines for Time Series (TSMARS). Bootstrapping is carried out using two methods: resampling residuals from an initial fitted model and resampling regression pairs. Bootstrap AGGregatING (bagging) (Breiman, 1996) is implemente...
Article
Full-text available
this paper, we propose a multivariate 2
Article
Recent empirical studies show that the squares of high-frequency stock returns are long-range dependent and can be modeled as fractionally integrated processes, using, for example, long-memory stochastic volatility models. Are such long-range dependencies common among stocks? Are they caused by the same sources of variation? In this paper, we class...
Article
Full-text available
Building Bayesian belief networks in the ab-sence of data involves the challenging task of eliciting conditional probabilities from ex-perts. In this paper, we develop analytical methods for determining the order in which parameters are to be elicited, based on a proximity criteria for the distribution of ei-ther the entire set of variables, or a s...
Article
Thesis (Ph. D.)--Columbia University, 1991. Includes bibliographical references (leaves 153-157). Microfilm. "91-27,958."

Network

Cited By