J. R. M. Hosking’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (48)


Scoring and thresholding for availability
  • Article

January 2008

·

16 Reads

IBM Systems Journal

·

J. R. M. Hosking

As the capacity of hardware systems has grown and workload consolidation has taken place, the volume of performance metrics and diagnostic data streams has outscaled the capability of people to handle these systems using traditional methods. As work of different types (such as database, batch, and Web processing), each in its own monitoring silo, runs concurrently on a single image (operating system instance), both the complexity and the business consequences of a single image failure have increased. This paper presents two techniques for generating actionable information out of the overwhelming amount of performance and diagnostic data available to human analysts. Failure scoring is used to identify high-risk failure events that may be obscured in the myriad system events. This replaces human expertise in scanning tens of thousands of records per day and results in a short, prioritized list for action by systems staff. Adaptive thresholding is used to drive predictive and descriptive machine-learning-based modeling to isolate and identify misbehaving processes and transactions. The attraction of this technique is that it does not require human intervention and can be reapplied continually, resulting in models that are not brittle. Both techniques reduce the quantity and increase the relevance of data available for programmatic and human processes.


Some theory and practical uses of trimmed L-moments

September 2007

·

329 Reads

·

97 Citations

Journal of Statistical Planning and Inference

Trimmed L-moments, defined by Elamir and Seheult [2003. Trimmed L-moments. Comput. Statist. Data Anal. 43, 299–314], summarize the shape of probability distributions or data samples in a way that remains viable for heavy-tailed distributions, even those for which the mean may not exist. We derive some further theoretical results concerning trimmed L-moments: a relation with the expansion of the quantile function as a weighted sum of Jacobi polynomials; the bounds that must be satisfied by trimmed L-moments; recurrences between trimmed L-moments with different degrees of trimming; and the asymptotic distributions of sample estimators of trimmed L-moments. We also give examples of how trimmed L-moments can be used, analogously to L-moments, in the analysis of heavy-tailed data. Examples include identification of distributions using a trimmed L-moment ratio diagram, shape parameter estimation for the generalized Pareto distribution, and fitting generalized Pareto distributions to a heavy-tailed data sample of computer network traffic.


Distributions with maximum entropy subject to constraints on their L-moments or expected order statistics

September 2007

·

125 Reads

·

12 Citations

Journal of Statistical Planning and Inference

We find the distribution that has maximum entropy conditional on having specified values of its first rL-moments. This condition is equivalent to specifying the expected values of the order statistics of a sample of size r. The maximum-entropy distribution has a density-quantile function, the reciprocal of the derivative of the quantile function, that is a polynomial of degree r; the quantile function of the distribution can then be found by integration. This class of maximum-entropy distributions includes the uniform, exponential and logistic, and two new generalizations of the logistic distribution. It provides a new method of nonparametric fitting of a distribution to a data sample. We also derive maximum-entropy distributions subject to constraints on expected values of linear combinations of order statistics.


Spatial Comparability of the Palmer Drought Severity Index

June 2007

·

152 Reads

·

149 Citations

JAWRA Journal of the American Water Resources Association

The Palmer Drought Severity Index, which is intended to be of reasonable comparable local significance both in space and time, has been extensively used as a measure of drought for both agricultural and water resource management. This study examines the spatial comparability of Palmer's (1965) definition of severe and extreme drought. Index values have been computed for 1035 sites with at least 60 years of record that are scattered across the contiguous United States, and quantile values corresponding to a specified index value were calculated for given months and then mapped. The analyses show that severe or extreme droughts, as defined by Palmer (1965), are not spatially comparable in terms of identifying rare events. The wide variation across the country in the frequency of occurrence of Palmer's (1965) extreme droughts reflects the differences in the variability of precipitation, as well as the average amount of precipitation. It is recommended first, that a drought index be developed which considers both variability and averages; and second, that water resource managers and planners define a drought in terms of an index value that corresponds to the expected quantile (return period) of the event.


Decomposition of heterogeneous classification problems

June 2006

·

18 Reads

·

4 Citations

Chidanand Apte

·

·

Jonathan R. M. Hosking

·

[...]

·

Barry K. Rosen

In some classification problems the feature space is heterogeneous in that the best features on which to base the classification are different in different parts of the feature space. In some other problems the classes can be divided into subsets such that distinguishing one subset of classes from another and classifying examples within such subsets require very different decision rules, involving different sets of features. In such heterogeneous problems, many modeling techniques (including decision trees, rules, and neural networks) evaluate the performance of alternative decision rules by averaging over the entire problem space, and axe prone to generating a model that is suboptimal in any of the regions or subproblems. Better overall models can be obtained by splitting the problem appropriately and modeling each subproblem separately. This paper presents a new measure to determine the degree of dissimilarity between the decision surfaces of two given problems, and suggests a way to search for a strategic splitting of the feature space that identifies regions with different characteristics. We illustrate the concept using a multiplexor problem.


On the characterization of distributions by their LL-moments

January 2006

·

204 Reads

·

82 Citations

Journal of Statistical Planning and Inference

A distribution with finite mean is uniquely determined by the set of expectations of the largest (or smallest) order statistics from samples of size 1,2,…1,2,…. However, this characterization contains some redundancy; some of the expectations can be dropped from the set and the remaining elements of the set still suffice to characterize the distribution. The rth LL-moment of a distribution is a linear combination of the expectations of the largest (or smallest) order statistics from samples of size 1,2,…,r1,2,…,r. We show that a wide range of distributions can be characterized by their LL-moments with no redundancy; a set that contains all of the LL-moments except one no longer suffices to characterize the distribution.


Ensemble modeling through multiplicative adjustment of class probability

February 2002

·

34 Reads

·

4 Citations

We develop a new concept for aggregating items of evidence for class probability estimation. In Naive Bayes, each feature contributes an independent multiplicative factor to the estimated class probability. We modify this model to include an exponent in each factor in order to introduce feature importance. These exponents are chosen to maximize the accuracy of estimated class probabilities on the training data. For Naive Bayes, this modification accomplishes more than what feature selection can. More generally, since the individual features can be the outputs of separate probability models, this yields a new ensemble modeling approach, which we call APM (Adjusted Probability Model), along with a regularized version called APMR.


A Statistical Perspective on Data Mining

October 1999

·

594 Reads

·

64 Citations

Future Generation Computer Systems

Data mining can be regarded as a collection of methods for drawing inferences from data. The aims of data mining, and some of its methods, overlap with those of classical statistics. However, there are some philosophical and methodological differences. We examine these differences, and we describe three approaches to machine learning that have developed largely independently: classical statistics, Vapnik's statistical learning theory, and computational learning theory. Comparing these approaches, we conclude that statisticians and data miners can profit by studying each other's methods and using a judiciously chosen combination of them.


Table 1 : Results for multiplexor classiication problems. Tabulated values are merit and IPA measures for each of the eight feature variables. Numbers of 0s and 1s in the 1000 examples are also given, both for the feature variables and for the class variable C. Starred values are the smallest of the IPA v alues for the control inputs and the largest of the IPA v alues for the signal and irrelevant inputs. \Gap" is the diierence between the two starred values in the row.
Table 2 : Summary of the trees generated by C4.5 and CART for Cases 1{4.
Table 3 :
Table 4 : IPA v alues for feature-based subproblems of the DNA data.
Table 6 :

+1

Decomposition of Heterogeneous Classification Problems
  • Article
  • Full-text available

October 1999

·

115 Reads

·

29 Citations

Intelligent Data Analysis

In some classification problems the feature space is heterogeneous in that the best features on which to base the classification are different in different parts of the feature space. In some other problems the classes can be divided into subsets such that distinguishing one subset of classes from another and classifying examples within the subsets require very different decision rules, involving different sets of features. In suchheterogeneous problems, many modeling techniques (including decision trees, rules, and neural networks) evaluate the performance of alternative decision rules byaveraging over the entire problem space, and are prone to generating a model that is suboptimal in any of the regions or subproblems. Better overall models can be obtained by splitting the problem appropriately and modeling each subproblem separately. This paper presents a new measure to determine the degree of dissimilaritybetween the decision surfaces of two given problems, and suggests a...

Download

Table 1 . Contingency table for feature class cross-classiication.
Use of Randomization to Normalize Feature Merits

October 1999

·

67 Reads

·

5 Citations

. Feature merits are used for feature selection in classification and regression as well as for decision tree generation. Commonly used merit functions exhibit a bias towards features that take a large varietyofvalues. Wepresentascheme based on randomization for neutralizing this bias by normalizing the merits. The merit of a feature is normalized by division by the expected merit of a feature that is random noise taking the same distribution of values as the given feature. The noise feature is obtained by randomly permuting the values of the given feature. The scheme can be used for any merit function including the Gini and entropy measures. We demonstrate its effectiveness by applying it to the contextual merit defined byHong(IBM Res. Rep. RC19664, 1994). Keywords: Classification, Data mining, Feature merits, Feature selection. Area of interest: Concept formation and classification. Contact: Jonathan R. M. Hosking, hosking@watson.ibm.com, phone +1 914 945 1031, fax +1 914 945 3434. ...


Citations (45)


... The three parameters β (shape), α (scale), and γ (location) can be estimated via the maximum likelihood or with probability-weighted moments (PWMs; Hosking, 1986Hosking, , 1990. Following Beguería et al. (2014), the unbiased estimator for a PWM (Hosking, 1986) was considered: ...

Reference:

Generalised drought index: a novel multi-scale daily approach for drought assessment
L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics
  • Citing Article
  • September 1990

Journal of the Royal Statistical Society Series B (Methodological)

... The primary reason for applying the 50 GEVD is its support by large sample theory (Coles 2001). The probability density function 51 (PDF) of the stationary GEVD is as follows (Hosking & Wallis 1997): ...

Regional Frequency Analysis: An Approach Based on L-Moments
  • Citing Article
  • September 1998

... Events Nevertheless, there is a multitude of events in time series data that can be relevant for the detection of possible correlations in the IT landscape. In the literature different types of anomalies in time series data have been identified, e.g. by [38] [21] [39]: ...

Reallocation Outliers in Time Series
  • Citing Article
  • January 1993

Journal of the Royal Statistical Society Series C Applied Statistics

... As an example, in the case where n=5, b=6, there are 16,807 combinations of parts' order on the CONV, meaning 16,807 IF-THEN rules have to be created. An approach to reduce the multiple combinatorial problems such as evolutionary classifier (17), (18), (19), (20) is unsuitable for implementation due to the huge combinations required for large calculation times to encode the individuals. A tree search-based algorithm (21), (22), (23) is another approach for handling this problem, but it is difficult to employ because making a correct tree construction is difficult for our production problem. ...

Decomposition of Heterogeneous Classification Problems
  • Citing Article
  • December 1998

Intelligent Data Analysis

... Unless there is a compelling reason to use a different probability distribution to model flows, the USGS employs log-Pearson Type 3 (LP3). While it is considered the "base method" in the US [per the recommendation of the US Water Resources Council (Committee, 1967(Committee, , 1975], it is far from being the only option and others may be more effective in a given area (Hosking and Wallis, 1993;Kuczera, 1982;Singh, 1998;Wallis and Wood, 1985). ...

Erratum: ``Some statistics useful in regional frequency analysis'' [Water Resources Research, 29(2), 271-281 (1993)]
  • Citing Article
  • January 1995

Water Resources Research

... In this context, we describe how a given reference distribution gives rise to a whole collection of L-functionals. If Legendre polynomials and a uniform reference distribution are used, the resulting class of L-functionals of order 1, 2,… corresponds to L-moments [13], whereas the ratios of L-functionals of order (2,1), (3,2), and (4,2) agree with the L-coefficient of variation, the L-skewness, and the L-kurtosis [14][15][16] up to normalizing constants. This Legendre class of L-functionals is a natural choice for distributions with bounded support, whereas Hermite polynomials (with a Gaussian reference distribution) or Laguerre polynomials (with an exponential reference distribution) are preferable for data whose support is on the real line and on the positive real line, respectively, particularly so if the distribution of the response is close to the reference distribution. ...

On the characterization of distributions by their LL-moments
  • Citing Article
  • January 2006

Journal of Statistical Planning and Inference

... As summarized in ESM Table S1, not all GCMs reach with the assumption that the reference period and each GWLs can be considered roughly stationary. We fitted a generalized extreme value (GEV) distribution to the samples by using the maximum likelihood parameter estimation (Hosking and Wallis, 1993), as implemented in the R 'extRemes' package (Gilleland, 2024). ...

Some Statistics Useful in Regional Frequency Analysis
  • Citing Article
  • February 1993