Richard J. Hathaway’s research while affiliated with Mathematical Sciences Research Institute and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (75)


Colorized iVAT Images for Labeled Data
  • Article

August 2023

·

5 Reads

Advances in Science Technology and Engineering Systems Journal

Elizabeth Dixon Hathaway

·

Richard Joseph Hathaway


Clustering Random Variables

March 2015

·

38 Reads

IETE Journal of Research

The fuzzy c-means (FCM) clustering algorithm has long been used to cluster numerical data. More recently FCM has found application in certain schemes for clustering heterogenous data sets consisting of mixtures of numerical, interval, and fuzzy data. The range of applicability of FCM is extended here to include clustering data whose features are continuous random variables. Parametric, nonparametric and empirical models are presented, and in each case, the distributional laws of the random variables are encoded to give a real, finite-dimensional representation to which FCM can be applied. Subsequent decoding of the results yields cluster prototypes that are similar in nature to the original data themselves. Some properties of the approach are noted and the results of preliminary computational experimentation are given.


Density-Weighted Fuzzy c-Means Clustering

March 2009

·

144 Reads

·

72 Citations

IEEE Transactions on Fuzzy Systems

In this short paper, a unified framework for performing density-weighted fuzzy c-means (FCM) clustering of feature and relational datasets is presented. The proposed approach consists of reducing the original dataset to a smaller one, assigning each selected datum a weight reflecting the number of nearby data, clustering the weighted reduced dataset using a weighted version of the feature or relational data FCM algorithm, and if desired, extending the reduced data results back to the original dataset. Several methods are given for each of the tasks of data subset selection, weight assignment, and extension of the weighted clustering results. The newly proposed weighted version of the non-Euclidean relational FCM algorithm is proved to produce the identical results as its feature data analog for a certain type of relational data. Artificial and real data examples are used to demonstrate and contrast various instances of this general approach.


A Simple Acronym for Doing Calculus: CAL

November 2008

·

24 Reads

·

3 Citations

PRIMUS

An acronym is presented that provides students a potentially useful, unifying view of the major topics covered in an elementary calculus sequence. The acronym (CAL) is based on viewing the calculus procedure for solving a calculus problem P∗ in three steps: 1. recognizing that the problem cannot be solved using simple (non-calculus) techniques, 2. approximating the solution of P∗ using simple techniques, and 3. perfecting the approximation by taking an appropriate limit. The modest contribution of this note attempts to package this point-of-view in a form that can be easily remembered and appreciated by students. The acronym is presented, explained, and illustrated using several important calculus topics.


An algorithm for clustering tendency assessment

July 2008

·

42 Reads

·

9 Citations

The visual assessment of tendency (VAT) technique, developed by J.C. Bezdek, R.J. Hathaway and J.M. Huband, uses a visual approach to find the number of clusters in data. In this paper, we develop a new algorithm that processes the numeric output of VAT programs, other than gray level images as in VAT, and produces the tendency curves. Possible cluster borders will be seen as high-low patterns on the curves, which can be caught not only by human eyes but also by the computer. Our numerical results are very promising. The program caught cluster structures even in cases where the visual outputs of VAT are virtually useless.


Tendency curves for visual clustering assessment

May 2008

·

29 Reads

·

2 Citations

We improve the visual assessment of tendency (VAT) technique, which, developed by J.C. Bezdek, R.J. Hathaway and J.M. Huband, uses a visual approach to find the number of clusters in data. Instead of using square gray level images of dissimilarity matrices as in VAT, we further process the matrices and produce the tendency curves. Possible cluster structure will be shown as peak-valley patterns on the curves, which can be caught not only by human eyes but also by the computer. Our numerical experiments showed that the computer can catch cluster structures from the tendency curves even in cases where the visual outputs of VAT are virtually useless.


Fig. 1. Scatter plot of D with m = 20 row (
) and n = 40 column ( ) objects.  
Fig. 3. Assessment of clustering and co-clustering tendency in D (cf. Fig. 1). (a) VAT image I( ~  
Fig. 4. coVAT " best case " : 64 row (
) and 136 column ( ) objects, all co-clusters.  
Fig. 5. Assessment of clustering and co-clustering tendency in D1 (cf. Fig. 4). (a) VAT image I( ~ D ) for P1. (b) VAT image I( ~ D ) for P2. (c) VAT image I( ~ D ) for P3. (d) coVAT image I( ~ D)for P4.  
Fig. 6. Approximation error E = kD 0 D k =kD k versus m for Example 2.  

+5

Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices
  • Article
  • Full-text available

November 2007

·

925 Reads

·

98 Citations

IEEE Transactions on Fuzzy Systems

We have an m times n matrix D, and assume that its entries correspond to pair wise dissimilarities between m row objects Or and n column objects Oc, which, taken together (as a union), comprise a set O of N = m + n objects. This paper develops a new visual approach that applies to four different cluster assessment problems associated with O. The problems are the assessment of cluster tendency: PI) amongst the row objects Or; P2) amongst the column objects Oc; P3) amongst the union of the row and column objects Or U Oc; and P4) amongst the union of the row and column objects that contain at least one object of each type (co-clusters). The basis of the method is to regard D as a subset of known values that is part of a larger, unknown N times N dissimilarity matrix, and then impute the missing values from D. This results in estimates for three square matrices (Dr, Dc, DrUc) that can be visually assessed for clustering tendency using the previous VAT or sVAT algorithms. The output from assessment of DrUc ultimately leads to a rectangular coVAT image which exhibits clustering tendencies in D. Five examples are given to illustrate the new method. Two important points: i) because VAT is scalable by sVAT to data sets of arbitrary size, and because coVAT depends explicitly (and only) on VAT, this new approach is immediately scalable to, say, the scoVAT model, which works for even very large (unloadable) data sets without alteration; and ii) VAT, sVAT and coVAT are autonomous, parameter free models - no "hidden values" are needed to make them work.

Download


Extending fuzzy and probabilistic clustering to very large data sets

November 2006

·

304 Reads

·

149 Citations

Computational Statistics & Data Analysis

Approximating clusters in very large (VL=unloadable) data sets has been considered from many angles. The proposed approach has three basic steps: (i) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (ii) clustering the sample with a literal (or exact) algorithm; and (iii) non-iterative extension of the literal clusters to the remainder of the data set. Extension accelerates clustering on all (loadable) data sets. More importantly, extension provides feasibility—a way to find (approximate) clusters—for data sets that are too large to be loaded into the primary memory of a single computer. A good generalized sampling and extension scheme should be effective for acceleration and feasibility using any extensible clustering algorithm. A general method for progressive sampling in VL sets of feature vectors is developed, and examples are given that show how to extend the literal fuzzy (c-means) and probabilistic (expectation-maximization) clustering algorithms onto VL data. The fuzzy extension is called the generalized extensible fast fuzzy c-means (geFFCM) algorithm and is illustrated using several experiments with mixtures of five-dimensional normal distributions.


Citations (64)


... In the case of s = 2 features we do not need a new approach as we can represent all the available information of object data X and labels L in a single, colorized scatterplot. We illustrate this below in Fig. 1 with a scatterplot of the hypothetical school data example from [2] involving 26 students (the objects) with their corresponding scaled SAT and high school GPA scores (the object data X) and their freshman math course outcomes of Pass or Fail (the labels or categories L). Note that the scatterplot reveals the clusters of the object data X along with the distribution of the labels among those clusters. ...

Reference:

Colorized iVAT Images for Labeled Data
Diagonally Colorized iVAT Images for Labeled Data
  • Citing Conference Paper
  • November 2022

... Aiming to compute h 1 (n) and h 2 (n) in an iterative way, we follow here an alternating optimization strategy [10], [18] which assumes that one of the (virtual) adaptive filter is kept fixed while optimizing the other one and vice-versa. Such a strategy allows us to construct the following local optimization problems [2], [4], [7]: ...

Two New Convergence Results for Alternating Optimization
  • Citing Chapter
  • January 2003

... This general algorithm has been implemented under the name 'fuzzy product', and is illustrated by examples in the next section. It can be seen that this algorithm is a particular case of a grouped coordinate descent method for which Bezdek et al. [4] have proven global and local convergence properties. ...

Coordinate descent and clustering
  • Citing Article
  • January 1986

Control and Cybernetics

... Our approach leverages the established elbow method [13] to automatically choose the key information to be covered and used by a TLS framework. This method has been shown to be effective in choosing representative key points from data samples, such as determining the number of clusters used by a K-means algorithm [7]. To determine the timeline length, our approach first scores and ranks the candidate dates of a long-running event. ...

An iterative procedure for minimizing a generalized sum-of-squared-errors clustering criterion
  • Citing Article
  • January 1994

Neural Parallel and Scientific Computations

... A prominent use of the FL across several scientific disciplines is referred to as fuzzy clustering. Fuzzy c-means (FCM) is a clustering algorithm [102][103][104]. It relies on iterative optimization and partitioning a dataset into N clusters, thus allocating every data point to every cluster with various degrees of membership. ...

Convergence Theory for Fuzzy C-Means: Counter-Examples and Repairs

IEEE Transactions on Systems Man and Cybernetics

... Consequently, the proposed approach for optimizing r n ensures non-decreasing values of q(r n ). As the objective function in (21) is non-decreasing, the AO algorithm for optimizing r is guaranteed to converge [25]. Based on these considerations, both the solutions for W and r ensure that the SCNR objective value does not decrease, i.e., SCNR(W t , r t ) ≥ SCNR(W t−1 , r t−1 ), where t denotes the iteration index of the AO algorithm. ...

Convergence of alternating optimization

Neural Parallel and Scientific Computations

... In this part we present a general principle of alternating optimization and existing approaches of photometric stereo which exploit this principle. Alternating optimization is one of the existing heuristic algorithms which consists in minimization of the objective function successively fixing all but one variable, [97], [98], [99]. ...

Local convergence of tri-level alternating optimization

Neural Parallel and Scientific Computations

... The proposed method not only is a deterministic algorithm which leads to the same solutions in several independent runs, but also is capable to be incorporated with heterogeneous aims such as financial and aesthetic considerations, without loss of generality. In fact, several aims can be formulated within the norm function of the fuzzy membership matrix [28]. ...

Fusing heterogeneous fuzzy data for clustering
  • Citing Article
  • July 1997

Proceedings of SPIE - The International Society for Optical Engineering

... The clustering of similarityand dissimilarity-based relational data is also important [2]. A given representation of objects may not be readily defined in terms of features yet it can be characterized by the relationships between the objects [3][4][5]. This type of representation is common in many fields, including bioinformatics, computer vision, and psychology. ...

Clustering with relational c-means partitions from pairwise distance data
  • Citing Article
  • December 1987

Mathematical Modelling