Article

Two additions to hierarchical cluster analysis

Authors:
  • 43.18
  • Independent Statistician and Author
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Two additions to Johnson's hierarchical cluster analysis (1967) are proposed. The first is a procedure for ordering the clusters to make the obtained solution unique and frequently more meaningful. The second is a similarity measure, whereby two (or more) cluster solutions of the same set of variables can be compared. The value of ordered clusterings is illustrated with a reanalysis of data from Miller & Nicely (1955).

No full-text available

Request Full-text Paper PDF

Request the article directly
from the authors on ResearchGate.

... • Spectral seriation ("Spect") [17]: • Grauvaeus and Wainer algorithm ("GW") [76]: ...
... By minimizing this path length, we globally minimize the similarities between adjacent objects. The same criterion is used in algorithms such as GW [76] or OLO [16] (see section 4.2.1). However, we propose here to incrementally optimize this criterion. ...
... The quality of the proposed ("Prop") algorithm was tested using seven quality indexes (BAR, Moore Stress, Least Square, Weighted Gradient, Path Length, Inertia and 2SUM, see section 4.2.1). Its quality is compared with the quality obtained by random permutations of the matrix ("Rand") and the quality of six existing algorithms: "OLO" [16], "GW" [76], "HC" [83], "VAT" [24], "Spectral" [58], "MDS" [32], see section 4.2. 1. The state-of-the-art algorithms and the quality indexes were computed using the "seriation" package [82] in R 3.3.2. ...
Thesis
The research work presented in this thesis concerns the development of unsupervised learning approaches adapted to large relational and dynamic data-sets. The combination of these three characteristics (size, complexity and evolution) is a major challenge in the field of data mining and few satisfactory solutions exist at the moment, despite the obvious needs of companies. This is a real challenge, because the approaches adapted to relational data have a quadratic complexity, unsuited to the analysis of dynamic data. We propose here two complementary approaches for the analysis of this type of data. The first approach is able to detect well-separated clusters from a signal created during an incremental reordering of the dissimilarity matrix, with no parameter to choose (e.g., the number of clusters). The second proposes to use support points among the objects in order to build a representation space to define representative prototypes of the clusters. Finally, we apply the proposed approaches to real-time profiling of connected users. Profiling tasks are designed to recognize the "state of mind" of users through their navigations on different web-sites.
... In this paper, seriation is considered different from clustering as shown in Fig. 1. The linking element between those two methods is a clustering with optimal leaf ordering [8][9][10], which, from the perspective of clustering, is a procedure 'to order the clusters at each level so that the objects on the edge of each cluster are adjacent to that object outside the cluster to which it is nearest' [8]. From the perspective of seriation, the result is an optimal ordering and rearrangement, but an additional grouping procedure is necessary to identify cluster boundaries and clusters [11][12][13]. ...
... In this paper, seriation is considered different from clustering as shown in Fig. 1. The linking element between those two methods is a clustering with optimal leaf ordering [8][9][10], which, from the perspective of clustering, is a procedure 'to order the clusters at each level so that the objects on the edge of each cluster are adjacent to that object outside the cluster to which it is nearest' [8]. From the perspective of seriation, the result is an optimal ordering and rearrangement, but an additional grouping procedure is necessary to identify cluster boundaries and clusters [11][12][13]. ...
... However, there are '2 n -1 linear orderings consistent with the structure of the tree' [9] generated by hierarchical clustering. To remedy this situation, several authors have proposed additional procedures to perform optimal leaf ordering of the dendrogram [8][9][10][117][118][119]. ...
... Dendrogram seriation was introduced by Gruvaeus and Wainer (1972), whose motivation was to obtain a unique ordering of the objects from a hierarchical clustering. Since then, many authors (Degerman 1982;Gale et al. 1984;Eisen et al. 1998;Alon et al. 1999;Wishart 1999;Bar-Joseph et al. 2001) have presented other dendrogram seriation algorithms, with the objective of placing similar objects nearby. ...
... Our motivation is improvement of data visualizations, including but not limited to scatterplot matrices, parallel coordinate displays, glyph displays and heatmaps. Hurley (2004) used the original Gruvaeus and Wainer (1972) (GW) algorithm for this purpose, however DendSer with various cost functions tackles this problem in a more systematic and general way. The work presented here is based on the unpublished PhD dissertation of Earle (2010) and a brief description of some topics appeared in Earle and Hurley (2011). ...
... The original (GW) dendrogram seriation algorithm of Gruvaeus and Wainer (1972) examines each node in turn and rearranges its left and right sub-nodes so that the two most low-weight objects at the edges of the sub-nodes are placed adjacently. Figure 2 Figure 2: Four re-arrangements of node N examined by the Gruvaeus and Wainer (1972) algorithm. ...
Article
Visualizations of statistical data benefit from systematic ordering of data objects to highlight features and structure. This article concerns ordering via dendrogram seriation based on hierarchical clustering of data objects. It describes DendSer, a general-purpose dendrogram seriation algorithm which when coupled with various seriation cost functions is easily adapted to different visualization settings. Comparisons are made with other dendrogram seriation algorithms and applications are presented. Supplementary materials for this article are available online.
... n, as an oricentric tensor. Considering the orientation differences d orientation only, oricentric tensors have no inherent order and, hence, can be placed at random positions during ordered hierarchical clustering [GW72]. Therefore, we propose a custom, top-down hierarchical clustering approach. ...
... As suggested by Gruvae and Wainer [GW72], we concatenate two subsets such that elements on the edge of different subsets that are most similar are placed next to each other to show the relation between subsets. To concatenate two non-oricentric subsets, we choose the pair of edge elements with the smallest orientation difference while for a non-oricentric subset and an oricentric subset, we choose the pair with the smallest shape difference. ...
Article
Full-text available
A Diffusion Tensor Imaging (DTI) group study consists of a collection of volumetric diffusion tensor datasets (i.e., an ensemble) acquired from a group of subjects. The multivariate nature of the diffusion tensor imposes challenges on the analysis and the visualization. These challenges are commonly tackled by reducing the diffusion tensors to scalar-valued quantities that can be analyzed with common statistical tools. However, reducing tensors to scalars poses the risk of losing intrinsic information about the tensor. Visualization of tensor ensemble data without loss of information is still a largely unsolved problem. In this work, we propose an overview + detail visualization to facilitate the tensor ensemble exploration. We define an ensemble representative tensor and variations in terms of the three intrinsic tensor properties (i.e., scale, shape, and orientation) separately. The ensemble summary information is visually encoded into the newly designed aggregate tensor glyph which, in a spatial layout, functions as the overview. The aggregate tensor glyph guides the analyst to interesting areas that would need further detailed inspection. The detail views reveal the original information that is lost during aggregation. It helps the analyst to further understand the sources of variation and formulate hypotheses. To illustrate the applicability of our prototype, we compare with most relevant previous work through a user study and we present a case study on the analysis of a brain diffusion tensor dataset ensemble from healthy volunteers.
... The first dendrogram seriation algorithm was developed by Gruvaeus and Wainer (1972) with an objective to obtain a unique dendrogram order. Since then, other dendrogram seriation algorithms have been developed with an objective to find an optimal permutation of dendrogram leaves that minimizes a cost function, i.e. a function used to evaluate a permutation of a set of objects (Bar-Joseph et al. 2001;Wu et al. 2010;Earle and Hurley 2014;Hahsler et al. 2008;Forina et al. 2002). ...
... We demonstrate the use of Gruvaeus and Wainer's (1972) dendrogram seriation algorithm (GW) for plant breeding data. This method flips each leaf of the dendrogram moving up the clustering so that adjacent entities are the most similar. ...
Article
A dendrogram is often used to display the results from hierarchical clustering; however, the order of objects in a standard dendrogram is arbitrary and so similarity cannot be readily interpreted. An optimized dendrogram, a dendrogram produced by re-ordering the objects using a seriation method, has a customized ordering that reflects the similarity among objects with most similar objects located closest together. Hierarchical clustering has been applied to the analysis of data from plant breeding programs to identify the patterns in breeding populations and to study genotype by environment interactions. In this paper we demonstrate the advantage of an optimized dendrogram for interpretation of plant breeding data and, given this advantage, argue that an optimized dendrogram should be used as the default whenever hierarchical clustering is used.
... However, the object in the matrix must be organized in an order that reflect the data structure: objects that are similar to each other should be positioned close to each other in the matrix. To find the optimal order of objects in a similarity matrix, several authors proposed different reordering algorithms optimizing different cost functions: Optimal Leaf Ordering algorithm [1] ("OLO") and Grauvaeus and Wainer algorithm [11] ("GW") minimize the Hamiltonian Path Length. Spectral seriation [2] ("Spect") optimizes the 2-SUM problem whereas Multidimensional Scaling [5] ("MDS") optimizes the Least Square Criterion. ...
... The quality of the proposed algorithm was tested using seven classical indexes for matrix reordering (BAR, Moore Stress, Least Square, Weighted Gradient, Path Length, Inertia and 2SUM, see [14]). Its quality is compared with the quality of six existing methods: "OLO" [1], "GW" [11], "HC" [12], "VAT" [4], "Spectral" [9] and "MDS" [5]. The algorithms and the quality indices were computed using the "seriation" package [12] in R 3.3.2. ...
Conference Paper
Full-text available
Visualization methods are important to describe the underlying structure of a data set. When the data is not described as a vector of numerical values, a visualization can be obtained through the reordering of the corresponding similarity matrix. Although several methods of reordering exist, they all need the complete similarity matrix in memory. However, this is not possible for the analysis of dynamic data sets. The goal of this paper is to propose an original algorithm for the incremental reordering of a similarity matrix adapted to dynamic data sets. The proposed method is compared with state-of-the-art algorithms for static data-sets and applied to a dynamic data-set in order to demonstrate its efficiency.
... Hierarchical clustering (HC) (Eisen et al., 1998) Other (depends on linkage) Gruvaeus and Wainer reordering (GW) (Gruvaeus & Wainer, 1972) Restricted path length Optimal leaf ordering reordering (OLO) (Bar-Joseph et al., 2001) Restricted path length DendSer reordering (Earle & Hurley, 2015) Various (restricted) Other methods Multidimensional scaling (MDS) (Kendall, 1971) Other (stress) Rank-two ellipse seriation (R2E) (Chen, 2002) None Sorting Points Into Neighborhoods (SPIN) (Tsafrir et al., 2005) Other (energy) Visual Assessment of Tendency (VAT) (Bezdek & Hathaway, 2002) Other (MST) produce good seriations without directly targeting a specific seriation criterion. Table 2 210 summarizes popular methods and indicates for each what, if any, seriation criterion is optimized. ...
... To improve the presentation of the dendrogram, several methods for rotating subtrees to minimize an objective function under the constraints given by the dendrogram have been proposed. Gruvaeus & Wainer (1972) suggest to obtain a unique order by requiring to order the leaf nodes such that at each level the objects at the edge of each cluster 270 are adjacent to that object outside the cluster to which it is nearest and they provide a simple heuristic. Bar-Joseph et al. (2001) developed an efficient procedure to rearrange the dendrogram such that the Hamiltonian path connecting the leaves is minimized and called this the optimal leaf order. ...
Article
Seriation aims at finding a linear order for a set of objects to reveal structural information which can be used for deriving data-driven decisions. It presents a difficult combinatorial optimization problem with its roots and applications in many fields including operations research. This paper focuses on a popular seriation problem which tries to find an order for a single set of objects that optimizes a given seriation criterion defined on one-mode two-way data, i.e., an object-by-object dissimilarity matrix. Over the years, members of different research communities have introduced many criteria and seriation methods for this problem. It is often not clear how different seriation criteria and methods relate to each other and which criterion or seriation method to use for a given application. These methods are representing tools for analytics and therefore are of theoretical and practical interest to the operations research community. The purpose of this paper is to provide a consistent overview of the most popular criteria and seriation methods and to present a comprehensive experimental study to compare their performance using artificial and a representative set of real-world datasets.
... Kleiner and Hartigan (1981) solved this problem in a simple and ingenious way. They noted that a tree is a common icon to represent a hierarchical cluster structure, and that the structure of the tree is reasonably well determined (Johnson, 1967; Gruvaeus and Wainer, 1972)(averaged across all ten states). We see immediately that there are two major groupings of variables: one includes life expectancy, percent of high school graduates and income; and the other includes homicide rate, literacy and temperature . ...
Article
The past decade has seen a substantial growth in methods and schemes for the display of multivariate data. This paper encompasses a sketch of the history of multivariate displays, examines a number of techniques, describes their construction, illustrates their use, and comments on their efficacy.
... Ma et al. [49] ordered categorical values by: (1) In hierarchical cluster analysis, the terminal nodes of a tree can be arranged based on some criteria to best reveal the relationship between the nodes and enhance the visual display. Gruvaeus and Wainer [51] presented an algorithm that applied a series of tests for locally orienting the nodes so that objects displayed on the left and right edges of each cluster are adjacent to those objects outside the cluster to which they are most similar. Due to the large number of applications that construct trees for analyzing datasets, many different heuristics have been suggested to solve the problem of ordering the leaves of a binary hierarchical clustering tree [52,53]. ...
... In this case we performed hierarchical clustering on the distance matrix of Euclidean distances, using the complete linkage algorithm. In Figure 6, adjacency matrices are visualized as cluster heat maps to compare results of the default hierarchical clustering (HC), the Gruvaeus and Wainer's method (GW) 10 , the optimal leaf ordering (OLO) 6 , and the MOLO method (MOLO). These matrices are diagonally symmetric and rows and columns are reordered based on the leaf order of dendrograms. ...
Article
Full-text available
Dendrograms are graphical representations of binary tree structures resulting from agglomerative hierarchical clustering. In Life Science, a cluster heat map is a widely accepted visualization technique that utilizes the leaf order of a dendrogram to reorder the rows and columns of the data table. The derived linear order is more meaningful than a random order, because it groups similar items together. However, two consecutive items can be quite dissimilar despite proximity in the order. In addition, there are 2 n-1 possible orderings given n input elements as the orientation of clusters at each merge can be flipped without affecting the hierarchical structure. We present two modular leaf ordering methods to encode both the monotonic order in which clusters are merged and the nested cluster relationships more faithfully in the resulting dendrogram structure. We compare dendrogram and cluster heat map visualizations created using our heuristics to the default heuristic in R and seriation-based leaf ordering methods. We find that our methods lead to a dendrogram structure with global patterns that are easier to interpret, more legible given a limited display space, and more insightful for some cases. The implementation of methods is available as an R package, named ”dendsort”, from the CRAN package repository. Further examples, documentations, and the source code are available at [https://bitbucket.org/biovizleuven/dendsort/].
...  Fuzzy C-means Clustering (FCM) [166][167][168][169][170][171][172][173][174][175] algorithm -one of the most widely used fuzzy clustering algorithms.  Hierarchical clustering [176][177][178][179][180][181][182][183][184] (also called Hierarchical Cluster Analysis or HCA [185][186][187][188][189][190][191][192][193][194]) -method of cluster analysis which seeks to build a hierarchy of clusters. Used in data mining and statistics. ...
Thesis
Human-Machine Interaction (HMI) progressively becomes a part of coming future. Being an example of HMI, embedded eye tracking systems allow user to interact with objects placed in a known environment by using natural eye movements. The EyeDee™ portable eye tracking solution (developed by SuriCog) is an example of an HMI-based product, which includes Weetsy™ portable wire/wireless system (including Weetsy™ frame and Weetsy™ board), π-Box™ remote smart sensor and PC-based processing unit running SuriDev eye/head tracking and gaze estimation software, delivering its result in real time to a client’s application through SuriSDK (Software Development Kit). Due to wearable form factor developed eye-tracking system must conform to certain constraints, where the most important are low power consumption, low heat generation low electromagnetic radiation, low MIPS (Million Instructions per Second), as well as support wireless eye data transmission and be space efficient in general. Eye image acquisition, finding of the eye pupil ROI (Region Of Interest), compression of ROI and its wireless transmission in compressed form over a medium are very beginning steps of the entire eye tracking algorithm targeted on finding coordinates of human eye pupil. Therefore, it is necessary to reach the highest performance possible at each step in the entire chain. In contrast with state-of-the-art general-purpose image compression systems, it is possible to construct an entire new eye tracking application-specific image processing and compression methods, approaches and algorithms, design and implementation of which are the goal of this thesis.
... Therefore, it is of interest to find an approach for achieving an optimal ordering in a smaller number of steps. Gruvaeus and Wainer (1972) present an algorithm applying series of tests for local orienting the nodes so that object displayed on the left and right edges of each cluster are adjacent to those objects outside the cluster to which they are most similar. Their procedure of ascendant hierarchy aims to obtain ordering of the terminal nodes in a chain with minimal length. ...
Article
A new version of the single–link hierarchical clustering algorithm is presented. It produces a dendrogram which gives better graphical presentation of the proximity between the observed objects than the standard algorithms. A definition of a perfect chain is proposed. This kind of chains are useful for describing interesting properties of the algorithm. Some sufficient conditions for the shortest trajectory connecting all objects are included. Another useful property of the main idea is that the dendrogram produced by any clustering algorithm may be rearranged to get better interpretation.
... Hierarchical Cluster Analysis (HCA) groups data into clusters having similar attributes (Gruvaeus & Wainer, 1972). HCA tries to identify concentrated groups (clusters) of objects, while no information about membership is available, and usually not even the number of clusters is known (Matera et al., 2014). ...
Article
Full-text available
Elemental determination was carried out on 36 grape juice samples (19 organic and 17 ordinary), with the goal of identifying significant differences between the two types of juice for classification purposes. Inductively coupled plasma-mass spectrometry was used for the determination of 24 elements, Al, As, Ba, Ca, Cd, Ce, Co, Cr, Cu, Fe, La, Mg, Mn, Mo, Na, Ni, P, Pb, Rb, Se, Sn, Ti, V, and Zn. Ba, Ce, La, Mg, P, Pb, Rb, Sn, and Ti concentrations were found to be higher in organic versus ordinary samples, while Na and Va concentrations were higher in ordinary versus organic samples. The remaining investigated elements exhibited statistically equivalent concentration levels in both types of samples. Principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA) statistical techniques of the elemental fingerprints were readily able to discriminate organic from ordinary samples and can be used as alternative methods for adulteration evaluation.
... To see more accurately the influence of the ground-surface on the traffic mobile, we identify accurate groups of cells correlated regarding the ground-surface by the use of clustering methods. The Hierarchical Ascending Classification (HAC) [5] and the k-means [6] methods were used for this purpose. The projections of cells on principal factors were considered as new variables, and then HAC was launched to see the hierarchical tree ( Figure 6). ...
Conference Paper
Full-text available
Cellular phones can be used as indicators of the customer's location using the detection of presence of the mobile phones during a day time in urban environment. An urban environment is composed of intrinsically different types of ground. In this paper, we study the impact of ground-surface types on the traffic mobile using statistical methods. Results show that the traffic is greatly influenced by the ground-surfaces covered by the cells.
... Since the order of leaf nodes in a dendrogram is not unique (each subtree can be rotated) and to further improve the presentation, the leaf nodes can be reordered using heuristics (e.g., Gruvaeus and Wainer, 1972). Only more recently Bar-Joseph et al. (2001) developed an O(n 4 ) algorithm that finds the optimal order of leaf nodes which minimizes the sum of distances between the nodes in the order. ...
Article
Full-text available
For hierarchical clustering, dendrograms are a convenient and powerful visualization technique. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation techniques. Both ideas, matrix shading and reordering, have been well-known for a long time. However, only recent algorithmic improvements allow us to solve or approximately solve the seriation problem efficiently for larger problems. Furthermore, seriation techniques are used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is able to present the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows us to judge cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. Experiments show that dissimilarity plots scale very well with increasing data dimensionality.
... The color display shows the original data matrix in which rows and columns are permuted according to an algorithm in Gruvaeus and Wainer (1972). Different colors represent the magnitude/strength (0-12) of the linkage between cases (species) and variables (a-biotic factors) in the matrix (Ling 1973). ...
... Hierarchical cluster analysis with complete linkage (furthest neighbour) was applied following the standard hierarchical amalgamation method of Hartigan (1975). According to the default settings of Systat 9.0, the algorithm of Gruvaeus and Wainer (1972) was used to order the trees. The three mesowear variables, % high, % sharp and % blunt were analysed by cluster statistics. ...
Article
The so-called Hundsheim rhinoceros, Stephanorhinus hundsheimensis, was a very common faunal element of the Early to early Middle Pleistocene period in the western Palaearctic. In this study, individuals from two different central European populations of the Hundsheim rhinoceros were analysed in order to determine whether their local dietary signals could reflect differing food availability between the two populations, and whether such information could provide a better understanding of the ecological role of S. hundsheimensis within corresponding faunal assemblages, and of its principal subsistence strategy in the western Palaearctic. The mesowear traits observed in the studied S. hundsheimensis populations have been interpreted as representing biome-specific signals, indicating grassland vegetation at the site of Süßenborn, and dense to open forests at Voigtstedt (both localities in Germany). The analyses performed on the fossil rhino material demonstrate the most pronounced dietary variability ever established for a single herbivorous ungulate species by mesowear studies. This variability ranges from an attrition dominated grazing regime, to a one of predominantly browsing, and characterises S. hundsheimensis as the most ecologically tolerant rhinoceros of the Palaearctic Plio-Pleistocene. Although such dietary flexibility proved an effective enough subsistence strategy over a period of 600–900 ka (1.4/1.2–0.6/0.5 Myr) in the western Palaearctic, the situation changed dramatically after 0.6 Myr BP, when the new species of rhinoceroses, Stephanorhinus hemitoechus and Stephanorhinus kirchbergensis, appeared and started to compete for both the grass and the browse. For the generalist S. hundsheimensis, this bilateral interference was detrimental to its success in all of its habitats. The successful competition of specialised forms of rhinoceroses, which might have originated as a result of the development of 100 ka periodicity in the global climatic record, is proposed as the main reason for the extinction of S. hundsheimensis during the early Middle Pleistocene.
... The standard hierarchical amalgamation method of Hartigan (1975) is used in hierarchical cluster statistics. The algorithm of Gruvaeus and Wainer (1972) was then applied to order the cluster tree using three mesowear variables. We used the typical dataset of 27 herbivore species originally selected by Fortelius and Solounias (2000), since their clustering pattern was relatively "free of abnormalities" and the discriminate analysis indicated 96% correctly classified cases (jack-knifed matrix) for each dietary class (browser, grazer, and intermediate feeder). ...
Article
Food preferences of the sand gazelle (Gazella marica) from the Mahazat as-Sayd Protected Area in Saudi Arabia were evaluated using focal animal sampling in conjunction with an eco-morphological method examining two parameters of tooth wear, i.e., occlusal relief and cusp shape. Observations of live, free-ranging animals (n = 53) showed that sand gazelles generally consumed more grass (58.4%) than browse (41.6%). However, during the dry season, gazelles spent significantly more time browsing (51.0%) and less time grazing (49.0%) than under wet conditions (browsing: 17.6%; grazing: 82.4%). Thus, consistent with predictions, sand gazelles are intermediate feeders but shift towards browsing when grass is scarce. The mesowear signature of the sand gazelle is consistent with a grazing signal in other ruminants. In other words, the browse component of the diets of live animals was not reflected in the tooth wear. This could have occurred because browse is less abrasive than grass, but more likely because all food types are heavily abrasive in this dusty habitat. We conclude that the sand gazelle population in Mahazat as-Sayd encounters a highly abrasive diet, which has implications for their ability to meet nutritional demands.
... X-axes shows rank order of species, y-axes shows proportional abundance of species. The slope is related to evenness (steep slope-low evenness), x-intercept is potential richness The color display shows the original data matrix in which rows and columns are permuted according to an algorithm in Gruvaeus and Wainer (1972). Different colors represent the magnitude / strength (0 -12) of the linkage between cases (species) and variables (a-biotic factors) in the matrix (Ling, 1973). ...
Article
Full-text available
Hard substratum surface area of framework cavities constitutes a major habitat in coral reefs. We studied the community composition and distribution of cryptic sessile macro-organisms in framework cavities in relation to abiotic parameters on a reef slope in Curaçao. Spatial characteristics were measured with a CaveCam (video) cave-explorer to investigate the macro-faunal community composition. Light intensity and water movement were measured. Bacterial densities were counted in and outside the cavities over a year. Cover of the fauna and flora in cavities was about 95% of total hard surface area. Cavities harbored a distinctive macro-fauna. Species composition was very diverse, with a total of 88 species/taxa found. Diversity (H') was high and evenness (V') low, indicating the presence of dominant species. Community composition was related to abiotic parameters. Light intensity decreased with a factor of 10 from front to back of cavities, with a consequent decrease in crustose coralline algae in the same direction, but there was no other relation between light and distribution of organisms. Water motion and turbidity, generally less in cavities than on the open reef, were significantly related to biotic distribution. Inside cavities we found sponge and total suspension-feeder cover to decrease with increasing water movement and turbidity. There was an average depletion of bacteria of 40% in cavity water. In a functional sense reef framework cavities are a uniform trophodynamic environment characterized by high bacterioplankton removal rates and efflux of DIN and it is surprising to find each cavity having a different species composition and abundance.
... The standard hierarchical amalgamation method of Hartigan (1975) is used in hierarchical cluster statistics. The algorithm of Gruvaeus and Wainer (1972) was then applied to order the cluster tree using three mesowear variables. We used the comparative dataset of herbivore species following Fortelius and Solounias (2000), Kaiser and Solounias (2003) as well as the mesowear data obtained from Gazella, Eudorcas and Nanger by Louys et al. (2011). ...
Article
Food preferences of Arabian gazelle (Gazella arabica) on Farasan Islands, Saudi Arabia, were evaluated using focal animal sampling, ‘food tracking’ and an eco-morphological method examining tooth wear. Behavioural observations showed that gazelles generally consumed foliage, fruits and flowers of trees and shrubs, and to much lower extent annual and perennial herbs. Grass represents only 4.4% of the total time spent feeding at ground level. During dry season, gazelles spent significantly more time browsing on trees and shrubs and less time browsing on herbs than under wet conditions. Thus, consistent with predictions, gazelles are selective browsers. Major food plants are Acacia ehrenbergiana, Corchorus depressus and Capparis sinaica. During dry season the time spent feeding on fallen pods of Acacia and the time spent browsing on hind legs significantly increased. No sexual and age-related differences in the dietary choice of gazelles were revealed. The tooth wear signature of Farasan gazelles is dominated by abrasive food components classifying them with seasonally variable browsers and frugivorous dwarf ungulates. Based on the feeding behaviour we linked abrasion to the high mineral component of pods. Since the Farasan Islands are subject to high dust load, we also attribute the observed tooth wear signal to the high grit load.
... In hierarchical displays, a decision is needed at each merge to specify which subtree should go to the left and which to the right (Hurley, 2012). We used the order suggested in Gruvaeus and Wainer (1972), which ensures that objects at the boundaries of each class were located next to objects outside the class which they most resembled (Gordon, 1987). At a merge of clusters A and B, the new cluster is one of (A, B), (A , B), (A, B ), (A , B ), where A denotes A in reverse order. ...
Article
Full-text available
We propose Bayesian model averaging (BMA) as a method for postprocessing the results of model-based clustering. Given a number of competing models, appropriate model summaries are averaged, using the posterior model probabilities, instead of being taken from a single "best" model. We demonstrate the use of BMA in model-based clustering for a number of datasets. We show that BMA provides a useful summary of the clustering of observations while taking model uncertainty into account. Further, we show that BMA in conjunction with model-based clustering gives a competitive method for density estimation in a multivariate setting. Applying BMA in the model-based context is fast and can give enhanced modeling performance.
... Hierarchical Cluster Analysis (HCA) groups data into clusters having similar attributes (Gruvaeus & Wainer, 1972). HCA tries to identify concentrated groups (clusters) of objects, while no information about membership is available, and usually not even the number of clusters is known (Matera et al., 2014). ...
... Two major issues must be solved when using HC analysis. The first is the similarity measure that can be used as a scalar distance between different clusters, and the second is the linkage method that orders the clusters to produce a unique and meaningful solution (Johnson, 1967;Gruvaeus and Wainer, 1972;Langfelder et al., 2008). In this study, we used the average linkage as the linkage method between groups, which is defined on the basis that the similarity between two clusters is equal to the mean distance between elements of each cluster (http://home.deib.polimi.it/matteucc/Clustering/ ...
Article
Full-text available
The grain-size distribution (GSD) of sediments provides information on sediment provenance, transport processes, and the sedimentary environment. Although a wide range of statistical parameters have been applied to summarize GSDs, most are directed at only parts of the distribution, which limits the amount of environmental information that can be retrieved. Endmember modeling provides a flexible method for unmixing GSDs; however, the calculation of the exact number of endmembers and geologically meaningful endmember spectra remain unresolved using existing modeling methods. Here we present the methodology hierarchical clustering endmember modeling analysis (CEMMA) for unmixing the GSDs of sediments. Within the CEMMA framework, the number of endmembers can be inferred from agglomeration coefficients, and the grain-size spectra of endmembers are defined on the basis of the average distance between the samples in the clusters. After objectively defining grain-size endmembers, we use a least squares algorithm to calculate the fractions of each GSD endmember that contributes to individual samples. To test the CEMMA method, we use a grain-size data set from a sediment core from Wulungu Lake in the Junggar Basin in China, and find that application of the CEMMA methodology yields geologically and mathematically meaningful results. We conclude that CEMMA is a rapid and flexible approach for analyzing the GSDs of sediments.
... Sakai et al. (2014) obtain a unique ordering by sorting the dendrogram obtained from a classificatory clustering method as a separate step following the hierarchical clustering. They discuss their method, a 'leaf' ordering following hierarchical clustering, along with other methods involving such a two-step process (Bar-Joseph et al. 2001;Gruvaeus and Wainer 1972). These latter methods are available through the seriation R Package (Buchta et al. 2008). ...
Chapter
OBJECTIVES: Visualization in state sequence data has been developed extensively through an R package described in Gabadinho et al. (2011). Graphics depicting states prevalent at each cross section in time can be generated for all data as well as for covariate level sets for datasets with a large number of subjects with state transitions over time. Special longitudinal sequence sets can also be carved out using similarity measures across sequences. In our work, we believed there may be latent informative images inherent in data on changes in cancer states (degrees of response, progressions, and deaths). We obtain a longitudinal as well as a cross-sectional informative image through a novel heuristic grounded in the framework of hierarchical clustering. METHODS: We used iconic known images, stripped them of all ordering information, and attempted to recover the known latent image underlying the randomly permuted data using our heuristic as well as other alternative methods such as those in Sakai et al. (2014). RESULTS: Results validate our methods. The method is demonstrated through a visual representation of changes in cancer states for two induction therapies in a cancer trial. A further application to a two-way ordering of gene sample heat maps are also presented. CONCLUSIONS: When cancer state transition graphics for competing therapies are juxtaposed, there can be a quick read of early versus late response to therapy, the depth and duration of response as well as a rough gauge of events such as progression and death. This is a good complement to quantitative inferences. For gene expression data, we hope that our methods will bring out finer distinctions in addition to presenting gross patterns in the data like those seen using prevalent methods.
... At each merge point, this function orders the sub-trees so that the left one has merged at lower level. (ii) OM2: GW-method, the ordering method proposed by Gruvaeus and Wainer (Gruvaeus and Wainer, 1972). At each merge point for two clusters, this method orders the sub-trees so that the distance between the closest items is minimized. ...
Article
The association rule mining is one of the most popular data mining techniques, however, the users often experience difficulties in interpreting and exploiting the association rules extracted from large transaction data with high dimensionality. The primary reasons for such difficulties are two-folds. Firstly, too many association rules can be produced by the conventional association rule mining algorithms, and secondly, some association rules can be partly overlapped. This problem can be addressed if the user can select the relevant items to be used in association rule mining, however, there are often quite complex relations among the items in large transaction data. In this context, this paper aims to propose a novel visual exploration tool, structured association map (SAM), which enables the users to find the group of the relevant items in a visual way. The appearance of SAM is similar with the well-known cluster heat map, however, the items in SAM are sorted in more intelligent way so that the users can easily find the interesting area formed by a set of associated items, which are likely to constitute interesting many-to-many association rules. Moreover, this paper introduces an index called S2C, designed to evaluate the quality of SAM, and explains the SAM based association analysis procedure in a comprehensive manner. For illustration, this procedure is applied to a mass health examination result data set, and the experiment results demonstrate that SAM with high S2C value helps to reduce the complexities of association analysis significantly and it enables to focus on the specific region of the search space of association rule mining while avoiding the irrelevant association rules.
... (b) Clustering Algorithms: Clustering algorithms, in the context of matrix reordering, are based on deriving clusters of "similar" data elements (e.g., nodes) and ordering each cluster individually. Building on this, Gruvaeus and Wainer [GW72] suggested to order clusters at different levels using an hierarchical clustering (dendrogram). Elements at the margin of each cluster, i.e. the first and last element in the obtained order for the respective clusters, should also be similar to the first (or last) element in the adjacent cluster. ...
Article
Full-text available
This survey provides a description of algorithms to reorder visual matrices of tabular data and adjacency matrix of networks. The goal of this survey is to provide a comprehensive list of reordering algorithms published in different fields such as statistics, bioinformatics, or graph theory. While several of these algorithms are described in publications and others are available in software libraries and programs, there is little awareness of what is done across all fields. Our survey aims at describing these reordering algorithms in a unified manner to enable a wide audience to understand their differences and subtleties. We organize this corpus in a consistent manner, independently of the application or research field. We also provide practical guidance on how to select appropriate algorithms depending on the structure and size of the matrix to reorder, and point to implementations when available.
... Como podemos apreciar, SYSTAT genera un gráfico donde se muestra la matriz original de datos, con las filas (casos) y columnas (variables) permutadas según el algoritmo propuesto por Gruvaeus and Wainer (1972). Los diferentes rasgos se expresan asociados a la magnitud de los valores en la matriz (Ling, 1973). ...
Article
The dietary regime of Equus capensis from the Middle Pleistocene of South Africa is investigated by mesowear analysis. Results indicate that the mesowear signature of this species resembles that of two extant mixed feeders, the Grant's Gazelle (Gazella granti) and the Thomson's Gazelle (Gazella thomsoni), suggesting a mixed feeding dietary strategy for E. capensis. The mesowear signature of a contemporaneous population of Equus mosbachensis from Europe (Arago, France) is also determined for comparative purposes and has a typical grazing signature. In general, all extant species of Equus are believed to be almost exclusively grazers. However, a considerable degree of dietary flexibility is recently reported. The dietary signal of E. capensis is considered to be the result of feeding on the unique fynbos vegetation, which was beginning to establish itself at this time in southwestern South Africa. Grasses are a minor component of this floral kingdom. Our findings thus provide further evidence for the unexpected flexibility in feeding strategies of Equus, the most widely distributed equid taxon in the Quaternary. They highlight the potential use of the attrition–abrasion wear equilibrium as a habitat indicator, by mirroring the availability of food items in mammalian herbivore ecosystems.
Article
Full-text available
Three bovid species are present at Dorn-Dürkheim 1. The overwhelmingly abundant species is a boselaphine, Miotragocerus sp., with smaller and less advanced teeth than Tragoportax amalthea from Pikermi, Greece. Miotragocerus was present in the latest middle Miocene and Vallesian of western and central Europe and survived into the Turolian. The other two bovid species remain enigmatic and of uncertain tribal affiliation. Each is represented by very few teeth, none of them associated. One species is larger and the other smaller than the Miotragocerus sp. Mesowear analysis of M2s and M3s was used to investigate the dietary regime of Miotragocerus sp. from Dorn-Dürkheim 1. Miotragocerus was found to be linked to extant browsing ungulates close to the transition to mixed feeders. The percentage of abrasive food components like grass in the diet of this species was probably close to 10%. This ratio suggests Miotragocerus is intermediate between the two hipparionine horse species of this early Turolian (MN11) palaeoenvironment.
Chapter
The Pliocene ungulate fauna from the ­hominid-bearing Laetolil succession (Southern Serengeti, Tanzania) is investigated with regard to dietary adaptation, niche segregation and change over time. The fossiliferous Upper Laetolil Beds (ULB) (3.63–3.85 Ma) are unconformably overlain by the Upper Ndolanya Beds (UNB) (2.66 Ma). Both stratigraphic units contain a rich mammalian fauna, with ungulates predominating. Analysis of dental mesowear is applied to 23 ungulate taxa from both units, including Equidae, Bovidae and Giraffidae, and the results are compared to extant species. The equids at Laetoli represent the only specialized grazers throughout the succession. All Upper Laetolil Alcelaphini and Hippotragini have mesowear signatures that indicate intermediate feeding strategies, different from their modern counterparts that are mostly specialized grazers. This indicates a dietary shift in these lineages, a finding that is also supported by isotope studies. Mesowear data of ungulates from the ULB also suggest that extant ungulates representing closely related lineages in the same genus or even tribe may not serve as actualistic model taxa in faunal reconstructions using taxonomic uniformitarianism. The three species of giraffids and the remaining bovid taxa were either browsers or intermediate feeders, but not grazers. The almost complete absence of grazing guilds, and the heavy reliance on browse by most fossil herbivores, do not support the inference that the Laetoli environment was dominated by grassland. Within the Laetoli succession it appears that fundamental feeding niches converged over time, with grazers increasingly engaged in feeding on less abrasive components and intermediate-feeders closing the dietary gap by exploiting more abrasive feeding niches. Niche partitioning in the Laetoli ungulates appears to reflect environmental change and evolutionary trajectories in the major lineages. This distribution of feeding niches may serve as an overall indicator of niche diversity. Within the succession it appears that the diversity of feeding niches generally decreased. A decrease in feeding niches would suggest that the diverse habitat structure, which was typical of the ULB environment, no longer existed after the faunal and environmental transition that occurred after deposition of the ULB. After a hiatus of 1.0 million years, the UNB environment was more or less free of forest and woodland patches and can be characterized as more or less open grassland. KeywordsRuminantia-Bovidae-Giraffidae-Equidae-Paleodiet-Mesowear-Tooth wear-Pliocene-Paleoecology
Article
A method is presented for the graphic display of proximity matrices as a complement to the common data analysis techniques of hierarchical clustering. The procedure involves the use of computer generated shaded matrices based on unclassed choropleth mapping in conjunction with a strategy for matrix reorganization. The latter incorporates a combination of techniques for seriation and the ordering of binary trees.
Article
Full-text available
In this study we present the first palaeodietary investigation of three-toed horses from southern Africa and a systematic revision. The dietary regime of 'Eurygnathohippus' cf. baardi from Langebaanweg (South Africa) was evaluated using the mesowear method. This hipparion was originally identified as Hipparion cf. baardi. However, recent evidence discussed here suggests that it belongs to the Eurygnathohippus clade. Cluster analysis comparing this equid to other fossil hipparionines from central Europe and North America indicates that 'E.' cf. baardi was a dedicated grazer at Langebaanweg. Subtle differences in the feeding preference between populations of 'E.' cf. baardi from the two river channel deposits [Pelletal Phosphate Member (PPM), Beds 3aS and 3aN] and from the Quartzose Sand Member (QSM) were also found. 'Eurygnathohippus' cf. baardi from the two PPM deposits have similar grazing dietary signals, which are most similar to those of extant Connochaetes taurinus (wildebeest) and Alcelaphus buselaphus (hartebeest). 'Eurygnathohippus' cf. baardi from the QSM, a floodplain and salt marsh deposit underlying the river channel, clusters separately and is more similar to the grazing bovid, Damaliscus lunatus (topi). This study shows that 'Eurygnathohippus' cf. baardi was an eclectic feeder with a strong grazing signal. The presence of high-crowned dentitions in the 'Sivalhippus' Complex, to which 'Eurygnathohippus' cf. baardi belongs, can be considered an exaptation of the group. Our results also provide some evidence for either differential habitat or habitat change during the late Miocene/early Pliocene.
Article
The phylogenetic relationships of the three Abiesspecies grown in Balkan Peninsula were investigated through their volatile secondary metabolites. The leaf oil of a statistically representative sample of sympatric Abies alba, A. cephalonica and A. borisii-regis trees was obtained and analyzed by means of GC and GC/MS. Forty components were identified and quantified on the basis of their retention indices, their mass spectra characteristics and relative peak areas. The majority of the investigated oils were found to consist of monoterpenes. The major constituents were the same in all studied species but their content varied significantly, thus allowing the assignment of characteristic chemical profiles based on their contribution rank. The variation of the oil constituents and their possible taxonomic significance is discussed.
Chapter
This paper addresses the problem of automatic seriation of mouse brain cross-sections stained with green-florescence protein (GFP). This is fundamental for the neuroscience community to help in the processing and analyzing the huge amount of experimental data. It is also a challenging problem since, during the manual procedure of cutting the brains and acquiring hundreds of images, the human operator can unwittingly change its natural sequence, loose data, induce large morphological distortions, or introduce artifacts. Most image seriation methods are two-step: firstly, a distance matrix is obtained from image processing, and secondly, the optimal seriation method is determined for this matrix. However, these methods are very sensitive to noise, distortion, and missing data, since the optimal solution for the matrix does not match the true seriation. Instead, we propose a graph-based method where the images are iteratively revisited and the image similarity information is refined, until a linear graph representing the seriation is obtained. This similarity information is based on Histogram Oriented Gradient (HOG) features, computed from random locations at the images in each iteration/revisitation. Experimental results based on both synthetic and real data are used to validate and illustrate the application of the method. It is showed that the proposed method outperforms the other state-of-the-art methods used for comparison purposes in this specific type of data.
Conference Paper
Exploratory data analysis (EDA: Tukey, 1977) has been introduced and extensively used for more than 30 years yet boxplot and scatterplot are still the major EDA tools for visualizing continuous data in the 21st century. On the other hand, multiple correspondence analysis (MCA) type of methods and mosaic plots are most popular in practice for visualizing multivariate binary and nominal data. But all these methods loose their efficiency when data dimensionality gets really high (hundreds/thousands), particularly when data is of non-continuous nature. Matrix visualization (MV) instead can simultaneously explore the associations of up to thousands of variables, subjects, and their interactions, without reducing dimension. MV permutes rows and columns of the raw data matrix together with two corresponding proximity matrices by suitable seriation (reordering) algorithms. These permuted matrices are then displayed as matrix maps through suitable color spectra for extracting the subject-clusters, variable-groups, and the subjects/variables interaction patterns. For binary data, conventional visualization techniques (boxplot, scatterplot (matrix), mosaic display, parallel coordinate plot, etc.) basically cannot provide users much visual information while the binary generalized association plots (bGAP), by integrating matrix visualization with suitably chosen proximity for binary data, can effectively present complex patterns for thousands of binary variables for thousands of subjects in one matrix visualization.
Article
A review is presented of methods of summarizing the relationships within a set of objects by a set of hierarchically-nested classes of similar objects, representable by a rooted tree diagram. Material covered includes algorithms for obtaining tree diagrams, comments on the selection of appropriate methods of analysis and the validation of classifications, distributions of different types of tree, and consensus trees.
Article
Correlation and covariance matrices provide the basis for all classical multivariate techniques. Many statistical tools exist for analyzing their structure, but, surprisingly, there are few techniques for exploratory visual display, and for depicting the patterns of relations among variables in such matrices directly, particularly when the number of variables is moderately large. This paper describes a set of techniques we subsume under the name “corrgram”, based on two main schemes: (a) rendering the value of a correlation to depict its sign and magnitude. We consider some of the properties of several iconic representations, in relation to the kind of task to be performed. (b) re-ordering the variables in a correlation matrix so that “similar” variables are positioned adjacently, facilitating perception. In addition, the extension of this visualization to matrices for conditional independence and partial independence is described and illustrated, and we provide an easily-used SAS implementation of these methods.
Article
The dietary regimes of 15 ungulate species from the middle Pleistocene levels of the hominid-bearing locality of Elandsfontein, South Africa, are investigated using the mesowear technique. Previous studies, using taxonomic analogy, classified twelve of the studied species as grazers (Redunca arundinum, Hippotragus gigas, Hippotragus leucophaeus, Antidorcas recki, Homoiceras antiquus, Damaliscus aff. lunatus, Connochaetes gnou laticornutus, Rabaticerus arambourgi, Damaliscus niro, Damaliscus sp. nov., an unnamed “spiral horn” antelope and Equus capensis), one as a mixed feeder (Taurotragus oryx) and two as browsers (Tragelaphus strepsiceros and Raphicerus melanotis). Although results from mesowear analysis sustain previous dietary classifications in the majority of cases, five species were reclassified. Three species previously classified as grazers, were reclassified as mixed feeders (H. gigas, D. aff. lunatus and R. arambourgi), one previously classified as a grazer, was reclassified as a browser (the “spiral horn” antelope), and one previously classified as a mixed feeder, was reclassified as a browser (T. oryx). While current results broadly support previous reconstructions of the Elandsfontein middle Pleistocene environment as one which included a substantial C3 grassy component, the reclassifications suggest that trees, broad-leaved bush and fynbos were probably more prominent than what was previously thought.
Article
Little is known about whether culture influences social correlates of dietary behaviors. Questionnaires on parent- and child-reported family and peer influences on children’s fruit, juice and vegetable consumption were analyzed for ethnic group differences in responses. Grade 4‐6 students completed the questionnaires in the classroom and their parents completed telephone or in-home interviews. Analyses of variance across ethnic categories and χ 2 analysis of differences in ethnic group composition between clusters of scales were conducted. Few ethnic group differences were detected, suggesting substantial commonality among respondents. Ethnic differences might be accommodated by interventions tailored to particular behaviors among ethnic groups.
Article
The floral community along South Africa's southwest coast today is dominated by shrubby strandveld, renosterveld, and coastal fynbos vegetation. The grass family (Poaceae), represented primarily by C 3 taxa, is scarce by comparison. Nevertheless, grass has a long history along this coast, as indicated by the presence of ∼5-million-year-old C3 grass pollen and phytoliths in sediments at the fossil locality of Langebaanweg E Quarry. Because the pollen and phytoliths of other plant families, including fynbos, have also been found, it has been difficult to determine whether grass was scarce or abundant in this environment. In order to shed light on this issue, I analyzed the dental mesowear of the E Quarry bovids. Results indicate that only one (Simatherium demissum) of seven analyzed species was a grazer. These compare well with the results of a microwear texture analysis, which indicate that none of the seven analyzed species were obligate grazers. These two studies point strongly toward a heavily wooded environment and not one that was dominated by grass. Although a conventional dental microwear analysis did identify three out of seven E Quarry bovid species as grazers (Bed3aN Damalacra, Kobus subdolus, and S. demissum), only S. demissum probably actually was a grazer. I suggest that the grazer signal exhibited by the other two bovid samples indicate that these species were taking advantage of a spike in grass abundance, probably during the winter growth season.
Chapter
Beavers are increasingly viewed as “ecological engineers,” having broad effects on physical, chemical, and biological attributes of north-temperate landscapes. We examine the influence of both local successional processes associated with beaver activity and regional geomorphic boundaries on spatial variation in fish assemblages along the Kabetogama Peninsula in Voyageurs National Park, northern Minnesota, USA. Based on the results, we present a hierarchical conceptual model suggesting how geomorphic boundaries and beaver pond succession interact to influence fish assemblage attributes. The presence of a productive and diverse fish assemblage in headwater streams of north-temperate areas requires the entire spatial and temporal mosaic of successional habitats associated with beaver activity, including those due to the creation and abandonment of beaver ponds. The ultimate impact of the local successional mosaic on fishes, however, will be strongly influenced by the regional geomorphic context in which the mosaic occurs.
Article
Full-text available
Eurygnathohippus is a genus of hipparionine horse that evolved in and was confined to the African continent from the late Miocene to Pleistocene interval. Eurygnathohippus woldegabrieli is a new species from Aramis, Ethiopia, dated between 4.4 and 4.2Ma. The hypodigm is currently restricted to 157 specimens from 14 Aramis localities and one nearby Gona locality. We nominate a mandible as the type and a maxillary dentition and two complete metacarpal IIIs as paratypes. Our analysis reveals that Eurygnathohippus woldegabrieli is derived compared with late Miocene Eurygnathohippus feibeli in its overall size, cheek tooth crown height, mandibular symphysis length, and robusticity of distal limb elements. It is primitive in mandibular symphysis length and robusticity of distal limb elements compared with the more advanced medial Pliocene species Eurygnathohippus hasumense. A study of Eurygnathohippus woldegabrieli’s paleodiet as measured by two mesowear methods, corroborated by carbon isotope studies, reveals that it was a dedicated grazer with a coarse C4 diet akin to that of modern zebras, wildebeests, and white rhinoceroses.
Article
Full-text available
The aim of the present study is to locate and decipher the groundwater quality, types, and hydrogeochemical reactions, which are responsible for elevated concentration of fluoride in the Chhindwara district in Madhya Pradesh, India. Groundwater samples, quality data and other ancillary information were collected for 26 villages in the Chhindwara District, M.P. India during May 2006. The saturation index was computed for the selected samples in the region, which suggest that generally most of the minerals are saturated with respect to water. The concentration of fluoride in the region varies from 0.6 to 4.74 mg/l, which is much higher as per the national and international water quality standards. The study also reveals that the fluoride bearing rock formations are the main source of the higher concentration of fluoride in groundwater along with the conjuncture of land use change. Moreover, the area is a hard rock terrain and consists of fractured granites and amygdaloidal and highly jointed compact basalt acting as good aquifer, which is probably enriching the high content of fluoride in groundwater. High concentration of fluoride is found in deeper level of groundwater and it is possible due to rock‐water interaction, which requires further detailed investigation. The highly alkaline conditions indicate fluorite dissolution, which works as a major process for higher concentration of fluoride in the study area. The results of this study will ultimately help in the identification of risk areas and taking measures to mitigate negative impacts related to fluoride pollution and toxicity.
Article
Full-text available
Previous seriation algorithms are confronted with a balance problem. Some approaches provide permutations with perfect wholeness, where matrix rows/columns are associated with increasing or decreasing gradient. However, this smooth permutation may lead to the blurred representation of the data structure, such as clustering structures and detailed structures inside clusters. Some other approaches indicate these structures well by tighter aggregating similar rows/columns, but this aggregation is alway at the cost of losing necessary coherence of the matrix rows/columns. In this paper, we introduce a seriation algorithm that aims at balancing the smoothness of the permutation and the clarity of the matrix structure. The permutation algorithm greedily and recursively replaces high-dissimilar object pairs with low-dissimilar ones, and the optimization algorithm searches the global optimizing solution by applying the simulated annealing algorithm. A comparison study shows both empirical and statistical evidence that Recut can provide more accurate and visually appropriate permutation by considering the balance problem.
Article
The feature selection for software birthmark has a direct bearing on software recognition rate. We apply constrained clustering to analyze software features. The within-and between-class distances of features are measured based on mutual information. Information gain and penalty functions are constructed using homogeneous and heterogeneous software features respectively. Then the software birthmark features with high class distinction and minimum redundancy are selected. It is shown the algorithm provide an effective approach for software birthmark feature selection and optimization by analysis and comparison.
Book
This is a book. We are not allowed to upload or share it, sorry.
ResearchGate has not been able to resolve any references for this publication.