Fig 3 - uploaded by James C. Bezdek
Content may be subject to copyright.
"Two cluster data" scatterplot and initial prototypes c = 2.

"Two cluster data" scatterplot and initial prototypes c = 2.

Source publication
Article
Full-text available
Fuzzy c-means (FCM) is a useful clustering technique. Modifications of FCM using L<sub>1</sub> norm distances increase robustness to outliers. Object and relational data versions of FCM clustering are defined for the more general case where the L<sub>p</sub> norm (p&ges;1) or semi-norm (0<p<1) is used as the measure of dissimilarity. We give simple...

Context in source publication

Context 1
... computed prototypes and partition matrices of the two approaches are themselves very similar? The remaining nu- merical experiments use other artificial two-dimensional data sets that allow us to graphically depict the effect of on the placement of the terminal prototype vectors. The two data sets (and initial prototype values) are depicted in Figs. 3 and 4 and are, respectively, called the "two cluster data" and "no cluster data." Using identical data values, initializations, and stopping criteria, we calculate and using the relational and object data approaches, respectively. We calculated the Frobenius norm difference in the terminal partitions and prototypes produced by the two ...

Similar publications

Article
Full-text available
This paper investigates the need for the development of fuzzy-temporal relational database model (FTRDM) and investigates some important and necessary issues and concepts for the development of such model. This paper highlights the importance of building temporal ontology for database application in the relational environment. The main emphasis is...

Citations

... FCM is widely used for its efficiency and simplicity, yet it struggles with complex, high-dimensional, and non-Euclidean datasets. To mitigate these limitations, several variants have been introduced, incorporating improved objective functions and constraints, such as adaptive FCM [19], generalized FCM [20], fuzzy weighted c-means [21], and generalized FCM with improved fuzzy partitioning [22]. Kernel-based approaches like kernel FCM (KFCM) [5] and constrained models, including agglomerative fuzzy k-means (AFKM) [23], robust self-sparse fuzzy clustering (RSSFCA) [18], robust and sparse fuzzy k-means (RSFKM) [24], possibilistic FCM (PFCM) [25], and principal component analysis-embedded FCM (P SFCM) [26] as well as hyperbolic extensions such as hyperbolic smoothing-based fuzzy clustering (HSFC) [27] and Integration of hyperbolic tangent and Gaussian kernels for FCM (HGFCM) [28], have also been explored. ...
Preprint
Full-text available
Clustering algorithms play a pivotal role in unsupervised learning by identifying and grouping similar objects based on shared characteristics. While traditional clustering techniques, such as hard and fuzzy center-based clustering, have been widely used, they struggle with complex, high-dimensional, and non-Euclidean datasets. In particular, the Fuzzy C-Means (FCM) algorithm, despite its efficiency and popularity, exhibits notable limitations in non-Euclidean spaces. Euclidean spaces assume linear separability and uniform distance scaling, limiting their effectiveness in capturing complex, hierarchical, or non-Euclidean structures in fuzzy clustering. To overcome these challenges, we introduce Filtration-based Hyperbolic Fuzzy C-Means (HypeFCM), a novel clustering algorithm tailored for better representation of data relationships in non-Euclidean spaces. HypeFCM integrates the principles of fuzzy clustering with hyperbolic geometry and employs a weight-based filtering mechanism to improve performance. The algorithm initializes weights using a Dirichlet distribution and iteratively refines cluster centroids and membership assignments based on a hyperbolic metric in the Poincar\'e Disc model. Extensive experimental evaluations demonstrate that HypeFCM significantly outperforms conventional fuzzy clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.
... Bobrowski and Bezdek extended the k-means clustering algorithm by exploring two similarity measures with nonhyperelliptical topologies: the square of the l 1 norm and the square of the supremum norm l ∞ [5]. Hathaway et al. examined clustering with general l p norm distances, with p varying from 0 to 1, and evaluated clustering performance with arti-ficial datasets [18]. Aside from members of the Minkowski or p-norm family, Filippone et al. surveyed modified k-means clustering algorithms incorporating kernel-based similarity measures that produce nonlinear separating hypersurfaces between clusters [15]. ...
Article
Full-text available
Recognizing the pivotal role of choosing an appropriate distance metric in designing the clustering algorithm, our focus is on innovating the k-means method by redefining the distance metric in its distortion. In this study, we introduce a novel k-means clustering algorithm utilizing a distance metric derived from the ℓpp\ell _p quasi-norm with p∈(0,1)p(0,1)p\in (0,1). Through an illustrative example, we showcase the advantageous properties of the proposed distance metric compared to commonly used alternatives for revealing natural groupings in data. Subsequently, we present a novel k-means type heuristic by integrating this sub-one quasi-norm-based distance, offer a step-by-step iterative relocation scheme, and prove the convergence to the Kuhn-Tucker point. Finally, we empirically validate the effectiveness of our clustering method through experiments on synthetic and real-life datasets, both in their original form and with additional noise introduced. We also investigate the performance of the proposed method as a subroutine in a deep learning clustering algorithm. Our results demonstrate the efficacy of the proposed k-means algorithm in capturing distinctive patterns exhibited by certain data types.
... Bobrowski and Bezdek extends the k-means clustering algorithm [4] by invesigating the use of two similarity measures that have nonhyperelliptical topologies, the square of the l 1 norm and the square of the sup norm l ∞ . Hathaway et al. [17] examined clustering with general l p norm distances, with p varying from 0 to 1, and evaluated clustering performance using artificial datasets. Aside from members of the Minkowski or p-norm family, Filippone et al. [14] presented a survey of modified k-means clustering algorithms incorporating kernel-based similarity measures that produce nonlinear separating hypersurfaces between clusters. ...
Preprint
Full-text available
In this paper, we investigate the impact of distance metrics on the performance of the k-means algorithm, which is one of the most popular clustering schemes. Our focus is on the sub-one ℓp quasi-norm-based distance metric, which has recently gained attention due to its ability to better leverage similarities between data-items while avoiding overemphasizing the dissimilarities. Through an illustrative example, we demonstrate the superiority of the proposed distance metric over commonly used metrics in revealing natural groupings in data. To validate the effectiveness of the proposed sub-one ℓp distance based k-means method, we conduct experiments on synthetic datasets as well as real-life datasets from the UCI machine learning repository both in their original form and with additional noise introduced. Overall, our results demonstrate the effectiveness of the ℓp quasi-norm-based distance metric in improving the performance of the k-means algorithm in various situations, particularly in noisy data.
... To remedy the shortcoming of performance sensitivity to initialization, many alternative methods have been proposed, such as the FCM variants with improved objective function and initialization, and additional constraints. FCM-like algorithms with improved objective function include adaptive FCM algorithm [38], generalized FCM clustering [39], enhanced FCM [40], fast generalized FCM [41], fuzzy weighted c-means [42], [43], generalized FCM algorithm with improved fuzzy partition [44], fuzzy local information c-means [45], Bayesian fuzzy clustering (BFC) [46], and kernel fuzzy c-means clustering (KFCM) [47]. FCM with improved initialization includes multistage random sampling [48], the genetic algorithm [49], the Gustafson-Kessel algorithm [50], initialization schemes by utilizing color space in image segmentation [51], [52], the Markov random field [53], and the two-phase fuzzy c-means (2PFCM) [54]. ...
Article
The fuzzy c-means clustering algorithm is the most widely used soft clustering algorithm. In contrast to hard clustering, the cluster membership of data generated using the fuzzy c-means algorithm is ambiguous. Similar to hard clustering algorithms, the clustering results of the fuzzy cmeans clustering algorithm are also sub-optimal with varied performance depending on initial solutions. In this paper, a collaborative annealing fuzzy c-means algorithm is presented. To address the issue of ambiguity, the proposed algorithm leverages an annealing procedure to phase out the fuzzy cluster membership degree toward a crispy one by reducing the exponent gradually according to a cooling schedule. To address the issue of sub-optimality, the proposed algorithm employs multiple fuzzy c-means modules to generate alternative clusters based on membership srepeatedly re-initialized using a meta-heuristic rule. Experimental results on eight benchmark datasets are elaborated to demonstrate the superiority of the proposed algorithm to thirteen prevailing hard and soft algorithms in terms of internal and external cluster validity indices.
... An assumption of Q = 2 usually leads to appropriate results for the most problems (e.g., Hathaway and Bezdek 2001). Calculating the derivatives of Eq. (15) with regard to the jk and the c k and fixing them equal to zero, the FCM algorithm is iteratively run by updating the jk as (e.g., Hathaway et al. 2000;Sun and Li 2015;Singh and Sharma 2018): and the centers c k at each iteration (Hathaway et al. 2000): ...
... An assumption of Q = 2 usually leads to appropriate results for the most problems (e.g., Hathaway and Bezdek 2001). Calculating the derivatives of Eq. (15) with regard to the jk and the c k and fixing them equal to zero, the FCM algorithm is iteratively run by updating the jk as (e.g., Hathaway et al. 2000;Sun and Li 2015;Singh and Sharma 2018): and the centers c k at each iteration (Hathaway et al. 2000): ...
Article
This work illustrates the application of a fuzzy-guided focused technique for cooperative 2D modeling of magnetic and gravity data on common geological sources responsible for anomalous observation, whereby a well-known fuzzy c-means clustering tool is inserted in the center of the inversion mechanism to search different clusters of geophysical properties which are magnetic susceptibility and density contrast within several zones. An unstructured meshing is performed with triangular cells which captures the accurate borders of a rugged topography area and any complex-shaped sought sources. The efficiency of the proposed cooperative inversion algorithm is examined along 2D profiles with simulation of several synthetic sources by which the retrieved geophysical properties indicate sharp edges, correct depth and shape pattern of the sought sources. Imposing a model stabilizer, a depth weighting function and petrophysics constraints of physical models within the clustering inversion, greatly promotes the performance of the constructed models over the outputs of running individually each data set. Of note is that a conjugate gradient solver is utilized here with a preconditioner to estimate approximately sought physical properties from an objective function with two constituents that are a model and a misfit norm. Ground-based gravity and magnetic observations over a plausible oil-trapping structure are investigated at the Kifl region, situated in Iraq. Cooperative inversion output can pave the way for imaging of a fault feature which has been filled by a thick sequence of sediments, presenting a configuration of a graben-horst structure. The main notable output of this work proves the existence of an oil-trapping structure responsible for a distinct potential field geophysics anomaly.
... The description of rock mass quality follows fuzzy criteria. Therefore, in this study, using fuzzy logic principles by FCM (Richard et al. 2000), a model was presented to predict Q and Q srm , as well as D.K. The fuzzy model inputs that considered as independent variables, are the parameters and indicators related to the geophysical data (V p , ER, K p and DCE). ...
Article
Full-text available
Since rock mass classification is based on common Q classification system and its modified system for sedimentary rocks (Qsrm) requires direct access to the rock mass, indirect methods are usually used to predict Q and Qsrm. The most important methods are geophysical methods such as pressure wave velocity (Vp) and electrical resistivity (ER). Considering the characteristics of cavities and other important properties of limestones in Qsrm, this classification is a good measure of quality in these rocks. The degree of karstification (D.K), in addition to the Qsrm, is also an important parameter in these rocks. Therefore, in this study, in addition to primary statistical analysis, using fuzzy C-means clustering (FCM) method, a multi-input–multi-output model for indirect estimation of Qsrm and D.K is presented. In addition to Vp and ER, two other independent variables called DCE and Kp were used to indirect prediction of Qsrm and D.K. DCE is the difference in the error of calculating the experimental relationships in estimating Q and Qsrm using Vp and ER, and Kp represents the ratio of Vp in the field to its value in the laboratory (Kp = VPLab/VPField). The proposed model has the ability to delete any of the mentioned independent variables in the absence of sufficient data except Vp. R2, RMSE and VAF evaluation indicators are highly consistent with the FCM model performance and when all four independent variables are used in the input data, we will have the best efficiency of the model.
... 2) is the distance metric between unlabeled samples and the cluster centroids. The classical or standard distance metric is norm-based metric, including Euclidean distance [30], Manhattan distance [31], log-Euclidean distance [32] and so on. Further, the Wishart-based [23] dissimilarity measure has become a popular distance metric. ...
Article
Full-text available
In the complicated geographical environment, there will be a seriously deleterious effect to the performance of synthetic aperture radar (SAR)-ground moving target indication (SAR-GMTI) system, because it is difficult to obtain the homogeneous training samples to accurately estimate the clutter covariance matrix (CCM) without prior information of the observed scene. To this end, this paper proposes a SAR-GMTI approach aided by online knowledge with an airborne multichannel quadrature-polarimetric (quad-pol) radar system. Generally, this paper can be divided into two parts: online knowledge acquisition and polarization knowledge-aided (Pol-KA) SAR-GMTI processing. Firstly, based on the similarity of pixels from the multichannel and multi-polarization information, a weighed estimation method of polarimetric coherency matrix is proposed, which can overcome the over-smoothing problem and increase the estimation accuracy of coherency matrix. Further, a hybrid weighted local K-means based on geodesic distance (GD-HWLKM) clustering algorithm is proposed to achieve the aim of unsupervised classification. Here, geodesic distance (GD) is exploited to measure the distance between multi-feature region covariance matrixes (MFRCMs) and a hybrid weight from different scales (including local cover class distribution, region and pixel) is calculated to automatically update the cluster centroid, which can make full use of the local spatial information by taking the interclass samples' similarity and the diversity of different classes into consideration. Secondly, with the assistance of the previous polarization SAR (PolSAR) image classification result, a Pol-KA SAR-GMTI method is developed. For each ground cover category, an accurate CCM can be estimated with the independent and identically distributed (IID) training samples. Then, the multichannel clutter suppression and preliminary constant false alarm rate (CFAR) detection are performed. Finally, with an airborne multichannel quad-pol radar system, the experimental results on real measured data demonstrate that the proposed method can efficiently improve the clutter suppression preformation and moving-target detection preformation.
... The most popular fuzzy clustering method is Fuzzy C-Means (FCM) [69]. Many improvements have been made upon FCM, including methods that more easily identify centers [70], generalize the algorithm to arbitrary distance metrics [71], reduce time complexity [72], and more. Fuzzy clustering can also be combined with hierarchical clustering, as was done in Hierarchical Unsupervised Fuzzy Clustering (HUFC) [73]. ...
Article
Full-text available
Clustering algorithms are a class of unsupervised machine learning (ML) algorithms that feature ubiquitously in modern data science, and play a key role in many learning-based application pipelines. Recently, research in the ML community has pivoted to analyzing the fairness of learning models, including clustering algorithms. Furthermore, research on fair clustering varies widely depending on the choice of clustering algorithm, fairness definitions employed, and other assumptions made regarding models. Despite this, a comprehensive survey of the field does not exist. In this paper, we seek to bridge this gap by categorizing existing research on fair clustering, and discussing possible avenues for future work. Through this survey, we aim to provide researchers with an organized overview of the field, and motivate new and unexplored lines of research regarding fairness in clustering.
... The study proposed to replace dealing with image pixel with gray level histogram. The latter issue regarding the Euclidian distance that was solved by studies [51,52]. The studies proposed LP norms (0 < p <= 1) as a distance measures rather than Euclidian distance that minimized outliers. ...
... Except for the unique FCM method [19], an amount of Fuzzy C-Means derivative has developed in the prose. FCM either aim at speedup of algorithm (example, speedy Fuzzy C-Means by arbitrary variety and rapid sweeping Fuzzy C-Means, or development of performance associated with clustering concerning artifacts or noise (example, LP space clustering [20], Probabilistic neural clustering [21], fuzzy-noise clustering [22]). ...