BookPDF Available

Pattern Recognition With Fuzzy Objective Function Algorithms

Authors:
A preview of the PDF is not available

Chapters (6)

Section 1 (S1) describes specifically the problems to be discussed in succeeding chapters. In S2 a short analysis of the modeling process suggests that information and uncertainty will be key concepts in the development of new mathematical structures for pattern recognition. Fuzzy sets are introduced in the third section as a natural and tractable way to model physical situations which seem to require something more than is offered by deterministic or stochastic representations.
In this chapter the foundations for models of subsequent chapters are discussed. S4 contains some definitions and first properties of fuzzy sets. The important device of mathematical embedding first appears here. Interestingly enough, however, a property of the embedded structure (that a subset and its complement reproduce the original set under union) which is not preserved under embedment turns out to be one of its most useful aspects! In S5 partitions of finite data sets are given a matrix-theoretic characterization, culminating in a theorem which highlights the difference between conventional and fuzzy partitions. S6 is devoted to exploration of the algebraic and geometric nature of the partition embedment: the main results are that fuzzy partition space is compact, convex, and has dimension n (c − 1). S7 considers hard and fuzzy relations in finite data sets and records some connections between relations and partitions that tie the observational and relation-theoretic approaches together.
S8 illustrates some of the difficulties inherent with cluster analysis; its aim is to alert investigators to the fact that various algorithms can suggest radically different substructures in the same data set. The balance of Chapter 3 concerns objective functional methods based on fuzzy c-partitions of finite data. The nucleus for all these methods is optimization of nonlinear objectives involving the weights u ik ; functionals using these weights will be differentiable over M fc —but not over M c —a decided advantage for the fuzzy embedding of hard c-partition space. Classical first- and second-order conditions yield iterative algorithms for finding the optimal fuzzy c-partitions defined by various clustering criteria.
Section 14 reviews the validity problem. The main difficulty is this: complex algorithms stand squarely between the data for which substructure is hypothesized and the solutions they generate; hence it is all but impossible to transfer a theoretical null hypothesis about X to U ∈ M fc which can be used to statistically substantiate or repudiate the validity of algorithmically suggested clusters. As a result, a number of scalar measures of partition fuzziness (which are interesting in their own right) have been used as heuristic validity indicants. Sections 15, 16, and 17 discuss three such measures: the Anderson iris data surfaces in S15 and S17. S18 contains several approaches aimed towards connecting a null hypothesis about X to U ∈ M fc : this idea is currently being heavily studied, and S18 is transitory at best. Sections 19 and 20 discuss measures of hard cluster validity which have been related by their inventors to fuzzy algorithms in several ways. S19 contains a particularly interesting application to the design of interstellar navigational systems. S20 provides an additional insight into the geometric property that data which cluster well using FCM algorithm (A11.1) must have.
In this chapter we discuss three modifications of previous algorithms which have been proposed in an attempt to compensate for the difficulties caused by variations in cluster shape. The basic dilemma is that “clusters” defined by criterion functions usually take mathematical substance via metrical distances in data space. Each metric induces its own unseen but quite pervasive topological structure on ℝp due to the geometric shape of the open balls it defines. This often forces the criterion function employing d to unwittingly favor clusters in X having this basic shape—even when none are present! In S21, we discuss a novel approach due to Backer which “inverts” several previous strategies. S22 considers an interesting modification of the FCM functional J m due to Gustafson and Kessel,(54) which uses a different norm for each cluster! S23 and S24 discuss generalization of the fuzzy c-means algorithms (A11.1) in a different way—the prototypes v i for J m (U, v) become r-dimensional linear varieties in ℝp , 0 ≤ r ≤ p − 1.
In this final chapter we consider several fuzzy algorithms that effect partitions of feature space ℝp , enabling classification of unlabeled (future) observations, based on the decision functions which characterize the classifier. S25 describes the general problem in terms of a canonical classifier, and briefly discusses Bayesian statistical decision theory. In S26 estimation of the parameters of a mixed multivariate normal distribution via statistical (maximum likelihood) and fuzzy (c-means) methods is illustrated. Both methods generate very similar estimates of the optimal Bayesian classifier. S27 considers the utilization of the prototypical means generated by (A11.1) for characterization of a (single) nearest prototype classifier, and compares its empirical performance to the well-known k-nearest-neighbor family of deterministic classifiers. In S28, an implicit classifier design based on Ruspini’s algorithm is discussed and exemplified.
... • Phân cụm C-means (FCM) được giới thiệu lần đầu bởi Dunn [2] vào năm 1973, và được cải tiến bởi Bezdek [3] vào năm 1981. ...
... Như vậy, PC càng gần 1 cho thấy kết quả phân cụm càng "cứng" (mỗi điểm dữ liệu gần như chỉ thuộc về một cụm duy nhất), ngược lại PC thấp thể hiện kết quả càng "mờ" (fuzzy). Trong thực nghiệm về phân vùng khối u trên ảnh y khoa, việc PC cao thường chỉ ra vùng phân cụm khối u và nền được tách biệt rõ ràng hơn, hỗ trợ kiểm tra tính nhất quán của kết quả (Bezdek, 1981;Xie & Beni, 1991). ...
... Trong thuật toán phân cụm mờ (FCM), tham số m điều khiển "độ mềm" của ranh giới giữa các cụm: m càng gần 1 thì phân vùng càng cứng (giống k-means), còn m càng lớn thì mỗi điểm càng có khả năng chia đều giữa nhiều cụm. Theo các công trình gốc của Dunn (1973) và Bezdek (1981), giá trị m = 2.0 là tiêu chuẩn được áp dụng rộng rãi, vì nó tạo ra một độ mờ vừa phải, giúp thuật toán vừa tận dụng được ưu điểm của phân cụm mờ (giảm nhạy với nhiễu) vừa vẫn giữ được cấu trúc chi tiết của vùng khối u trên nhũ ảnh. Nếu chọn m quá nhỏ, mô hình dễ bỏ sót các vùng chuyển tiếp; nếu quá lớn, kết quả sẽ quá mờ, khó xác định chính xác biên khối u ...
Thesis
Việc phát hiện sớm ung thư vú thông qua nhũ ảnh là vô cùng cần thiết để nâng cao tỷ lệ sống sót và giảm chi phí điều trị, đặc biệt trong bối cảnh tỷ lệ mắc bệnh ngày càng gia tăng tại Việt Nam và trên thế giới. Nghiên cứu này đáp ứng nhu cầu cấp thiết về một hệ thống chẩn đoán tự động, chính xác và hiệu quả trong môi trường dữ liệu lớn. Để thực hiện, nghiên cứu sử dụng các thuật toán phân cụm mờ gồm FCM, EnFCM và thuật toán đề xuất MFCM, được triển khai trên nền tảng Apache Spark và Hadoop với kỹ thuật MapReduce và treeReduce để xử lý song song. MFCM là một cải tiến mới, kết hợp bộ lọc thích nghi và thông tin không gian lân cận, giúp giảm nhiễu và tăng độ chính xác phân vùng so với các phương pháp truyền thống. Phương pháp này tận dụng histogram để giảm khối lượng tính toán, đồng thời tích hợp xử lý phân tán tiên tiến trên cụm 12 worker. Qua kết quả thực nghiệm, MFCM đạt độ chính xác phân vùng cao nhất (96,39% trên bộ dữ liệu VinDr-Mammo) và cải thiện tốc độ gấp 2 lần khi sử dụng 12 worker so với môi trường cục bộ. Những điểm mới của nghiên cứu bao gồm việc hoàn thiện bộ dữ liệu VinDr-Mammo với segmentation mask, đề xuất MFCM vượt trội về độ chính xác và tốc độ, cũng như phương pháp chọn số cụm tối ưu dựa trên các chỉ số Xie–Beni, IoU và PC. Các đóng góp này hứa hẹn hỗ trợ bác sĩ trong chẩn đoán sớm ung thư vú, giảm tải công việc, tiết kiệm chi phí y tế và mở rộng khả năng tiếp cận dịch vụ chẩn đoán chất lượng cao, đặc biệt tại các địa phương có nguồn lực hạn chế.
... Dengan FCM, setiap titik data memiliki nilai keanggotaan yang menunjukkan seberapa besar keterkaitannya dengan masing-masing cluster, sehingga memberikan fleksibilitas dalam pengelompokan data yang memiliki batasan yang tidak jelas. Proses FCM melibatkan inisialisasi centroid, perhitungan derajat keanggotaan berdasarkan jarak ke centroid, dan pembaruan centroid hingga konvergensi tercapai (Bezdek, 1981 Bezdek (1981) pada nilai 2 dalam Fuzzy C-Means memberikan sifat yang diinginkan dari keanggotaan fuzzy, di mana setiap titik data dapat berkontribusi pada lebih dari satu cluster dengan cara yang terukur. Tingkat akurasi ( ) adalah batas toleransi untuk menghentikan iterasi jika perubahan nilai keanggotaan sangat kecil, menandakan konvergensi. ...
... Dengan FCM, setiap titik data memiliki nilai keanggotaan yang menunjukkan seberapa besar keterkaitannya dengan masing-masing cluster, sehingga memberikan fleksibilitas dalam pengelompokan data yang memiliki batasan yang tidak jelas. Proses FCM melibatkan inisialisasi centroid, perhitungan derajat keanggotaan berdasarkan jarak ke centroid, dan pembaruan centroid hingga konvergensi tercapai (Bezdek, 1981 Bezdek (1981) pada nilai 2 dalam Fuzzy C-Means memberikan sifat yang diinginkan dari keanggotaan fuzzy, di mana setiap titik data dapat berkontribusi pada lebih dari satu cluster dengan cara yang terukur. Tingkat akurasi ( ) adalah batas toleransi untuk menghentikan iterasi jika perubahan nilai keanggotaan sangat kecil, menandakan konvergensi. ...
... Keterangan: × = matriks partisi ukuran × pada iterasi ke-r : ukuran cluster yang diinginkan : banyaknya objek : derajat keanggotaan objek ke-i pada cluster ke-l (Bezdek, 1981). c) Menghitung pusat cluster (centroid) menggunakan rumus berikut : ...
Article
Full-text available
The Family Planning Program (KB) operates as a government initiative which controls both population numbers and birth interval lengths in communities. The program works to enhance family and community welfare through education and reproductive health service delivery which simultaneously helps decrease maternal and infant death rates. The Modern KB Method requires active participation from couples who have children and continue using modern contraception without interruptions in pregnancy. The data organization technique called cluster analysis groups similar data points into clusters where items within the same group share many similarities but differ substantially from items in other groups. Fuzzy C-Means represents a clustering approach which enables data points to connect with different groups at various intensity levels. The Fuzzy C-Means cluster analysis produced 3 clusters in this research study.
... Tento rozptyl globálneho radu je možné odhadnúť zo vzťahu pre 275 rozptyl teoretického rozdelenia. V tomto článku sa použilo GEV rozdelenie, a preto rozptyl 276 globálneho radu je daný vzorcom: 277 Priestorový rozptyl stredných hodnôt ročných maxím bol počítaný vzťahom: 285 príslušnosti ku zhluku vypočítané minimalizovaním funkcie (Bezdek, 1981): 336 ...
Preprint
Full-text available
V článku je opísaná regionálna frekvenčná analýza denných a viacdenných úhrnov zrážok z pohľadu teorémy ergodicity. Podľa teorémy ergodicity platí, že dlhodobé časové priemery realizácií stacionárneho stochastického procesu je môžné nahradiť priemerom ansámblu, t.j. priemerom jednotlivých realizácií v homogénnom priestore v krátkom čase. Koncept strednej ergodicity je v článku vysvetlený jednak po teoretickej stránke, tak aj pri riešení praktickej úlohy pri odhade návrhových hodnôt jedno a viacdenných úhrnov dažďa. Metódou Fuzzy-C means zhlukovania boli zrážkomerné stanice z celého územia Slovenska rozčlenené do piatich zhlukov. Vhodnosť tohhto rozdelenia bola testovaná štatisticky analýzou rozptylu. Stanice situované vo vysokohorských polohách a v pohoriach Nízkych a Vysokých Tatier boli identifikované ako jeden zhluk, avšak analýza rozptylu nepreukázala homogenitu tohto zhluku. Do budúcna bude potrebné sa viac zamerať na tento región. Pre každý zhluk boli zostavené globálne rady ktorých dĺžka ďaleko prevyšuje dĺžku radov pozorovaní v jednotlivých zrážkomerných staniciach. Toto umožnilo spracovať kvantily denných a viacdenných úhrnov zrážok pre široký zrozsah pravdeopdobnosti prekročenia (p=0,5-0,001). V článku sú prezentované výsledky spracovania kvantilov 1 až 5 denných úhrnov atmosférických zrážok za obdobie 2001-2020. S použitím Bayesovskej inferencie parametrov GEV rozdelenia a Monte Carlo Markov Chain simuláciami boli vypočítané aj neistoty vo forme kredibilných intervalov. V kontexte prebiehajúcej zmeny klímy sa očakáva nestacionarita aj v extémnychh úhrnoch zrážok. Vyhody konceptu ergodicity v regionalnej frekvenčnej analyzy je to, že nám dovoľuje pracovať s krátkymi radmi pozorovaní. Spájanie kratších radov do globálneho radu má tú vyhodu, že je splnený prepoklad stacionarity, čo by mohlo byť riešenie na odhad kvantilov v oblasti malých pravdepodobností aj v podmienkach rýchlo sa meniacej klímy.
... Neighboring members on the PC phase 172 plane were then divided into four groups using fuzzy c-means clustering, in which each 173 member was assigned a membership value, a real number between 0 and 1, based on its 174 distance from the cluster centroid. The membership values were obtained by minimizing 175 the following function (Bezdek 1981): 176 ...
Preprint
Full-text available
We describe a novel method for objectively selecting the most probable scenario among four typhoon forecasts. The method uses ensemble clustering to progressively incorporate a sequence of analytical data leading up to the most recent. Each ensemble member and the analytical data were initially projected onto the phase space spanned by the two leading principal components from an empirical orthogonal function analysis of the ensemble clustering. We then employed a particle filter-based Bayesian approach to assess the similarity between the forecasts and the analytical results within phase space. The scenario with the highest probability was then selected as the optimal scenario by application of the selective ensemble method. This new method was applied to Typhoon Hagibis (2019) using the regional ensemble prediction system of the Japan Meteorological Agency (JMA). The selected scenario successfully predicted a mesoscale front and associated coastal heavy rainfall that exceeded 100 mm per 3 hours in Miyagi Prefecture, which JMA's deterministic mesoscale model failed to forecast. Notably, the optimal scenario was identified prior to the onset of heavy rainfall. Statistical analysis of multiple typhoon cases demonstrated that the proposed method allows for optimal scenario selection up to six hours in advance of the target time. These results suggest that the new method can identify more realistic scenarios than operational deterministic forecasts of significant weather phenomena before their occurrence.
Article
The freight forwarding (FF) industry plays a key role in running global supply chains, with a sales revenue of $180.66 billion in 2021. Digitization in supply chain management, a thriving topic in the past few years, has been accelerated by the challenges of COVID-19 and presents both challenges and opportunities for the FF industry which requires freight forwarders to adapt. Since technological foresight studies for the FF space are scarce, the specific expected impacts of digitization in FF remain unrevealed today. The aim of this study is to examine upcoming changes in the FF industry expected by FF professionals and academics over the next 30 years against the background of current technological developments and to identify related resilience measures for FF organizations. Overall, 84 international experts shared their estimates in a Delphi survey. The results are clustered into four clusters that provide an outlook for the future of FF. The results show that FF organizations should adapt to developing customer expectations, a change in the FF profile due to new requirements and a change of vision for the development of human resources. Still, experts see a high relevance for FF services in the future despite all technological developments.
Chapter
A basic problem in the design of atmospheric experiments is presented by the choice of a sampling rate for the measurement of experimental variables. An approach to the solution of this problem is presented under the assumption that the sampling rate decision can be made prior to the execution of the experiment, as opposed to being made while the experiment is in progress. The technique used is to employ a newly developed and versatile family of fuzzy clustering algorithms, the Fuzzy c-Elliptotypes algorithms, and then to assess the fuzziness of the algorithmically determined clusters as a measure of the quality of the data.