Alfredo Cobá’s research while affiliated with Autonomous University of Yucatán and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


A depiction of compositional data and its representation in the probability simplez. All instances of compositional data can be represented as points in the probability simplex Δ3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta ^{3}$$\end{document}. In the example, points in pink represent possible foods in terms of their percentage of protein, carbohidrates and fat. Red meat foods are represented by red squares, the blue circles indicate vegetables and green triangles show some types of fish and seafood
The k-neighborhood of a vector. LOF identifies the k-nearest neighbors for each vector based on the k-distance. Vector v has as its neighbors vectors w1,w2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{1}, w_{2}$$\end{document}, and w3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{3}$$\end{document}. The expected distance from v to its neighbors serves as the basis for the characterization of v. The neighbors of v have to be characterized in terms of their own neighbors. The characterization of v and those in its context (neighborhood) are to be compared to compute the local outlier factor of v
Annulus under different distances—divergences. The shape of the obtained annulus is affected by the applied distance
A sketch of the test datasets. Top: in the first group, several annulus were created within the probability simple (blue points). A few points were added to the probability simplex, but not fulfilling the pattern criteria (red). The latter are to be identified as anomalies. Bottom: several histograms were generated from a fixed probability function. Each histogram consists of n bins and is embedded into the probability simplex. A few histograms obtained from a different probability function are included to function as anomalies. In both groups, the usual or regular class is from 5 to 10 times more abundant than the anomaly class
Precision, recall and accuracy of anomaly detection for the dataset of points defining an annulus in the probability simplex Δd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta ^{d}$$\end{document}, for d=3,5,10,20\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = 3, 5, 10, 20$$\end{document}

+3

Anomaly detection in the probability simplex under different geometries
  • Article
  • Full-text available

May 2023

·

147 Reads

·

2 Citations

Information Geometry

·

Sergio Mota

·

Sergio Martinez

·

[...]

·

An open problem in data science is that of anomaly detection. Anomalies are instances that do not maintain a certain property that is present in the remaining observations in a dataset. Several anomaly detection algorithms exist, since the process itself is ill-posed mainly because the criteria that separates common or expected vectors from anomalies are not unique. In the most extreme case, data is not labelled and the algorithm has to identify the vectors that are anomalous, or assign a degree of anomaly to each vector. The majority of anomaly detection algorithms do not make any assumptions about the properties of the feature space in which observations are embedded, which may affect the results when those spaces present certain properties. For instance, compositional data such as normalized histograms, that can be embedded in a probability simplex, constitute a particularly relevant case. In this contribution, we address the problem of detecting anomalies in the probability simplex, relying on concepts from Information Geometry, mainly by focusing our efforts in the distance functions commonly applied in that context. We report the results of a series of experiments and conclude that when a specific distance-based anomaly detection algorithm relies on Information Geometry-related distance functions instead of the Euclidean distance, the performance is significantly improved.

Download

Citations (1)


... The concentration of the distribution varies based on a shape parameter, γ. When γ is greater than 1, the density concentrates in the center of the probability simplex, defined as the set of all non-negative vectors whose components sum to one (Frigyik et al., 2010;Legaria et al., 2023). If γ is between 0 and 1, the concentration shifts to the vertices of the simplex (Frigyik et al., 2010). ...

Reference:

Integrated Water Management Under Different Water Rights Institutions and Population Patterns: Methodology and Application
Anomaly detection in the probability simplex under different geometries

Information Geometry