Anna Cena

Anna Cena
Warsaw University of Technology · Faculty of Mathematics and Information Science

M.Sc.

About

27
Publications
14,537
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
263
Citations

Publications

Publications (27)
Article
Full-text available
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between th...
Preprint
Full-text available
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they can be meaningful in data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm...
Preprint
Full-text available
Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationsh...
Article
Full-text available
This paper aims to find the reasons why some citation models can predict a set of specific bibliometric indices extremely well. We show why fitting a model that preserves the total sum of a vector can be beneficial in the case of heavy-tailed data that are frequently observed in informetrics and similar disciplines. Based on this observation, we in...
Preprint
Full-text available
The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algo...
Article
Full-text available
We analyse the usefulness of Jain’s fairness measure and the related Prathap’s bibliometric z-index as proxies when estimating the parameters of the 3DSI (three dimensions of scientific impact) model.
Preprint
Full-text available
Internal cluster validity measures (such as the Calinski-Harabasz, Dunn, or Davies-Bouldin indices) are frequently used for selecting the appropriate number of partitions a dataset should be split into. In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities. Is the optimal grouping...
Article
Full-text available
There are many approaches to the modelling of citation vectors of individual authors. Models may serve different purposes, but usually they are evaluated with regards to how well they align to citation distributions in large networks of papers. Here we compare a few leading models in terms of their ability to correctly reproduce the values of selec...
Article
Full-text available
We demonstrate that by using a triple of simple numerical summaries: an author’s productivity, their overall impact, and a single other bibliometric index that aims to capture the shape of the citation distribution, we can reconstruct other popular metrics of bibliometric impact with a sufficient degree of precision. We thus conclude that the use o...
Article
Full-text available
Internal cluster validity measures (such as the Caliski–Harabasz, Dunn, or Davies–Bouldin indices) are frequently used for selecting the appropriate number of partitions a dataset should be split into. In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities. Is the optimal grouping...
Article
The growing popularity of bibliometric indexes (whose most famous example is the h index by J. E. Hirsch [J. E. Hirsch, Proc. Natl. Acad. Sci. U.S.A. 102, 16569–16572 (2005)]) is opposed by those claiming that one’s scientific impact cannot be reduced to a single number. Some even believe that our complex reality fails to submit to any quantitative...
Article
Full-text available
We investigate the application of the Ordered Weighted Averaging (OWA) data fusion operator in agglomerative hierarchical clustering. The examined setting generalises the well-known single, complete and average linkage schemes. It allows to embody expert knowledge in the cluster merge process and to provide a much wider range of possible linkages....
Article
This paper concerns the use of high-resolution images from a Hasselblad H6D - 100c middle format camera in the investigation of wall paintings. The presented investigation, proved the great potential of an image-based approach combined with the used camera for such documentation. The conducted analysis shows that the obtained accuracy allows not on...
Book
Full-text available
**zobacz także: https://datawranglingpy.gagolewski.com** Celem autorów książki jest przygotowanie Czytelnika do samodzielnego przeprowadzenia całego procesu analizy danych, od pobrania i załadowania zbioru, przez jego wstępne przetworzenie i wyczyszczenie, aż po samą analizę, wizualizację wyników i ich interpretację. Wiemy, że pewne rozwiązania, k...
Conference Paper
Full-text available
The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity...
Conference Paper
We discuss a generalization of the fuzzy (weighted) k-means clustering procedure and point out its relationships with data aggregation in spaces equipped with arbitrary dissimilarity measures. In the proposed setting, a data set partitioning is performed based on the notion of points’ proximity to generic distance-based penalty minimizers. Moreover...
Article
Full-text available
The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algo...
Article
Full-text available
The Hirsch's h-index is perhaps the most popular citation-based measure of the scientific excellence. In 2013 G. Ionescu and B. Chopard proposed an agent-based model for this index to describe a publications and citations generation process in an abstract scientific community. With such an approach one can simulate a single scientist's activity, an...
Chapter
Full-text available
The K-means algorithm is one of the most often used clustering techniques. However, when it comes to discovering clusters in informetric data sets that consist of non-increasingly ordered vectors of not necessarily conforming lengths, such a method cannot be applied directly. Hence, in this paper, we propose a K-means-like algorithm to determine gr...
Article
Classically, unsupervised machine learning techniques are applied on data sets with fixed number of attributes (variables). However, many problems encountered in the field of informetrics face us with the need to extend these kinds of methods in a way such that they may be computed over a set of nonincreasingly ordered vectors of unequal lengths. T...
Article
Full-text available
The recently-introduced OM3 aggregation operators fulfill three appealing properties: they are simultaneously minitive, maxitive, and modular. Among the instances of OM3 operators we find e.g. OWMax and OWMin operators, the famous Hirsch's h-index and all its natural generalizations. In this paper the basic axiomatic and probabilistic properties of...
Chapter
Full-text available
Recently, a very interesting relation between symmetric minitive, maxitive, and modular aggregation operators has been shown. It turns out that the intersection between any pair of the mentioned classes is the same. This result introduces what we here propose to call the OM3 operators. In the first part of our contribution on the analysis of the OM...
Chapter
Full-text available
This article is a second part of the contribution on the analysis of the recently-proposed class of symmetric maxitive, minitive and modular aggregation operators. Recent results [M. Gagolewski and R. Mesiar, “Aggregating different paper quality measures with a generalized h-index“, J. Informetr. 6, No. 4, 566–579 (2012)] indicated some unstable be...

Network

Cited By