Giorgio Maria Di Nunzio

Giorgio Maria Di Nunzio
University of Padova | UNIPD · Department of Information Engineering

PhD Computer Science

About

159
Publications
12,878
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,191
Citations
Additional affiliations
January 2007 - present
University of Padova

Publications

Publications (159)
Preprint
Full-text available
In this paper, we want to speculate about the possibility to model all the currently known/proposed approaches to terminology into a single schema. We will use the Entity-Relationship (ER) diagram as our tool for the conceptual data model of the problem and to express the associations between the objects of the study. We will analyse the onomasiolo...
Preprint
Full-text available
Ranking is a fundamental operation in information access systems, to filter information and direct user attention towards items deemed most relevant to them. Due to position bias, items of similar relevance may receive significantly different exposure, raising fairness concerns for item providers and motivating recent research into fair ranking. Wh...
Article
Full-text available
In Information Retrieval (IR), the semantic gap represents the mismatch between users’ queries and how retrieval models answer to these queries. In this paper, we explore how to use external knowledge resources to enhance bag-of-words representations and reduce the effect of the semantic gap between queries and documents. In this regard, we propose...
Article
Full-text available
Evidence-based healthcare integrates the best research evidence with clinical expertise in order to make decisions based on the best practices available. In this context, the task of collecting all the relevant information, a recall oriented task, in order to take the right decision within a reasonable time frame has become an important issue. In t...
Article
The present research is aimed at conducting a study on Russian-Italian medical translation with regard to the current development of two Machine Translation tools that feature prominently in today’s Neural Machine Translation framework, namely DeepL and Yandex. For the purpose of our research, we have selected three highly specialized and three pop...
Chapter
Ranking is a fundamental operation in information access systems, to filter information and direct user attention towards items deemed most relevant to them. Due to position bias, items of similar relevance may receive significantly different exposure, raising fairness concerns for item providers and motivating recent research into fair ranking. Wh...
Conference Paper
Full-text available
We present FullBrain, a social e-learning platform where students share and track their knowledge. FullBrain users can post notes, ask questions and share learning resources in dedicated course and concept spaces. We detail two components of FullBrain: a Social Information Retrieval (SIR) system equipped with query autocomplete and query au-tosugge...
Article
Full-text available
Language interference is common in today’s multilingual societies where more languages are in contact. As a global result, it leads to the creation of hybrid languages. These, together with doubts on their right to be officially recognised, highlight the problem of their automatic identification and further elaboration in the area of computational...
Article
Terminology standardization reflects two different aspects involving the meaning of terms and the structure of terminological resources. In this paper, we focus on the structural aspect of standardization and we present the work of re-modeling TriMED, a multilingual terminological database conceived to support multi-register medical communication....
Chapter
Systematic reviews are scientific investigations that use strategies to include a comprehensive search of all potentially relevant articles and the use of explicit, reproducible criteria in the selection of articles for review. As time and resources are limited for compiling a systematic review, limits to the search are needed. In this paper, we de...
Conference Paper
Full-text available
The process of standardization plays an important role in the management of terminological resources. In this context, we present the work of re-modeling an existing multilingual terminological database for the medical domain, named TriMED. This resource was conceived in order to tackle some problems related to the complexity of medical terminology...
Article
Full-text available
In this work, we compare and analyze a variety of approaches in the task of medical publication retrieval and, in particular, for the Technology Assisted Review (TAR) task. This problem consists in the process of collecting articles that summarize all evidence that has been published regarding a certain medical topic. This task requires long search...
Preprint
Language interference is common in today's multilingual societies where more languages are being in contact and as a global final result leads to the creation of hybrid languages. These, together with doubts on their right to be officially recognised made emerge in the area of computational linguistics the problem of their automatic identification...
Article
Full-text available
Sir Arthur Conan Doyle was an esteemed and highly experienced physician and much of his medical knowledge spreads into his literary works. In this paper, we propose to study the medical terminology in the stories of Sherlock Holmes through a mixedmethod of quantitative and qualitative analysis. Our approach is based on 1) the automatic extraction o...
Conference Paper
Full-text available
This paper aims at comparing and reproducing the predictions of two public available computational auditory models for speaker localization in different simulated environments. The direction-of-arrival (DOA) of sound sources in the horizontal plane can be extracted by using binaural spatial cues from room and user acoustics. Since our predictions c...
Chapter
This paper describes the steps that led to the invention, design and development of the Distributed Information Retrieval Evaluation Campaign Tool (DIRECT) system for managing and accessing the data used and produced within experimental evaluation in Information Retrieval (IR). We present the context in which DIRECT was conceived, its conceptual mo...
Article
In-region location verification (IRLV) aims at verifying whether a user is inside a region of interest (ROI). In wireless networks, IRLV can exploit the features of the channel between the user and a set of trusted access points. In practice, the channel feature statistics is not available and we resort to machine learning (ML) solutions for IRLV....
Conference Paper
The Precision Medicine (PM) track at the Text REtrieval Conference (TREC) focuses on providing useful precision medicine-related information to clinicians treating cancer patients. The PM track gives the unique opportunity to evaluate medical IR systems using the same set of topics on two different collections: scientific literature and clinical tr...
Preprint
Full-text available
In this paper, we investigate how semantic relations between concepts extracted from medical documents can be employed to improve the retrieval of medical literature. Semantic relations explicitly represent relatedness between concepts and carry high informative power that can be leveraged to improve the effectiveness of retrieval functionalities o...
Chapter
In this paper, we present a methodology for the development of a new eHealth resource in the context of Computational Terminology. This resource, named TriMED, is a digital library of terminological records designed to satisfy the information needs of different categories of users within the healthcare field: patients, language professionals and ph...
Conference Paper
Full-text available
In this paper, we focus on the teaching of specialised translation and, in particular, on the preliminary phase of the translation process which is based on a broad and systematic work on the terminology of the micro-language considered. We present a new model of bilingual terminological record, as a digital tool supporting the process of translati...
Preprint
Full-text available
In-region location verification (IRLV) aims at verifying whether a user is inside an specific region, and in wireless networks it can exploit the features of the channel between the user to be verified and the set of trusted access points. As IRLV is an hypothesis testing problem we can resort to the Neyman-Pearson (N-P) theorem, when channel stati...
Preprint
Full-text available
We consider the in-region location verification problem of deciding if a message coming from a user equipment over a wireless network has been originated from a specific physical area (e.g., a safe room) or not. The detection process exploits the features of the channel over which the transmission occurs with respect to a set of network acces point...
Conference Paper
Full-text available
This contribution proposes to provide an overview of the syntactic and semantic behavior of medical terms in the literary works of Conan Doyle. The object of study is the analysis of the scientific terms in the stories of Sherlock Holmes through the model of terminological record set out in a multilingual terminological database (TriMED) implemente...
Conference Paper
Supervised machine learning algorithms require a set of labelled examples to be trained; however, the labelling process is a costly and time consuming task which is carried out by experts of the domain who label the dataset by means of an iterative process to filter out non-relevant objects of the dataset. In this paper, we describe a set of experi...
Conference Paper
Full-text available
Tre categorie di persone si confrontano con la complessità del linguaggio medico, ciascuna con le proprie esigenze di rimedio: medici, traduttori tecnico scientifici e pazienti. Il presente lavoro propone di elaborare uno strumento che contribuisca a porre rimedio all'opacità che caratterizza la comunicazione in ambito medico tra i suoi vari attori...
Conference Paper
Full-text available
Technology-Assisted Review (TAR) systems are essential to minimize the effort of the user during the search and retrieval of relevant documents for a specific information need. In this paper , we present a failure analysis based on terminological and linguistic aspects of a TAR system for systematic medical reviews. In particular, we analyze the re...
Conference Paper
Full-text available
Three precise categories of people are confronted with the complexity of medical language: physicians, patients and scientific translators. The purpose of this work is to develop a methodology for the implementation of a terminological tool that contributes to solve problems related to the opacity that characterizes communication in the medical fie...
Chapter
In this chapter we present a two-dimensional representation of probabilities called likelihood spaces. In particular, we show the geometrical properties of Bayes’ rule when projected into this two-dimensional space and extend this concept to Naïve Bayes classifiers. We apply this geometrical interpretation to a real machine learning problem of text...
Article
Background: Tens of glycemic variability (GV) indices are available in the literature to characterize the dynamic properties of glucose concentration profiles from continuous glucose monitoring (CGM) sensors. However, how to exploit the plethora of GV indices for classifying subjects is still controversial. For instance, the basic problem of using...
Conference Paper
Efficiently allocating resources and predicting cell handovers is essential in modern wireless networks; however, this is only possible if there is an efficient way to estimate the future state of the network. In order to accomplish this, we investigate two learning techniques to predict the long-term channel gains in a wireless network. Previous w...
Chapter
Educational data mining (EDM) is an emerging discipline that studies methods for exploring the data that come from educational environments. This chapter focuses on students that are studying foundations of machine learning (ML) and, in particular, probabilistic models for classification. Interactive ML (IML) is a relatively new area of ML where in...
Article
Full-text available
The 38th European Conference on Information Retrieval took place from the 20th to the 23rd of March 2016 in Padua, Italy. This report summarizes the conference in terms of the presented keynotes, scientific and social programme, industry day, tutorials, workshops and student support.
Conference Paper
Automatic text categorization is an effective way to organize large text datasets in Digital Libraries (DL). However, most of the available machine learning tools are complex and go beyond the scope of what a digital library curator need or is able to do in order to classify the objects of a DL. Drawing inspiration from the field of Learning Analyt...
Article
Full-text available
Nowadays, most mobile devices are equipped with multiple wireless interfaces, causing an emerging research interest in device to device (D2D) communication: the idea behind the D2D paradigm is to exploit the proper interface to directly communicate with another user, without traversing any network infrastructure. A first issue related to this parad...
Conference Paper
In this demo, we present two applications which allow users to ‘see’ a geometric interpretation of the Bayes’ rule and interact with a Naïve Bayes text classifier on a real dataset, namely the Reuters-21578 newswire collection. The main objective of this demo is to show how the pattern recognition capabilities of the human increase the effectivenes...
Conference Paper
In this demo, we present a web application which allows users to interact with two retrieval models, namely the Binary Independence Model (BIM) and the BM25 model, on a standard TREC collection. The goal of this demo is to give students deeper insight into the consequences of modeling assumptions (BIM vs. BM25) and the consequences of tuning parame...
Conference Paper
Full-text available
In this position paper, we discuss the issue of how to ensure reproducibility of the results when off-the-shelf open source Information Retrieval (IR) systems are used. These systems provided a great advancement to the field but they rely on many configurations parameters which are often implicit or hidden in the documentation and/or source code. I...
Article
In this paper, we present the initial findings about a possible geometric interpretation of the BM25 model and a comparison of the BM25 with the Binary Independence Model (BIM) on a two-dimensional space. A Web application was developed in R to show an example of this geometric view on a standard TREC collection. The application is accessible at th...
Article
Full-text available
In Information Retrieval (IR), measuring the distance between rankings is a way for comparing evaluation measures and assess system rankings. In this paper, we present a variation of the Spearman foot rule which allows us to dene two measures that have nice analytical and geometrical properties that can be eectively used to compare dierent rankings...
Chapter
The goal of supervised classification is to assign a new object to a class from a given set of classes based on the attribute values of this object and on a training set. Although “supervised,” classification algorithms provide only very limited forms of guidance by the user. Typically, the user selects the dataset and sets the values for some para...
Article
Full-text available
Research in dialectal variation allows linguists to understand the fundamental principles that underlie language systems and gram- matical changes in time and space. Since different dialectal variants do not occur randomly on the territory and geographical patterns of variation are recognizable for an individual syntactic form, we believe that a sy...
Article
Practical classification problems often involve some kind of trade-off between the decisions a classifier may take. Indeed, it may be the case that decisions are not equally good or costly; therefore, it is important for the classifier to be able to predict the risk associated with each classification decision. Bayesian decision theory is a fundame...
Conference Paper
Full-text available
Syntactic comparison across languages is essential in the research field of linguistics, e.g. when investigating the relationship among closely related languages. In IR and NLP, the syntactic information is used to understand the meaning of word occurrences according to the context in which their appear. In this paper, we discuss a mathematical fra...
Article
Full-text available
Geolinguistic systems explore the relationship between language and cultural adaptation and change and they can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. However, the heterogeneity of geolinguistic projects has been recognised as a key problem limiting the reusability of...
Conference Paper
This article examines manual textual categorisation by human coders with the hypothesis that the law of total probability may be violated for difficult categories. An empirical evaluation was conducted to compare a one step categorisation task with a two step categorisation task using crowdsourcing. It was found that the law of total probability wa...
Conference Paper
This poster discusses the main assumptions of classical probabilistic models in IR by means of a visual data analysis approach. Starting from the problem of classification of documents into relevant and non relevant classes, we derive the exact same formula of the relevance weight of the Binary Independence Model but with more degrees of interactio...
Article
This work discusses the consequences of choosing a sample space based on the interpretation of an experiment. We discuss the paradoxes that have been extensively studied in literature, and then we propose an alternative interpretation of the problem of document classification.
Conference Paper
Digital Geolinguistic systems encourages collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. These systems can be used as instructional tools, presenting complex data and relationships in a way accessible to all educational levels. In this...
Conference Paper
Full-text available
Digital Geolinguistic systems encourage collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. In this demo, we propose a Linked Open Data approach for increasing the level of interoperability of geolinguistic applications and the reuse of the...
Article
In this paper, we present some ideas about possible directions of a new interpretation of the Okapi BM25 ranking formula. In particular, we have focused on a full bayesian approach for deriving a smoothed formula that takes into account a-priori knowledge on the probability of terms. In fact, most of the efforts in improving the BM25 were done in c...
Article
The Open Language Archives Community which recently celebrated its first 10 years of activity, is a worldwide network dedicated to collecting information on language resources and developing standard protocols for interoperability. In this context, Linked Open Data paradigm is very promising, because it eases interoperability between different syst...
Article
Full-text available
The PROMISE network of excellence organized a two-days brainstorming workshop on 30th and 31st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. This document reports on the outcomes of this event and provides...
Conference Paper
In this paper we introduce the Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt) enterprise which is a linguistic project aiming to account for minimally different variants within a sample of closely related languages. One of the main goals of ASIt is to share and make linguistic data re-usable. In order to create a universally available...
Conference Paper
Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations of data sparsity (i.e. when the size of training set is small) the maximum likelihood estimation...
Article
In the last decade, the importance of analyzing information management systems logs has grown, because log data constitute a relevant aspect in evaluating the quality of such systems. A review of 10 years of research on log analysis is presented in this paper. About 50 papers and posters from five major conferences and about 30 related journal pape...
Article
This paper describes the Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt) linguistic linked dataset. ASIt is a scientific project aiming to account for minimally different variants within a sample of closely related languages; it is part of the Edisyn network, the goal of which is to establish a European network of researchers in the ar...
Conference Paper
Full-text available
Naïve Bayes probabilistic models are widely used in text categorization because of their efficient model training and good empirical results. Bayesian classifiers face a common issue called data sparsity problem which makes an adequate estimation of probabilities a difficult task. Therefore, smoothing techniques are needed in order to adjust the ma...
Conference Paper
Full-text available
Dynamic Quantum Clustering (DQC) is a recent clustering technique based on physical intuition from quantum mechanics. Clusters are identified as the minima of the potential function of the Schrödinger equation. In this poster, we apply this technique to explore the possibility to select highly relevant documents relative to a query of a user. In pa...
Conference Paper
Dynamic Quantum Clustering is a recent clustering technique which makes use of Parzen window estimator to construct a potential function whose minima are related to the clusters to be found. The dynamic of the system is computed by means of the Schrödinger differential equation. In this paper, we apply this technique in the context of Information R...
Conference Paper
Interactions between users and information access systems can be analyzed and studied to gather user preferences and to learn what a user likes the most, and to use this information to adapt the search to users and personalize the presentation of results. The LogCLEF lab - "A benchmarking activity on Multilingual Log File Analysis: Language identif...
Conference Paper
Full-text available
The current lack of recent and long-term query logs makes the verifiability and repeatability of log analysis experiments very limited. A first attempt in this direction has been made within the Cross-Language Evaluation Forum in 2009 in a track named LogCLEF which aims to stimulate research on user behaviour in multilingual environments and promot...
Conference Paper
The current lack of recent and long-term query logs makes the verifiability and repeatability of log analysis experiments very limited. A first attempt in this direction has been made within the Cross-Language Evaluation Forum in 2009 in a track named LogCLEF which aims to stimulate research on user behaviour in multilingual environments and promot...
Conference Paper
Full-text available
ASIt aims to observe, collect and analyse the linguistic variation displayed by the dialects of a language. The main theoretical hypothesis is that linguistic variation is not due to chance, but depends on the combination of a finite number of parameters. It is a first step towards the creation of a European digital library for recording and studyi...
Conference Paper
Full-text available
The paper illustrates the methodology at the basis of the design of a digital library system that enables the management of linguistic resources of curated dialect data. Since dialects are rarely recognized as official languages, first of all linguists need a dedicated information management system providing the unambiguous identification of each d...
Conference Paper
EuropeanaConnect delivers core components which are essential for the realisation of the European Digital Library (Europeana) as a truly interoperable, multilingual and user-oriented service for all European citizens.
Article
This article reports the findings of a user study conducted in the context of the TELplus project to gain insights about user needs and preferences for the digital library services offered by The European Library Web portal. The user requirements collection for the Web portal was designed by adopting a comprehensive survey approach. This combined e...
Conference Paper
This paper presents the design, implementation and evaluation of a query suggestion tool (named i-TEL-u) that allows for the management and the exploitation of different contexts in an integrated way within the same search interface for accessing the contents of The European Library portal. i-TEL-u allows users to seamlessly move from one context t...