
Antonio Origlia- University of Naples Federico II
Antonio Origlia
- University of Naples Federico II
About
98
Publications
13,579
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
731
Citations
Introduction
Current institution
Publications
Publications (98)
The BRILLO (Bartending robot for interactive long-lasting operations) project aims to create an autonomous robotic bartender that can interact with customers while accomplishing its bartending tasks. In such a scenario, people’s novelty effect connected to the use of an attractive technology is destined to wear off and, consequently, negatively aff...
Socially assistive robots represent a promising tool in assistive contexts for improving people’s quality of life and well-being through social, emotional, cognitive, and physical support. However, the effectiveness of interactions heavily relies on the robots’ ability to adapt to the needs of the assisted individuals and to offer support proactive...
In linguistics, research on dialogue systems has accentuated the need to focus on various pragmatic aspects for their management and modelling. Among the most important pragma-linguistic speech acts in dialogue systems studies are Clarification Requests, corrective feedback that in some circumstances require access to the set of shared knowledge kn...
This paper explores the application of the Influence Diagrams model for decision-making in the context of conversational agents. The system consists of a Conversational Recommender System (CoRS), in which the decision-making module is separate from the language generation module. It provides the capability to evolve a belief based on user responses...
Conversational recommender systems aim at recommending the most relevant information for users based on textual or spoken dialogues, through which users can communicate their preferences to the system more efficiently. Argumentative conversational recommender systems represent a kind of deliberation dialogue in which participants share their specif...
This study presents the results of two perception experiments aimed at evaluating the effect that specific patterns of disfluencies have on people listening to synthetic speech. We consider the particular case of Cultural Heritage presentations and propose a linguistic model to support the positioning of disfluencies throughout the utterances in th...
The BRILLO (Bartending Robot for Interactive Long-Lasting Operations) project aims to create an autonomous robotic bartender that can interact with customers while accomplishing its bartending tasks. In such a scenario, people's novelty effect connected to the use of an attractive technology is destined to wear off and, consequently, negatively aff...
Automatic speech recognition systems based on end-to-end models (E2E-ASRs) can achieve comparable performance to conventional ASR systems while reproducing all their essential parts automatically, from speech units to the language model. However, they hide the underlying perceptual processes modelled, if any, and they have lower adaptability to mul...
Socially assistive robots represent a promising tool in assistive contexts to improve people's quality of life and well-being through social and emotional support, just like cognitive or physical. However, the effectiveness of interactions depends significantly on their ability to adapt to the needs of the assisted persons and act proactively in an...
The literature provides increasing evidence that co-verbal gestures partake in utterances. This position is supported by the observation, in several languages, that a suspension of speech due to planning issues often corresponds to a suspension of gestural activity. However, studies on the correlation between speech disfluency phenomena and co-occu...
The paper proposes the topic of Environmental Artificial Intelligence i.e., Artificial Intelligence approaches, based on the use of natural language, applied to architecture to support the design of systems to control the progress of degenerative states and to test their functionality through simulations with real-time interactive 3D models. Beginn...
Monitoring the state of railways infrastructure is crucial for travel safety. These inspections are mostly accomplished by means of dedicated and expensive vehicles, which cause significant disruption to the normal operativity of a line. A recent emerging solution is to equip train vehicles with low cost sensors (mostly accelerometers) able to scan...
The paper presents the results of a scientific collaboration between the Interdepartmental Research Center Urban/Eco of the University of Naples Federico II and the MANN (Museo Archeologico Nazionale di Napoli, National Archaeological Museum of Naples). The research activity was aimed to the digitisation, design, and development of an AR/VR-powered...
BRILLO (Bartending Robot for Interactive Long-Lasting Operations) project has the overall goal of creating an autonomous robotic bartender that can interact with customers while accomplishing its bartending tasks. In such a scenario, people's novelty effect connected to the use of an attractive technology is destined to wear off and, consequently,...
Analysing text to detect semantic similarities is a recent breakthrough of Natural Language Processing that brought many novel applications in different fields. A domain which could greatly benefit of this innovation is the one regarding Location-based and/or Touristic Recommender Systems, where the user receives suggestions based on his/her past l...
Nowadays, the use of graph databases combined with textual corpora analysis seems to play a pivotal role in supporting dialogue systems design and implementation. However, dialogues are rarely put in an explicit relationship with the graph structures representing the knowledge domain. In this work, we show how native graph databases provide a frame...
In this work, a spoken dialogue system architecture capable of dealing with Common Ground inconsistencies is proposed. Specifically, attention will be drawn upon the Conflict Search Graph, with insights on its ability to recognise problems and make them explicit via polar questions. Appropriate question forms are, indeed, adopted for the occurring...
The paper shows the results of the PRIN CHROME Cultural Heritage Orienting Multimodal Experiences project, about the three charterhouses of Campania, with a specific focus on research activities related to the connections between representation, survey, AI and VR. The project has formalized a methodology of collection, analysis and modeling of mult...
Past research has concentrated on the use of different forms of polar questions in specific contexts, defined in terms of the relationship between original bias and contextual evidence. It has been showed that, for English and German, people tend to prefer specific forms given the pragmatic context. Based on previous experiments, in this work, we o...
The continuous growth of available resources on the web, both in the form of Linked Open Data and on Social Networks, provides an important opportunity to gather information concerning specific kinds of touristic activities like, for example, cultural tourism, eco-tourism, bike-tourism, and so on. Both decision makers and tourists can take advantag...
Continuous monitoring procedures are becoming even more crucial for assessing the potential deterioration of architectural structures, due to the many inducted advantages. A cultural heritage site, in fact, is constantly subject to degradation, due in particular to atmospheric agents. Preserving it with preventive analyses is an important goal for...
Understanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly related to human speech recognition. They commonly process the phonetic structure of speech while neglecting supra-...
The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus ca...
Providing technologies to support the visiting experience in cultural venues of artistic value is an important issue that needs to be addressed by considering the delicate nature of the places. Architectural heritage and visual arts are two valuable examples: the most sensible choice for augmenting the comprehension and the experience concerning th...
Parking Guidance and Information (PGI) systems aim at supporting drivers in finding suitable parking spaces, also by predicting the availability at driver’s Estimated Time of Arrival (ETA), leveraging information about the general parking availability situation. To do these predictions, most of the proposals in the literature dealing with on-street...
The integrity of phonetic perception abilities is necessary for a normal functioning future speech development. Since the ability to discriminate linguistic sounds is typically associated to the correct acquisition and production of the same sounds, an alteration of this ability could contribute to the onset of speech and language disorders. Suppor...
Investigating the multimodal communication of Tourist Guides to implement a Virtual Tourist Guide leading tourists in three Italian Charterhouses, the paper focuses on an aspect of the human guide's speech that would be useful to create a very realistic Virtual Guide: linguistic disfluencies. On a corpus of three guided tours in S. Martino Charterh...
With the recent availability of industry-grade, high-performing engines for video games production, researchers in different fields have been exploiting the advanced technologies offered by these artefacts to improve the quality of the interactive experiences they design. While these engines provide excellent and easy-to-use tools to design interfa...
The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus ca...
Monitoring the occupancy of on-street parking spaces on a city-wide scale is still an open issue. Past research demonstrated the viability of parking crowd-sensing by means of the standard on-board sensors of probe vehicles, foreseeing the use of high-mileage vehicles, like taxis. Nevertheless, the achievable spatio-temporal sensing coverage has ne...
Definition and experimentation of a methodology of collecting, analysing and modelling multimodal data in designing virtual agents serving in museums, through an “anthropomorphic” human-machine dialog system.
- Research features and regularity of verbal behavioural pattern to be implemented in text to speech (TTS) systems in order to improve its p...
We report about the organization of the IDIAL (Evaluation of Italian DIALogue systems) task at EVALITA 2018, the first shared task aiming at assessing interactive characteristics of conversational agents for the Italian language. In this perspective, IDIAL considers a dialogue system as a "black box" (i.e., evaluation cannot access internal compone...
The SUGAR task is intended to develop a baseline to train a voice-controlled robotic agent to act as a cooking assistant. The starting point will be therefore to provide authentic spoken data collected in a simulated natural context from which semantic predicates will be extracted to classify the actions to perform. Three different approaches were...
In this paper, we propose a new set of experiments to further evaluate the performance of a previously presented system based on an adaptive strategy for stimuli selection masked behind a gamified activity. This involves two virtual agents creating a social setting designed to support a narrative to engage young children. With respect to previously...
Various vocalizations are displayed in everyday conversation and TV debates and talk shows, with or without communicative import: cough, hiccup, laughter. While laughter has been the object of intense research, a peculiar vocalization has been less investigated: the sigh. While Boncinelli (2012) and other authors explained its structure as a peculi...
We present here the conversion of Linguistic Linked Open Data into Semantic Maps to be used to produce contents in a set of technological applications for Cultural Heritage. The paper describes the architectural data collection and annotation procedure adopted in the Cultural Heritage Orienting Multimodal Experiences (CHROME) project (PRIN 2015 fun...
Cultural Heritage (CH) is a challenging domain of application for novel Information and Communication Technologies (ICT), where visualization plays a major role in enhancing visitors' experience, either onsite or online. Technology-supported natural human-computer interaction is a key factor in enabling access to CH assets. Advances in ICT ease vis...
With the advent of artificial intelligence and natural user interfaces, the need for multimedia material that can be semantically interpreted in real time becomes critical. In the field of 3D architectural survey, a significant amount of research has been conducted to allow domain experts represent semantic data while keeping spatial references. Su...
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
Knowing where to park in advance is a most wished feature by many drivers. In recent years, many research efforts have been spent to analyse massive amount of parking information, to learn availability trends and thus to predict, within a Parking Guidance and Information (PGI) system, where there is the highest chance to find free parking spaces. T...
The present paper reports on the advantages of using graph databases in the development of dynamic language models in Spoken Language Understanding applications, such as spoken dialogue systems. First of all, we introduce Neo4J graph databases and, specifically, MultiWordNet-Extended, a graph representing linguistic knowledge. After this first over...
Ubiquitous computing is extending its applications to an increasing number of domains. “Monolithic” approaches use centralised systems, controlling devices and users’ requests. A different solution can be found in works proposing “distributed” intelligent devices that communicate, without a central reasoner, creating little communities to support t...
La collana pubblica gli atti del convegno annuale di Linguistica Computazionale (CLiC-it), che ha lo scopo di costituire un luogo di discussione di riferimento nel campo delle ricerce sulla linguistica computazionale. Gli atti includono interventi sul trattamento automatico della lingua, comprendenti le riflessioni teoriche e metodologiche sul tema...
The success of software applications, in a worldwide setup offering simple development and distribution models, is often determined by the quality and ease of use of provided interfaces. In this paper, we present a framework for multimodal signal analysis operating in conjunction with any other Android application to estimate the cognitive load imp...
This paper presents a CAVE-like architecture to support the interaction for small groups of people with a leader in a multi-projection environment in the unusual condition where a vertical depth camera records people and their movements. In this framework, modelling people as gaussians, we localise and track people when they step into a defined are...
We propose a method for syllabic stress annotation which does not require manual labels for the learning process, but uses stress labels automatically generated from a multiscale model of rhythm perception. The model outputs a sequence of events, corresponding to the sequences of strong-weak syllables present in speech, based on which a stressed/un...
Prosodic prominence is an umbrella term encompassing various related but conceptually and functionally different phenomena such as phonological stress, paralinguistic emphasis, lexical, syntactic, semantic or pragmatic salience, to mention a few. Due to the high interest prominence has received from various disciplines, it has been studied from mul...
In this paper, we present a further step in the development of an emotion tracking system based on phonetic syllables and machine learning algorithms. A system built on phonetically defined units has advantages both on the side of the amount of data needed to train the classifier and on the ability of improving our knowledge about how humans use sp...
In the present work we take into account the needs of real-time systems to give an estimate of the emotional content of an utterance during its production rather than waiting for it to be completed. The potential impact of this approach on the design of affective computing systems is also analysed. Past works have shown the importance of syllables...
English. In this report, we describe the EVALITA 2014 Emotion Recognition Task (ERT). Specifically, we describe the datasets, the evaluation procedure and we summarize the results obtained by the proposed systems. On this basis we provide our view on the current state of emotion recognition systems for Italian, whose development appears to be sever...
English. In this report, we describe the EVALITA 2014 Emotion Recognition Task (ERT). Specifically, we describe the datasets, the evaluation procedure and we summarize the results obtained by the proposed systems. On this basis we provide our view on the current state of emotion recognition systems for Italian, whose development appears to be sever...
As research on the extraction of acoustic properties of speech for emotion recognition progresses, the need of investigating methods of feature extraction taking into account the necessities of real time processing systems becomes more important. Past works have shown the importance of syllables for the transmission of emotions, while classical res...
In this work we present a new version of our previously published Optimal Stylization (OpS) algorithm for pitch stylization. Here we give a better perceptual representation of the pitch curve for linguistics research. While the OpS algorithm produced good stylizations for naive listeners, when deployed in a prosodic analysis tool, we observed that,...
In this Forced Alignment on Children Speech (FACS) task, systems are required to align audio sequences of children read spoken sentences to the provided relative transcriptions, and the task has to be considered speaker independent.
Evaluating human machine interaction in the case of multimodal systems is often a difficult task involving the monitoring of multiple sources, data fusion and results interpretation. While subtasks are highly dependent on the specific goal of the application and on the available interaction modalities, it is possible to formalize this workflow into...
Automatic pitch stylization is an important resource for researchers working both on prosody and speech technologies. In order to be useful, the stylized F0 curve should contain the fewest possible number of control points while remaining, at the same time, close to the original curve from a perceptual point of view. Here, a pitch stylization algor...
Forced alignment both for words and phones is a challenging and interesting task for automatic speech processing systems because the difficulties introduced by natural speech are many and hard to deal with. Furthermore, forced alignment approaches have been tested on Italian just in a few studies. In this task, the main goal was to evaluate the per...
Past works have shown the importance of syllables for the transmission of emotions while classical research methods adopted in prosody show that it is important to concentrate on specific areas of the speech signal to study intonation phenomena. Technological approaches, however, are often designed to use the whole speech signal without taking into...
From a cognitive point of view, personality perception corresponds to capturing individual differences and can be thought of as positioning the people around us in an ideal personality space. The more similar the personality of two individuals, the closer their position in the space. This work shows that the mutual position of two individuals in th...
In this paper, we propose a human-robot interaction system that exploits emotion and attention to regulate and adapt the robotic interactive behavior. In particular, we will focus on the relation between arousal, predictability, and attentional allocation considering as a case study a robotic manipulator interacting with a human operator. We rely o...
In this paper we extend a multimodal framework based on speech and gestures to include emotional information by means of anger detection. In recent years multimodal interaction has become of great interest thanks to the increasing availability of mobile devices allowing a number of different interaction modalities. Taking intelligent decisions is a...
We present OpS, a divide et impera algorithm to address the problem of pitch stylization as an optimization process in O(NlogN). We aim at balancing the quality of the stylized curve and its cost in terms of the number of control points used. We also investigate how the occurrence of prominent syllables can be exploited to obtain less expensive sty...
In this paper we will investigate the usefulness of the rhythmogram, a speech rhythm representation based on the Auditory Primal Sketch model, for the automatic detection of prominent syllables. This representation was compared to other features usually used for this task and it showed a higher performance in the identification of prominent/non-pro...
Nonverbal behaviour influences to a significant extent our perception of others, especially during the earliest stages of an interaction. This article considers the phenomenon in two zero acquaintance scenarios: the first is the attribution of personality traits to speakers we listen to for the first time, the second is the social attractiveness of...
In this paper we introduce the €motion database, a multilingual emotional database consisting of emotional sentences elicited in four European languages: Italian, French, English and German. Along with this, a new set of features, containing both global and local prosodic features, for automatic classification of emotions is presented and their app...
In this paper we explore the usefulness of prosodic features for syllable classification. In order to do this, we represent the syllable as a static analysis unit such that its acoustic-temporal dynamics could be merged into a set of features that the SVM classifier will consider as a whole. In the first part of our experiment we used MFCC as featu...
In this paper a non-supervised approach for automatic syllable prominence recognition is presented. Previous research in this field showed that syllable nuclei energy and duration are the main cues for prominence detection. The role of the fundamental frequency has also been investigated in the past but was considered secondary or irrelevant for th...
Presentiamo qui una procedura automatica (RateEstimator) per il calcolo dello speech rate mediante un algoritmo di ricerca di nuclei sillabici a partire dal profilo energetico del segnale. La ricerca di nuclei sillabici a partire dall’individuazione dei picchi nel profilo dell’intensità del segnale da cui ricavare la misura dello speech rate rappre...
In this paper we propose a formal approach to the generative design of artefacts. The founding idea, bridging the gap between
the domain of architectural artefacts and the field of ontologies, is to represent the notion of species as it exists in the
context of generative design by the concept of class existing in the field of formal ontologies. In...
In this report we present the system proposed and the results obtained by our group for the Evalita 2009 Connected Digits Recognition task. The recognition system uses the syllable as base unit. In a first stage, the continuous speech sequence is divided in syllable-like units using an energy-based algorithm. Then, the obtained syllables are passed...