
Stefano Ferilli- PhD
- Professor (Full) at University of Bari Aldo Moro
Stefano Ferilli
- PhD
- Professor (Full) at University of Bari Aldo Moro
About
390
Publications
59,488
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,616
Citations
Introduction
Current institution
Additional affiliations
October 2006 - present
Publications
Publications (390)
The increasing integration of Artificial Intelligence (AI) in education has led to the development of innovative tools like Intelligent Question-Answering Systems (IQASs), aiming to revolutionize traditional learning paradigms. However, many existing IQAS struggle with the nuances of natural language and the complexities of student questions. This...
This research introduces an artificial intelligence-based strategy for improving symbol recognition within the field of library science, concentrating on the creation and application of sophisticated technological solutions. Consistent with the objectives of the CHANGES project—Cultural Heritage Active Innovation for Sustainable Society, which focu...
The traditional record-based approach to the description of Cultural Heritage is nowadays obsolete. It is unable to properly handle complex descriptions and it cannot support advanced functions provided by Artificial Intelligence techniques for helping practitioners, scholars, researchers and end-users in carrying out their tasks. A graph-based, se...
Graph reachability is the task of understanding whether two distinct points in a graph are interconnected by arcs to which in general a semantic is attached. Reachability has plenty of applications, ranging from motion planning to routing. Improving reachability requires structural knowledge of relations so as to avoid the complexity of traditional...
Ontologies are essential for the management and integration of heterogeneous datasets. This paper presents OntoBuilder, an advanced tool that leverages the structural capabilities of semantic labeled property graphs (SLPGs) in strict alignment with semantic web standards to create a sophisticated framework for data management. We detail OntoBuilder...
Process mining can be applied to systems for the management of Workflow, Business Processes and, in general, Process-Aware Information to discover and analyse implicit processes. In recent times, semantic interoperability has also become of crucial importance in the area of business processes. In particular, interoperability enables the discovery o...
This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, char...
Digital media have enabled the access to unprecedented literary knowledge. Authors, readers, and scholars are now able to discover and share an increasing amount of information about books and their authors. However, these sources of knowledge are fragmented and do not adequately represent non-Western writers and their works. In this paper we prese...
With the progressive improvements in the power, effectiveness, and reliability of AI solutions, more and more critical human problems are being handled by automated AI-based tools and systems. For more complex or particularly critical applications, the level of knowledge, not just information, must be handled by systems where explicit relationships...
(Extended Abstract) While most previous research focused only on the textual content of documents, advanced support for document management in Digital Libraries, for Open Science, requires handling all aspects of a document: from structure, to content, to context. These different but inter-related aspects cannot be handled separately, and were trad...
In the era of big data, linked data interfaces play a critical role in enabling access to and management of large-scale, heterogeneous datasets. This survey investigates forty-seven interfaces developed by the semantic web community in the context of the Web of Linked Data, displaying information about general topics and digital library contents. T...
Digital media have enabled the access to unprecedented literary knowledge. Authors, readers, and scholars are now able to discover and share an increasing amount of information about books and their authors. However, these sources of knowledge are fragmented and do not adequately represent non-Western writers and their works. In this paper we prese...
The increasing scale and pace of the production of digital documents have generated a need for automatic tools to analyze documents and extract underlying concepts and knowledge in order to help humans manage information overload. Specifically, since most information comes in the form of text, natural language processing tools are needed that are a...
A significant part of the current research in the field of Artificial Intelligence is devoted to knowledge bases. New techniques and methodologies are emerging every day for the storage, maintenance and reasoning over knowledge bases. Recently, the most common way of representing knowledge bases is by means of graph structures. More specifically, a...
The pervasive use of AI today caused an urgent need for human-compliant AI approaches and solutions that can explain their behavior and decisions in human-understandable terms, especially in critical domains, so as to enforce trustworthiness and support accountability. The symbolic/logic approach to AI supports this need because it aims at reproduc...
The article provides a factual foundation for the possibility of organizing and implementing e-learning in Ukrainian higher educational institutions during the war. The current research topicality is supported by the urgent need for training experience, organization and implementation during wartime because of the fact that both the educational pro...
The traditional record-based approach used in Digital Libraries has gone as far as it could. To support the needs and activities of different kinds of users, we propose a graph-based ‘holistic’ representation of DL knowledge, describing the documents’ metadata, physical aspects, content, context and lifecycle. We also propose its implementation bas...
While most previous research focused only on the textual content of documents, advanced support for document management in digital libraries, for open science, requires handling all aspects of a document: from structure, to content, to context. These different but inter-related aspects cannot be handled separately and were traditionally ignored in...
The state-of-the-art in Artificial Intelligence, the Internet, and the computational power reached by the current technologies, allow to think of Intelligent Tutoring Systems much more advanced than their original definition. The KEPLAIR project envisages an online platform, designed to help all players involved in the educational endeavors, especi...
Capturing and analyzing interaction data in real-time from development environments can help in understanding how programmers handle coding activities. We propose the use of process mining to learn coding behavior from event logs captured from a customized Integrated Development Environment, concerning interactions with both such an environment and...
Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based approach to anaphora resolution in Eng...
In Probabilistic Abductive Logic Programming we are given a probabilistic logic program, a set of abducible facts, and a set of constraints. Inference in probabilistic abductive logic programs aims to find a subset of the abducible facts that is compatible with the constraints and that maximizes the joint probability of the query and the constraint...
Ontologies, and especially formal ones, have traditionally been investigated as a means to formalize an application domain so as to carry out automated reasoning on it. The union of the terminological part of an ontology and the corresponding assertional part is known as a Knowledge Graph. On the other hand, database technology has often focused on...
Tools for Natural Language Processing work using linguistic resources, that are language-specific. The complexity of building such resources causes many languages to lack them. So, learning them automatically from sample texts would be a desirable solution. This usually requires huge training corpora, which are not available for many local language...
Since 2005 the Italian Research Conference on Digital Libraries is a yearly date for researchers on Digital Libraries and related topics, organized by the Italian Research Community. Over the years, IRCDL has become an essential national forum focused on digital libraries and associated technical, practical, and social issues. IRCDL encompasses the...
In its original definition, the Abstract Argumentation framework considers atomic claims and a binary attack relationship among them, based on which different semantics would select subsets of claims consistently supporting the same position in a dispute or debate. While attack is obviously the core relationship in this setting, in more complex (an...
Natural Language Processing tools use language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is complex, it would be desirable to learn these resources automatically from sample texts. In this paper we focus on stopwords, i.e., terms which are not relevant to understand the topic and conte...
The possibility of inter-relating different information items is crucial in the perspective of enhanced storage, handling and fruition of knowledge. GraphBRAIN is a general-purpose tool that allows to design and collaboratively populate knowledge graphs, and provides advanced solutions for their fruition, consultation and analysis. Its functionalit...
Financial analysts constitute an important element of financial decision-making in stock exchanges throughout the world. By leveraging on argumentative reasoning, we develop a method to predict financial analysts’ recommendations in earnings conference calls (ECCs), an important type of financial communication. We elaborate an analysis to select th...
New technologies for storing and handling knowledge provide unprecedented opportunities for enhanced fruition of digital libraries and archives. Going beyond document retrieval based on lexical content or metadata, using the context of documents, and/or of their content, may provide very new ways to put them in perspective and grasp a deeper unders...
Process Mining and Automated Process Management have become a hot research topic in recent years. While most state-of-the-art approaches were developed for very specific and constrained application domains, where also the available data were limited, the WoMan framework is also able to deal with non-standard application domains (such as human routi...
This book constitutes the thoroughly refereed proceedings of the 16th Italian Research Conference on Digital Libraries, IRCDL 2020, held in Bari, Italy, in January 2020.
The 12 full papers and 6 short papers presented were carefully selected from 26 submissions. The papers are organized in topical sections on information retrieval, bid data and dat...
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian...
Most people believe that their choices are based only on a rational analysis of available alternatives. Actually, emotions affect our choices and are activated as feedback during the decision process. In Recommender Systems research, preference learning is an important element to produce good recommendations. Preferences can be acquired by explicit...
Positive-Unlabeled (PU) learning works by considering a set of positive samples, and a (usually larger) set of unlabeled ones. This challenging setting requires algorithms to cleverly exploit dependencies hidden in the unlabeled data in order to build models able to accurately discriminate between positive and negative samples. We propose to exploi...
Process Management techniques are useful in domains where the availability of a (formal) process model can be leveraged to monitor, supervise, and control a production process. While their classical application is in the business and industrial fields, other domains may profitably exploit Process Management techniques. Some of these domains (e.g.,...
Emotions have an impact to almost all decisions. They affect our choices and are activated as feedback during the decision process. This work aims at investigating whether behavior patterns can be learned and used to predict the user’s choice. Specifically, we focused on pairwise image comparisons in a preference elicitation experiment, and exploit...
Since most content in Digital Libraries and Archives is text, there is an interest in the application of Natural Language Processing (NLP) to extract valuable information from it in order to support various kinds of user activities. Most NLP techniques exploit linguistic resources that are language-specific, costly and error prone to produce manual...
In Abstract Argumentation, the task of modeling and analyzing semantics is a hot problem. An alternative representation of computational models of argument, based on the matrix theory, is proposed, in order to obtain a deeper understanding of extension-based semantics and of ranking semantics too. In this paper, we start from the concept of matrix...
Real-world problems often require purely deductive reasoning to be supported by other techniques that can cope with noise in the form of incomplete and uncertain data. Abductive inference tackles incompleteness by guessing unknown information, provided that it is compliant with given constraints. Probabilistic reasoning tackles uncertainty by weake...
Noisy (uncertain, missing, or inconsistent) information, typical of many real-world domains, may dramatically affect the performance of logic-based Machine Learning. Multistrategy Learning approaches have been tried to solve this problem by coupling Inductive Logic Programming with other kinds of inference. While uncertainty has been tackled using...
Cutset Networks (CNets) are density estimators leveraging context-specific independencies recently introduced to provide exact inference in polynomial time. Learning a CNet is done by firstly building a weighted probabilistic OR tree and then estimating tractable distributions as its leaves. Specifically, selecting an optimal OR split node requires...
Natural Language Processing techniques are of utmost importance for the proper management of Digital Libraries. These techniques are based on language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is costly, time-consuming and error-prone, it would be desirable to learn these resources aut...
Analogy is the cognitive process of matching the characterizing features of two different items. This may enable reuse of knowledge across domains, which can help to solve problems. Indeed, abstracting the ‘role’ of the features away from their specific embodiment in the single items is fundamental to recognize the possibility of an analogical mapp...
Positive-Unlabeled (PU) learning works by considering a set of positive samples, and a (usually larger) set of unlabeled ones. This challenging setting requires algorithms to cleverly exploit dependencies hidden in the unlabeled data in order to build models able to accurately discriminate between positive and negative samples. We propose to exploi...
Natural Language Processing techniques are of utmost importance for the proper management of Digital Libraries. These techniques are based on language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is costly, time-consuming and error-prone, it would be desirable to learn these resources aut...
Sentence-based extractive summarization aims at automatically generating shorter versions of texts by extracting from them the minimal set of sentences that are necessary and sufficient to cover their content. Providing effective solutions to this task would allow the users to save time in selecting the most appropriate documents to read for satisf...
Discussions on social Web platforms carry a lot of information which is more and more difficult to analyze. Given a virtual community of users that discuss a particular topic of interest, an important task is to extract a model of the whole debate in order to automatically evaluate what are the most reliable claims. This paper proposes to approach...
In the phase of evaluation of accepted arguments, one may find that not all the arguments of discussion are essential when drawing conclusions. Especially when the cardinality of the set of arguments is high, the task of identifying the most relevant arguments of the whole discussion in huge Argument Systems through the analysis of its synthesis ma...
This paper aims at studying complex behaviors of search and rescue robots in emergency situations. We used as environment of the simulation NetProLogo in order to: i) build a simulated scenario with robots, humans beings, and emergency exits, ii) endow robots with reasoning rules, and iii) evaluate robots behavior on the basis of two search strateg...
Several high-level tasks in the management of Digital Libraries require the application of Natural Language Processing (NLP) techniques. In turn, most NLP solutions are based on linguistic resources that are costly to produce, and so motivate research for automated ways to build them. In particular, Language Identification is a crucial NLP task, th...
The main objective of this paper is checking whether, and to what extent, advanced process mining techniques can support efficient and effective knowledge discovery in complex domains. This is done on chess playing, cast as a process. A secondary objective is checking whether the discovered information can provide interesting insight in the game ru...
In addition to the classical exploitation of process models for checking process enactment conformance, a very relevant but almost neglected task concerns the prediction of which activities will be carried out next at a given moment during process execution. The outcomes of this task may allow to save time and money by taking suitable actions that...
In addition to the classical exploitation as a means for checking process enactment conformance, process models may be used to predict which activities will be carried out next. The prediction performance may provide indirect indications on the correctness and reliability of a process model. This paper proposes a strategy for activity prediction us...
Computational models of argument aims at engaging argu-mentation-related activities with human users. In the present work we propose a new generalized version of abstract argument system, called Trust-affected Bipolar Weighted Argumentation Framework (T-BWAF). In this framework, two mainly interacting components are exploited to reason about the ac...
Medical diagnosis in general is a hard task, requiring significant skill and expertise. Psychological diagnosis, in particular, is peculiar for several reasons: since the illness is mental rather than physical, no instrumental measurements can be done, more subjectivity is involved in the diagnostic process, and there is more chance of comorbidity....
Computational models of argument aims at engaging argu-mentation-related activities with human users. In the present work we propose a new generalized version of abstract argument system, called Trust-affected Bipolar Weighted Argumentation Framework (T-BWAF). In this framework, two mainly interacting components are exploited to reason about the ac...
While nowadays most newspapers are born-digital (typeset directly in PDF), up to a few years ago they were only available in printed form. Digitizing the paper artifact to make it available in digital libraries yields a sequence of raster images of the pages that make up the documents. Such images consist of just matrices of pixels, and carry no ex...
The possibility for people to leave comments in blogs and forums on the Internet allows to study their attitude (in terms of valence or even of specific feelings) on various topics. For some digital libraries this may be a precious opportunity to understand how their content is perceived by their users and, as a consequence, to suitably direct thei...
When used as an interface in the context of Ambient Assisted Living (AAL), a social robot should not just provide a task-oriented support. It should also try to establish a social empathic relation with the user. To this aim, it is crucial to endow the robot with the capability of recognizing the user?s affective state and reason on it for triggeri...
Several high-level tasks in the management of Digital Libraries require the application of Natural Language Processing (NLP) techniques. In turn, most NLP solutions are based on linguistic resources that are costly to produce, and so motivate research for automated ways to build them. In particular, Language Identication is a crucial NLP task, that...
This book constitutes the refereed proceedings of the 16th International Conference of the Italian Association for Artificial Intelligence, AI*IA 2017, held in Bari, Italy, in November 2017.
The 37 full papers presented were carefully reviewed and selected from 91 submissions. The papers are organized in topical sections on applications of AI; nat...
This book constitutes the thoroughly refereed proceedings of the 12th Italian Research Conference on Digital Libraries, IRCDL 2016, held in Firence, Italy, in February 2016.
The 15 papers presented were carefully selected from 23 submissions and cover topics such as formal methods, long-term preservation, metadata creation, management and curation,...
In addition to the classical exploitation as a means for checking process enactment conformance, process models may be precious for making various kinds of predictions about the process enactment itself (e.g., which activities will be carried out next, or which of a set of candidate processes is actually being executed). These predictions may be mu...
This paper describes our first experience of participation at the EVALITA challenge. We participated only to the SENTIPOLC Sentiment Polarity subtask and, to this purpose we tested two systems, both developed for a generic Text Categorization task, in the context of the sentiment analysis: SentimentWS and SentiPy. Both were developed according to t...
Building a diversied portfolio is an appealing strategy in the analysis of stock market dynamics. It aims at reducing risk in market capital investments. Grouping stocks by similar latent trend can be cast into a clustering problem. The classical K-Means clustering algorithm does not fit the task of financial data analysis. Hence, we investigate No...
Speaker identification can be summarized as the classification task that determines if two voices were spoken by the same person or not. It is a thoroughly studied topic, since it has applications in many fields. One is forensic phonetics, considered very hard since the expert has to face ambient noise, very short recordings, interference, loss of...
Three relevant areas of interest in symbolic Machine Learning are incremental supervised learning, multistrategy learning and predicate invention. In many real-world tasks, new observations may point out the inadequacy of the learned model. In such a case, incremental approaches allow to adjust it, instead of learning a new model from scratch. Spec...
Workflow management is fundamental to efficiently, effectively and economically carry out complex processes. In turn, the formalism used for representing workflow models is crucial for effectiveness. The formalism introduced by the WoMan framework for workflow management, based on First-Order Logic, is more expressive than standard formalisms adopt...
WorkFlow Management Systems provide automatic support to learn process models or to check compliance of process enactment to correct models. The expressive power of the adopted formalism for representing process models is fundamental to determine the effectiveness or even feasibility of a correct model. In particular, a desirable feature is the pos...
Author identification is a hot topic, especially in the Internet age. Following our previous work in which we proposed a novel approach to this problem, based on relational representations that take into account the structure of sentences, here we present a tool that computes and visualizes a numerical and graphical characterization of the authors/...
Analogy is the cognitive process of matching the characterizing features of two different items. This may enable reuse of knowledge across domains, which can be helpful to solve problems. Analogy is strongly related to semantics, because the mappings are based on the role and meaning of the features, which goes beyond simple syntactic association....
The possibility for people to leave comments in blogs and forums on the Internet allows to study their attitude (in terms of va-lence or even of specific feelings) on various topics. For some digital libraries this may be a precious opportunity to understand how their content is perceived by their users and, as a consequence, to suitably direct the...
While nowadays most newspapers are born-digital (typeset directly in PDF), up to a few years ago they were only available in printed form. Digitizing the paper artifact to make it available in digital libraries yields a sequence of raster images of the pages that make up the documents. Such images consist of just matrices of pixels, and carry no ex...
EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for the Italian language: since 2007 shared tasks have been proposed covering the analysis of both written and spoken language with the aim of enhancing the development and dissemination of resources and technologies for Italian. EVALITA is an initiative of the Itali...
Current tools to create OWL-S annotations have been designed starting from the knowledge engineer’s point of view. Unfortunately, the formalisms underlying Semantic Web languages are often obscure to the developers of Web services. To bridge this gap, it is desirable that developers are provided with suitable tools that do not necessarily require k...
Predicate Invention aims at discovering new emerging concepts in a logic theory. Since there is usually a combinatorial explosion of candidate concepts to be invented, only those that are really relevant should be selected, which cannot be done manually due to the huge number of candidates. While purely logical automatic approaches may be too rigid...
This paper proposes an architecture for agents that are in charge of handling a given environment in an Ambient Intelligence context, ensuring suitable contextualized and personalized support to the user’s actions, adaptivity to the user’s peculiarities and to changes over time, and automated management of the environment itself. Functionality invo...
Several studies report successful results on how social assistive robots can be employed as interface in the assisted living domain. In this domain, a natural way to interact with robots is to use a speech. However, humans often use particular intonation in the voice that can change the meaning of the sentence. For this reason, a social assistive r...
The availability on the Internet of huge amounts of blog posts, messages and comments allows to study the attitude of people on various topics. Sentiment Analysis, Opinion Mining and Emotion Analysis denote the area of research in Computer Science aimed at studying, analyzing and classifying text documents based on the underlying opinions expressed...
Several studies report successful results on how social assistive robots can be employed as interface in the assisted living domain. In our opinion, to plan their response and interact successfully with people, it is crucial to recognize human emotions. To this aim, features of the prosody of the speech together with facial expressions and gestures...
Predicate Invention is the branch of symbolic Machine Learning aimed at discovering new emerging concepts in the available knowledge. The outcome of this task may have important consequences on the efficiency and effectiveness of many kinds of exploitation of the available knowledge. Two fundamental problems in Predicate Invention are how to handle...