Felice Dell'Orletta

Felice Dell'Orletta
Italian National Research Council | CNR · Institute of Computational Linguistics "Antonio Zampolli" ILC

PhD

About

175
Publications
31,973
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,404
Citations
Introduction
Felice is a researcher at Institute of Computational Linguistics "Antonio Zampolli" (ILC) of CNR of Pisa and he is head of the ItaliaNLP Lab (www.italianlp.it) at ILC. His research interests are mainly on probabilistic models of language and natural language processing, including part-of-speech tagging, dependency parsing, information extraction, readability assessment, native language identification, stylometry and, more in general, his research are focused on machine-learning algorithms applied to NLP and advanced techniques for multilingual text analysis. He has supervised theses of undergraduate, graduate and PhD students in different areas of Computational Linguistics and since 2010 he has been teaching Computational Linguistics courses at the University of Pisa.
Additional affiliations
September 2011 - present
Italian National Research Council
Position
  • Researcher

Publications

Publications (175)
Preprint
AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech t...
Preprint
Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer...
Preprint
Full-text available
Large Language Models (LLMs) are increasingly used as "content farm" models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles,...
Preprint
Automatic methods for generating and gathering linguistic data have proven effective for fine-tuning Language Models (LMs) in languages less resourced than English. Still, while there has been emphasis on data quantity, less attention has been given to its quality. In this work, we investigate the impact of human intervention on machine-generated d...
Conference Paper
The healthcare industry is experiencing an unprecedented era of transformation, driven by the proliferation of Electronic Health Records (EHRs) and the emergence of vast amounts of natural language data from sources like social media. This dynamic landscape presents novel opportunities for companies, healthcare practitioners, and researchers, under...
Article
Full-text available
This paper presents a study based on the linguistic profiling methodology to explore the relationship between the linguistic structure of a text and how it is perceived in terms of writing quality by humans. The approach is tested on a selection of Italian L1 learners essays, which were taken from a larger longitudinal corpus of essays written by I...
Article
Purpose The authors’ goal is to investigate variations in the writing style of book reviews published on different social reading platforms and referring to books of different genres, which enables acquiring insights into communication strategies adopted by readers to share their reading experiences. Design/methodology/approach The authors propose...
Article
Full-text available
Introduction Language is usually considered the social vehicle of thought in intersubjective communications. However, the relationship between language and high-order cognition seems to evade this canonical and unidirectional description (ie, the notion of language as a simple means of thought communication). In recent years, clinical high at-risk...
Article
Full-text available
The outstanding performance recently reached by neural language models (NLMs) across many natural language processing (NLP) tasks has steered the debate towards understanding whether NLMs implicitly learn linguistic competence. Probes, i.e., supervised models trained using NLM representations to predict linguistic properties, are frequently adopted...
Article
Full-text available
The act of lying and its detection have raised interest in many fields, from the legal system to our daily lives. Considering that testimonies are commonly based on linguistic parameters, natural language processing, a research field concerned with programming computers to process and analyse natural language texts or speech, is a topic of interest...
Preprint
Full-text available
On the quest for interpreting deep Neural Language Models (NLMs), linguistically annotated probing sets are essential tools for investigating models' linguistic abilities. Such a line of research is currently dominated by dependency-based syntactic annotation formalisms, and particularly by Universal Dependencies treebanks. In this work, we test wh...
Article
Full-text available
Natural Language Processing (NLP) is a discipline at the intersection between Computer Science (CS), Artificial Intelligence (AI), and Linguistics that leverages unstructured human-interpretable (natural) language text. In recent years, it gained momentum also in health-related applications and research. Although preliminary, studies concerning Low...
Chapter
In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we conducted an error analysis of the main weaknesses of the...
Article
Full-text available
In this paper, we present an in-depth investigation of the linguistic knowledge encoded by the transformer models currently available for the Italian language. In particular, we investigate how the complexity of two different architectures of probing models affects the performance of the Transformers in encoding a wide spectrum of linguistic featur...
Preprint
Transformer-based language models are known to display anisotropic behavior: the token embeddings are not homogeneously spread in space, but rather accumulate along certain directions. A related recent finding is the outlier phenomenon: the parameters in the final element of Transformer layers that consistently have unusual magnitude in the same di...
Article
Full-text available
In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which respect complex and simple texts vary and whether and...
Chapter
The paper illustrates the results of a first experiment in which Natural Language Processing was used to support the revision of a children’s dictionary, in particular for what concerns style and wording of definitions and the enrichment of the list of lemmas. The results achieved are promising and demonstrate the potential of a synergy to be stren...
Article
In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behaviour of one of the most prominent Neural Language Model (NLM) based on Transformer architectures, BERT [15], when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting gramma...
Article
Full-text available
In recent years, the explainable artificial intelligence (XAI) paradigm is gaining wide research interest. The natural language processing (NLP) community is also approaching the shift of paradigm: building a suite of models that provide an explanation of the decision on some main task, without affecting the performances. It is not an easy job for...
Conference Paper
In recent years, the paradigm of eXplainable Artificial Intelligence (XAI) systems has gained wide research interest and beyond. The Natural Language Processing (NLP) community is also approaching this new way of understanding AI applications: building a suite of models that provide an explanation for the decision, without affecting performance. Th...
Article
Full-text available
In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students’ essays contained in CItA (Corpus I...
Conference Paper
In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre-processed th...
Preprint
Full-text available
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method...
Article
Full-text available
The paper illustrates a novel methodology meeting a twofold goal, namely quantifying the reliability of automatically generated dependency relations without using gold data on the one hand, and identifying which are the linguistic constructions negatively affecting the parser performance on the other hand. These represent objectives typically inves...
Article
Tenders or technical terms contain a large quantity of both technical, legal, managerial information mixed in a nested and complex net of relationships. Extracting technical and design information from a document whose aim is both legal and technical, and that is written using several specific jargons, is not a trivial task: the purpose of the rese...
Preprint
Full-text available
In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different level...
Article
Introduction: The term pro-ana (pro-anorexia) means the spread of restrictive eating behaviors and anorectic advices in virtual spaces written by teenagers. The purpose of this pilot study consists in a qualitative and quantitative analysis of foods contained in a linguistic corpus made up of users' comments on pro-ana websites. Method: The corp...
Conference Paper
Full-text available
The ongoing phenomenon of digitisation is changing social and work life, with tangible effects on the socioeconomic context. Understanding the impact, opportunities, and threats of digital transformation requires the identification of viewpoints from a large diversity of stake-holders, from policy makers to domain experts, and from engineers to com...
Article
Full-text available
Moving from the assumption that formal, rather than content features, can be used to detect differences and similarities among textual genres and registers, this paper presents a new approach to linguistic profiling – a well-established methodological framework to study language variation – which is applied to detect significant variations within t...
Preprint
In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto's quality by...
Chapter
The Topic, Age, and Gender (TAG-it) prediction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets we...
Chapter
Full-text available
In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we. linguistic pre–processed t...
Chapter
In this paper we present an in-depth investigation of the linguistic knowledge encoded by the transformer models currently available for the Italian language. In particular, we investigate whether and how using different architectures of probing models affects the performance of Italian transformers in encoding a wide spectrum of linguistic feature...
Article
Contrary to what happens in forecasting, in which the repetitive nature of events lends itself to the ex post validation of expert judgments, it is usually very difficult to compare directly the forecast of technology foresight studies with realized outcomes. When the comparison is feasible, therefore, there is large opportunity for learning and me...
Article
Full-text available
Background Cancer cells are characterized by chromosomal instability (CIN) and it is thought that errors in pathways involved in faithful chromosome segregation play a pivotal role in the genesis of CIN. Cohesin forms a large protein ring that binds DNA strands by encircling them. In addition to this central role in chromosome segregation, cohesin...