Johanna Monti

Johanna Monti
University of Naples "L'Orientale" | IUO · Dipartimento di Studi Letterari, Linguistici e Comparati

PhD

About

87
Publications
21,072
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
536
Citations
Introduction
Johanna Monti is Associate Professor of Modern Languages Teaching at the "L'Orientale"University of Naples. She was the Computational Linguistics Research manager of the Thamus Consortium (Italy). She received her PhD in Computational Linguistics at the University of Salerno, Italy. Her research activities are in the field of hybrid approaches to Machine Translation and NLP applications.
Additional affiliations
May 2016 - May 2020
University of Naples "L'Orientale"
Position
  • Professor (Associate)

Publications

Publications (87)
Chapter
The investigation of phraseology through corpus-based and computational approaches holds significant relevance for various professionals, including translators, interpreters, terminologists, lexicographers, language instructors, and learners. Computational Phraseology, and in particular the computational analysis of multiword expressions (also know...
Conference Paper
Terminology translation plays a significant role in domain-specific machine translation. However, some knowledge domains and languages still suffer from the lack of high-quality machine translation results due to the mistranslation of terminology. This is the case in the legal domain and the Arabic language. Most machine translation systems fail in...
Article
The lack of annotated datasets affects the development of Natural Language Processing applications and heavily impacts the access to textual data, in particular for specific domains and specific languages. In this paper, we propose a methodology to annotate texts concerning domain-specific knowledge, to provide a reliable source of data for the tas...
Article
Full-text available
Learning idiomatic expressions is seen as one of the most challenging stages in second-language learning because of their unpredictable meaning. A similar situation holds for their identification within natural language processing applications such as machine translation and parsing. The lack of high-quality usage samples exacerbates this challenge...
Preprint
Full-text available
Languages differ in terms of the absence or presence of gender features, the number of gender classes and whether and where gender features are explicitly marked. These cross-linguistic differences can lead to ambiguities that are difficult to resolve, especially for sentence-level MT systems. The identification of ambiguity and its subsequent reso...
Chapter
Inspired by the historical models of artificial and auxiliary languages, Emojitaliano is the result of a social and crowdsourcing experiment which was conducted by a group of seventeen translators, followers of the “Scritture brevi” blog, and led to the creation of an international language based on emojis. The experiment was carried out during 201...
Preprint
Full-text available
This report presents an analysis of #hashtags used by Italian Cultural Heritage institutions to promote and communicate cultural content during the COVID-19 lock-down period in Italy. Several activities to support and engage users' have been proposed using social media. Most of these activities present one or more #hashtags which help to aggregate...
Chapter
Computational Stylometry develops techniques that allow scholars to find out information about authors of texts by means of an automatic stylistic analysis. Indeed, each author’s style is unique, and no two authors are characterized by the same set of stylistic features. Several scholars focus on the analysis of different stylistic features and spe...
Preprint
Learning idiomatic expressions is seen as one of the most challenging stages in second language learning because of their unpredictable meaning. A similar situation holds for their identification within natural language processing applications such as machine translation and parsing. The lack of high-quality usage samples exacerbates this challenge...
Chapter
Terminological resources, invaluable tools for language experts, translators, learners, among others, are widely employed in many applicative scenarios from Machine Translation (MT) to Natural Language Processing (NLP). Automatic terminology extraction from unstructured texts represents a useful, yet non-trivial task, in order to create terminologi...
Experiment Findings
https://uniornlp.carto.com/builder/04f2cca9-08cd-4b9f-90cd-79fc0d93af42/embed A first map with preliminary results on alerts concerning illegal fires posted on Twitter between 2013-2020. This map was built by our research group on a model that is able to discriminate between alert and no-alert tweets on the basis of an annotated subsection of UNI...
Preprint
Full-text available
This report presents an analysis of #hashtags used by Italian Cultural Heritage institutions to promote and communicate cultural content during the COVID-19 lock-down period in Italy. Several activities to support and engage users' have been proposed using social media. Most of these activities present one or more #hashtags which help to aggregate...
Conference Paper
Full-text available
In this paper, we describe UniOr ExpSys team participation in TRAC-2 (Trolling, Aggression and Cyberbullying) shared task, a workshop organized as part of LREC 2020. TRAC-2 shared task is organized in two sub-tasks: Aggression Identification (a 3-way classification between "Overtly Aggressive", "Covertly Aggressive" and "Non-aggressive" text data)...
Conference Paper
Full-text available
In this paper, we present a web service platform for disinformation detection in hotel reviews written in English. The platform relies on a hybrid approach of computational stylometry techniques, machine learning and linguistic rules written using COGITO, Expert System Corp.'s semantic intelligence software thanks to which it is possible to analyze...
Conference Paper
In this paper, we describe a Telegram bot, Mago della Ghigliottina (Ghigliottina Wizard), able to solve La Ghigliottina game (The Guillotine), the final game of the Italian TV quiz show L'Eredità. Our system relies on linguistic resources and artificial intelligence and achieves better results than human players (and competitors of L'Eredità too)....
Preprint
Full-text available
In this paper, we describe UniOr ExpSys team participation in TRAC-2 (Trolling, Aggression and Cyberbullying) shared task, a workshop organized as part of LREC 2020. TRAC-2 shared task is organized in two sub-tasks: Aggression Identification (a 3-way classification between "Overtly Aggressive", "Covertly Aggressive" and "Non-aggressive" text data)...
Preprint
Full-text available
In this paper, we present a web service platform for disinformation detection in hotel reviews written in English. The platform relies on a hybrid approach of computational stylometry techniques, machine learning and linguistic rules written using COGITO , Expert System Corp.'s semantic intelligence software thanks to which it is possible to analyz...
Preprint
Full-text available
In this paper, we describe a Telegram bot, Mago della Ghigliottina (Ghigliottina Wizard), able to solve La Ghigliottina game (The Guillotine), the final game of the Italian TV quiz show L'Eredità. Our system relies on linguistic resources and artificial intelligence and achieves better results than human players (and competitors of L'Eredità too)....
Chapter
This paper describes Il mago della Ghigliottina, a bot which took part in the Ghigliottin-AI task of the Evalita 2020 evaluation campaign. The aim is to build a system able to solve the TV game “La Ghigliottina”. Our system has already participated in the Evalita 2018 task NLP4FUN. Compared to that occasion, it improved its accuracy from 61% to 68....
Chapter
Evaluating Artificial Players for the Language Game “La Ghigliottina” (Ghigliottin-AI) task is one of the tasks organized in the context of the 2020 EVALITA edition, a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. Ghigliottin-AI participants are asked to build an artificial player able...
Chapter
This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evalu...
Chapter
“Spotted" posts represent one of the most popular forms of Computer-mediated Communication (CMC) among university students in Italy, and as such, they represent a privileged context to analyze the Italian language used by students on the Web. This kind of informal communication channels is active especially on Instagram, and provides relevant insig...
Chapter
In this paper, we present the Archaeo-Term Project, along with one of its first efforts in enhancing multilingual access to Archaeological data, making available a resource of Archaeological terms within the framework of YourTerm CULT project. In order to enhance and promote the use of a terminological common ground across different languages the A...
Chapter
This paper presents the results of research carried out on the UNIOR Eye corpus, a corpus which has been built by downloading tweets related to environmental crimes. The corpus is made up of 228,412 tweets organized into four different subsections, each one concerning a specific environmental crime. For the current study we focused on the subsectio...
Conference Paper
Full-text available
In this paper, we show the results of a stylometric analysis conducted on Paul McCartney's interview transcriptions using three different approaches in order to detect differences and similarities in his speeches before and after 9th November 1966, the date of his supposed death. Our research is based on the Let IT Corpus, a corpus of Paul McCartne...
Article
Full-text available
The paper describes the PARSEME-It corpus, developed within the PARSEME-It project which aims at the development of methods, tools and resources for multiword expressions (MWE) processing for the Italian language. The project is a spin-off of a larger multilingual project for more than 20 languages from several language families, namely the PARSEME...
Conference Paper
Full-text available
The MUMTTT workshop will be held on the last day of the Europhras'2019 conference, namely on 27th September 2019. It will provide a forum for researchers and practitioners in the fields of (Computational) Linguistics, (Computational) Phraseology, Translation Studies and Translation Technology to discuss recent advances in the area of multi-word uni...
Conference Paper
Full-text available
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F-Measure for gender and age detection in cyberbully...
Presentation
Full-text available
Con questo contributo mostriamo come la stilometria computazionale può essere utile per individuare genere ed età dell'autore di un testo contenente cyberbullismo, grazie alla sinergia con l'intelligenza artificiale e ad un approccio basato su regole. Si riportano i risultati di una sperimentazione compiuta in occasione della manifestazione Futuro...
Article
Full-text available
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F-Measure for gender and age detection in cyberbully...
Conference Paper
Full-text available
The paper describes UNIOR4NLP a system developed to solve "La Ghigliottina" game which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign. The system is the best performing one in the competition and achieves better results than human players.
Conference Paper
Full-text available
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Conference Paper
ll contributo descrive il sistemaUNIOR4NLP, sviluppato per risolvere il gioco “La Ghigliottina”, che ha partecipato alla sfida NLP4FUN della campagna di valutazione Evalita 2018. Il sistema risulta il migliore della competizione e ha prestazioni più elevate rispetto agli umani.
Book
Full-text available
Multiword expressions (MWEs) are known as a “pain in the neck” due to their idiosyncratic behaviour. While some categories of MWEs have been largely studied, verbal MWEs (VMWEs) such as to take a walk, to break one’s heart or to turn off have been relatively rarely modelled. We describe an initiative meant to bring about substantial progress in und...
Conference Paper
Full-text available
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multi-word expressions. We present the annotation methodology, focusing on changes from last year's shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation...
Chapter
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
Full-text available
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
Full-text available
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
Conference Paper
This paper describes a new language resource annotated with verbal multiword expressions (VMWEs) in Italian. The paper discusses the state of the art in VMWE identification and annotation in Italian, the methodology adopted, the various VMWE categories annotated, the corpus and the annotation process. Finally, the paper ends with results, conclusio...
Conference Paper
Full-text available
Topics of Interest The MUMTTT 2017 workshop invites the submission of papers reporting on original and unpublished research on topics related to MWU processing in machine translation and translation technology, including:  Lexical, syntactic, semantic and translational aspects in MWU representation  Theoretical approaches to MWUs (e.g., collostru...
Article
Full-text available
Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial...
Chapter
La collana pubblica gli atti del convegno annuale di Linguistica Computazionale (CLiC-it), che ha lo scopo di costituire un luogo di discussione di riferimento nel campo delle ricerce sulla linguistica computazionale. Gli atti includono interventi sul trattamento automatico della lingua, comprendenti le riflessioni teoriche e metodologiche sul tema...
Book
Full-text available
This volume documents the proceedings of the 2nd Workshop on Multi-word Units in Machine Translation and Translation Technology (MUMTTT 2015), held on 1-2 July 2015 as part of the EUROPHRAS 2015 conference: "Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives" (Málaga, 29 June – 1 July 2015). The works...
Conference Paper
Full-text available
This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multi-word datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult t...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Conference Paper
Full-text available
English. The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT tra...
Conference Paper
Recent studies have highlighted that Multiword Units (MWU) Translation by Machine Translation (MT) is still an open challenge, whatever is the adopted approach (statistical, rule-based or example-based). The difficulties in translating automatically this recurrent, complex and varied lexical phenomenon originate from its lexical, syntactic, semanti...
Conference Paper
Full-text available
Following the success of the MT SUMMIT 2013 Workshop on Multi-word Units in Machine Translation and Translation Technology, we are announcing the 2015 edition to be held in conjunction with the Europhras conference on Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives (Malaga, Spain, 29 June – 1 July...
Book
NooJ is a linguistic development environment that provides tools for linguists to construct linguistic resources that formalise a large gamut of linguistic phenomena: typography, orthography, lexicons for simple words, multiword units and discontinuous expressions, inflectional and derivational morphology, local, structural and transformational syn...
Conference Paper
Full-text available
In the history of translation, since the classical age, the munus interpretum (the task of the translator) proposed by Cicero has to be considered as the departure point of the definition of the notion of equivalence. According to this point of view, the translator does not have to translate from a linguistic system to another, but he needs to refo...
Chapter
Full-text available
CLiC-it 2015 is held in Trento on December 3-4 2015, hosted and locally organized by Fondazione Bruno Kessler (FBK), one the most important Italian research centers for what concerns CL. The organization of the conference is the result of a fruitful conjoint effort of different research groups (Università di Torino, Università di Roma Tor Vergata a...
Conference Paper
Full-text available
This paper presents a systematic human evaluation of translations of English support verb constructions produced by a rule-based machine translation (RBMT) system (OpenLogos) and a statistical machine translation (SMT) system (Google Translate) for five languages: French, German, Italian, Portuguese and Spanish. We classify support verb constructio...
Article
This paper aims to outline the current trends that are contributing to a rapid development of translation technologies by promoting their wide dissemination among translation professionals and internauts, i.e. cloud computing technologies that offer ubiquitous access to digital content and multi-language translation tools within online collaborativ...
Article
Two emerging phenomena of the internet, crowdsourcing, the exploitation of a community/group of people to perform tasks normally performed by employees and cloud computing, which allows users ubiquitous access to services and online tools for translation and multilingual digital content, have been widely adopted in the field of Machine Translation...
Article
Crowdsourcing and cloud computing have been widely adopted in the field of Machine Translation and Computer Aided Translation in the last fifteen years. They are also more and more used for the development and maintenance of lexical and terminological resources. This paper aims to outline the state of the art of these two emerging phenomena of the...
Conference Paper
In the last years important initiatives, like the development of the European Library and Europeana, aim to increase the availability of cultural content from various types of providers and institutions. The accessibility to these resources requires the development of environments which allow both to manage multilingual complexity and to preserve t...
Conference Paper
This paper describes a computational linguistics-based approach for providing interoperability between multi-lingual systems in order to overcome crucial issues like cross-language and cross-collection retrieval. Our proposal is a system which improves capabilities of language-technology-based information extraction. In the last few years various t...
Conference Paper
Full-text available
This paper addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct rea...
Conference Paper
Full-text available
Machine Translation (MT) has evolved along with different types of computer-assisted translation tools and a notable progress has been achieved in improving the quality of translations. However, in spite of the recent positive developments in translation technologies, not all problems have been solved and in particular the identification, interpret...
Conference Paper
Extracting relevant information in multilingual context from massive amounts of unstructured, structured and semi-structured data is a challenging task. Various theories have been developed and applied to ease the access to multicultural and multilingual resources. This papers describes a methodology for the development of an ontology-based Cross-L...
Conference Paper
One of the most relevant problems with Information Retrieval (IR) softwares is the correct processing of complex lexical units, today also known as multiword units. The shortcomings are mainly due to the fact that such units are often considered as extemporaneous combinations of words retrievable by means of statistical routines. On the contrary, s...
Article
Full-text available
This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar ap...
Article
Full-text available
With the rapid evolution of the Internet, translation has become part of the daily life of ordinary users, not only of professional translators. Machine translation has evolved along with different types of computer-assisted translation tools. Qualitative progress has been made in the field of machine translation, but not all problems have been sol...
Article
Full-text available
Although a vast amount of contents and knowledge has been made available in electronic format and on the web in recent years, translators still do not have friendly and targeted tools at their disposal for the various aspects of a translation process, i.e., the analysis phase, automatic creation and management of the linguistic resources needed and...

Network

Cited By