Gabriela Ferraro’s research while affiliated with Australian National University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (39)


The conceptual schema of HateNet (on gray background) and t-HateNet (whole schema)—our transfer learning architectures. (HateNet) The text of tweets is processed through two pre-trained units (shown in blue background), and three units whose parameters are trainable end-to-end via back-propagation. The output of the chain is the hate prediction for the input text. (t-HateNet) The tweets in two data sets (the red and the blue data set) are processed through a shared pipeline (pre-processing, ELMo, bi-LSTM and max-pooling) and a task-specific component (the Hate classification). The tweet representation space constructed at the output of the max-pooling unit is adequate for both learning tasks
The Map of Hate constructed on the Davidson data set (a, b) and the Waseem data set (c, d), using the tweet embeddings generated by HateNet (a, c) and t-HateNet (b, d). Note that the axes of the Map of Hate are synthetic and not interpretable
The Map of Hate constructed by t-HateNet jointly on 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\%$$\end{document} of the Davidson and Waseem data sets. Crosses and circles show incorrectly and correctly predicted examples, respectively. (Interactive version available online at https://bit.ly/39OVBgX). a All the six classes in the two data sets. b The Harmless classes of the two data sets overlap. c The hateful classes: Hate and Offensive (Davidson), Racism and Sexism (Waseem)
Warning: this figure contains real-world examples of offensive language! Automatically highlighting offensive terms. It is not possible to illustrate the performance of the technique without quoting actual examples of speech classified as hate speech and to that end we include this figure. Examples of six tweets from the Davidson and Waseem data sets, together with their predicted and observed hate category. The top three tweets are hateful (sexist, offensive and racist respectively) and their category was correctly predicted. The bottom three tweets are harmless, but they were incorrectly predicted as hateful. The color map shows how many times a word’s representation was selected for the tweet representation, normalized by the size of the embedding (here 512)
a Prediction performances on two data sets: Davidson and Waseem. Boxplot summarizing macro-F1 score for each data set, and each approach (baselines, HateNet, t-HateNet). Red diamonds and values indicate mean F1. Each box consists of 10 independent runs. b Prediction performances with limited amounts of training data on Waseem datasets. The x-axis shows the percentage of the training set used for training, the y-axis shows the macro-F1 measure. Each bar shows the mean value over 10 runs, and the standard deviation
Transfer learning for hate speech detection in social media
  • Article
  • Full-text available

October 2023

·

238 Reads

·

62 Citations

Journal of Computational Social Science

Lanqin Yuan

·

Tianyu Wang

·

Gabriela Ferraro

·

[...]

·

Today, the internet is an integral part of our daily lives, enabling people to be more connected than ever before. However, this greater connectivity and access to information increase exposure to harmful content, such as cyber-bullying and cyber-hatred. Models based on machine learning and natural language offer a way to make online platforms safer by identifying hate speech in web text autonomously. However, the main difficulty is annotating a sufficiently large number of examples to train these models. This paper uses a transfer learning technique to leverage two independent datasets jointly and builds a single representation of hate speech. We build an interpretable two-dimensional visualization tool of the constructed hate speech representation—dubbed the Map of Hate—in which multiple datasets can be projected and comparatively analyzed. The hateful content is annotated differently across the two datasets (racist and sexist in one dataset, hateful and offensive in another). However, the common representation successfully projects the harmless class of both datasets into the same space and can be used to uncover labeling errors (false positives). We also show that the joint representation boosts prediction performances when only a limited amount of supervision is available. These methods and insights hold the potential for safer social media and reduce the need to expose human moderators and annotators to distressing online messaging.

Download


Explore BiLSTM-CRF-Based Models for Open Relation Extraction

April 2021

·

153 Reads

Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlapping problems and enhance models' performance. From the evaluation results and comparisons between models, we select the best combination of tagging scheme, word embedder, and BiLSTM-CRF network to achieve an Open RE model with a remarkable extracting ability on multiple-relation sentences.


Figure 1: Exiting a local minimum while continual learning.
Learning to Continually Learn Rapidly from Few and Noisy Data

March 2021

·

53 Reads

Neural networks suffer from catastrophic forgetting and are unable to sequentially learn new tasks without guaranteed stationarity in data distribution. Continual learning could be achieved via replay -- by concurrently training externally stored old data while learning a new task. However, replay becomes less effective when each past task is allocated with less memory. To overcome this difficulty, we supplemented replay mechanics with meta-learning for rapid knowledge acquisition. By employing a meta-learner, which \textit{learns a learning rate per parameter per past task}, we found that base learners produced strong results when less memory was available. Additionally, our approach inherited several meta-learning advantages for continual learning: it demonstrated strong robustness to continually learn under the presence of noises and yielded base learners to higher accuracy in less updates.


Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French

December 2020

·

92 Reads

·

5 Citations

In this paper we present the problem of a noisy lexical taxonomy and suggest two tasks as potential remedies. The first task is to identify and eliminate incorrect hypernymy links, and the second is to repopulate the taxonomy with new relations. The first task consists of revising the entire taxonomy and returning a Boolean for each assertion of hypernymy between two nouns (e.g. brie is a kind of cheese ). The second task consists of recursively producing a chain of hypernyms for a given noun, until the most general node in the taxonomy is reached (e.g. brie → cheese → food → etc.). In order to achieve these goals, we implemented a hybrid hypernym-detection algorithm that incorporates various intuitions, such as syntagmatic, paradigmatic and morphological association measures as well as lexical patterns. We evaluate these algorithms individually and collectively and report findings in Spanish, English and French.


Lightme: analysing language in internet support groups for mental health

October 2020

·

74 Reads

·

4 Citations

Health Information Science and Systems

Background Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution. Methods Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout.com mental health forum for young people. Results When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; (1) posts expressing hopelessness, (2) short posts expressing concise negative emotional responses, (3) long posts expressing variations of emotions, (4) posts expressing dissatisfaction with available health services, (5) posts utilising storytelling, and (6) posts expressing users seeking advice from peers during a crisis. Conclusion It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.


Fig. 2. Normative Rules Generation Framework
Data sets splits and average length of input and output sequences (dash signs '-' indicates 'no data for those cells')
Relation extraction evaluation results using the RegTech test set
Automatic Extraction of Legal Norms: Evaluation of Natural Language Processing Tools

September 2020

·

597 Reads

·

6 Citations

Lecture Notes in Computer Science

Extracting and formalising legal norms from legal documents is a time-consuming and complex procedure. Therefore, the automatic methods that can accelerate this process are in high demand. In this paper, we address two major questions related to this problem: (i) what are the challenges in formalising legal documents into a machine understandable formalism? (ii) to what extent can the data-driven state-of-the-art approaches developed in the Natural Language Processing (NLP) community be used to automate the normative mining process. The results of our experiments indicate that NLP technologies such as relation extraction and semantic parsing are promising research avenues to advance research in this area.


Figure 1: Neural optimiser applicability
Figure 3: A comparison between the LSTM-optimiser and MTL2L
Figure 4: Meta-testing on the same single domain as meta-training
Figure 5: Meta-tested on an unseen Modified Cifar10 dataset
MTL2L: A Context Aware Neural Optimiser

July 2020

·

71 Reads

Learning to learn (L2L) trains a meta-learner to assist the learning of a task-specific base learner. Previously, it was shown that a meta-learner could learn the direct rules to update learner parameters; and that the learnt neural optimiser updated learners more rapidly than handcrafted gradient-descent methods. However, we demonstrate that previous neural optimisers were limited to update learners on one designated dataset. In order to address input-domain heterogeneity, we introduce Multi-Task Learning to Learn (MTL2L), a context aware neural optimiser which self-modifies its optimisation rules based on input data. We show that MTL2L is capable of updating learners to classify on data of an unseen input-domain at the meta-testing phase.


Severity label descriptions and examples in the Reachout dataset.
Feature set used for triage classification with the Reachout dataset. '*' indicates a lexicon that have been tested in the previous studies (see Table 3).
Lightme: Analysing Language in Internet Support Groups for Mental Health

July 2020

·

46 Reads

Background: Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution. Methods: Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout.com mental health forum for young people. Results: When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; 1) posts expressing hopelessness, 2) short posts expressing concise negative emotional responses, 3) long posts expressing variations of emotions, 4) posts expressing dissatisfaction with available health services, 5) posts utilising storytelling, and 6) posts expressing users seeking advice from peers during a crisis. Conclusion: It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.



Citations (31)


... Recent studies have introduced a novel approach using transfer learning models for hate speech detection. Yuan et al. [48] demonstrated that transfer learning significantly improved detection accuracy compared to traditional methods. Masud and Chakraborty [49] conducted research with the goal of measuring the power balance between governing and the opposition parties by linking observed online patterns with real occurrences, categorizing political attacks as distinct form of offense. ...

Reference:

A Multi-Architecture Approach for Offensive Language Identification Combining Classical Natural Language Processing and BERT-Variant Models
Transfer learning for hate speech detection in social media

Journal of Computational Social Science

... Thus, new categories cannot be handled by a model with static architecture. In some CL approaches [49], [50], the network is usually constructed by two modules: a feature extractor and a classifier. Currently, to handle new classes, many works assign a new classifier to the network when a new class emerges [51], [46]. ...

Plastic and Stable Gated Classifiers for Continual Learning
  • Citing Conference Paper
  • June 2021

... This may indicate that the current Health Gym GAN requires further fine-tuning to fully capture the complexity of a dataset consisting of multiple inter-connected categorical variables. For instance, the recurrent components of the Health Gym GAN could potentially benefit from existing work on network simplification 87 . ...

An Input Residual Connection for Simplifying Gated Recurrent Neural Networks
  • Citing Conference Paper
  • July 2020

... In this scenario, researchers have found that the writings posted by individuals on social media platforms are valuable evidence for looking for early signs of depression [7][8][9][10][11][12]. Individuals experiencing depression find comfort in expressing their thoughts and emotions on these platforms, motivated by factors such as privacy or anonymity [13,14]. Consequently, social media provides a complementary opportunity to access valuable information about individuals' state of mind beyond traditional professional therapy. ...

Lightme: analysing language in internet support groups for mental health

Health Information Science and Systems

... In larger scales, the encoding process may face severe scalability issues and turn out to be a potential bottleneck for efficient large scale reasoning. Automating this process with the help of efficient natural language processing tools is an open research problem; there are several examples of preliminary results in literature [25,28,15,77,55]. ...

Automatic Extraction of Legal Norms: Evaluation of Natural Language Processing Tools

Lecture Notes in Computer Science

... Learning rates β are the meta-learning target of MetaSGD, and hence a unique learning rate is learnt for every parameter to enable important features to be updated more quickly. In their paper, showed that MetaSGD achieved strong results for few shot learning; and Kuo et al. (2020) showed that MetaSGD could converged base learners much more rapidly than SGD 4 . We will now simplify our notation by denoting ∇ θ L t+1 (θ t ) as ∇L. ...

M 2 SGD: Learning to Learn Important Weights
  • Citing Conference Paper
  • June 2020

... In the social media realm, many academics have investigated hate speech detection and suggested different methods to identify it, with particular emphasis on the English language [9], [10]. Detecting hate speech in Arabic media, on the other hand, is still a developing field. ...

Transfer Learning for Hate Speech Detection in Social Media

... Bu çalışma ile birlikte Wipo-alpha veri seti, patent veri araştırmaları dünyasına katıldı. Bu çalışmayı örnek alarak ilerleyen yıllarda birçok farklı çalışma yapılmıştır [25,39,40]. Aiolli vd. ...

Linking Patents to Knowledge Sources: A Context Matching Technique using Automatic Patent Classification
  • Citing Conference Paper
  • December 2018

... Such information can enhance design activities throughout the design process, including market research, problem formulation, detailed design, and testing (Vasantha et al., 2017). Yet, despite their potential utility, patent documents are regarded as difficult to read compared to other documents (Verberne et al., 2010;Brügmann et al., 2015;Suominen et al., 2018;Casola and Lavelli, 2022). This has been attributed to the linguistic characteristics of patents, such as long sentences and the use of novel terms (Mille and Wanner, 2007;Verberne et al., 2010), complex syntactic structure (Verberne et al., 2010), and complex linguistic style (Brügmann et al., 2015). ...

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

... Due to the capability of extracting useful information and benefiting many NLP applications (e.g., information retrieval (Fetahu et al., 2021;Guo et al., 2009) and question answering (Longpre et al., 2021)), NER appeals to many researchers (Jiang et al., 2021;Feng et al., 2018;Kim et al., 2015;Lee et al., 2018;Qu et al., 2016;Rodriguez et al., 2018;Wang et al., 2018;Zhang et al., 2021b;Yang et al., 2017;Yang and Katiyar, 2020;Fei et al., 2021). Recently, to reduce the huge cost of annotating data, researchers start to explore crossdomain NER methods. ...

Named Entity Recognition for Novel Types by Transfer Learning
  • Citing Conference Paper
  • January 2016