Graeme Hirst

Graeme Hirst
University of Toronto | U of T · Department of Computer Science

PhD, Brown University

About

241
Publications
166,685
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,303
Citations
Additional affiliations
January 1984 - present
University of Toronto
Position
  • Professor (Full)

Publications

Publications (241)
Article
Background: The negative psychosocial impacts of cancer diagnoses and treatments are well documented. Virtual care has become an essential mode of care delivery during the COVID-19 pandemic, and online support groups (OSGs) have been shown to improve accessibility to psychosocial and supportive care. de Souza Institute offers CancerChatCanada, a t...
Preprint
UNSTRUCTURED The recent onset of the COVID-19 pandemic and the social distancing requirement has created a further demand for virtual groups. Advances in artificial intelligence (AI) may offer novel solutions to the management challenges such as the lack of emotional connections within virtual groups’ interventions. Using typed text from Online Sup...
Preprint
Full-text available
We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring exp...
Article
Full-text available
Background Cancer and its treatment can significantly impact the short- and long-term psychological well-being of patients and families. Emotional distress and depressive symptomatology are often associated with poor treatment adherence, reduced quality of life, and higher mortality. Cancer support groups, especially those led by health care profes...
Preprint
Full-text available
Developing moral awareness in intelligent systems has shifted from a topic of philosophical inquiry to a critical and practical issue in artificial intelligence over the past decades. However, automated inference of everyday moral situations remains an under-explored problem. We present a text-based approach that predicts people's intuitive judgmen...
Article
Full-text available
Background Word embeddings are dense numeric vectors used to represent language in neural networks. Until recently, there had been no publicly released embeddings trained on clinical data. Our work is the first to study the privacy implications of releasing these models. Objective This paper aims to demonstrate that traditional word embeddings cre...
Preprint
Background: Cancer and its treatment can significantly impact the short- and long-term psychological well-being of patients and families. Emotional distress and depressive symptomatology are often associated with poor treatment adherence, reduced quality of life, and higher mortality. Cancer support groups, especially those led by health care profe...
Article
Full-text available
Objective: In this work, we introduce a privacy technique for anonymizing clinical notes that guarantees all private health information is secured (including sensitive data, such as family history, that are not adequately covered by current techniques). Materials and methods: We employ a new "random replacement" paradigm (replacing each token in...
Preprint
BACKGROUND Word embeddings are dense numeric vectors used to represent language in neural networks. Until recently, there had been no publicly released embeddings trained on clinical data. Our work is the first to study the privacy implications of releasing these models. OBJECTIVE This paper aims to demonstrate that traditional word embeddings cre...
Preprint
We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora. Our framework is based on the premise that language use can inform people's moral perception toward right or wrong, and we build our methodology by exploring moral biases learned from diachronic word embeddings. We demonstrate how a pa...
Preprint
Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes. However, methods for measuring and removing such biases remain poorly understood. We show that for any embedding model that implicitly does matrix factorization, debiasing vectors post hoc using subspace projection (Bolukbasi et al., 2016) i...
Article
Full-text available
We propose a novel method for enriching word-embeddings without the need of a labeled corpus. Instead, we show that relying on a regressor – trained with a small lexicon to predict pseudo-labels – significantly improves performance over current techniques that rely on human-derived sentence-level labels for an entire corpora. Our approach enables e...
Article
Full-text available
Background: A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person's death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determin...
Preprint
Full-text available
A surprising property of word vectors is that vector algebra can often be used to solve word analogies. However, it is unclear why -- and when -- linear operators correspond to non-linear embedding models such as skip-gram with negative sampling (SGNS). We provide a rigorous explanation of this phenomenon without making the strong assumptions that...
Article
Full-text available
Background: A scoping review to characterize the literature on the use of conversations in social media as a potential source of data for detecting adverse events (AEs) related to health products. Methods: Our specific research questions were (1) What social media listening platforms exist to detect adverse events related to health products, and...
Article
Full-text available
Background: Language is one of the first faculties afflicted by Alzheimer’s disease (AD). A growing body of work has focussed on leveraging automated analysis of speech to accurately predict the onset of AD. Previous work, however, did not address the effects of AD on the structure of discourse in spontaneous speech and literature. Aims: Our goal i...
Article
Full-text available
Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships...
Article
Full-text available
The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied t...
Article
Full-text available
This paper describes the digitization and enrichment of the Canadian House of Commons English Debates from 1901 to present. We start by laying out the general framework in which this project took place and then present the structure of the database and provide guidelines to prospective users. The paper concludes with the introduction of www.lipad.c...
Article
Full-text available
An impressive breadth of interdisciplinary research suggests that emotions have an influence on human behavior. Nonetheless, we still know very little about the emotional states of those actors whose daily decisions have a lasting impact on our societies: politicians in parliament. We address this question by making use of methods of natural langua...
Data
Emotional Polarity in Britain using General-Purpose Lexicons. Alternative measures of emotional polarity computed using three popular sentiment lexicons that are not specific to the domain of parliamentary debates: NRC, OpinionFinder and SentiWordNet. (EPS)
Data
Power Spectral Densities of Emotional Polarity Measures (Log Scale). (EPS)
Data
Emotional Polarity and Economic Indicators in the United Kingdom. Heat map of the five main indicators, illustrating the change from lowest to highest values during the whole time-period. (EPS)
Data
The Effect of Labor Disputes on Emotional Polarity, by Government and Opposition Status. Orthogonalized impulse response functions with bootstrapped error bands computed from bivariate VECMs with unrestricted constants. Yearly models (upper part) are computed with 2 lags in levels and quarterly models (lower part) with 5 lags in levels. The left pa...
Data
Autocorrelation Functions of Emotional Polarity Measures. (EPS)
Data
Supporting Information for “Measuring Emotion in Parliamentary Debates with Automated Textual Analysis”. (PDF)
Conference Paper
We examine whether using frame choices in forum statements can help us identify framing strategies in parliamentary discourse. In this analysis, we show how features based on embedding representations can improve the discovery of various frames in argumentative political speech. Given the complex nature of the parliamentary discourse, the initial r...
Conference Paper
Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been...
Article
Full-text available
Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been...
Conference Paper
Full-text available
Numerous studies have shown that language impairments, particularly semantic deficits, are evident in the narrative speech of people with Alzheimer's disease from the earliest stages of the disease. Here, we present a novel technique for capturing those changes, by comparing distributed word representations constructed from healthy controls and Alz...
Article
Modernist authors such as Virginia Woolf and James Joyce greatly expanded the use of ‘free indirect discourse’, a form of third-person narration that is strongly influenced by the language of a viewpoint character. Unlike traditional approaches to analyzing characterization using common words, such as those based on Burrows (1987), the nature of fr...
Article
This report documents the program and the outcomes of Dagstuhl Seminar 16161 "Natural Language Argumentation: Mining, Processing, and Reasoning over Textual Arguments", 17--22 April 2016. The seminar brought together leading researchers from computational linguistics, argumentation theory and cognitive psychology communities to discuss the obtained...
Chapter
In Macroanalysis (2013), Matthew Jockers provocatively declares that large digitized collections of literary texts have rendered close reading “totally inappropriate as a method of studying literary history.” Hammond, Brooke and Hirst respond by demonstrating the productive interpretive interplay that results when close reading is placed in a “feed...
Conference Paper
Discourse parsing in Portuguese has two critical limitations. The first is that the task has been explored using only symbolic approaches, i.e., Using manually extracted lexical patterns. The second is related to the domain of the lexical patterns, which were extracted through the analysis of a corpus of academic texts, generating many domain-speci...
Article
T. S. Eliot’s poem The Waste Land is a notoriously challenging example of modernist poetry, mixing the independent viewpoints of over ten distinct characters without any clear demarcation of which voice is speaking when. In this work, we apply unsupervised techniques in computational stylistics to distinguish the particular styles of these voices,...
Conference Paper
Full-text available
Automatic analysis of impaired speech for screening or diagnosis is a growing research field; however there are still many barriers to a fully automated approach. When automatic speech recognition is used to obtain the speech transcripts, sentence boundaries must be inserted before most measures of syntactic complexity can be computed. In this pape...
Article
Full-text available
Document enrichment focuses on retrieving relevant knowledge from external resources, which is essential because text is generally replete with gaps. Since conventional work primarily relies on special resources, we instead use triples of Subject, Predicate, Object as knowledge and incorporate distributional semantics to rank them. Our model first...
Article
Full-text available
We propose two improvements on lexical association used in embedding learning: factorizing individual dependency relations and using lexicographic knowledge from monolingual dictionaries. Both proposals provide low-entropy lexical cooccurrence information, and are empirically shown to improve embedding learning by performing notably better than sev...
Chapter
I analyze Berners-Lee, Hendler, and Lassila’s description of the Semantic Web, discussing what it implies for a Multilingual Semantic Web and the barriers that the nature of language itself puts in the way of that vision. Issues raised include the mismatch between natural language lexicons and hierarchical ontologies, the limitations of a purely wr...
Article
Full-text available
Previous attempts at RST-style discourse segmentation typically adopt features centered on a single token to predict whether to insert a boundary before that token. In contrast, we develop a discourse segmenter utilizing a set of pairing features, which are centered on a pair of adjacent tokens in the sentence, by equally taking into account the in...
Conference Paper
Full-text available
We replace the overlap mechanism of the Lesk algorithm with a simple, generalpurpose Naive Bayes model that measures many-to-many association between two sets of random variables. Even with simple probability estimates such as maximum likelihood, the model gains significant improvement over the Lesk algorithm on word sense disambiguation tasks. Wit...
Conference Paper
Full-text available
Text-level discourse parsing remains a challenge. The current state-of-the-art overall accuracy in relation assignment is 55.73%, achieved by Joty et al. (2013). However, their model has a high order of time complexity, and thus cannot be applied in practice. In this work, we develop a much faster model whose time complexity is linear in the number...
Article
We define a model of discourse coherence based on Barzilay and Lapata’s entity grids as a stylometric feature for authorship attribution. Unlike standard lexical and character-level features, it operates at a discourse (cross-sentence) level. We test it against and in combination with standard features on nineteen book-length texts by nine nineteen...
Conference Paper
Full-text available
We use computational techniques to extract a large number of different features from the narrative speech of individuals with primary progressive aphasia (PPA). We examine several different types of features , including part-of-speech, complexity , context-free grammar, fluency, psy-cholinguistic, vocabulary richness, and acoustic, and discuss the...
Conference Paper
Full-text available
We use computational techniques to extract a large number of different features from the narrative speech of individuals with primary progressive aphasia (PPA). We examine several different types of features , including part-of-speech, complexity , context-free grammar, fluency, psy-cholinguistic, vocabulary richness, and acoustic, and discuss the...
Conference Paper
Full-text available
Agrammatic aphasia is a serious language impairment which can occur after a stroke or traumatic brain injury. We present an automatic method for analyzing apha-sic speech using surface level parse features and context-free grammar production rules. Examining these features individually , we show that we can uncover many of the same characteristics...
Article
Full-text available
In argumentative political speech, the way an issue is framed may indicate the unstated assumptions of the argument and hence the ideological position of the speaker. Our goal is to use and extend our prior work on discourse parsing and the identification of argumentation schemes to identify specific instances of issue framing and, more generally,...
Conference Paper
Full-text available
Article
Full-text available
An analysis of linguistic approaches to determining the lexical cohesion in text reveals differ- ences in the types of lexical semantic relations (term relationships) that contribute to the continuity of lexical meaning in the text. Differences were also found in how these lexical relations join words to- gether, sometimes with grammatical relation...
Conference Paper
Full-text available
Interpreting anaphoric shell nouns (ASNs) such as 'this issue' and 'this fact' is essential to understanding virtually any substantial natural language text.One obstacle in developing methods for automatically interpreting ASNs is the lack of annotated data. We tackle this challenge by exploiting cataphoric shell nouns (CSNs) whose construction mak...
Article
Full-text available
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such as antipodals, complementaries, and gradable. Ho...
Article
In preparation for a clinical information system implementation, the Centre for Addiction and Mental Health (CAMH) Clinical Information Transformation project completed multiple preparation steps. An automated process was desired to supplement the onerous task of manual analysis of clinical forms. We used natural language processing (NLP) and machi...
Article
Scientific literature on biodiversity is longevous, but even when legacy publications are available online, researchers often fail to search it adequately or effectively for prior publications; consequently, new research may replicate, or fail to adequately take into account, previously published research. The mechanisms of the Semantic Web and met...
Article
We adopt Koppel et al.'s unmasking approach [5] as the major framework of our authorship verification system. We enrich Koppel et al.'s original word frequency features with a novel set of coherence features, derived from our earlier work [2], together with a full set of stylometric features. For texts written in languages other than English, some...