Catherine Havasi's research while affiliated with Massachusetts Institute of Technology and other places

Publications (58)

Preprint
Full-text available
Retrofitting is a technique used to move word vectors closer together or further apart in their space to reflect their relationships in a Knowledge Base (KB). However, retrofitting only works on concepts that are present in that KB. RetroGAN uses a pair of Generative Adversarial Networks (GANs) to learn a one-to-one mapping between concepts and the...
Preprint
Full-text available
In recent years, transformer-based language models have achieved state of the art performance in various NLP benchmarks. These models are able to extract mostly distributional information with some semantics from unstructured text, however it has proven challenging to integrate structured information, such as knowledge graphs into these models. We...
Conference Paper
Crowdsourcing common sense training data was born twenty years ago. It began with the idea to "harness the power of bored people on the Internet" to collect "what everyone knows but no one writes down". This was an era when we were all just starting to learn how to search the web, before people learned the dismal art of keywords, they tried typing...
Article
Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and...
Article
Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and...
Article
Full-text available
Languages differ in how they package the components of an event into words to form sentences. For example, while some languages typically encode the manner of motion in the verb (e.g., running), others more often use verbs that encode the path (e.g., ascending). Prior research has demonstrated that children and adults have lexicalization biases; th...
Conference Paper
The Narratarium Colorizer device receives either keyboard input or speech recognition input and uses natural language processing to extract key terms. The terms are queried for in a knowledge base of words and associated colors, created by leveraging the Open Mind Common Sense database and ConceptNet. The system outputs a continually changing color...
Article
The guest editors introduce novel statistical approaches to concept-level sentiment analysis that go beyond a mere syntactic-driven analysis of text and provide semantic-based methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.
Article
The distillation of knowledge from the Web—also known as opinion mining and sentiment analysis—is a task that has recently raised growing interest for purposes such as customer service, predicting financial markets, monitoring public security, investigating elections, and measuring a health-related quality of life. This article considers past, pres...
Article
Full-text available
The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.
Article
ConceptNet is a knowledge representation project, providing a large semantic graph that describes general human knowledge and how it is expressed in natural language. Here we present the latest iteration, ConceptNet 5, with a focus on its fundamental design decisions and ways to interoperate with it.
Article
This editorial introduction describes the aims and scope of the special issue on Common Sense for Interactive Systems of the ACM Transactions on Interactive Intelligent Systems. It explains why the common sense knowledge problem is crucial for both artificial intelligence and human-computer interaction, and it shows how the four articles selected f...
Article
Cyberbullying (harassment on social networks) is widely recognized as a serious social problem, especially for adolescents. It is as much a threat to the viability of online social networks for youth today as spam once was to email in the early days of the Internet. Current work to tackle this problem has involved social and psychological studies o...
Article
Full-text available
In a world in which millions of people express their opinions about commercial products in blogs, wikis, fora, chats and social networks, the distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand o...
Article
Most organizations have a wealth of knowledge about themselves available online, but little for a visitor to interact with on-site. At the MIT Media Lab, we have designed and deployed a novel intelligent signage system, the Glass Infrastructure (GI), that enables small groups of users to physically interact through a touch screen display with this...
Article
Web 2.0 has changed the ways people communicate, collaborate, and express their opinions and sentiments. But despite social data on the Web being perfectly suitable for human consumption, they remain hardly accessible to machines. To bridge the cognitive and affective gap between word-level natural language data and the concept-level sentiments con...
Article
Most organizations have a wealth of knowledge about themselves available online, but little for a visitor to interact with on-site. At the MIT Media Lab, we have designed and deployed a novel intelligent signage system, the Glass Infrastructure (GI) that enables small groups of users to physically interact with this data and to discover the latent...
Article
Full-text available
The Association for the Advancement of Artificial Intelligence (AAAI) presented the 2010 Fall Symposium Series on November 11-13, 2010. The eight symposia included Cognitive and Metacognitive Educational Systems, Commonsense Knowledge, Complex Adaptive Systems: Resilience, Robustness, and Evolvability, Computational Models of Narrative, Dialog with...
Conference Paper
The results of 2,256 neuroimaging experiments were analyzed using singular value decomposition (SVD) and non-negative matrix factorization (NMF) to extract patterns in the data. To evaluate the techniques ’ efficacy at capturing regularities in the data, one positive and one negative result from each of 100 random experiments were treated as missin...
Conference Paper
Most organizations have a wealth of knowledge about themselves available online, but little for a visitor to interact with on-site. At the MIT Media Lab, we have designed and deployed a novel intelligent signage system, the Glass Infrastructure (GI), that enables small groups of users to interact physically through a touch-screen display with this...
Conference Paper
Next-generation patients are far from being peripheral to health-care. They are central to understanding the effectiveness and efficiency of services and how they can be improved. Today a lot of patients are used to reviewing local health services on-line but this social information is just stored in natural language text and it is not machine-acce...
Conference Paper
In AI, we often need to make sense of data that can be measured in many different dimensions -- thousands of dimensions or more -- especially when this data represents natural language semantics. Dimensionality reduction techniques can make this kind of data more understandable and more powerful, by projecting the data into a space of many fewer di...
Conference Paper
In a world in which millions of people express their feelings and opinions about any issue in blogs, wikis, fora, chats and social networks, the distillation of knowledge from this huge amount of unstructured information is a challenging task. In this work we build a knowledge base which merges common sense and affective knowledge and visualize it...
Article
We present Luminoso, a tool that helps researchers to visualize and understand a dimensionality-reduced semantic space based on textual information by exploring it interactively. It streamlines the process of creating such a space by taking input from a directory of text documents, and optionally including common-sense background information. This...
Article
Today millions of web-users express their opinions about many topics through blogs, wikis, fora, chats and social networks. For sectors such as e-commerce and e-tourism, it is very useful to automatically analyze the huge amount of social information available on the Web, but the extremely unstructured nature of these contents makes it a difficult...
Article
Full-text available
Online patient opinions are a very important instrument for the eective evaluation of local hospitals, hospices and men- tal health services but the distillation of knowledge from this unstructured information remains a dicult and com- plex task. Within this paper we aim to eectively mine and analyze this social information to make a comprehensive...
Conference Paper
What is Common Sense Computing? And why is it so important for the technological evolution of humankind? This paper presents an overview of past, present and future efforts of the AI community to give computers the capacity for Common Sense reasoning, from Minsky’s Society of Mind to Media Laboratory’s Digital Intuition theory, and beyond. Is it ac...
Article
Full-text available
Understanding the world we live in requires access to a large amount of background knowledge: the commonsense knowledge that most people have and most computer systems don't. Many of the limitations of artificial intelligence today relate to the problem of acquiring and understanding common sense. The Open Mind Common Sense project began to collect...
Conference Paper
Words mean different things to different people, and capturing these differences is often a subtle art. These differences are often “a matter of perspective”. Perspective can be taken to be the set of beliefs held by a person as a result of their background, culture, tastes, and experience. But how can we represent perspective computationally? In t...
Conference Paper
Full-text available
We present a game-based interface for acquiring common sense knowledge. In addition to being interactive and en- tertaining, our interface guides the knowledge acquisition process to learn about the most salient characteristics of a particular concept. We use statistical classification methods to discover the most informative characteristics in the...
Conference Paper
In order to be helpful to people, the intelligent interfaces of the future will have to acquire, represent, and infer simple knowledge about everyday life and activities. While much work in AI has represented this knowledge at the word, sentence, and logical assertion level, we see a growing need to understand it at a larger granularity, that of st...
Conference Paper
Emotions are a fundamental component in human experience, cognition, perception, learning and communication. In this paper we explore how the use of Common Sense Computing can significantly enhance computers’ emotional intelligence i.e. their capability of perceiving and expressing emotions, to allow machines to make more human-like decisions and i...
Article
Full-text available
The detection of emotions in text is a key issue for the de-velopment of intelligent systems. As demonstrated by the Turing test, a machine cannot be considered really intelligent unless it is also capable of perceiving and expressing emotions. In this work we focus on building a knowledge base which merges Common Sense and affective knowledge and...
Conference Paper
We are interested in the problem of reasoning over very large common sense knowledge bases. When such a knowledge base contains noisy and subjective data, it is important to have a method for making rough conclusions based on similarities and tendencies, rather than absolute truth. We present Analogy Space, which accomplishes this by forming the an...
Conference Paper
We present an overview of the workshop on Common Sense Knowledge and Goal-Oriented Interfaces held at the 2008 Intelligent User Interfaces conference. Six papers were accepted from diverse research groups, each offering innovative new research on interfaces that incorporate common sense knowledge and that are oriented around the goals of their user...
Article
The Open Mind Common Sense project has been collecting common-sense knowledge from volun-teers on the Internet since 2000. This knowledge is represented in a machine-interpretable seman-tic network called ConceptNet. We present ConceptNet 3, which improves the acquisition of new knowledge in ConceptNet and facilitates turning edges of the network b...
Conference Paper
There is a mutually beneficial relationship between user interfaces and common sense reasoning and acquisition. Common sense knowledge enables interfaces to better understand and to be more grounded in the world of the user, thus improving the user's overall experience with the interface. This would not be possible without large sources of common s...
Article
Full-text available
In this paper we consider the problem of identifying and classifying discourse co-herence relations. We report initial re-sults over the recently released Discourse GraphBank (Wolf and Gibson, 2005). Our approach considers, and determines the contributions of, a variety of syntactic and lexico-semantic features. We achieve 81% accuracy on the task...
Conference Paper
Full-text available
Recent work in lexical resource construction has recog- nized the importance of contextualizing the knowledge in existing resources and ontologies with information derived from text corpora. This paper describes the in- tegration of a corpus-based lexical acquisition process with a large, linguistically motivated lexical ontology. This semi-automat...
Article
Full-text available
In this paper we describe the structure and development of the Brandeis Semantic Ontology (BSO), a large generative lexicon ontology and lexical database. The BSO has been designed to allow for more widespread access to Generative Lexicon-based lexical resources and help researchers in a variety of computational tasks. The specification of the type...
Article
The expression of motion verbs differs between languages. The path of motion, such as crossing or entering, is more promi-nently featured in path-based languages such as Spanish than in manner-based languages such as English. Here, we revisit the data from a study on manner and path biases in verb lexi-calization (Havasi & Snedeker, 2004), and crea...
Article
Full-text available
Natural language processing researchers currently have access to a wealth of information about words and word senses. This presents problems as well as resources, as it is often difficult to search through and coordinate lexical information across various data sources. We have approached this problem by creating a shared environment for various lex...
Article
Full-text available
The Brandeis Semantic Ontology (BSO) is a new English resource in the generative lexicon tradition. Although still in development, the BSO is a sizable resource containing both a type system and a network of qualia relations. In this paper, we demonstrate that the BSO tends to contain correct qualia relationships by matching the current progress on...
Article
—Singular value decomposition (SVD) is a powerful technique for finding similarities and patterns in large data sets. SVD has applications in text analysis, bioinformatics, and recommender systems, and in particular was used in many of the top entries to the Netflix Challenge. It can also help generalize and learn from knowledge represented in a sp...
Article
Understanding language in any form requires understanding connections among words, concepts, phrases and thoughts. Many of the problems faced today in artificial intelligence depend in some manner on understanding this network of relationships that represents the facts that each of us knows about the world and how words relate to one another. Resea...
Article
In this paper, we describe the motivation behind and implementation of GIOMI: Game for Interactive OpenMind Improvement. GIOMI is designed to draw upon the internet community in order to rate assertions in the OpenMind database and thereby improve the quality of the database. To encourage user participation in this project, GIOMI contains a number...

Citations

... People have created several large-scale commonsense knowledge bases to store relational knowledge about objects in structured triples (Speer et al., 2017;Ji et al., 2021;Nayak et al., 2021), such as (person, riding, horse) and (plant, on, windowsill). Intuitively, relational triples in commonsense knowledge bases store expected prior relations between objects, which can provide useful disembodied learning signals for relation detectors. ...
... Liang et al. [38] regard the ground object as a node, use the detector to detect the object, and then define the adjacent relationship between nodes according to the spatial distance between entities. Lin et al. [39] constructed a concept map for the whole label set by using human knowledge in conceptnet [40], and then fused the two global feature vectors generated by CNN and GNN. In this method, the feature elements of GCN output and CNN output are multiplied, and then output to the final classifier for scene classification. ...
... The Glass Infrastructure [10] allows the discovery of latent connections between people, projects, and ideas in an organisation. The system uses spectral association to generate a "semantic space" from project information, where closeness in the space signifies similarity of people, projects and ideas. ...
... Widely used commonsense knowledge graphs (CKGs) include ATOMIC [28], ConceptNet [33], etc. Commonsense knowledge is particularly important for ERC, since the colloquial expressions often occur in a conversation, making it difficult for the model to understand the semantics of sentences. Therefore, the CKGs containing abundant commonsense, are leveraged to incorporate such commonsense into the ERC model to improve ERC performance. ...
... Possibly due to limited preprocessing, unlike our work, Concept-Net seems to treat entities such as "hairdryer" and "a hairdryer" differently [56]. The overall difference between Figs. 7(a) and 9(a) is that ConceptNet includes common-sense ontological relationships [57], and our work includes domain-specific technical relationships that are extracted directly from the text. ...
... When creating the Enchanted Wearable system, our objective was to create a portable projection system by attaching a projector inside the dress. For the optics system, we drew inspiration from Hayden and Novy's Narratorium [15]. In this project, a system of mirrors and projectors reflect dynamic interactive animations on the 4 walls of a fabric tent creating an immersive environment called Dream Room. ...
... We noticed that most existing datasets (see Pang et al. 2002;Akhmedova et al. 2018;Poria et al. 2016;Turney 2002;Esuli and Sebastiani 2006;Hu and Liu 2004;Vilares et al. 2015;Zhou and Chaovalit 2008;Cambria et al. 2010Cambria et al. , 2012Balahur et al. 2011;Chifu and Chifu 2019) are either reviews or taken from data banks. However, some works used datasets extracted from social networks such as in Montejo-Ráez et al. (2012), Keyvanpour et al. (2020), Krishna et al. (2018), Meriem et al. (2021) and Cui et al. (2011). ...
... QM9 has been broadly employed in tasks related to molecule synthesis and optimization [3,121,128,132,141,144,153,182]. Similarly, ZINC250K [269] is another dataset for molecule design that contains around 250k commercially available Text English Yelp [225], Amazon [290], Wikipedia [111], STS benchmark (STSb) [291], One Billion Word [292], RottenTomatoes [104], VerbNet [293], Concept-Net [294], Stanford Sentiment Treebank (SST) [295], CAPTIONS [296], Project Gutenberg [297], ViGGO corpus [298], ParaNMT-50M [299], IMDB text corpus [300], TimeBank [301], Facebook politicians [214], EMNLP2017 WMT News [7], OpenWeb-Text [302], ROCStories [113] Chinese ...
... We noticed that most existing datasets (see Pang et al. 2002;Akhmedova et al. 2018;Poria et al. 2016;Turney 2002;Esuli and Sebastiani 2006;Hu and Liu 2004;Vilares et al. 2015;Zhou and Chaovalit 2008;Cambria et al. 2010Cambria et al. , 2012Balahur et al. 2011;Chifu and Chifu 2019) are either reviews or taken from data banks. However, some works used datasets extracted from social networks such as in Montejo-Ráez et al. (2012), Keyvanpour et al. (2020), Krishna et al. (2018), Meriem et al. (2021) and Cui et al. (2011). ...
... First, the relational nature of knowledge calls for connections between disparate results and insights. [15][16][17] Second, the acceleration of scientific activity and attendant accumulation of new results and insights increase the need to connect new and prior knowledge to extend and apply what is learned. 18 Third, different types of knowledge exist and necessarily have dissimilar computer-processable representations. ...