Marti A Hearst

Marti A Hearst
University of California, Berkeley | UCB · School of Information

About

90
Publications
10,540
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,979
Citations
Citations since 2016
22 Research Items
3315 Citations
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500

Publications

Publications (90)
Preprint
Full-text available
Researchers are expected to keep up with an immense literature, yet often find it prohibitively time-consuming to do so. This paper explores how intelligent agents can help scaffold in-situ information seeking across scientific papers. Specifically, we present Scim, an AI-augmented reading interface designed to help researchers skim papers by autom...
Preprint
Full-text available
Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses fo...
Preprint
Full-text available
When seeking information not covered in patient-friendly documents, like medical pamphlets, healthcare consumers may turn to the research literature. Reading medical papers, however, can be a challenging experience. To improve access to medical papers, we introduce a novel interactive interface-Paper Plain-with four features powered by natural lang...
Preprint
News podcasts are a popular medium to stay informed and dive deep into news topics. Today, most podcasts are handcrafted by professionals. In this work, we advance the state-of-the-art in automatically generated podcasts, making use of recent advances in natural language processing and text-to-speech technology. We present NewsPod, an automatically...
Article
Full-text available
In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suf...
Preprint
In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suf...
Preprint
The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8%, a state-of-the-art. We argue that this outstanding performance is unlikely to lead to a good...
Preprint
This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity. We train the model with a novel algorithm to optimize the reward (k-SCST), in which the model proposes several candidate simplifications, computes each candidate's re...
Preprint
The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in real-world applications. In this paper, we first perform in-depth error analysis of...
Preprint
Despite the central importance of research papers to scientific progress, they can be difficult to read. Comprehension is often stymied when the information needed to understand a passage resides somewhere else: in another section, or in another paper. In this work, we envision how interfaces can bring definitions of technical terms and symbols to...
Preprint
Full-text available
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of information makes it hard for researchers to keep track of their own research area, let alone explore new directions. Standard search engines are desig...
Conference Paper
Programmers frequently learn from examples produced and shared by other programmers. However, it can be challenging and time-consuming to produce concise, working code examples. We conducted a formative study where 12 participants made examples based on their own code. This revealed a key hurdle: making meaningful simplifications without introducin...
Conference Paper
Programmers frequently turn to the web to solve problems and find example code. For the sake of brevity, the snippets in online instructions often gloss over the syntax of languages like CSS selectors and Unix commands. Programmers must compensate by consulting external documentation. In this paper, we propose language-specific routines called Tuto...
Conference Paper
This research investigates how to introduce synchronous interactive peer learning into an online setting appropriate both for crowdworkers (learning new tasks) and students in massive online courses (learning course material). We present an interaction framework in which groups of learners are formed on demand and then proceed through a sequence of...
Conference Paper
For an instructor who is teaching a massive open online course (MOOC), what is the best way to understand their class? What is the best way to view how the students are interacting with the content while the course is running? To help prepare for the next iteration, how should the course's data be best analyzed after the fact? How do these instruct...
Conference Paper
We study effects of introducing a real-time chatroom into a massive open online course with several thousand students, supplementing an existing forum. The chatroom was supported by teaching assistants, and generated thousands of lines of discussion by 28\% of 681 consenting chat condition participants, mostly on-topic. Despite this, chat activity...
Conference Paper
Massive open online courses (MOOCs) rely primarily on discussion forums for interaction among students. We investigate how forum design affects student activity and learning outcomes through a field experiment with 1101 participants on the edX platform. We introduce a reputation system, which gives students points for making useful posts. We show t...
Conference Paper
The organizers of this workshop have issued a call for help, stating that one challenge in using knowledge and semantic annotations in search is that "standard text search excels at shallow information needs expressed by short keyword queries, and so semantic annotation contributes very little". I suggest that the answer is right there in the quest...
Article
Full-text available
We study the problem of semantic interpretation of noun compounds such as bee honey, malaria mosquito, apple cake, and stem cell. In particular, we explore the potential of using predicates that make explicit the hidden relation that holds between the nouns that form the noun compound. For example, mosquito that carries malaria is a paraphrase of t...
Article
Full-text available
Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full...
Data
Supplement Material 1: Excel file with the manual annotation data. Supplement Material 2: Log file listing all the data used for the automatic annotation. Supplement Material 3: Annotation counts for all categories used in the automatic analysis.
Article
Full-text available
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine, a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displaye...
Article
This paper reports on the results of two questionnaires asking biologists about the incorporation of text-extracted entity information, specifically gene and protein names, into bioscience literature search user interfaces. Among the findings are that study participants want to see gene/protein metadata in combination with organism information; tha...
Conference Paper
Full-text available
We present a simple linguistically-motivated method for characterizing the semantic rela- tions that hold between two nouns. The ap- proach leverages the vast size of the Web in order to build lexically-specific features. The main idea is to look for verbs, preposi- tions, and coordinating conjunctions that can help make explicit the hidden relatio...
Article
Full-text available
The BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature. One novel feature is the ability to search and browse article figures and their captions. A grid view juxtaposes many different figures associated with the same keywords, providing new insight into the li...
Article
This paper presents the results of a pilot us- ability study of a novel approach to search user interfaces for bioscience journal arti- cles. The main idea is to support search over figure captions explicitly, and show the cor- responding figures directly within the search results. Participants in a pilot study ex- pressed surprise at the idea, not...
Conference Paper
Full-text available
For the WMT 2007 shared task, the UC Berkeley team employed three techniques of interest. First, we used monolingual syntactic paraphrases to provide syntactic variety to the source training set sentences. Second, we trained two language models: a small in-domain model and a large out-of-domain model. Finally, we made use of results from prior rese...
Conference Paper
Full-text available
We present a novel, simple, unsupervised method for char- acterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover predicates that make explicit the hidden relations between the nouns. This is accomplished by writing Web search engine queries that restate the noun compound as a rela- tive clause...
Conference Paper
We address the problem of multi-way re- lation classification, applied to identifica- tion of the interactions between proteins in bioscience text. A major impediment to such work is the acquisition of appro- priately labeled training data; for our ex- periments we have identified a database that serves as a proxy for training data. We use two grap...
Conference Paper
Full-text available
Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of sur- face features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precisi...
Article
Full-text available
Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have...
Conference Paper
A crucial step toward the goal of au- tomatic extraction of propositional in- formation from natural language text is the identification of semantic relations between constituents in sentences. We examine the problem of distinguishing among seven relation types that can oc- cur between the entities "treatment" and "disease" in bioscience text, and...
Conference Paper
Full-text available
The BioText group participated in the two main tasks of the TREC 2004 Genomics track. Our approach to the ad hoc task was similar to the one used in the 2003 Genomics track, but due to the lack of training data, we did not achieve the high scores of the previous year. The most novel aspect of our submission for the categorization task centers aroun...
Article
The volume of biomedical text is growing at a fast rate, creating challenges for humans and computer systems alike. One of these challenges arises from the frequent use of novel abbreviations in these texts, thus requiring that biomedical lexical ontologies be continually updated. In this paper we show that the problem of identifying abbreviations'...
Conference Paper
Full-text available
There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata...
Article
In many types of technical texts, meaning is embedded in noun compounds. A language understanding program needs to be able to interpret these in order to ascertain sentence meaning.
Article
prerequisite for many natural language processing tasks, including part-ofspeech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that...
Article
Using quantitative measures of the informational, navigational, and graphical aspects of a Web site, a quality checker aims to help nonprofessional designers improve their Web sites. As part of the WebTango project, we explore automated approaches for helping designers improve their sites. Our goal is to create an interactive tool that helps steer...
Article
The physical context of architectural design includes large workspaces, typically drafting tables covered with piles of images and sketches. We are investigating if and how a large computerized workspace can be integrated usefully into such a design environment. To this end, we compared a large computerized desktop (digital desk) to a standard desk...
Article
We are creating an interactive tool to help non-professional web site builders create high quality designs. We have previously reported that quantitative measures of web page structure can predict whether a site will be highly or poorly rated by experts, with accuracies ranging from 67--80%. In this paper we extend that work in several ways. First,...
Article
Usability evaluation is an increasingly important part of the user interface design process. However, usability evaluation can be expensive in terms of time and human resources, and automation is therefore a promising way to augment existing approaches. This article presents an extensive survey of usability evaluation methods, organized according t...
Article
A quantitative analysis of a large collection of expert-rated web sites reveals that page-level metrics can accurately predict if a site will be highly rated. The analysis also provides empirical evidence that important metrics, including page composition, page formatting, and overall page characteristics, differ among web site categories such as e...
Article
We present preliminary findings of a quantitative analysis of several attributes of Web page layout and composition and their relation to usability. We compared Web sites that have been favorably rated by experts with those that have not been rated, and found that 6 out of 12 measured attributes were significantly associated with highly rated sites...
Article
Usability evaluation is an increasingly important part of the user interface design process. However, usability evaluation can be expensive in terms of time and human resources, and automation is therefore a promising way to augment existing approaches. This article presents an extensive survey of usability evaluation methods, organized according t...
Article
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothes...
Article
An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a technique, called Scatter/Gather, for the navigation of large collections of retrieved documents. Scatter/Gather clusters the documents into semantica...
Article
Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-ofspeech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have de...
Article
Qualitative results are presented of a user study comparing a large digital desk with stylus, a digital tablet with stylus, and a standard monitor and mouse. Participants split preferences over the desk and the tablet for a sketching task, but generally preferred the monitor and mouse for an image sorting task. Participants did not object to the re...
Conference Paper
Qualitative results are presented of a user study comparing a large digital desk with stylus, a digital tablet with stylus, and a standard monitor and mouse. Participants split preferences over the desk and the tablet for a sketching task, but generally preferred the monitor and mouse for an image sorting task. Participants did not object to the re...
Article
The article comprises two sections. In the first, the author suggests that speculative markets are a neglected way to help Us find out what people know. Such markets pool the information that is known to diverse individuals into a common resource, and have many advantages over standard institutions for information aggregation, such as news media, p...
Article
this article presents an efficient, trainable system for sentence boundary disambiguation. The system, called Satz, makes simple estimates of the parts of speech of the tokens immediately preceding and following each punctuation mark, and uses these estimates as input to a machine learning algorithm that then classifies the punctuation mark. Satz i...
Article
Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-ofspeech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have de...
Article
Full-text available
Despite the Web's current disorganized and anarchic state, many AI researchers believe that it will become the world's largest knowledge base. We examine a line of research whose final goal is to make disparate data sources work together to better serve users' information needs. This work is known as information integration. The authors talk about...
Article
Full-text available
My first exposure to Support Vector Machines came this spring when heard Sue Dumais present impressive results on text categorization using this analysis technique. This issue's collection of essays should help familiarize our readers with this interesting new racehorse in the Machine Learning stable. Bernhard Scholkopf, in an introductory overview...
Article
Full-text available
this article cross-cut a number of the usual layers of digital library architectures including retrieval engines, access protocols, intermediate services, and user inter38 Rose Kearns Xerox Allaire Hicks Love Magnin Reid Kobayashi Adams Chamber Howard 3 Vairo Montgomery Merritt Stevens Hajia Rosner Galvin Biele Lovell Bardsley Rosner Senter Sachdav...
Article
this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.
Article
Trends & Controversies is a regular feature of IEEE Expert that focuses on current topics related to intelligent systems. This is the first issue in which I am acting as editor. Craig Knoblock, my predecessor, has given me an excellent example to follow. I have also been fortunate to inherit the aesthetically pleasing cartooning skills of Kevin Kni...
Article
Digital property rights take into account the practices and uses of documents and their newly mutable forms, and attempt to satisfy the needs of publishers and users of published works. The technological base rests on the idea of trusted systems, that is, ``a computer system that can be relied upon to respect the rules governing the use of a digita...
Article
Full-text available
An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a technique, called Scatter/Gather, for the navigation of large collections of retrieved documents. Scatter/Gather clusters the documents into semantica...
Article
I suggest that artificial intelligence be presented as a broad, interdisciplinary, humanistic area of study taught through several courses. I suggest avoiding problem sets, examinations, and other elements of engineering pedagogy; I advocate curriculum ...
Conference Paper
When learning classifiers, more extensive search for rules is shown to lead to lower predictive accuracy on many of the leal-world domains investigated. This counter-intuitive re suit is particularly relevant to recent system the search methods that ...
Article
In this paper, we discuss mixed-media access, an information access paradigm for multimedia data in which the media type of a query may differ from that of the data. The types of media considered in this paper are speech, images of text, and full-length text. Some examples of metadata for mixed-media access are locations of keywords in speech and i...
Conference Paper
An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a technique, called Scatter/Gather, for the navigation of large collections of retrieved documents. Scatter/Gather clusters the documents into semantica...
Article
Full-text available
We propose the use of the text of the sen-tences surrounding citations as an impor-tant tool for semantic interpretation of bio-science text. We hypothesize several dif-ferent uses of citation sentences (which we call citances), including the creation of training and testing data for seman-tic analysis (especially for entity and re-lation recogniti...
Article
Usability evaluation is an increasingly important part of the iterative design process. It is also a major bottleneck. Auto-mated usability evaluation is one way to mitigate this bot-tleneeck. We present an extensive survey of existing auto-mated methods for evaluating WIMP and Web interfaces. Our survey shows that while potentially beneficial, aut...
Article
To gain insight about new methods for automated us-ability assessment, we systematically compare methods for usability evaluation to those of performance evalu-ation. We find that the major challenges include au-tomatically generating high-level usage traces, mapping these high-level traces into UI operations, and simulat-ing UI behavior across a r...
Article
Key Trends in Technology Use What does the future hold for search user interfaces? The familiar web search interface of today works well for mil-lions of people and billions of queries a year. Very few in-novations in search interfaces gain wide enough acceptance to replace the standard type-keywords-in-entry-form, view-results-in-a-vertical-result...
Article
One of the most pressing usability issues in the design of web sites is that of how to improve navigation and search. We are conducting a series of usability studies to address this problem, focusing on web sites that con-sist of large collections of loosely organized information. This article describes our method and presents prelim-inary results...

Network

Cited By