Susan T. Dumais's research while affiliated with Microsoft and other places

Publications (278)

Conference Paper
Full-text available
Email remains one of the most frequently used means of online communication. People spend significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to r...
Preprint
Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices...
Article
Full-text available
Limited labeled data is becoming one of the largest bottlenecks for supervised learning systems. This is especially the case for many real-world tasks where large scale labeled examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be effective in mitigating the scarcity...
Article
When people communicate with each other, their choice of what to say is tied to their perceptions of the audience. For many communication channels, people have some ability to explicitly specify their audience members and the different roles they can play. While existing accounts of communication behavior have largely focused on how people tailor t...
Preprint
Full-text available
Email remains one of the most frequently used means of online communication. People spend a significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to...
Article
2020 ACM. Personalized document recommendation systems aim to provide users with a quick shortcut to the documents they may want to access next, usually with an explanation about why the document is recommended. Previous work explored various methods for better recommendations and better explanations in different domains. However, there are few eff...
Preprint
Email is an integral part of people's work and life, enabling them to perform activities such as communicating, searching, managing tasks and storing information. Modern email clients take a step forward and help improve users' productivity by automatically creating reminders, tasks or responses. The act of reading is arguably the only activity tha...
Preprint
Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. The growing need for large-scale datasets to train deep learning models has increased its importance. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling b...
Preprint
Full-text available
Limited labeled data is becoming the largest bottleneck for supervised learning systems. This is especially the case for many real-world tasks where large scale annotated examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be a good means to mitigate the scarcity of an...
Conference Paper
Email remains a critical channel for communicating information in both personal and work accounts. The number of emails people receive every day can be overwhelming, which in turn creates challenges for efficient information management and consumption. Having a good estimate of the significance of emails forms the foundation for many downstream tas...
Article
We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000...
Preprint
Full-text available
Email triage involves going through unhandled emails and deciding what to do with them. This familiar process can become increasingly challenging as the number of unhandled email grows. During a triage session, users commonly defer handling emails that they cannot immediately deal with to later. These deferred emails, are often related to tasks tha...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
In this paper, we study how to leverage calendar information to help with email re-finding using a zero-query prototype, Calendar-Aware Proactive Email Recommender System (CAPERS). CAPERS proactively selects and displays potentially useful emails to users based on their upcoming calendar events with a particular focus on meeting preparation. We app...
Article
Cambridge Core - Knowledge Management, Databases and Data Mining - Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven
Article
Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven December 2013
Conference Paper
Full-text available
Complex software applications expose hundreds of commands to users through intricate menu hierarchies. One of the most popular productivity software suites, Microsoft Office, has recently developed functionality that allows users to issue free-form text queries to a search system to quickly find commands they want to execute, retrieve help document...
Conference Paper
The success of information retrieval systems depends critically on both the ability of systems to efficiently and effectively retrieve information, and to support people in articulating their information needs and making sense of the results. This interdisciplinary, user-centered perspective on information systems motivated my early work on Latent...
Conference Paper
Email continues to be one of the most important means of online communication, leading to a number of challenges related to information overload and email management. To better understand email management practices in detail, we examine the distribution of visits to emails over time. During their lifetime, emails may be visited one or more times, a...
Article
We formulate and study search algorithms that consider a user's prior interactions with a wide variety of content to personalize that user's current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user's in...
Conference Paper
Full-text available
Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long term. To enable discovery, a machine learning algorithm typically elicits feedback on items it is...
Conference Paper
Email is still among the most popular online activities. People spend a significant amount of time sending, reading and responding to email in order to communicate with others, manage tasks and archive personal information. Most previous research on email is based on either relatively small data samples from user surveys and interviews, or on consu...
Conference Paper
Email has been a dominant form of communication for many years, and email search is an important problem. In contrast to other search setting, such as web search, there have been few studies of user behavior and models of email search success. Research in email search is challenging for many reasons including the personal and private nature of the...
Conference Paper
As the number of email users and messages continues to grow, search is becoming more important for finding information in personal archives. In spite of its importance, email search is much less studied than web search, particularly using large-scale behavioral log analysis. In this paper we report the results of a large-scale log analysis of email...
Conference Paper
Email has been central to online communication for the past two decades. Through constant use, new information flows are being defined around users' interactions with emails. Alongside traditional messages, the email inbox is an always-available repository of to-do lists, reminders, files and notes. In this paper, we investigate the use of self-add...
Conference Paper
Email continues to be an important form of communication as well as a way to manage tasks and archive personal information. As the volume of email grows, organizing and finding relevant email remains challenging. In this paper, we present a large-scale log analysis of the activities that people perform on email mes-sages (accessing external informa...
Conference Paper
Traditionally search engines have returned the same results to everyone who asks the same question. However, using a single ranking for everyone in every context at every point in time limits how well a search engine can do in providing relevant information. In this talk I present a framework to quantify the "potential for personalization" which we...
Conference Paper
Full-text available
Web search functionality is increasingly integrated into operating systems, software applications, and other interactive environments that extend beyond the traditional web browser. In particular, intelligent virtual assistants (e.g., Microsoft Cortana or Apple Siri) often "fall-back" to generic web search in cases where utterances fall outside the...
Conference Paper
Full-text available
In this paper, we study shortlists as an interface component for recommender systems with the dual goal of supporting the user's decision process, as well as improving implicit feedback elicitation for increased recommendation quality. A shortlist is a temporary list of candidates that the user is currently considering, e.g., a list of a few movies...
Article
Full-text available
In this paper, we study the impact of design choices for recommender systems on one-choice tasks where users want to select one item out of a variety of options. Instead of focusing on only user factors or recommendation quality, we consider how an interface design that provides the user with digital short-term memory impacts both user behavior and...
Conference Paper
Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle is important in improving search systems. We address this important issue using a mixed methods stu...
Conference Paper
Scrolling is an integral part of our everyday computing experience, and many techniques and devices have been developed to enhance scrolling. We have conducted an 18 participant user study to understand how users' gaze position and scrolling strategies are coordinated. Our data showed that people scrolled within preferred reading regions of the scr...
Article
Personalization in computing helps tailor content to a person’s individual tastes. As a result, the tasks that benefit from personalization are inherently subjective. Many of the most robust approaches to personalization rely on large sets of other people’s preferences. However, existing preference data is not always available. In these cases we pr...
Patent
A system that facilitates ranking search results returned by a search engine in response to receipt of a query is described herein. The system includes a receiver component that receives categorical metadata pertaining to an item and categorical metadata pertaining to the query and a computation component that computes at least one of a document fe...
Conference Paper
Over the last decade, the rise of web services has made it possible to gather traces of human behavior in situ at a scale and fidelity previously unimaginable. Large-scale behavioral data enables researchers and practitioners to detect adverse drug reactions and interactions, to understand how information diffuses through social networks, how peopl...
Article
We present methods to automatically identify and recommend sub-tasks to help users explore and accomplish complex search tasks. Although Web searchers often exhibit directed search behaviors such as navigating to a particular Website or locating a particular item of information, many search scenarios involve more complex tasks such as learning abou...
Book
Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results. Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalized results on demand, over personal data, and for comple...
Patent
Full-text available
The claimed subject matter relates to an architecture that can scale a content feed in terms of the amount of content delivered in order to facilitate satisfactory experiences in connection with a social network. In particular, the architecture can utilize a content feed to disseminate content between members of a network community, generally relat...
Article
Gaze tracking technology is increasingly common in desktop, laptop and mobile scenarios. Most previous research on eye gaze patterns during human-computer interaction has been confined to controlled laboratory studies. In this paper we present an in situ study of gaze and mouse coordination as participants went about their normal activities. We ana...
Patent
Full-text available
Personalized navigation for one or more individuals' use of a search engine is provided. Identification of a query submitted to the search engine is performed. If the query is identified to be a personal navigational query, which is a query via which the individuals intend to navigate to a particular site or information object that they have previo...
Article
Significant time and effort has been devoted to reducing the time between query receipt and search engine response, and for good reason. Research suggests that even slightly higher retrieval latency by Web search engines can lead to dramatic decreases in users' perceptions of result quality and engagement with the search results. While users have c...
Patent
Full-text available
Concepts and technologies are described herein for hyperlocal smoothing. The hyperlocal smoothing solutions described herein provide a smooth view of data and events across hyperlocal geographic areas by combining sparse data available with inferred or extrapolated data. Additionally, the hyperlocal smoothing solutions described herein make use of...
Patent
Full-text available
Multiple pieces of information can be arranged into a single construct that allows the employee to ascertain information quickly while at her workstation. Selection of information for placement into the construct can employ various statistical models and the like. Selective pieces of information can be masked for a user's construct based upon acces...
Article
A person often uses a single search engine for very different tasks. For example, an author editing a manuscript may use the same academic search engine to find the latest work on a particular topic or to find the correct citation for a familiar article. The author's tolerance for latency and accuracy may vary according to task. However, search eng...
Article
Over the last two decades the information retrieval landscape has changed dramatically. Twenty years ago, there were fewer than 3k web sites and the earliest web search engines indexed approximately 50k pages. Today, search engines index billions of web pages, images, videos, news, music, social media, books, etc., and have become the main entry po...
Patent
One or more systems and/or techniques are provided for constructing a query classification index that can be used to classify a query into relevant categories. Where documents in an index are classified into one or more category predictions for a category hierarchy, classification metadata is generated for categories to which a document in the inde...
Patent
Full-text available
Providing for generation of a task oriented data structure that can correlate natural language descriptions of computer related tasks to application level commands and functions is described herein. By way of example, a system can include an activity translation component that can receive a natural language description of an application level task....
Patent
Full-text available
A query processing system is described herein for personalizing results for a particular user. The query processing system operates by receiving a query from a particular user u who intends to find results that satisfy the query with respect to a topic Tu, the user being characterized by user information θu. In one implementation, the query process...
Chapter
HCI researchers are increasingly collecting rich behavioral traces of user interactions with online systems in situ at a scale not previously possible. These logs can be used to characterize user interactions with existing systems and compare different designs. Large-scale log studies give rise to new challenges in experimental design, data collect...
Conference Paper
Web searchers often exhibit directed search behaviors such as navigating to a particular Website. However, in many circumstances they exhibit different behaviors that involve issuing many queries and visiting many results. In such cases, it is not clear whether the user's rationale is to intentionally explore the results or whether they are struggl...
Conference Paper
Full-text available
The Internet is the largest source of information in the world. Search engines help people navigate the huge space of avail-able data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engin...
Patent
Full-text available
One or more techniques and/or systems are provided for transitioning between representations of an electronic document. Elements, such as visual elements, common between a first set of elements from a first representation of the document and a second set of elements from a second representation of the document are identified. The non-intersecting e...
Article
Significant time and effort has been devoted to reducing the time between query receipt and search engine response, and for good reason. Research suggests that even slightly higher retrieval latency by Web search engines can lead to dramatic decreases in users' perceptions of result quality and engagement with the search results. While users have c...
Conference Paper
Web content increasingly reflects the current state of the physical and social world, manifested both in traditional news media sources along with user-generated publishing sites such as Twitter, Foursquare, and Facebook. At the same time, web searching increasingly reflects problems grounded in the real world. As a result of this blending of the w...
Conference Paper
Full-text available
Many search engines identify bursts of activity around particular topics and reflect these back to users as Popular Now or Hot Searches. Activity around these topics typically evolves quickly in real-time during the course of a trending event. Users' informational needs when searching for such topics will vary depending on the stage at which they e...
Article
Full-text available
The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query japan spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and...
Patent
The subject disclosure pertains to anonymous network interaction. More specifically, mechanisms are provided to ensure anonymity with respect network interaction such that third parties are unable to determine the source and/or intent of communications. Accordingly, entities may anonymize all outgoing and/or incoming data packets so as to mitigate...
Conference Paper
Synchronous social question-and-answer (Q&A) systems help people find answer by connecting them with others via instant messaging. To understand how such systems can quickly and effectively establish fruitful connections, we analyze conversations collected from a working enterprise social Q&A system. We show that when askers start with underspecifi...
Conference Paper
Full-text available
Forming an accurate mental model of a user is crucial for the qualitative design and evaluation steps of many information-centric applications such as web search, content recommendation, or advertising. This process can often be time-consuming as search and interaction histories become verbose. In this work, we present and analyze the usefulness of...
Patent
Full-text available
The claimed subject matter relates to an architecture that can facilitate creation and management of an event-oriented transient network and can further manage decommission of the transient network. In particular, the architecture can construct temporary communities based upon a particular event, project, or activity; manage (e.g., filter, prioriti...
Conference Paper
The World Wide Web is highly dynamic and is constantly evolving to cover the latest information about the physical and social updates in the world. At the same time, the changes in web contents are entangled with new information needs and time-sensitive user interactions with information sources. To address these temporal information needs effectiv...
Conference Paper
Full-text available
Most research in Web search personalization models users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that for a significant number of search sessions, users diverge from their regular search profiles in order to satisfy atypi...
Article
The SIGIR 2012Workshop on Time-aware Information Access (#TAIA2012)1 was held in Portland on Thursday, 16 August 2012. The workshop brought together about 50 researchers from academia and industry for a full-day programme on time-sensitive information access that involved three keynotes and nine paper presentations.
Article
We present PivotPaths, an interactive visualization for exploring faceted information resources. During both work and leisure, we increasingly interact with information spaces that contain multiple facets and relations, such as authors, keywords, and citations of academic publications, or actors and genres of movies. To navigate these interlinked r...
Conference Paper
Users of search engines often abandon their searches. Despite the high frequency of Web search abandonment and its importance to Web search engines, little is known about why searchers abandon beyond that it can be for good or bad reasons. In this paper, we ex-tend previous work by studying search abandonment using both a retrospective survey and a...
Article
Full-text available
We describe an investigation of the use of probabilistic models and cost-benefit analyses to guide resource-intensive procedures used by a Web-based question answering system. We first provide an overview of research on question-answering systems. Then, we present details on AskMSR, a prototype web-based question answering system. We discuss Bayesi...
Article
Full-text available
Content on the Internet is always changing. We explore the value of biasing search result snippets towards new webpage content. We present results from a user study comparing traditional query-focused snippets with snippets that emphasize new page content for two query types: general and trending. Our results indicate that searchers prefer the incl...