Susan T. Dumais's research while affiliated with Microsoft and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (285)
Many search engines identify bursts of activity around particular topics and reflect these back to users as Popular Now or Hot Searches. Activity around these topics typically evolves quickly in real-time during the course of a trending event. Users’ informational needs when searching for such topics will vary depending on the stage at which they e...
Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. Its importance has further increased recently due to the growing need for large-scale datasets to train deep learning models. Weak or noisy supervision could originate from multiple sources including non-expert annotators...
Email remains one of the most frequently used means of online communication. People spend significant amount of time every day on
emails to exchange information, manage tasks and schedule events.
Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or
identifying intents to r...
Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices...
Limited labeled data is becoming one of the largest bottlenecks for supervised learning systems. This is especially the case for many real-world tasks where large scale labeled examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be effective in mitigating the scarcity...
When people communicate with each other, their choice of what to say is tied to their perceptions of the audience. For many communication channels, people have some ability to explicitly specify their audience members and the different roles they can play. While existing accounts of communication behavior have largely focused on how people tailor t...
Email remains one of the most frequently used means of online communication. People spend a significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to...
2020 ACM. Personalized document recommendation systems aim to provide users with a quick shortcut to the documents they may want to access next, usually with an explanation about why the document is recommended. Previous work explored various methods for better recommendations and better explanations in different domains. However, there are few eff...
Email is an integral part of people's work and life, enabling them to perform activities such as communicating, searching, managing tasks and storing information. Modern email clients take a step forward and help improve users' productivity by automatically creating reminders, tasks or responses. The act of reading is arguably the only activity tha...
Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. The growing need for large-scale datasets to train deep learning models has increased its importance. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling b...
Limited labeled data is becoming the largest bottleneck for supervised learning systems. This is especially the case for many real-world tasks where large scale annotated examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be a good means to mitigate the scarcity of an...
Email remains a critical channel for communicating information in both personal and work accounts. The number of emails people receive every day can be overwhelming, which in turn creates challenges for efficient information management and consumption. Having a good estimate of the significance of emails forms the foundation for many downstream tas...
We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000...
Email triage involves going through unhandled emails and deciding what to do with them. This familiar process can become increasingly challenging as the number of unhandled email grows. During a triage session, users commonly defer handling emails that they cannot immediately deal with to later. These deferred emails, are often related to tasks tha...
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
In this paper, we study how to leverage calendar information to help with email re-finding using a zero-query prototype, Calendar-Aware Proactive Email Recommender System (CAPERS). CAPERS proactively selects and displays potentially useful emails to users based on their upcoming calendar events with a particular focus on meeting preparation. We app...
Cambridge Core - Knowledge Management, Databases and Data Mining - Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven
Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven December 2013
Complex software applications expose hundreds of commands to users through intricate menu hierarchies. One of the most popular productivity software suites, Microsoft Office, has recently developed functionality that allows users to issue free-form text queries to a search system to quickly find commands they want to execute, retrieve help document...
The success of information retrieval systems depends critically on both the ability of systems to efficiently and effectively retrieve information, and to support people in articulating their information needs and making sense of the results. This interdisciplinary, user-centered perspective on information systems motivated my early work on Latent...
Email continues to be one of the most important means of online communication, leading to a number of challenges related to information overload and email management. To better understand email management practices in detail, we examine the distribution of visits to emails over time. During their lifetime, emails may be visited one or more times, a...
We formulate and study search algorithms that consider a user's prior interactions with a wide variety of content to personalize that user's current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user's in...
Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long term. To enable discovery, a machine learning algorithm typically elicits feedback on items it is...
Email is still among the most popular online activities. People spend a significant amount of time sending, reading and responding to email in order to communicate with others, manage tasks and archive personal information. Most previous research on email is based on either relatively small data samples from user surveys and interviews, or on consu...
Email has been a dominant form of communication for many years, and email search is an important problem. In contrast to other search setting, such as web search, there have been few studies of user behavior and models of email search success. Research in email search is challenging for many reasons including the personal and private nature of the...
Harman, D. K. and Kelly, D., editors
As the number of email users and messages continues to grow, search is becoming more important for finding information in personal archives. In spite of its importance, email search is much less studied than web search, particularly using large-scale behavioral log analysis. In this paper we report the results of a large-scale log analysis of email...
Email has been central to online communication for the past two decades. Through constant use, new information flows are being defined around users' interactions with emails. Alongside traditional messages, the email inbox is an always-available repository of to-do lists, reminders, files and notes. In this paper, we investigate the use of self-add...
Email continues to be an important form of communication as well as a way to manage tasks and archive personal information. As the volume of email grows, organizing and finding relevant email remains challenging. In this paper, we present a large-scale log analysis of the activities that people perform on email mes-sages (accessing external informa...
Traditionally search engines have returned the same results to everyone who asks the same question. However, using a single ranking for everyone in every context at every point in time limits how well a search engine can do in providing relevant information. In this talk I present a framework to quantify the "potential for personalization" which we...
Web search functionality is increasingly integrated into operating systems, software applications, and other interactive environments that extend beyond the traditional web browser. In particular, intelligent virtual assistants (e.g., Microsoft Cortana or Apple Siri) often "fall-back" to generic web search in cases where utterances fall outside the...
In this paper, we study shortlists as an interface component for recommender systems with the dual goal of supporting the user's decision process, as well as improving implicit feedback elicitation for increased recommendation quality. A shortlist is a temporary list of candidates that the user is currently considering, e.g., a list of a few movies...
In this paper, we study the impact of design choices for recommender systems
on one-choice tasks where users want to select one item out of a variety of
options. Instead of focusing on only user factors or recommendation quality, we
consider how an interface design that provides the user with digital short-term
memory impacts both user behavior and...
Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle is important in improving search systems. We address this important issue using a mixed methods stu...
Scrolling is an integral part of our everyday computing experience, and many techniques and devices have been developed to enhance scrolling. We have conducted an 18 participant user study to understand how users' gaze position and scrolling strategies are coordinated. Our data showed that people scrolled within preferred reading regions of the scr...
Personalization in computing helps tailor content to a person’s individual tastes. As a result, the tasks that benefit from personalization are inherently subjective. Many of the most robust approaches to personalization rely on large sets of other people’s preferences. However, existing preference data is not always available. In these cases we pr...
A system that facilitates ranking search results returned by a search engine in response to receipt of a query is described herein. The system includes a receiver component that receives categorical metadata pertaining to an item and categorical metadata pertaining to the query and a computation component that computes at least one of a document fe...
Over the last decade, the rise of web services has made it possible to gather traces of human behavior in situ at a scale and fidelity previously unimaginable. Large-scale behavioral data enables researchers and practitioners to detect adverse drug reactions and interactions, to understand how information diffuses through social networks, how peopl...
We present methods to automatically identify and recommend sub-tasks to help users explore and accomplish complex search tasks. Although Web searchers often exhibit directed search behaviors such as navigating to a particular Website or locating a particular item of information, many search scenarios involve more complex tasks such as learning abou...
Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results.
Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalized results on demand, over personal data, and for comple...
The claimed subject matter relates to an architecture that can scale a content feed in terms of the amount of content delivered in order to facilitate satisfactory experiences in connection with a social network. In particular, the architecture can utilize a content feed to disseminate content between members of a network community, generally relat...
Gaze tracking technology is increasingly common in desktop, laptop and mobile scenarios. Most previous research on eye gaze patterns during human-computer interaction has been confined to controlled laboratory studies. In this paper we present an in situ study of gaze and mouse coordination as participants went about their normal activities. We ana...
Personalization is a way for computers to support people’s diverse interests and needs by providing content tailored to the individual. While strides have been made in algorithmic approaches to personalization, most require access to a significant amount of data. However, even when data is limited online crowds can be used to infer an individual’s...
Personalized navigation for one or more individuals' use of a search engine is provided. Identification of a query submitted to the search engine is performed. If the query is identified to be a personal navigational query, which is a query via which the individuals intend to navigate to a particular site or information object that they have previo...
Significant time and effort has been devoted to reducing the time between query receipt and search engine response, and for good reason. Research suggests that even slightly higher retrieval latency by Web search engines can lead to dramatic decreases in users' perceptions of result quality and engagement with the search results. While users have c...
Concepts and technologies are described herein for hyperlocal smoothing. The hyperlocal smoothing solutions described herein provide a smooth view of data and events across hyperlocal geographic areas by combining sparse data available with inferred or extrapolated data. Additionally, the hyperlocal smoothing solutions described herein make use of...
Multiple pieces of information can be arranged into a single construct that allows the employee to ascertain information quickly while at her workstation. Selection of information for placement into the construct can employ various statistical models and the like. Selective pieces of information can be masked for a user's construct based upon acces...
A person often uses a single search engine for very different tasks. For example, an author editing a manuscript may use the same academic search engine to find the latest work on a particular topic or to find the correct citation for a familiar article. The author's tolerance for latency and accuracy may vary according to task. However, search eng...
Over the last two decades the information retrieval landscape has changed dramatically. Twenty years ago, there were fewer than 3k web sites and the earliest web search engines indexed approximately 50k pages. Today, search engines index billions of web pages, images, videos, news, music, social media, books, etc., and have become the main entry po...
One or more systems and/or techniques are provided for constructing a query classification index that can be used to classify a query into relevant categories. Where documents in an index are classified into one or more category predictions for a category hierarchy, classification metadata is generated for categories to which a document in the inde...
A query processing system is described herein for personalizing results for a particular user. The query processing system operates by receiving a query from a particular user u who intends to find results that satisfy the query with respect to a topic Tu, the user being characterized by user information θu. In one implementation, the query process...
Providing for generation of a task oriented data structure that can correlate natural language descriptions of computer related tasks to application level commands and functions is described herein. By way of example, a system can include an activity translation component that can receive a natural language description of an application level task....
HCI researchers are increasingly collecting rich behavioral traces of user interactions with online systems in situ at a scale not previously possible. These logs can be used to characterize user interactions with existing systems and compare different designs. Large-scale log studies give rise to new challenges in experimental design, data collect...
Web searchers often exhibit directed search behaviors such as navigating to a particular Website. However, in many circumstances they exhibit different behaviors that involve issuing many queries and visiting many results. In such cases, it is not clear whether the user's rationale is to intentionally explore the results or whether they are struggl...
The Internet is the largest source of information in the world. Search engines help people navigate the huge space of avail-able data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engin...
One or more techniques and/or systems are provided for transitioning between representations of an electronic document. Elements, such as visual elements, common between a first set of elements from a first representation of the document and a second set of elements from a second representation of the document are identified. The non-intersecting e...
Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results. Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalized results on demand, over personal data, and for compl...
Significant time and effort has been devoted to reducing the time between query receipt and search engine response, and for good reason. Research suggests that even slightly higher retrieval latency by Web search engines can lead to dramatic decreases in users' perceptions of result quality and engagement with the search results. While users have c...
Web content increasingly reflects the current state of the physical and social world, manifested both in traditional news media sources along with user-generated publishing sites such as Twitter, Foursquare, and Facebook. At the same time, web searching increasingly reflects problems grounded in the real world. As a result of this blending of the w...
Many search engines identify bursts of activity around particular topics and reflect these back to users as Popular Now or Hot Searches. Activity around these topics typically evolves quickly in real-time during the course of a trending event. Users' informational needs when searching for such topics will vary depending on the stage at which they e...
The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query japan spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and...
The subject disclosure pertains to anonymous network interaction. More specifically, mechanisms are provided to ensure anonymity with respect network interaction such that third parties are unable to determine the source and/or intent of communications. Accordingly, entities may anonymize all outgoing and/or incoming data packets so as to mitigate...
Synchronous social question-and-answer (Q&A) systems help people find answer by connecting them with others via instant messaging. To understand how such systems can quickly and effectively establish fruitful connections, we analyze conversations collected from a working enterprise social Q&A system. We show that when askers start with underspecifi...
Forming an accurate mental model of a user is crucial for the qualitative design and evaluation steps of many information-centric applications such as web search, content recommendation, or advertising. This process can often be time-consuming as search and interaction histories become verbose. In this work, we present and analyze the usefulness of...
The claimed subject matter relates to an architecture that can facilitate creation and management of an event-oriented transient network and can further manage decommission of the transient network. In particular, the architecture can construct temporary communities based upon a particular event, project, or activity; manage (e.g., filter, prioriti...
The World Wide Web is highly dynamic and is constantly evolving to cover the latest information about the physical and social updates in the world. At the same time, the changes in web contents are entangled with new information needs and time-sensitive user interactions with information sources. To address these temporal information needs effectiv...
Most research in Web search personalization models users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that for a significant number of search sessions, users diverge from their regular search profiles in order to satisfy atypi...
The SIGIR 2012Workshop on Time-aware Information Access (#TAIA2012)1 was held in Portland on Thursday, 16 August 2012. The workshop brought together about 50 researchers from academia and industry for a full-day programme on time-sensitive information access that involved three keynotes and nine paper presentations.
We present PivotPaths, an interactive visualization for exploring faceted information resources. During both work and leisure, we increasingly interact with information spaces that contain multiple facets and relations, such as authors, keywords, and citations of academic publications, or actors and genres of movies. To navigate these interlinked r...
Users of search engines often abandon their searches. Despite the high frequency of Web search abandonment and its importance to Web search engines, little is known about why searchers abandon beyond that it can be for good or bad reasons. In this paper, we ex-tend previous work by studying search abandonment using both a retrospective survey and a...
We describe an investigation of the use of probabilistic models and
cost-benefit analyses to guide resource-intensive procedures used by a
Web-based question answering system. We first provide an overview of research
on question-answering systems. Then, we present details on AskMSR, a prototype
web-based question answering system. We discuss Bayesi...
Content on the Internet is always changing. We explore the value of biasing search result snippets towards new webpage content. We present results from a user study comparing traditional query-focused snippets with snippets that emphasize new page content for two query types: general and trending. Our results indicate that searchers prefer the incl...
User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual's history of queries and clicked documents. Previous studies have explored how short-term behavior or long-term behavior can be pred...
Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the "state" of the search manually (most common), or plan for interruption in advance (unlikely). The goa...
It is very challenging task to understand a short query, especially if that query is considered in isolation. Luckily, queries do magically appear in a search box -- rather, they are issued by real people, trying to accomplish a task, at a given point in time and space, and this "context" can be used to aid query understanding. Traditionally search...
We present a new approach for personalizing Web search results to a specific user. Ranking functions for Web search engines are typically trained by machine learning algorithms using either direct human relevance judgments or indirect judgments obtained from click-through data from millions of users. The rankings are thus optimized to this generic...
Understanding the impact of individual and task differences on search result page examination strategies is important in developing improved search engines. Characterizing these effects using query and click data alone is common but insufficient since they provide an incomplete picture of result examination behavior. Cursor- or gaze-tracking studie...
Web search engines now offer more than ranked results. Queries on topics like weather, definitions, and movies may return inline results called answers that can resolve a searcher's information need without any additional interaction. Despite the usefulness of answers, they are limited to popular needs because each answer type is manually authored....
User behavior on the Web changes over time. For exam-ple, the queries that people issue to search engines, and the underlying informational goals behind the queries vary over time. In this paper, we examine how to model and predict user behavior over time. We develop a temporal model-ing framework adapted from physics and signal processing that can...