September 2021
·
13 Reads
·
12 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
September 2021
·
13 Reads
·
12 Citations
August 2021
·
8 Reads
·
2 Citations
Proceedings of the International AAAI Conference on Web and Social Media
Many search engines identify bursts of activity around particular topics and reflect these back to users as Popular Now or Hot Searches. Activity around these topics typically evolves quickly in real-time during the course of a trending event. Users’ informational needs when searching for such topics will vary depending on the stage at which they engage with an event. Through a survey and log study, we observe that interaction with content about trending events varies significantly with prior awareness of the event. Building on this observation, we conduct a larger-scale analysis of query logs and social media data associated with hundreds of trending events. We find that search and social media activity tend to follow similar temporal patterns, but that social media activity leads by a few hours. While user interest in trending event content predictably diverges during peak activity periods, the overlap between content searched and shared increases. We discuss how these findings relate to the design of interfaces to better support sensemaking around trending events by integrating real-time social media content with traditional search results.
May 2021
·
33 Reads
·
137 Citations
Proceedings of the AAAI Conference on Artificial Intelligence
Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. Its importance has further increased recently due to the growing need for large-scale datasets to train deep learning models. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling based on heuristics or user interaction signals. There is an extensive amount of previous work focusing on leveraging noisy labels. Most notably, recent work has shown impressive gains by using a meta-learned instance re-weighting approach where a meta-learning framework is used to assign instance weights to noisy labels. In this paper, we extend this approach via posing the problem as a label correction problem within a meta-learning framework. We view the label correction procedure as a meta-process and propose a new meta-learning based framework termed MLC (Meta Label Correction) for learning with noisy labels. Specifically, a label correction network is adopted as a meta-model to produce corrected labels for noisy labels while the main model is trained to leverage the corrected labels. Both models are jointly trained by solving a bi-level optimization problem. We run extensive experiments with different label noise levels and types on both image recognition and text classification tasks. We compare the re-weighing and correction approaches showing that the correction framing addresses some of the limitations of re-weighting. We also show that the proposed MLC approach outperforms previous methods in both image and language tasks.
March 2021
·
14 Reads
July 2020
·
497 Reads
·
34 Citations
Email remains one of the most frequently used means of online communication. People spend significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to recommend appropriate actions. The problem has been mostly posed as a supervised learning problem where models of different complexities were proposed to classify an email message into a predefined taxonomy of intents or classes. The need for labeled data has always been one of the largest bottlenecks in training supervised models. This is especially the case for many real-world tasks, such as email intent classification, where large scale annotated examples are either hard to acquire or unavailable due to privacy or data access constraints. Email users often take actions in response to intents expressed in an email (e.g., setting up a meeting in response to an email with a scheduling request). Such actions can be inferred from user interaction logs. In this paper, we propose to leverage user actions as a source of weak supervision, in addition to a limited set of annotated examples, to detect intents in emails. We develop an end-to-end robust deep neural network model for email intent identification that leverages both clean annotated data and noisy weak supervision along with a self-paced learning mechanism. Extensive experiments on three different intent detection tasks show that our approach can effectively leverage the weakly supervised data to improve intent detection in emails.
July 2020
·
35 Reads
·
84 Citations
May 2020
·
31 Reads
Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying questions have been recently studied in the literature. However, user interaction with clarifying questions is relatively unexplored. In this paper, we conduct a comprehensive study by analyzing large-scale user interactions with clarifying questions in a major web search engine. In more detail, we analyze the user engagements received by clarifying questions based on different properties of search queries, clarifying questions, and their candidate answers. We further study click bias in the data, and show that even though reading clarifying questions and candidate answers does not take significant efforts, there still exist some position and presentation biases in the data. We also propose a model for learning representation for clarifying questions based on the user interaction data as implicit feedback. The model is used for re-ranking a number of automatically generated clarifying questions for a given query. Evaluation on both click data and human labeled data demonstrates the high quality of the proposed method.
May 2020
·
428 Reads
·
33 Citations
Intelligent Systems, IEEE
Limited labeled data is becoming one of the largest bottlenecks for supervised learning systems. This is especially the case for many real-world tasks where large scale labeled examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be effective in mitigating the scarcity of labeled data by leveraging weak labels or injecting constraints from heuristic rules and/or extrinsic knowledge sources. Social media has little labeled data but possesses unique characteristics that make it suitable for generating weak supervision, resulting in a new type of weak supervision, i.e., weak social supervision. In this article, we illustrate how various aspects of social media can be used as weak social supervision. Specifically, we use the recent research on fake news detection as the use case, where social engagements are abundant but annotated examples are scarce, to show that weak social supervision is effective when facing the labeled data scarcity problem. This article opens the door to learning with weak social supervision for similar emerging tasks when labeled data is limited.
May 2020
·
225 Reads
·
11 Citations
Proceedings of the ACM on Human-Computer Interaction
When people communicate with each other, their choice of what to say is tied to their perceptions of the audience. For many communication channels, people have some ability to explicitly specify their audience members and the different roles they can play. While existing accounts of communication behavior have largely focused on how people tailor the content of their messages, we focus on the configuring of the audience as a complementary family of decisions in communication. We formulate a general description of audience configuration choices, highlighting key aspects of the audience that people could configure to reflect a range of communicative goals. We then illustrate these ideas via a case study of email usage-a realistic domain where audience configuration choices are particularly fine-grained and explicit in how email senders fill the To and Cc address fields. In a large collection of enterprise emails, we explore how people configure their audiences, finding salient patterns relating a sender's choice of configuration to the types of participants in the email exchange, the content of the message, and the nature of the subsequent interactions. Our formulation and findings show how analyzing audience configurations can enrich and extend existing accounts of communication behavior, and frame research directions on audience configuration decisions in communication and collaboration.
May 2020
·
185 Reads
Email remains one of the most frequently used means of online communication. People spend a significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to recommend appropriate actions. The problem has been mostly posed as a supervised learning problem where models of different complexities were proposed to classify an email message into a predefined taxonomy of intents or classes. The need for labeled data has always been one of the largest bottlenecks in training supervised models. This is especially the case for many real-world tasks, such as email intent classification, where large scale annotated examples are either hard to acquire or unavailable due to privacy or data access constraints. Email users often take actions in response to intents expressed in an email (e.g., setting up a meeting in response to an email with a scheduling request). Such actions can be inferred from user interaction logs. In this paper, we propose to leverage user actions as a source of weak supervision, in addition to a limited set of annotated examples, to detect intents in emails. We develop an end-to-end robust deep neural network model for email intent identification that leverages both clean annotated data and noisy weak supervision along with a self-paced learning mechanism. Extensive experiments on three different intent detection tasks show that our approach can effectively leverage the weakly supervised data to improve intent detection in emails.
... They found that most humans are worse than the algorithm while a few were more accurate, and people relied more on item content and demographic information to make predictions. Organisciak et al. (Organisciak et al. 2014) proposed a crowdsourcing system to predict ratings on items for requesters. They compared a collaborative filtering approach (predicting based on ratings from crowdworkers who share similar preferences) with a crowd prediction approach (crowdworkers predicting ratings based on the requester's past ratings). ...
September 2014
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
... These methods are particularly effective at the image level. This challenge also extends to pixel-level labels [61] and textual data [62]. In this paper, we propose an innovative first-break picking algorithm capable of identifying outliers or noisy labels in the data. ...
May 2021
Proceedings of the AAAI Conference on Artificial Intelligence
... Inductive recommendation. Inductive recommendation refers to models capable of recommending new items that were not seen during the model's training phase [58,59,62]. Existing methods achieve the new item recommendation capability by leveraging side information like tags or descriptions [29,38], modality representations [16,18,46,52,63], and behavior patterns [58,59]. ...
September 2021
... Weakly-supervised text classification (WTC) aims to use various weakly supervised signals to perform text classification. There are many sources of weak supervision signals, including: 1) external knowledge bases (Gabrilovich et al., 2007;Yin et al., 2019), 2) seed words (Meng et al., 2020b;Zhang et al., 2021;Mekala and Shang, 2020;Wang et al., 2021;Zhao et al., 2023a), 3) heuristic rules (Badene et al., 2019;Shu et al., 2020), 4) language models (prompt methods) (Holtzman et al., 2022;Han et al., 2022b). Among these, the most popular ones at present are seed-words and prompt methods, where the former generates pseudo-labels based on word matching, and the latter generates the class probability distribution of each text by prompting a large language model. ...
July 2020
... In this scenario, the user is seeking information about setting up a distribution list in Outlook, and the clarification question aims to clarify the version of Outlook that the user is working with. While clarification has become an important component of many conversational and interactive information-seeking systems [1], previous research has shown that even though clarification questions receive positive engagement, users are not frequently engaged with them [2,3]. ...
July 2020
... Research on online communities producing public information goods has found evidence that audience size motivates contributors [85]. Additionally, numerous studies have shown that users of social networking sites frequently consider the audience that their posts and messages may reach [57,83]. As individuals on social media typically have little information about who sees their posts, they conceive of "imagined audiences" based on cues from visible activity [7] and target imagined audiences using deliberate strategies, such as using multiple platforms to reach distinct audiences, in order to control who sees or does not see their posts [54,57,86]. ...
May 2020
Proceedings of the ACM on Human-Computer Interaction
... Finally, social media consumers from different networks discuss their preferences, themes, and relationships. Moreover, false news distribution processes establish an echo chamber cycle, underscoring the importance of network-based feature extraction in detecting fake news (Ruchansky et al., 2017;Sahoo et al., 2021;Shu et al. 2020). ...
May 2020
Intelligent Systems, IEEE
... Conversational search (CS) enables users and systems to collaboratively refine queries through dialogue (Radlinski and Craswell, 2017), addressing limitations of traditional keyword-matching systems where single queries often fail to capture complete information needs (Aliannejadi et al., 2019;Zamani et al., 2020). Query clarification has emerged as a key mechanism for improving Do you want a tufted design or a smooth finish? ...
April 2020
... Explanations that extend a user's prior knowledge or fulfill their immediate needs should be prioritized [60]. Moreover, previous research has suggested that presenting detailed and personalized explanations is useful for better understanding AI outcomes [78,101,113,192,228]. ...
April 2020
... Explanations that extend a user's prior knowledge or fulfill their immediate needs should be prioritized [60]. Moreover, previous research has suggested that presenting detailed and personalized explanations is useful for better understanding AI outcomes [78,101,113,192,228]. ...
March 2020