About
70
Publications
26,940
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,429
Citations
Citations since 2017
Introduction
Publications
Publications (70)
We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; fami...
We propose a novel system to help fact-checkers formulate search queries for known misinformation claims and effectively search across multiple social media platforms. We introduce an adaptable rewriting strategy, where editing actions (e.g., swap a word with its synonym; change verb tense into present simple) for queries containing claims are auto...
An important challenge for news fact-checking is the effective dissemination of existing fact-checks. This in turn brings the need for reliable methods to detect previously fact-checked claims. In this paper, we focus on automatically finding existing fact-checks for claims made in social media posts (tweets). We conduct both classification and ret...
The quality of prostate cancer (PCa) content on Instagram is unknown.
We examined 62 still-images and 64 video Instagram posts using #prostatecancer on 5/18/20. Results were assessed with validated tools.
Most content focused on raising awareness or sharing patient stories (46%); only 9% was created by physicians. 90% of content was low-to-moderate...
A growing number of people engage in online health forums, making it important to understand the quality of the advice they receive. In this paper, we explore the role of expertise in responses provided to help-seeking posts regarding mental health. We study the differences between (1) interactions with peers; and (2) interactions with self-identif...
In this paper, we explore the construction of natural language explanations for news claims, with the goal of assisting fact-checking and news evaluation applications. We experiment with two methods: (1) an extractive method based on Biased TextRank -- a resource-effective unsupervised graph-based algorithm for content extraction; and (2) an abstra...
TikTok is a social network launched in 2016, which is used to create and share short videos (≤60 seconds). TikTok was the most downloaded app in the U.S. in 2018 and 2019 and is currently available in >55 countries. Similar to other social networks, TikTok users can follow other content creators and view a feed of videos. Users may associate their...
The COVID-19 pandemic dramatically impacted society and health care on a global scale. To capture the lived experience of patients with prostate cancer and family members/caregivers during the COVID-19 pandemic, we performed a mixed-methods study of posts to two online networks. We compared all 6187 posts to the Inspire Us TOO Prostate Cancer onlin...
In this paper, we introduce personalized word embeddings, and examine their value for language modeling. We compare the performance of our proposed prediction model when using personalized versus generic word representations, and study how these representations can be leveraged for improved performance. We provide insight into what types of words c...
We introduce Biased TextRank, a graph-based content extraction method inspired by the popular TextRank algorithm that ranks text spans according to their importance for language processing tasks and according to their relevance to an input "focus." Biased TextRank enables focused content extraction for text by modifying the random restarts in the e...
Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people wit...
Hearings of witnesses and defendants play a crucial role when reaching court trial decisions. Given the high-stakes nature of trial outcomes, developing computational models that assist the decision-making process is an important research venue. In this paper, we address the identification of deception in real-life trial data. We use a dataset cons...
The ongoing COVID-19 pandemic has raised concerns for many regarding personal and public health implications, financial security and economic stability. Alongside many other unprecedented challenges, there are increasing concerns over social isolation and mental health. We introduce \textit{Expressive Interviewing}--an interview-style conversationa...
Worldwide, an increasing number of people are suffering from mental health disorders such as depression and anxiety. In the United States alone, one in every four adults suffers from a mental health condition, which makes mental health a pressing concern. In this paper, we explore the use of multimodal cues present in social media posts to predict...
In this paper, we introduce a counseling dialogue system that provides real-time assistance to counseling trainees. The system generates sample counselors' reflections-i.e., responses that reflect back on what the client has said given the dialogue history. We build our model upon the recent generative pretrained transformer architecture and levera...
Although there is a large amount of user-generated content about urological health issues on social media, much of this content has not been vetted for information accuracy. In this article, we review the literature on the quality and balance of information on urological health conditions on social networks. Across a wide range of benign and malign...
Recent years have witnessed a significant increase in the online sharing of medical information, with videos representing a large fraction of such online sources. Previous studies have however shown that more than half of the health-related videos on platforms such as YouTube contain misleading information and biases. Hence, it is crucial to build...
Recent years have witnessed a significant increase in the online sharing of medical information, with videos representing a large fraction of such online sources. Previous studies have however shown that more than half of the health-related videos on platforms such as YouTube contain misleading information and biases. Hence, it is crucial to build...
We explore the use of longitudinal dialog data for two dialog prediction tasks: next message prediction and response time prediction. We show that a neural model using personal data that leverages a combination of message content, style matching, time features, and speaker attributes leads to the best results for both tasks, with error rate reducti...
Sarcasm is often expressed through several verbal and non-verbal cues, e.g., a change of tone, overemphasis in a word, a drawn-out syllable, or a straight looking face. Most of the recent work in sarcasm detection has been carried out on textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic classificatio...
This paper introduces a multimodal approach for detecting individuals' affective state while being exposed to visual narratives. We use four modalities, namely visual facial behaviors, heart rate measurements, thermal imaging, and verbal descriptions, and show that we can predict changes in the affect that people experience when they are exposed to...
We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; fami...
The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the aut...
In this paper, we explore the hypothesis that multimodal features as well as demographic information can play an important role in increasing the performance of automatic lie detection. We introduce a large, multimodal deception detection dataset balanced across genders, and we analyze the patterns associated with the thermal, linguistic, and visua...
Automatic gender classification is receiving increasing attention in the computer interaction community as the need for personalized, reliable, and ethical systems arises. To date, most gender classification systems have been evaluated on textual and audiovisual sources. This work explores the possibility of enhancing such systems with physiologica...
The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the aut...
This paper explores gender-based differences in multimodal deception detection. We introduce a new large, gender-balanced dataset, consisting of 104 subjects with 520 different responses covering multiple scenarios, and perform an extensive analysis of different feature sets extracted from the linguistic, physiological, and thermal data streams rec...
Deception detection has received an increasing amount of attention in recent years, due to the significant growth of digital media, as well as increased ethical and security concerns. Earlier approaches to deception detection were mainly focused on law enforcement applications and relied on polygraph tests, which had proven to falsely accuse the in...
Rates of childhood obesity in the United States remain at historic highs. The pediatric primary care office represents an important yet underused setting to intervene with families. One factor contributing to underuse of the primary care setting is lack of effective available interventions. One evidence-based method to help engage and motivate pati...
Hearings of witnesses and defendants play a crucial role when reaching court trial decisions. Given the high-stake nature of trial outcomes, implementing accurate and effective computational methods to evaluate the honesty of court testimonies can offer valuable support during the decision making process. In this paper, we address the identificatio...
In this paper, we address the task of cross-cultural deception detection. Using crowdsourcing, we collect four deception datasets, two in English (one originating from United States and one from India), one from Romanian speakers, and one in Spanish obtained from speakers from Mexico, covering three predetermined topics. We also collect two additio...
The widespread use of deception in written content has motivated the need for methods to automatically profile and identify deceivers. Particularly, the identification of deception based on demographic data such as gender, age, and religion, has become of importance due to ethical and security concerns. Previous work on deception detection has stud...
This paper lays the grounds for a new methodology for detecting thermal discomfort, which can potentially reduce the building energy usage while improving the comfort of its inhabitants. The paper describes our explorations in automatic human discomfort prediction using physiological signals directly collected from a buildings inhabitants. Using in...
In this paper we address the automatic identification of deceit by using a multimodal approach. We collect deceptive and truthful responses using a multimodal setting where we acquire data using a microphone, a thermal camera, as well as physiological sensors. Among all available modalities, we focus on three modalities namely, language use, physio...
In this paper, we address the task of cross-cultural deception detection. Using crowdsourcing, we collect three deception datasets, two in English (one originating from United States and one from India), and one in Spanish obtained from speakers from Mexico. We run comparative experiments to evaluate the accuracies of deception classifiers built fo...
This paper presents the construction of a multimodal dataset for deception detection, including physiological, thermal, and visual responses of human subjects under three deceptive scenarios. We present the experimental protocol, as well as the data acquisition process. To evaluate the usefulness of the dataset for the task of deception detection,...
This paper lays the grounds for a new methodology for detecting thermal discomfort, which can potentially reduce the building energy usage while improving the comfort of its inhabitants. The paper describes our explorations in automatic human discomfort prediction using physiological signals directly collected from a buildings inhabitants. Using in...
This paper presents experiments in building a classifier for the automatic detection of deceit. Using a dataset of deceptive videos, we run several comparative evaluations focusing on the verbal component of these videos, with the goal of understanding the difference in deceit detection when using manual versus automatic transcriptions, as well as...
During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automa...
In this paper, we explore a thermal imaging approach to sensing affective state. Using features extracted from a thermal map of the face, obtained from a dataset consisting of 70 recordings of positive, negative, or neutral states, we show that we can effectively predict the presence of affect, with an error reduction of up to 50% as compared to a...
Using multimodal sentiment analysis, the presented method integrates linguistic, audio, and visual features to identify sentiment in online videos. In particular, experiments focus on a new dataset consisting of Spanish videos collected from YouTube that are annotated for sentiment polarity.
This paper describes several experiments in building a sentiment analysis classifier for spoken reviews. We specifically focus on the linguistic component of these reviews, with the goal of understanding the difference in sentiment classification performance when using manual versus automatic transcriptions, as well as the difference between spoken...
In this paper, we explore a multimodal approach to sensing affective state during exposure to visual narratives. Using four different modalities, consisting of visual facial behaviors, thermal imaging, heart rate measurements, and verbal descriptions, we show that we can effectively predict changes in human affect. Our experiments show that these m...
In this paper we present a framework to derive sentiment lexicons in a target language by using manually or automatically annotated data available in an electronic resource rich language, such as English. We show that bridging the language gap using the multilingual sense-level aligned WordNet structure allows us to generate a high accuracy (90%) p...
Some Hybrid Packing Systems integrate several algorithms to solve the bin packing problem (BPP) based on their past performance
and the problem characterization. These systems relate BPP characteristics with the performance of the set of solution algorithms
and allow us to estimate which algorithm is to yield the best performance for a previously u...
The problem of algorithm selection for solving NP problems arises with the appearance of a variety of heuristic algorithms.
The first works claimed the supremacy of some algorithm for a given problem. Subsequent works revealed that the supremacy
of algorithms only applied to a subset of instances. However, it was not explained why an algorithm solv...
Data preprocessing plays an important role in many processes of data mining. The practice widely adopted in this area is only
to use a preprocessing method like discretization. In this paper we propose an ordered scheme to combine various important
methods of data preprocessing. The aim is to increase the accuracy of the most used classification al...