About
96
Publications
36,106
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,878
Citations
Introduction
Additional affiliations
August 2019 - present
August 2016 - August 2019
September 2010 - present
Education
September 2010 - July 2016
September 2006 - July 2010
Publications
Publications (96)
The increasing prevalence of software vulnerabilities necessitates automated vulnerability repair (AVR) techniques. This Systematization of Knowledge (SoK) provides a comprehensive overview of the AVR landscape, encompassing both synthetic and real-world vulnerabilities. Through a systematic literature review and quantitative benchmarking across di...
While Endpoint Detection and Response (EDR) are able to efficiently monitor threats by comparing static rules to the event stream, their inability to incorporate past system context leads to high rates of false alarms. Recent work has demonstrated Provenance-based Intrusion Detection Systems (Prov-IDS) that can examine the causal relationships betw...
Many research attempts have been made to develop explanation techniques to provide interpretable explanations for deep learning results. However, the produced methods are optimized for non-security tasks (e.g., image analysis). Their key assumptions are often violated in security applications, resulting in a low explanation fidelity. In this chapte...
With life style shifting during the pandemic, online health communities start to attract more users (including healthcare workers and patients) to discuss health-related questions. While such online platforms provide convenience to users, with health-related information shared broadly over text and images (e.g., X-Ray scans, photocopies of document...
Federal funding agencies and industry entities are seeking innovative approaches to address the ever-growing cybersecurity crisis. Increasingly, numerous cybersecurity thought leaders are indicating that Artificial Intelligence (AI)-enabled analytics can help tackle key cybersecurity tasks and deploy defenses. This half-day workshop, co-located wit...
Malware classifiers are subject to training-time exploitation due to the need to regularly retrain using samples collected from the wild. Recent work has demonstrated the feasibility of backdoor attacks against malware classifiers, and yet the stealthiness of such attacks is not well understood. In this paper, we investigate this phenomenon under t...
Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple I...
The GPS has empowered billions of users and various critical infrastructures with its positioning and time services. However , GPS spoofing attacks also become a growing threat to GPS-dependent systems. Existing detection methods either require expensive hardware modifications to current GPS devices or lack the basic robustness against sophisticate...
Social computing researchers are using data from location-based social networks (LBSN), e.g., "Check-in" traces, as approximations of human movement. Recent work has questioned the validity of this approach, showing large discrepancies between check-in data and actual user mobility. To further validate and understand such discrepancies, we perform...
This book constitutes selected and extended papers from the Second International Workshop on Deployable Machine Learning for Security Defense, MLHat 2021, held in August 2021. Due to the COVID-19 pandemic the conference was held online.
The 6 full papers were thoroughly reviewed and selected from 7 qualified submissions. The papers are organized...
Healthcare applications on Voice Personal Assistant System (e.g., Amazon Alexa), have shown a great promise to deliver personalized health services via a conversational interface. However, concerns are also raised about privacy, safety, and service quality. In this paper, we propose VerHealth, to systematically assess health-related applications on...
Today’s video streaming market is crowded with various content providers (CPs). For individual CPs, understanding user behavior, in particular how users migrate among different CPs, is crucial for improving users’ on-site experience and the CP’s chance of success. In this paper, we take a data-driven approach to analyze and model user migration beh...
The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which...
The popularity of smart-home assistant systems such as Amazon Alexa and Google Home leads to a booming third-party application market (over 70,000 applications across the two stores). While existing works have revealed security issues in these systems, it is not well understood how to help application developers to enforce security requirements. In...
This book constitutes selected papers from the First International Workshop on Deployable Machine Learning for Security Defense, MLHat 2020, held in August 2020. Due to the COVID-19 pandemic the conference was held online.
The 8 full papers were thoroughly reviewed and selected from 13 qualified submissions. The papers are organized in the follow...
Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review se...
The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which...
Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform...
Ad-blocking systems such as Adblock Plus rely on crowdsourcing to build and maintain filter lists, which are the basis for determining which ads to block on web pages. In this work, we seek to advance our understanding of the ad-blocking community as well as the errors and pitfalls of the crowdsourcing process. To do so, we collected and analyzed a...
While deep learning models have achieved unprecedented success in various domains, there is also a growing concern of adversarial attacks against related applications. Recent results show that by adding a small amount of perturbations to an image (imperceptible to humans), the resulting adversarial examples can force a classifier to make targeted m...
Phishing has been a big concern due to its active roles in recent data breaches and state-sponsored attacks. While existing works have extensively analyzed phishing websites and their operations, there is still a limited understanding of the information sharing flows throughout the end-to-end phishing process. In this paper, we perform an empirical...
With the wide adoption of mobile devices, it becomes increasingly important to understand how users use mobile apps. Knowing when and where certain apps are used is instrumental for app developers to improve app usability and for Internet service providers (ISPs) to optimize their network services. However, modeling spatio-temporal patterns of app...
Today’s online question and answer (Q8A) services are receiving a large volume of questions. It becomes increasingly challenging to motivate domain experts to provide quick and high-quality answers. Recent systems seek to engage real-world experts by allowing them to set a price on their answers. This leads to a “targeted” Q8A model where users ask...
While deep learning has shown a great potential in various domains, the lack of transparency has limited its application in security or safety-critical areas. Existing research has attempted to develop explanation techniques to provide interpretable explanations for each classification decision. Unfortunately, current methods are optimized for non-...
Understanding mobile app usage has become instrumental to service providers to optimize their online services. Meanwhile, there is a growing privacy concern that users' app usage may uniquely reveal who they are. In this paper, we seek to understand how likely a user can be uniquely re-identified in the crowd by the apps she uses. We systematically...
Darknet markets are online services behind Tor where cybercriminals trade illegal goods and stolen datasets. In recent years, security analysts and law enforcement start to investigate the darknet markets to study the cybercriminal networks and predict future incidents. However, vendors in these markets often create multiple accounts (\em i.e., Syb...
Real-time crowdsourced maps, such as Waze provide timely updates on traffic, congestion, accidents, and points of interest. In this paper, we demonstrate how lack of strong location authentication allows creation of software-based Sybil devices that expose crowdsourced map systems to a variety of security and privacy attacks. Our experiments show t...
Leaked passwords from data breaches can pose a serious threat if users reuse or slightly modify the passwords for other services. With more services getting breached today, there is still a lack of a quantitative understanding of this risk. In this paper, we perform the first large-scale empirical analysis of password reuse and modification pattern...
Online question and answer (Q&A) services are facing key challenges to motivate domain experts to provide quick and high-quality answers. Recent systems seek to engage real-world experts by allowing them to set a price on their answers. This leads to a "targeted" Q&A model where users to ask questions to a target expert by paying the price. In this...
The email system is the central battleground against phishing and social engineering attacks, and yet email providers still face key challenges to authenticate incoming emails. As a result, attackers can apply spoofing techniques to impersonate a trusted entity to conduct highly deceptive phishing attacks. In this work, we study email spoofing to a...
Email spoofing is a critical step of phishing, where the attacker impersonates someone the victim knows or trusts. In this paper, we conduct a qualitative study to explore why email spoofing is still possible after years of efforts to develop and deploy anti-spoofing protocols (e.g., SPF, DKIM, DMARC). First, we measure the protocol adoption by sca...
Today's video streaming market is crowded with various content providers (CPs). For individual CPs, understanding user behavior, in particular how users migrate among different CPs, is crucial for improving users' on-site experience and the CP's chance of success. In this paper, we take a data-driven approach to analyze and model user migration beh...
Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDoS) attacks, data breaches, and...
The ever-increasing sophistication of malware has made malicious binary collection and analysis an absolute necessity for proactive defenses. Meanwhile, malware authors seek to harden their binaries against analysis by incorporating environment detection techniques, in order to identify if the binary is executing within a virtual environment or in...
The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent attackers from creating large numbers of fake identities to disseminate unwanted content (e.g., spam)....
Leaked passwords from data breaches can pose a serious threat to users if the password is reused elsewhere. With more online services getting breached today, there is still a lack of large-scale quantitative understanding of the risks of password reuse across services. In this paper, we analyze a large collection of 28.8 million users and their 61....
For millions around the globe, digital payment apps such as Venmo are replacing cash as the preferred method of payment between friends and vendors. Apps like Venmo bring a unique blend of convenience and social interactions into financial transactions. In this paper, we study the role of social relationships in the adoption of the Venmo digital pa...
It is often difficult to separate the highly capable “experts” from the average worker in crowdsourced systems. This is especially true for challenge application domains that require extensive domain knowledge. The problem of stock analysis is one such domain, where even the highly paid, well-educated domain experts are prone to make mistakes. As a...
For millions around the globe, digital payment apps such as Venmo are replacing cash as the preferred method of payment between friends and vendors. Apps like Venmo bring a unique blend of convenience and social interactions into financial transactions. In this paper, we study the role of social relationships in the adoption of the Venmo digital pa...
Watching videos from multiple content providers (CP) has become prevalent. For individual CPs, understanding user video consumption patterns among CPs is critical for improving on-site user experience and CP's opportunity of success. In this paper, based on a two-month dataset recording 9 million users' 269 million video viewing requests over 6 mos...
Inter-Component Communication (ICC) provides a message passing mechanism for data exchange between Android applications. It has been long believed that inter-app ICCs can be abused by malware writers to launch collusion attacks using two or more apps. However, because of the complexity of performing pairwise program analysis on apps, the scale of e...
Inter-Component Communication (ICC) enables useful interactions between mobile apps. However, misuse of ICC exposes users to serious threats such as intent hijacking/spoofing and app collusions, allowing malicious apps to access privileged user data via another app. Unfortunately, existing ICC analyses are largely incompetent in both accuracy and s...
Community based question answering (CQA) services receive a large volume of questions today. It is increasingly challenging to motivate domain experts to give timely answers. Recently, payment-based CQA services explore new incentive models to engage real-world experts and celebrities by allowing them to set a price on their answers. In this paper,...
Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and...
With smartphones making video recording easier than ever, new apps like Periscope and Meerkat brought personalized interactive video streaming to millions. With a touch, viewers can switch between first person perspectives across the globe, and interact in real-time with broadcasters. Unlike traditional video streaming, these services require low-l...
Crowdsourcing is a unique and practical approach to obtain personalized data and content. Its impact is especially significant in providing commentary, reviews and metadata, on a variety of location based services. In this study, we examine reliability of the Waze mapping service, and its vulnerability to a variety of location-based attacks. Our go...
Real-time crowdsourced maps such as Waze provide timely updates on traffic, congestion, accidents and points of interest. In this paper, we demonstrate how lack of strong location authentication allows creation of software-based Sybil devices that expose crowdsourced map systems to a variety of security and privacy attacks. Our experiments show tha...
Today’s ubiquitous online social networks serve multiple purposes, including social communication (Facebook, Renren), and news dissemination (Twitter). But how does a social network’s design define its functionality? Answering this would need social network providers to take a proactive role in defining and guiding user behavior.
In this paper, we...
Online services are increasingly dependent on user participation. Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detect...
Compared to traditional online maps, crowdsourced maps such as Waze are
unique in providing real-time updates on traffic, congestion, accidents and
points of interest. In this paper, we explore the practical impact of attacks
against crowdsourced map systems, and develop robust defenses against them. Our
experiments show that a single attacker with...
Today, most spectrum allocation algorithms use conflict graphs to capture interference conditions. The use of conflict graphs, however, is often questioned by the wireless community for two reasons. First, building accurate conflict graphs requires significant overhead, and hence does not scale to outdoor networks. Second, conflict graphs cannot pr...
In crowdsourced systems, it is often difficult to separate the highly capable " experts " from the average worker. In this paper , we study the problem of evaluating and identifying experts in the context of SeekingAlpha and StockTwits, two crowdsourced investment services that are encroaching on a space dominated for decades by large investment ba...
In crowdsourced systems, it is often difficult to separate the highly capable "experts" from the average worker. In this paper, we study the problem of evaluating and identifying experts in the context of SeekingAlpha and StockTwits, two crowdsourced investment services that are encroaching on a space dominated for decades by large investment banks...
Social interactions and interpersonal communication has undergone significant changes in recent years. Increasing awareness of privacy issues and events such as the Snowden disclosures have led to the rapid growth of a new generation of anonymous social networks and messaging applications. By removing traditional concepts of strong identities and s...
Recent work in security and systems has embraced the use of machine learning (ML) techniques for identifying misbehavior, e.g. email spam and fake (Sybil) users in social networks. However, ML models are typically derived from fixed datasets, and must be periodically retrained. In adversarial environments, attackers can adapt by modifying their beh...
For decades, the world of financial advisors has been dominated by large
investment banks such as Goldman Sachs. In recent years, user-contributed
investment services such as SeekingAlpha and StockTwits have grown to millions
of users. In this paper, we seek to understand the quality and impact of
content on social investment platforms, by empirica...
Mobile networking researchers have long searched for large-scale, fine-grained traces of human movement, which have remained elusive for both privacy and logistical reasons. Recently, researchers have begun to focus on geosocial mobility traces, e.g. Foursquare checkin traces, because of their availability and scale. But are we conceding correctnes...
The users of microblogging services, such as Twitter, use the count of followers of an account as a measure of its reputation or influence. For those unwilling or unable to attract followers naturally, a growing industry of "Twitter follower markets" provides followers for sale. Some markets use fake accounts to boost the follower count of their cu...
Popular Internet services in recent years have shown that remarkable things can be achieved by harnessing the power of the masses. However, crowd-sourcing systems also pose a real challenge to existing security mechanisms deployed to protect Internet services, particularly those tools that identify malicious activity by detecting activities of auto...