Sadia Afroz

Sadia Afroz
University of California, Berkeley | UCB · Department of Electrical Engineering and Computer Sciences

About

44
Publications
20,493
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,512
Citations

Publications

Publications (44)
Article
Full-text available
Low-code software development (LCSD) is an emerging approach to democratize application development for software practitioners from diverse backgrounds. LCSD platforms promote rapid application development with a drag-and-drop interface and minimal programming by hand. As it is a relatively new paradigm, it is vital to study developers’ difficultie...
Preprint
Full-text available
Low-code software development (LCSD) is an emerging approach to democratize application development for software practitioners from diverse backgrounds. LCSD platforms promote rapid application development with a drag-and-drop interface and minimal programming by hand. As it is a relatively new paradigm, it is vital to study developers' difficultie...
Preprint
Full-text available
We propose MALIGN, a novel malware family detection approach inspired by genome sequence alignment. MALIGN encodes malware using four nucleotides and then uses genome sequence alignment approaches to create a signature of a malware family based on the code fragments conserved in the family making it robust to evasion by modification and addition of...
Article
Full-text available
The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to the question of dataset-bias of...
Preprint
Recent advances in adversarial attacks have shown that machine learning classifiers based on static analysis are vulnerable to adversarial attacks. However, real-world antivirus systems do not rely only on static classifiers, thus many of these static evasions get detected by dynamic analysis whenever the malware runs. The real question is to what...
Conference Paper
Recent years have seen a dramatic increase in applications of Artificial Intelligence (AI) and Machine Learning (ML) to security and privacy problems. The analytic tools and intelligent behavior provided by these techniques make AI and ML increasingly important for autonomous real-time analysis and decision making in domains with a wealth of data o...
Preprint
Full-text available
The proliferation of fake news and its propagation on social media have become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been attempted to detect it. However, most of those focused on a special type of news (such as political) and did not apply many advanced techniques. In this rese...
Preprint
Cybercrime forums enable modern criminal entrepreneurs to collaborate with other criminals into increasingly efficient and sophisticated criminal endeavors. Understanding the connections between different products and services can often illuminate effective interventions. However, generating this understanding of supply chains currently requires ti...
Conference Paper
Facing undesired traffic from the Tor anonymity network, online service providers discriminate against Tor users. In this study we characterize the extent of discrimination faced by Tor users and the nature of undesired traffic exiting from the Tor network - a task complicated by Tor's need to maintain user anonymity. We leverage multiple independe...
Preprint
Full-text available
This paper examines different reasons the websites may vary in their availability by location. Prior works on availability mostly focus on censorship by nation states. We look at three forms of server-side blocking: blocking visitors from the EU to avoid GDPR compliance, blocking based upon the visitor's country, and blocking due to security concer...
Preprint
Full-text available
One of the Internet's greatest strengths is the degree to which it facilitates access to any of its resources from users anywhere in the world. However, users in the developing world have complained of websites blocking their countries. We explore this phenomenon using a measurement study. With a combination of automated page loads, manual checking...
Conference Paper
We propose monotonic classification with selection of monotonic features as a defense against evasion attacks on classifiers for malware detection. The monotonicity property of our classifier ensures that an adversary will not be able to evade the classifier by adding more features. We train and test our classifier on over one million executables c...
Article
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extr...
Conference Paper
Sites for online classified ads selling sex are widely used by human traffickers to support their pernicious business. The sheer quantity of ads makes manual exploration and analysis unscalable. In addition, discerning whether an ad is advertising a trafficked victim or an independent sex worker is a very difficult task. Very little concrete ground...
Conference Paper
Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums uns...
Conference Paper
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system’s ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal sub...
Conference Paper
Online underground forums serve a key role in facilitating information exchange and commerce between gray market or even cybercriminal actors. In order to streamline bilateral communication to complete sales, merchants often publicly post their IM contact details, such as their Skype handle. Merchants that publicly post their Skype handle potential...
Conference Paper
The utility of anonymous communication is undermined by a growing number of websites treating users of such services in a degraded fashion. The second-class treatment of anonymous users ranges from outright rejection to limiting their access to a subset of the service’s functionality or imposing hurdles such as CAPTCHA-solving. To date, the observa...
Article
Full-text available
The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requ...
Conference Paper
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maxim...
Conference Paper
This work addresses fundamental questions about the nature of cybercriminal organization. We investigate the organization of three underground forums: BlackhatWorld, Carders and L33tCrew to understand the nature of distinct communities within a forum, the structure of organization and the impact of enforcement, in particular banning members, on the...
Article
Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums ca...
Article
Recent studies on Website Fingerprinting (WF) claim to have found highly effective attacks on Tor. However, these studies make assumptions about user settings, adversary capabilities, and the nature of the Web that do not necessarily hold in practical scenarios. The following study critically evaluates these assumptions by conducting the attack whe...
Conference Paper
Full-text available
Active learning is an area of machine learning examining strategies for allocation of finite resources, particularly human labeling efforts and to an extent feature extraction, in situations where available data exceeds available resources. In this open problem paper, we motivate the necessity of active learning in the security domain, identify pro...
Article
We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the...
Conference Paper
Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, tradi...
Conference Paper
In this position paper, we argue that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments. We propose that designers of such systems broaden the classification goal into an explanatory...
Conference Paper
Underground forums enable technical innovation among criminals as well as allow for specialization, thereby making cybercrime economically efficient. The success of these forums is contingent on collective action twixt a variety of stakeholders. What distinguishes sustainable forums from those that fail? We begin to address these questions by exami...
Conference Paper
We examine how consumers perceive publicized instances of privacy flaws and private information data breaches.Using three real-world privacy breach incidents, we study how these flaws affected consumers' future purchasing behavior and perspective on a company's trustworthiness. We investigate whether despite a lack of widespread privacy enhancing t...
Article
The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and...
Conference Paper
In this work, we design a method for blog comment spam detection using the assumption that spam is any kind of uninformative content. To measure the "informativeness" of a set of blog comments, we construct a language and tokenization independent metric which we call content complexity, providing a normalized answer to the informal question "how mu...
Conference Paper
This paper presents Anonymouth, a novel framework for anonymizing writing style. Without accounting for style, anonymous au-thors risk identification. This framework is necessary to provide a tool for testing the consistency of anonymized writing style and a mechanism for adaptive attacks against stylometry techniques. Our framework defines the ste...
Article
In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-...
Conference Paper
Phishing is a security attack that involves obtaining sensitive or otherwise private data by presenting oneself as a trustworthy entity. Phishers often exploit users' trust on the appearance of a site by using web pages that are visually similar to an authentic site. This paper proposes a phishing detection approach - PhishZoo - that uses profiles...
Conference Paper
Security decision-making is hard for both humans and machines. This is because security decisions are context-dependent, require highly dynamic, specialized knowledge, and require complex risk analysis. Multiple user studies show that humans have diculty making these decisions, due to insucient information and bounded rationality. However, current...
Article
Phishing is a web-based attack that uses social engineer- ing techniques to exploit Internet users and acquire sensi- tive data. Most phishing attacks work by creating a fake version of the real site's web interface to gain the user's trust. Despite the fact that these phishing sites look iden- tical or nearly identical to the real sites they imita...
Article
In general, Algorithm visualization(AV) with animation helps to construct a mental model of the dynamic behavior of an algorithm in action. In this project we proposed an AV sys-tem for elementary algorithms that would allow both active and passive learning. Our AV system, AlgoVis illustrates an algorithm in the form of animation, code and plain en...

Network

Cited By