J. D. Tygar’s research while affiliated with University of California, Berkeley and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (135)


Adversarial Machine Learning
  • Book

February 2019

·

447 Reads

·

196 Citations

Anthony D. Joseph

·

Blaine Nelson

·

·

J. D. Tygar

Cambridge Core - Pattern Recognition and Machine Learning - Adversarial Machine Learning - by Anthony D. Joseph


Reviewer Integration and Performance Measurement for Malware Detection

July 2016

·

78 Reads

·

83 Citations

Lecture Notes in Computer Science

Brad Miller

·

Alex Kantchelian

·

Michael Carl Tschantz

·

[...]

·

J. D. Tygar

We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system’s ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778 GB of raw feature data. Without reviewer assistance, we achieve 72 % detection at a 0.5 % false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89 % and are able to detect 42 % of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 % points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3 % of our entire dataset.


Fig. 5: Figure 5a presents the impact of each component in our customized query strategy. We improve detection over the uncertainty sampling approach from prior work. Figure 5b presents the performance of our detector for imperfect reviewers with the specified true and false positive rates. For example, given a reviewer with a 5% false positive rate and 80% true positive rate, our detector's true positive rate only decreases by 5% at a 1% false positive rate.
Fig. 6: Figure 6a presents performance for different reviewer query budgets, with significant return on minimal efforts and diminishing returns occurring around 80 queries/day. Figure 6b demonstrates that retraining more quickly improves detector performance.
Fig. 7: Feature categories ranked by importance.
Back to the Future: Malware Detection with Temporally Consistent Labels
  • Article
  • Full-text available

October 2015

·

420 Reads

·

1 Citation

The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requiring that training labels also predate evaluation binaries since training labels collected after evaluation binaries constitute label knowledge from the future. Using a dataset containing 1.1 million binaries from over 2.5 years, we show that enforcing temporal label consistency decreases detection from 91% to 72% at a 0.5% false positive rate compared to temporal samples alone. The impact of temporal labeling demonstrates the potential of improved labels to increase detection results. Hence, we present a detector capable of selecting binaries for submission to an expert labeler for review. At a 0.5% false positive rate, our detector achieves a 72% true positive rate without an expert, which increases to 77% and 89% with 10 and 80 expert queries daily, respectively. Additionally, we detect 42% of malicious binaries initially undetected by all 32 antivirus vendors from VirusTotal used in our evaluation. For evaluation at scale, we simulate the human expert labeler and show that our approach is robust against expert labeling errors. Our novel contributions include a scalable malware detector integrating manual review with machine learning and the examination of temporal label consistency.

Download

Better Malware Ground Truth

October 2015

·

52 Reads

·

88 Citations

We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the first time between January 2012 and June 2014. Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form "k out of n" AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with ground truth are enough to achieve 80% true positive rate for 0.1% false positive rate. In comparison, the best unweighted threshold rule provides only 60% true positive rate at the same false positive rate.


Remote Operating System Classification over IPv6

October 2015

·

38 Reads

·

7 Citations

Differences in the implementation of common networking protocols make it possible to identify the operating system of a remote host by the characteristics of its TCP and IP packets, even in the absence of application-layer information. This technique, "OS fingerprinting," is relevant to network security because of its relationship to network inventory, vulnerability scanning, and tailoring of exploits. Various techniques of fingerprinting over IPv4 have been in use for over a decade; however IPv6 has had comparatively scant attention in both research and in practical tools. In this paper we describe an IPv6-based OS fingerprinting engine that is based on a linear classifier. It introduces innovative classification features and network probes that take advantage of the specifics of IPv6, while also making use of existing proven techniques. The engine is deployed in Nmap, a widely used network security scanner. This engine provides good performance at a fraction of the maintenance costs of classical signature-based systems. We describe our work in progress to enhance the deployed system: new network probes that help to further distinguish operating systems, and imputation of incomplete feature vectors.


Evasion and Hardening of Tree Ensemble Classifiers

September 2015

·

128 Reads

·

134 Citations

Recent work has successfully constructed adversarial "evading" instances for differentiable prediction models. However generating adversarial instances for tree ensembles, a piecewise constant class of models, has remained an open problem. In this paper, we construct both exact and approximate evasion algorithms for tree ensembles: for a given instance x we find the "nearest" instance x' such that the classifier predictions of x and x' are different. First, we show that finding such instances is practically possible despite tree ensemble models being non-differentiable and the optimal evasion problem being NP-hard. In addition, we quantify the susceptibility of such models applied to the task of recognizing handwritten digits by measuring the distance between the original instance and the modified instance under the L0, L1, L2 and L-infinity norms. We also analyze a wide variety of classifiers including linear and RBF-kernel models, max-ensemble of linear models, and neural networks for comparison purposes. Our analysis shows that tree ensembles produced by a state-of-the-art gradient boosting method are consistently the least robust models notwithstanding their competitive accuracy. Finally, we show that a sufficient number of retraining rounds with L0-adversarial instances makes the hardened model three times harder to evade. This retraining set also marginally improves classification accuracy, but simultaneously makes the model more susceptible to L1, L2 and L-infinity evasions.


Figure 2: Excess Risk R ( f ) − R ( f ∗ ) of learned models with 
Adversarial Active Learning

November 2014

·

876 Reads

·

71 Citations

Active learning is an area of machine learning examining strategies for allocation of finite resources, particularly human labeling efforts and to an extent feature extraction, in situations where available data exceeds available resources. In this open problem paper, we motivate the necessity of active learning in the security domain, identify problems caused by the application of present active learning techniques in adversarial settings, and propose a framework for experimentation and implementation of active learning systems in adversarial contexts. More than other contexts, adversarial contexts particularly need active learning as ongoing attempts to evade and confuse classifiers necessitate constant generation of labels for new content to keep pace with adversarial activity. Just as traditional machine learning algorithms are vulnerable to adversarial manipulation, we discuss assumptions specific to active learning that introduce additional vulnerabilities, as well as present vulnerabilities that are amplified in the active learning setting. Lastly, we present a software architecture, Security-oriented Active Learning Testbed (SALT), for the research and implementation of active learning applications in adversarial contexts.


On Modeling the Costs of Censorship

September 2014

·

37 Reads

·

3 Citations

We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the traditional methods of evaluation employed in prior works.


I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis

March 2014

·

124 Reads

·

108 Citations

Lecture Notes in Computer Science

Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website.


Large-margin convex polytope machine

January 2014

·

49 Reads

·

34 Citations

Advances in Neural Information Processing Systems

We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks. The CPM finds a large margin convex polytope separator which encloses one class. We develop a stochastic gradient descent based algorithm that is amenable to massive data sets, and augment it with a heuristic procedure to avoid sub-optimal local minima. Our experimental evaluations of the CPM on large-scale data sets from distinct domains (MNIST handwritten digit recognition, text topic, and web security) demonstrate that the CPM trains models faster, sometimes by several orders of magnitude, than state-of-the-art similar approaches and kernel-SVM methods while achieving comparable or better classification performance. Our empirical results suggest that, unlike prior similar approaches, we do not need to control the number of sub-classifiers (sides of the polytope) to avoid overfitting.


Citations (82)


... Discussing attackers' goals, knowledge, and capabilities is crucial to designing realistic attacks, which can be used for robustness assessments [25]. ...

Reference:

FRAUD-RLA: A new reinforcement learning adversarial attack against credit card fraud detection
Adversarial Machine Learning
  • Citing Book
  • February 2019

... In order to face the many challenges but also to leverage the opportunities it is encountering the discipline of digital forensics have to rethink in some ways established principles and reorganize well-known workflows, even include and use tools not previously considered viable for forensic use -concerns regarding the security of some machine learning algorithms has been voiced, for instance in [BBC+08]. On the other hand forensic analysts' skills need to be rounded up to make better use of these new tools in the first place but also to help integrate them in forensic best practices and validate them. ...

Open problems in the security of learning
  • Citing Article
  • January 2008

... The DREBIN20 dataset comprises ≈ 150K Android applications collected between January 2016 and December 2018, while APIGraph contains ≈ 323K Android applications from January 2012 to December 2018. Specifically, DREBIN20 includes 135,708 goodware and 15,765 malware labeled according to criteria adapted from [50], where an app is classified as goodware if it has zero VirusTotal (VT) detections, and as malware if it has four or more VT detections. Moreover, APIGraph comprises 290,505 goodware and 32,089 malware, labeled using the criteria established in [51]: an app is considered goodware if it has zero VT detections and malware if it has 15 or more VT detections. ...

Reviewer Integration and Performance Measurement for Malware Detection
  • Citing Conference Paper
  • July 2016

Lecture Notes in Computer Science

... This set of attacks represents the most important part of security vulnerabilities identified for the SUANET project and listed in Section II-B. TIK algorithm offers the possibility to calculate the expiration time and the distance traveled of the packets [44]. This information is then included in the packets, so that the other UAVs can infer whether or not the packet has been altered. ...

BiBa Broadcast Authentication
  • Citing Chapter
  • January 2003

... Machine learning is a promising topic that has been applied in several proposals for the purpose of OS fingerprinting area such as those cited in Fifield et al. (2015), Ordorica (2017) and Schwartzenberg (2010) and it is proven to be an effective method for classifications. Machine learning provides computers the ability to learn the behaviours without being explicitly programmed (configured). ...

Remote Operating System Classification over IPv6
  • Citing Conference Paper
  • October 2015

... GBDTs are susceptible to adversarial perturbation attacks [13], [14] that perturb feature values imperceptably to cause misclassification, similarly to adversarial perturbation attacks on Convolutional Neural Networks (CNNs) [15], [16]. Black-box gradient-based attacks [17], [18] approximate the gradient of a classifier based on the classifier outputs and can be applied to GBDTs. ...

Evasion and Hardening of Tree Ensemble Classifiers
  • Citing Article
  • September 2015

... This phenomenon stems from the ML model being "overly-reliant" on its training data: if the data seen at inference differs substantially from that seen by the ML model during its learning phase, then its output may not be correct [10]. Unfortunately, modern networks are constantlymutating ecosystems-from both the "benign" (e.g., devices may be added or removed) and "malicious" (attackers refine their tactics) perspective [11,42]. Extant cybersecurity literature mostly investigated concept drift in malware detection contexts [14,16,62], with some recent efforts considering this (still open) problem also for ML-NIDS [5,46,70,72,75]. ...

Adversarial Active Learning