Charles V. Wright’s research while affiliated with Johns Hopkins University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (13)


Uncovering Spoken Phrases in Encrypted Voice over IP Conversations
  • Article

January 2011

·

48 Reads

·

53 Citations

Charles V Wright

·

S E Coull

·

F Monrose

·

G M Masson

Although Voice over IP (VoIP) is rapidly being adopted, its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that it is possible to identify the phrases spoken within encrypted VoIP calls when the audio is encoded using variable bit rate codecs. To do so, we train a hidden Markov model using only knowledge of the phonetic pronunciations of words, such as those provided by a dictionary, and search packet sequences for instances of specified phrases. Our approach does not require examples of the speaker's voice, or even example recordings of the words that make up the target phrase. We evaluate our techniques on a standard speech recognition corpus containing over 2,000 phonetically rich phrases spoken by 630 distinct speakers from across the continental United States. Our results indicate that we can identify phrases within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation. copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be hon-ored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific per-mission and/or a fee. Permissions may be requested from the Publications Dept.


Uncovering Spoken Phrases in Encrypted Voice over IP Conversations

December 2010

·

48 Reads

·

35 Citations

ACM Transactions on Information and System Security

Charles V. Wright

·

Lucas Ballard

·

Scott E. Coull

·

[...]

·

Gerald M. Masson

Although Voice over IP (VoIP) is rapidly being adopted, its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that it is possible to identify the phrases spoken within encrypted VoIP calls when the audio is encoded using variable bit rate codecs. To do so, we train a hidden Markov model using only knowledge of the phonetic pronunciations of words, such as those provided by a dictionary, and search packet sequences for instances of specified phrases. Our approach does not require examples of the speaker’s voice, or even example recordings of the words that make up the target phrase. We evaluate our techniques on a standard speech recognition corpus containing over 2,000 phonetically rich phrases spoken by 630 distinct speakers from across the continental United States. Our results indicate that we can identify phrases within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.


Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security
  • Conference Paper
  • Full-text available

January 2010

·

259 Reads

·

24 Citations

Lecture Notes in Computer Science

Rigorous scientific experimentation in system and network security remains an elusive goal. Recent work has outlined three basic requirements for experiments, namely that hypotheses must be falsifiable, experiments must be controllable, and experiments must be repeatable and reproducible. Despite their simplicity, these goals are difficult to achieve, especially when dealing with client-side threats and defenses, where often user input is required as part of the experiment. In this paper, we present techniques for making experiments involving security and client-side desktop applications like web browsers, PDF readers, or host-based firewalls or intrusion detection systems more controllable and more easily repeatable. First, we present techniques for using statistical models of user behavior to drive real, binary, GUI-enabled application programs in place of a human user. Second, we present techniques based on adaptive replay of application dialog that allow us to quickly and efficiently reproduce reasonable mock-ups of remotely-hosted applications to give the illusion of Internet connectedness on an isolated testbed. We demonstrate the utility of these techniques in an example experiment comparing the system resource consumption of a Windows machine running anti-virus protection versus an unprotected system.

Download

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis

January 2009

·

476 Reads

·

301 Citations

Recent work has shown that properties of network traffic that remain observable after encryption, namely packet sizes and timing, can reveal surprising informa- tion about the traffic's contents (e.g., the language of a VoIP call (29), passwords in secure shell logins (20), or even web browsing habits (21, 14)). While there are some legitimate uses for encrypted traffic analysis, these techniques also raise important questions about the pri- vacy of encrypted communications. A common tactic for mitigating such threats is to pad packets to uniform sizes or to send packets at fixed timing intervals; however, this approach is often inefficient. In this paper, we propose a novel method for thwarting statistical traffic analysis algorithms by optimally morphing one class of traffic to look like another class. Through the use of convex op- timization techniques, we show how to optimally modify packets in real-time to reduce the accuracy of a variety of traffic classifiers while incurring much less overhead than padding. Our evaluation of this technique against two published traffic classifiers for VoIP (29) and web traffic (14) shows that morphing works well on a wide range of network data—in some cases, simultaneously providing better privacy and lower overhead than na¨ive defenses.


Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations

June 2008

·

142 Reads

·

202 Citations

Despite the rapid adoption of Voice over IP (VoIP), its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that when the audio is encoded using variable bit rate codecs, the lengths of encrypted VoIP packets can be used to identify the phrases spoken within a call. Our results indicate that a passive observer can identify phrases from a standard speech corpus within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.


Table 3 shows each of
Taming the Devil: Techniques for Evaluating Anonymized Network Data.

January 2008

·

167 Reads

·

46 Citations

Anonymization plays a key role in enabling the pub- lic release of network datasets, and yet there are few, if any, techniques for evaluating the efficacy of network data anonymization techniques with respect to the pri- vacy they afford. In fact, recent work suggests that many state-of-the-art anonymization techniques may leak more information than first thought. In this pa- per, we propose techniques for evaluating the anonymity of network data. Specifically, we simulate the behavior of an adversary whose goal is to deanonymize objects, such as hosts or web pages, within the network data. By doing so, we are able to quantify the anonymity of the data using information theoretic metrics, objectively compare the efficacy of anonymization techniques, and examine the impact of selective deanonymization on the anonymity of the data. Moreover, we provide several concrete applications of our approach on real network data in the hope of underscoring its usefulness to data publishers.


Figure 2. Binary tree with nodes indicating bits of the anonymized address. Root indicates left-most bit, and shaded nodes indicate compromised bits where the mapping to the unanonymized address is known.
Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces.

January 2007

·

376 Reads

·

115 Citations

Encouraging the release of network data is central to promoting sound network research practices, though the publication of this data can leak sensitive information about the publishing organization. To address this dilemma, sev- eral techniques have been suggested for anonymizing net- work data by obfuscating sensitive fields. In this paper, we present new techniques for inferring network topology and deanonymizing servers present in anonymized network data, using only the data itself and public information. Via analyses on three different network datasets, we quantify the effectiveness of our techniques, showing that they can uncover significant amounts of sensitive information. We also discuss prospects for preventing these deanonymiza- tion attacks.


Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob

January 2007

·

122 Reads

·

150 Citations

Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP.


On Inferring Application Protocol Behaviors in Encrypted Network Traffic.

December 2006

·

146 Reads

·

249 Citations

Journal of Machine Learning Research

Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network. How- ever, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common applica- tion protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classifiers achieve accuracy greater than 90% for several protocols in aggregate traffic, and, for most protocols, greater than 80% when making fine-grained classifications on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%.


Using visual motifs to classify encrypted traffic

November 2006

·

34 Reads

·

26 Citations

In an eort to make robust trac classification more accessi- ble to human operators, we present visualization techniques for network trac. Our techniques are based solely on net- work information that remains intact after application-layer encryption, and so oer a way to visualize trac "in the dark". Our visualizations clearly illustrate the dierences between common application protocols, both in their tran- sient (i.e., time-dependent) and steady-state behavior. We show how these visualizations can be used to assist a human operator to recognize application protocols in unidentified trac and to verify the results of an automated classifier via visual inspection. In particular, our preliminary results show that we can visually scan almost 45,000 connections in less than one hour and correctly identify known application behaviors. Moreover, using visualizations together with an automated comparison technique based on Dynamic Time Warping of the motifs, we can rapidly develop accurate rec- ognizers for new or previously unknown applications.


Citations (13)


... PHMMs are applied to the problem of protocol identification in [113,115], which has relevance in information security in the context of firewalls and network-based intrusion detection. In [114], PHMMs are applied to encrypted VoIP traffic and in [116], PHMMs are used to analyze applications based on network traces. ...

Reference:

A Survey of Machine Learning Algorithms and Their Application in Information Security: An Artificial Intelligence Approach
Towards better protocol identification using profile HMMs JHU Technical Report JHU-SPAR051201
  • Citing Article

... Various types of side channel attacks are built on traffic analysis. For instance, multiple studies use traffic analysis to uncover not only the language spoken over encrypted VoIP connections [54], but also the spoken phrases [53], [52]. Chen et al. [39] use traffic analysis to infer the video streams being watched over encrypted channels. ...

Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob
  • Citing Article
  • January 2007

... For example, in IP-based internetworking, traffic can hardly be modelled in a generic format [17]. Recently, the use of Hidden Markov Model (HMM)181920 for learning and prediction has increased in many fields including traffic engineering, speech processing, finance etc. due to its simple learning mechanism. Solving performance related issues of networks and ensuring better QoS for end-users calls for simple, tractable and realistic traffic measurement and modelling techniques. ...

HMM Profiles for Network Traffic Classification (Extended Abstract)
  • Citing Article
  • January 2004

... Similarly, early versions of other applications of traffic analysis attacks such as video fingerprinting (SVMs- [12], other statistical methods- [34,35]). and identifying spoken phrases on VoIP apps (Hidden Markov Models- [53,54]) leveraged traditional machine learning methods too. Inspired by the massive success of deep learning models in recent years, most recent work on encrypted traffic classification also leveraged them to improve performance. ...

Uncovering Spoken Phrases in Encrypted Voice over IP Conversations
  • Citing Article
  • January 2011

... Researchers have proposed various de-anonymization attacks. Most of these attacks are based on traffic injection [38,39], fingerprinting [40,41] and crypt-analysis [5]. Attacks based on crypt-analysis can be defended against by employing new cryptography-based schemes. ...

Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces.

... This approach could be extended to mask packet timings by sending packets at a fixed frequency corresponding to the minimum IPAT. However, as doing so has a prohibitively high bandwidth cost, Wright, Coull, and Monrose instead proposed traffic morphing [16]. Under their scheme, packets produced by one class of traffic (e.g., VoIP) were padded or delayed to reshape the exposed pattern into that of a different class (e.g., web traffic). ...

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis
  • Citing Conference Paper
  • January 2009

... This essentially requires accurately profiling the execution behavior of servers to create formal models that can be used in the generation of synthetic datasets. As an example, the widely used CERT dataset [28] used for insider threat detection is aimed at simulating human behavior based on observations made by ear-lier work on the network and host events in a large enterprise [29][30][31]. The ability to create such datasets with a high degree of realism requires extending such measurements to servers as the most important components of an enterprise network. ...

Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security

Lecture Notes in Computer Science

... A graph structure is well suited to encode logical links between various types of events. As a consequence, and with notable exceptions such as [35,66,158], link graphs are often used for visualizing relations among similar entities (e.g. network nodes or users) or various types of entities (e.g., the various features of an alert coming from an IDS or from a pcap file). ...

Using visual motifs to classify encrypted traffic
  • Citing Conference Paper
  • November 2006

... Hidden Markov Models (HMM) [1] are still one of the most widely used machine learning methods across an extensive range of data modalities-texts to images, time-series of hand movements to weather events-and virtually all disciplines analyzing data, from political science [2] to education [3] and computer security [4] to autonomous vehicles [5]. ...

HMM profiles for network traffic classification
  • Citing Conference Paper
  • October 2004