Alex X. Liu

Michigan State University, East Lansing, Michigan, United States

Are you Alex X. Liu?

Claim your profile

Publications (65)20.28 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Forwarding Information Base (FIB) of backbone routers has been rapidly growing in size. An ideal IP lookup algorithm should achieve constant, yet small, IP lookup time and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. In this paper, we first propose SAIL, a Splitting Approach to IP Lookup. One splitting is along the dimension of the lookup process, namely finding the prefix length and finding the next hop, and another splitting is along the dimension of prefix length, namely IP lookup on prefixes of length less than or equal to 24 and IP lookup on prefixes of length longer than 24. Second, we propose a suite of algorithms for IP lookup based on our SAIL framework. Third, we implemented our algorithms on four platforms: CPU, FPGA, GPU, and many-core. We conducted extensive experiments to evaluate our algorithms using real FIBs and real traffic from a major ISP in China. Experimental results show that our SAIL algorithms are several times or even two orders of magnitude faster than well known IP lookup algorithms.
    08/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bulk data migration between datacenters is often a critical step in deploying new services, improving reliability under failures, or implementing various cost reduction strategies for cloud companies. These bulk amounts of transferring data consume massive bandwidth, and further incur severe network congestion. Leveraging the temporal and spacial characteristics of inter-datacenter bulk data traffic, in this paper, we investigate the Multiple Bulk Data Transfers Scheduling (MBDTS) problem to reduce the network congestion. Temporally, we apply the store-and-forward transfer mode to reduce the peak traffic load on the link. Spatially, we propose to lexicographically minimize the congestion of all links among datacenters. To solve the MBDTS problem, we first model it as an optimization problem, and then propose the novel Elastic Time-Expanded Network technique to represent the time-varying network status as a static one with a reasonable expansion cost. Using this transformation, we reformulate the problem as a Linear Programming (LP) model, and obtain the optimal solution through iteratively solving the LP model. We have conducted extensive simulations on a real network topology. The results prove that our algorithm can significantly reduce the network congestion as well as balance the entire network traffic with practical computational costs.
    Computer Networks 08/2014; · 1.23 Impact Factor
  • Muhammad Shahzad, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this paper, we propose COLATE, the first per-flow latency measurement scheme that requires no probe packets and time stamping. Given a set of observation points, COLATE records packet timing information at each point so that later for any two points, it can accurately estimate the average and standard deviation of the latencies experienced by the packets of any flow in passing the two points. The key idea is that when recording packet timing information, COLATE purposely allows noise to be introduced for minimizing storage space, and when querying the latency of a target flow, COLATE uses statistical techniques to denoise and obtain an accurate latency estimate. COLATE is designed to be efficiently implementable on network middleboxes. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. In terms of storage space, COLATE uses less than 0.1 bit per packet, which means that, on a backbone link with about half a million packets per second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace. Results show that COLATE always achieves the required reliability for any given confidence interval.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Content Delivery Networks (CDNs) differ from other caching systems in terms of both workload characteristics and performance metrics. However, there has been little prior work on large-scale measurement and characterization of content requests and caching performance in CDNs. For workload characteristics, CDNs deal with extremely large content volume, high content diversity, and strong temporal dynamics. For performance metrics, other than hit ratio, CDNs also need to minimize the disk operations and the volume of traffic from origin servers. In this paper, we conduct a large-scale measurement study to characterize the content request patterns using real-world data from a commercial CDN provider.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mobile network operators have a significant interest in the performance of streaming video on their networks because network dynamics directly influence the Quality of Experience (QoE). However, unlike video service providers, network operators are not privy to the client- or server-side logs typically used to measure key video performance metrics, such as user engagement. To address this limitation, this paper presents the first large-scale study characterizing the impact of cellular network performance on mobile video user engagement from the perspective of a network operator. Our study on a month-long anonymized data set from a major cellular network makes two main contributions. First, we quantify the effect that 31 different network factors have on user behavior in mobile video. Our results provide network operators direct guidance on how to improve user engagement --- for example, improving mean signal-to-interference ratio by 1 dB reduces the likelihood of video abandonment by 2%. Second, we model the complex relationships between these factors and video abandonment, enabling operators to monitor mobile video user engagement in real-time. Our model can predict whether a user completely downloads a video with more than 87% accuracy by observing only the initial 10 seconds of video streaming sessions. Moreover, our model achieves significantly better accuracy than prior models that require client- or server-side logs, yet we only use standard radio network statistics and/or TCP/IP headers available to network operators.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multipath TCP (MPTCP) allows the concurrent use of multiple paths between two end points, and as such holds great promise for improving application performance. However, in this paper, we report a newly discovered class of attacks on MPTCP that may jeopardize and hamper its wide-scale adoption. The attacks stem from the interdependence between the multiple subflows in an MPTCP connection. MPTCP congestion control algorithms are designed to achieve resource pooling and fairness with single-path TCP users at shared bottlenecks. Therefore, multiple MPTCP subflows are inherently coupled with each other, resulting in potential side-channels that can be exploited to infer cross-path properties. In particular, an ISP monitoring one or more paths used by an MPTCP connection can infer sensitive and proprietary information (e.g., level of network congestion, end-to-end TCP throughput, packet loss, network delay) about its competitors. Since the side-channel information enabled by the coupling among the subflows in an MPTCP connection results directly from the design goals of MPTCP congestion control algorithms, it is not obvious how to circumvent this attack easily. We believe our findings provide insights that can be used to guide future security-related research on MPTCP and other similar multipath extensions.
    Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks; 11/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, the Graphics Processing Unit (GPU) has been proved to be an exciting new platform for software routers, providing high throughput and flexibility. However, it is still a challenging task to deploy some core routing functions into GPU-based software routers with anticipatory performance and scalability, such as IP address lookup. Existing solu- tions have good performance, but their scalability to IPv6 and frequent updates are not so encouraging. In this paper, we investigate GPU’s characteristics in par- allelism and memory accessing, and then encode a multi- bit trie into a state-jump table. On this basis, a fast and scalable IP lookup engine called GPU-Accelerated Multi-bit Trie (GAMT) has been presented. According to our experi- ments on real-world routing data, based on the multi-stream pipeline, GAMT enables lookup speeds as high as 1072 and 658 Million Lookups Per Second (MLPS) for IPv4/6 respec- tively, when performing a 16M traffic under highly frequent updates (70, 000 updates/s). Even using a small batch size, GAMT can still achieve 339 and 240 MLPS respectively, while keeping the average lookup latency below 100 μs. These results show clearly that GAMT makes significant progress on both scalability and performance.
    2013 the ninth ACM/IEEE symposium on Architectures for networking and communications systems; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the rich functionalities and enhanced computing capabilities available on mobile computing devices with touch screens, users not only store sensitive information (such as credit card numbers) but also use privacy sensitive applications (such as online banking) on these devices, which make them hot targets for hackers and thieves. To protect private information, such devices typically lock themselves after a few minutes of inactivity and prompt a password/PIN/pattern screen when reactivated. Passwords/PINs/patterns based schemes are inherently vulnerable to shoulder surfing attacks and smudge attacks. Furthermore, passwords/PINs/patterns are inconvenient for users to enter frequently. In this paper, we propose GEAT, a gesture based user authentication scheme for the secure unlocking of touch screen devices. Unlike existing authentication schemes for touch screen devices, which use what user inputs as the authentication secret, GEAT authenticates users mainly based on how they input, using distinguishing features such as finger velocity, device acceleration, and stroke time. Even if attackers see what gesture a user performs, they cannot reproduce the behavior of the user doing gestures through shoulder surfing or smudge attacks. We implemented GEAT on Samsung Focus running Windows, collected 15009 gesture samples from 50 volunteers, and conducted real-world experiments to evaluate GEAT's performance. Experimental results show that our scheme achieves an average equal error rate of 0.5% with 3 gestures using only 25 training samples.
    Proceedings of the 19th annual international conference on Mobile computing & networking; 09/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Ternary content addressable memories (TCAMs) are widely used in network devices carrying out the core operation of single-operation lookups. TCAMs are the core component of many networking devices such as routers, switches, firewalls and intrusion detection/prevention systems. Unfortunately, they are susceptible to errors caused by environmental factors such as radiation. TCAM errors may have significant impact on search results. In fact, only one error in a TCAM can cause 100 % of search keys to have wrong lookup results. Therefore, TCAM error detection and correction schemes are needed to enhance the reliability of TCAM-based systems. All prior solutions require hardware changes to TCAM circuitry and therefore are difficult to deploy. In this paper, we propose TCAMChecker, the first software-based solution for TCAM error detection and correction. Given a search key, TCAMChecker probabilistically decides to verify the lookup result. If TCAMChecker decides to verify the lookup result then it performs two parallel lookups for the given search key. If the lookup results do not match then at least one error is detected and is corrected by using a backup error-free memory. Note that the probability of lookup verification can be tuned for tradeoff between performance and reliability. A higher probability of lookup verification provides a more reliable TCAM system at the cost of performance. Our proposed TCAMChecker can be easily deployed on existing TCAM-based networking devices to improve the system reliability.
    Journal of Network and Systems Management 09/2013; 21(3). · 0.43 Impact Factor
  • Source
    Faraz Ahmed, Rong Jin, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Online social networks are being increasingly used for analyzing various societal phenomena such as epidemiology, information dissemination, marketing and sentiment flow. Popular analysis techniques such as clustering and influential node analysis, require the computation of eigenvectors of the real graph's adjacency matrix. Recent de-anonymization attacks on Netflix and AOL datasets show that an open access to such graphs pose privacy threats. Among the various privacy preserving models, Differential privacy provides the strongest privacy guarantees. In this paper we propose a privacy preserving mechanism for publishing social network graph data, which satisfies differential privacy guarantees by utilizing a combination of theory of random matrix and that of differential privacy. The key idea is to project each row of an adjacency matrix to a low dimensional space using the random projection approach and then perturb the projected matrix with random noise. We show that as compared to existing approaches for differential private approximation of eigenvectors, our approach is computationally efficient, preserves the utility and satisfies differential privacy. We evaluate our approach on social network graphs of Facebook, Live Journal and Pokec. The results show that even for high values of noise variance sigma=1 the clustering quality given by normalized mutual information gain is as low as 0.74. For influential node discovery, the propose approach is able to correctly recover 80 of the most influential nodes. We also compare our results with an approach presented in [43], which directly perturbs the eigenvector of the original data by a Laplacian noise. The results show that this approach requires a large random perturbation in order to preserve the differential privacy, which leads to a poor estimation of eigenvectors for large social networks.
    07/2013;
  • Muhammad Shahzad, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Radio Frequency Identification (RFID) systems are widely used in various applications such as supply chain management, inventory control, and object tracking. Identifying RFID tags in a given tag population is the most fundamental operation in RFID systems. While the Tree Walking (TW) protocol has become the industrial standard for identifying RFID tags, little is known about the mathematical nature of this protocol and only some ad-hoc heuristics exist for optimizing it. In this paper, first, we analytically model the TW protocol, and then using that model, propose the Tree Hopping (TH) protocol that optimizes TW both theoretically and practically. The key novelty of TH is to formulate tag identification as an optimization problem and find the optimal solution that ensures the minimal average number of queries. With this solid theoretical underpinning, for different tag population sizes ranging from 100 to 100K tags, TH significantly outperforms the best prior tag identification protocols on the metrics of the total number of queries per tag, the total identification time per tag, and the average number of responses per tag by an average of 50%, 10%, and 30%, respectively, when tag IDs are uniformly distributed in the ID space, and of 26%, 37%, and 26%, respectively, when tag IDs are non-uniformly distributed.
    Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: During crowded events, cellular networks face voice and data traffic volumes that are often orders of magnitude higher than what they face during routine days. Despite the use of portable base stations for temporarily increasing communication capacity and free Wi-Fi access points for offloading Internet traffic from cellular base stations, crowded events still present significant challenges for cellular network operators looking to reduce dropped call events and improve Internet speeds. For effective cellular network design, management, and optimization, it is crucial to understand how cellular network performance degrades during crowded events, what causes this degradation, and how practical mitigation schemes would perform in real-life crowded events. This paper makes a first step towards this end by characterizing the operational performance of a tier-1 cellular network in the United States during two high-profile crowded events in 2012. We illustrate how the changes in population distribution, user behavior, and application workload during crowded events result in significant voice and data performance degradation, including more than two orders of magnitude increase in connection failures. Our findings suggest two mechanisms that can improve performance without resorting to costly infrastructure changes: radio resource allocation tuning and opportunistic connection sharing. Using trace-driven simulations, we show that more aggressive release of radio resources via 1-2 seconds shorter RRC timeouts as compared to routine days helps to achieve better tradeoff between wasted radio resources, energy consumption, and delay during crowded events; and opportunistic connection sharing can reduce connection failures by 95% when employed by a small number of devices in each cell sector.
    Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems; 06/2013
  • Source
    M. Zubair Shafiq, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Cascades represent an important phenomenon across various disciplines such as sociology, economy, psychology, political science, marketing, and epidemiology. An important property of cascades is their morphology, which encompasses the structure, shape, and size. However, cascade morphology has not been rigorously characterized and modeled in prior literature. In this paper, we propose a Multi-order Markov Model for the Morphology of Cascades ($M^4C$) that can represent and quantitatively characterize the morphology of cascades with arbitrary structures, shapes, and sizes. $M^4C$ can be used in a variety of applications to classify different types of cascades. To demonstrate this, we apply it to an unexplored but important problem in online social networks -- cascade size prediction. Our evaluations using real-world Twitter data show that $M^4C$ based cascade size prediction scheme outperforms the baseline scheme based on cascade graph features such as edge growth rate, degree distribution, clustering, and diameter. $M^4C$ based cascade size prediction scheme consistently achieves more than 90% classification accuracy under different experimental scenarios.
    02/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Regular expression (RegEx) matching is a core function of deep packet inspection in modern network devices. Previous TCAM-based RegEx matching algorithms a priori assume that a deterministic finite automaton (DFA) can be built for a given set of RegEx patterns. However, practical RegEx patterns contain complex terms like wildcard closure and repeat character, and it may be impossible to build a DFA with a reasonable number of states. This results in prior work to being infeasible in practice. Moreover, TCAM-based RegEx matching is required to scale to a large-scale set of RegEx patterns. In this paper, we propose a compressed finite automaton implementation called (CFA) for scalable TCAM-based RegEx matching. CFA is designed to reduce TCAM space by using three compression techniques: transition, character, and state compressions. Experiments on realistic RegEx pattern sets show CFA highly outperforms previous solutions in terms of TCAM space, matching throughput, and TCAM power consumption.
    Architectures for Networking and Communications Systems (ANCS), 2013 ACM/IEEE Symposium on; 01/2013
  • Eric Norige, Alex X. Liu, Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet classification is the key mechanism for enabling many networking and security services. Ternary Content Addressable Memory (TCAM) has been the industrial standard for implementing high-speed packet classification because of its constant classification time. However, TCAM chips have small capacity, high power consumption, high heat generation, and large area size. This paper focuses on the TCAM-based Classifier Compression problem: given a classifier C, we want to construct the smallest possible list of TCAM entries T that implement C. In this paper, we propose the Ternary Unification Framework (TUF) for this compression problem and three concrete compression algorithms within this framework. The framework allows us to find more optimization opportunities and design new TCAM-based classifier compression algorithms. Our experimental results show that the TUF can speed up the prior algorithm TCAM Razor by twenty times or more and leads to new algorithms that improve compression performance over prior algorithms by an average of 13.7% on our largest real life classifiers.
    Architectures for Networking and Communications Systems (ANCS), 2013 ACM/IEEE Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Regular expression (RegEx) matching has been widely used in various networking and security applications. Despite much effort on this important problem, it remains a fundamentally difficult problem. DFA-based solutions can achieve high throughput, but require too much memory to be executed in high speed SRAM. NFA-based solutions require small memory, but are too slow. In this paper, we propose RegexFilter, a prefiltering approach. The basic idea is to generate the RegEx print of RegEx set and use it to prefilter out most unmatched items. There are two key technical challenges: the generation of RegEx print and the matching process of RegEx print. The generation of RegEx is tricky as we need to tradeoff between two conflicting goals: filtering effectiveness, which means that we want the RegEx print to filter out as many unmatched items as possible, and matching speed, which means that we want the matching speed of the RegEx print as high as possible. To address the first challenge, we propose some measurement tools for RegEx complexity and filtering effectiveness, and use it to guide the generation of RegEx print. To address the second challenge, we propose a fast RegEx print matching solution using Ternary Content Addressable Memory. We implemented our approach and conducted experiments on real world data sets. Our experimental results show that RegexFilter can speedup the potential throughput of RegEx matching by 21.5 times and 20.3 times for RegEx sets of Snort and L7-Filter systems, at the cost of less than 0.2 Mb TCAM chip.
    Proceedings of the 10th international conference on Applied Cryptography and Network Security; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cookies are the primary means for web applications to authenticate HTTP requests and to maintain client states. Many web applications (such as those for electronic commerce) demand a secure cookie scheme. Such a scheme needs to provide the following four services: authentication, confidentiality, integrity, and anti-replay. Several secure cookie schemes have been proposed in previous literature; however, none of them are completely satisfactory. In this paper, we propose a secure cookie scheme that is effective, efficient, and easy to deploy. In terms of effectiveness, our scheme provides all of the above four security services. In terms of efficiency, our scheme does not involve any database lookup or public key cryptography. In terms of deployability, our scheme can be easily deployed on existing web services, and it does not require any change to the Internet cookie specification. We implemented our secure cookie scheme using PHP and conducted experiments. The experimental results show that our scheme is very efficient on both the client side and the server side.A notable adoption of our scheme in industry is that our cookie scheme has been used by Wordpress since version 2.4. Wordpress is a widely used open source content management system.
    Computer Networks. 04/2012; 56(6):1723–1730.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance. Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.
    01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Enterprise Privacy Authorization Language (EPAL) is a formal language for specifying fine-grained enterprise privacy policies. With the adoption of EPAL, especially in web applications, the performance of EPAL policy evaluation engines becomes a critical issue. In this paper, we propose Eengine, an engine for efficient EPAL policy evaluation. Eengine first converts all string values in an EPAL policy to numerical values. Second, it converts a numericalized EPAL policy specified as a list of rules following the first-match semantics to a tree structure for efficient processing of numericalized requests.
    The Journal of Supercomputing 01/2012; 59:1577-1595. · 0.92 Impact Factor
  • Source
    Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Firewalls are the cornerstones of the security infrastructure for most enterprises. They have been widely deployed for protecting private networks. The quality of the protection provided by a firewall directly depends on the quality of its policy (i.e., configuration). Due to the lack of tools for analyzing firewall policies, many firewalls used today have policy errors. A firewall policy error either creates security holes that will allow malicious traffic to sneak into a private network or blocks legitimate traffic and disrupts normal business processes, which in turn could lead to irreparable, if not tragic, consequences. A major cause of policy errors are policy changes. Firewall policies often need to be changed as networks evolve and new threats emerge. Users behind a firewall often request the firewall administrator to modify rules to allow or protect the operation of some services. In this article, we first present the theory and algorithms for firewall policy change-impact analysis. Our algorithms take as input a firewall policy and a proposed change, then output the accurate impact of the change. Thus, a firewall administrator can verify a proposed change before committing it. We implemented our firewall change-impact analysis algorithms, and tested them on both real-life and synthetic firewall policies. The experimental results show that our algorithms are effective in terms of ensuring firewall policy correctness and efficient in terms of computing the impact of policy changes. Thus, our tool can be practically used in the iterative process of firewall policy design and maintenance. Although the focus of this article is on firewalls, the change-impact analysis algorithms proposed in this article are not limited to firewalls. Rather, they can be applied to other rule-based systems, such as router access control lists (ACLs), as well.
    ACM Transactions on Internet Technology - TOIT. 01/2012;

Publication Stats

390 Citations
20.28 Total Impact Points

Institutions

  • 2007–2014
    • Michigan State University
      • Department of Computer Science and Engineering
      East Lansing, Michigan, United States
  • 2011
    • Nanjing University
      • Department of Computer Science & Technology
      Nan-ching, Jiangsu Sheng, China
  • 2009
    • North Carolina State University
      • Department of Computer Science
      Raleigh, NC, United States
  • 2004–2006
    • University of Texas at Austin
      • Department of Computer Science
      Texas City, TX, United States