Alex X. Liu

Nanjing University, Nan-ching, Jiangsu Sheng, China

Are you Alex X. Liu?

Claim your profile

Publications (95)53.43 Total impact

  • Faraz Ahmed · Jeffrey Erman · Zihui Ge · Alex X. Liu · Jia Wang · He Yan
    ACM SIGMETRICS Performance Evaluation Review 06/2015; 43(1):459-460. DOI:10.1145/2796314.2745892
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel approach that transforms the feature space into a new feature space such that a range query in the original space is mapped into an equivalent box query in the transformed space. Since box queries are axis aligned, there are several implementational advantages that can be exploited to speed up the retrieval of query results using R-Tree [9] like indexing schemes. For two dimensional data, the transformation is precise. For larger than two dimensions, we propose a space transformation scheme based on disjoint planer rotation and a new type of query, pruning box query, to get the precise results. Experimental results with large synthetic databases and some real databases show the effectiveness of the proposed transformation scheme. These experimental results have been corroborated with suitable mathematical models. In disjoint planer rotation, additional computation time is required to remove the false positives produced due to the bounding box not being precise. A second topological transformation scheme is presented based on optimized bounding box, which reduces the amount of false positives. The amount of this reduction is more with increasing dimensions. Optimized bounding box for higher dimensions is computed based on a novel approach of simultaneous local optimal projections.
    IEEE Transactions on Knowledge and Data Engineering 05/2015; 27(5):1438-1451. DOI:10.1109/TKDE.2014.2363658 · 2.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Unknown RFID tags appear when the unread tagged objects are moved in or tagged objects are misplaced. This paper studies the practically important problem of unknown tag detection while taking both time-efficiency and energy-efficiency of battery-powered active tags into consideration. We first propose a Sampling Bloom Filter which generalizes the standard Bloom Filter. Using the new filtering technique, we propose the Sampling Bloom Filter-based Unknown tag Detection Protocol (SBF-UDP), whose detection accuracy is tunable by the end users. We present the theoretical analysis to minimize the time and energy costs. SBF-UDP can be tuned to either the time-saving mode or the energy-saving mode, according to the specific requirements. Extensive simulations are conducted to evaluate the performance of the proposed protocol. The experimental results show that SBF-UDP considerably outperforms the previous related protocols in terms of both time-efficiency and energy-efficiency. For example, when 3 or more unknown tags appear in the RFID system with 30 000 known tags, the proposed SBF-UDP is able to successfully report the existence of unknown tags with a confidence more than 99%. While our protocol runs 9 times faster than the fastest existing scheme and reducing the energy consumption by more than 80%.
    IEEE Transactions on Communications 04/2015; 63(4):1432-1442. DOI:10.1109/TCOMM.2015.2402660 · 1.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This is the PPT slides. The source codes are available at
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of encrypted Instant Messaging (IM) channels between users is made difficult by the presence of variable and high levels of uncorrelated background traffic. In this paper, we propose a novel Cross-correlation Outlier Detector (CCOD) to identify communicating end-users in a large group of users. Our technique uses traffic flow traces between individual users and IM service provider's data center. We evaluate the CCOD on a data set of Yahoo! IM traffic traces with an average SNR of −6.11dB (data set includes ground truth). Results show that our technique provides 88% true positives (TP) rate, 3% false positives (FP) rate and 96% ROC area. Performance of the previous correlation-based schemes on the same data set was limited to 63% TP rate, 4% FP rate and 85% ROC area.
    49th Annual Conference on Information Sciences and Systems (CISS), Johns Hopkins University, Baltimore, MD, USA; 03/2015
  • Muhammad Shahzad · Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Radio frequency identification (RFID) systems have been widely deployed for various applications such as object tracking, 3-D positioning, supply chain management, inventory control, and access control. This paper concerns the fundamental problem of estimating RFID tag population size, which is needed in many applications such as tag identification, warehouse monitoring, and privacy-sensitive RFID systems. In this paper, we propose a new scheme for estimating tag population size called Average Run-based Tag estimation (ART). The technique is based on the average run length of ones in the bit string received using the standardized framed slotted Aloha protocol. ART is significantly faster than prior schemes. For example, given a required confidence interval of 0.1% and a required reliability of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (UPE and EZB) for any tag population size. Furthermore, ART's estimation time is provably independent of the tag population sizes. ART works with multiple readers with overlapping regions and can estimate sizes of arbitrarily large tag populations. ART is easy to deploy because it neither requires modification to tags nor to the communication protocol between tags and readers. ART only needs to be implemented on readers as a software module.
    IEEE/ACM Transactions on Networking 02/2015; 23(1):241-254. DOI:10.1109/TNET.2014.2298039 · 1.81 Impact Factor
  • Wei Wang · Alex X. Liu · Muhammad Shahzad · Kang Ling · Sanglu Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: Some pioneer WiFi signal based human activity recognition systems have been proposed. Their key limitation lies in the lack of a model that can quantitatively correlate CSI dynamics and human activities. In this paper, we propose CARM, a CSI based human Activity Recognition and Monitoring system. CARM has two theoretical underpinnings: a CSI-speed model, which quantifies the correlation between CSI value dynamics and human movement speeds, and a CSI-activity model, which quantifies the correlation between the movement speeds of different human body parts and a specific human activity. By these two models, we quantitatively build the correlation between CSI value dynamics and a specific human activity. CARM uses this correlation as the profiling mechanism and recognizes a given activity by matching it to the best-fit profile. We implemented CARM using commercial WiFi devices and evaluated it in several different environments. Our results show that CARM achieves an average accuracy of greater than 96%.
    ACM MobiCom; 01/2015
  • Kamran Ali · Alex X. Liu · Wei Wang · Muhammad Shahzad
    [Show abstract] [Hide abstract]
    ABSTRACT: Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%.
    ACM MobiCom; 01/2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The breach of privacy in encrypted instant mes-senger (IM) service is a serious threat to user anonymity. Performance of previous de-anonymization strategies was limited to 65%. We perform network de-anonymization by taking advan-tage of the cause-effect relationship between sent and received packet streams and demonstrate this approach on a data set of Yahoo! IM service traffic traces. An investigation of various measures of causality shows that IM networks can be breached with a hit rate of 99%. A KCI Causality based approach alone can provide a true positive rate of about 97%. Individual performances of Granger, Zhang and IGCI causality are limited owing to the very low SNR of packet traces and variable network delays.
    IEEE Global Communications Conference (GLOBECOM), Austin, TX, USA; 12/2014
  • Jignesh Patel · Alex X. Liu · Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding deterministic finite state automata (DFA) for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as extended finite automata (XFA) and delayed input DFA (D 2FA). The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive nondeterministic finite automata (NFA) to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a “Minimize then Union” framework for constructing compact alternative automata focusing on the D 2FA. We show that we can construct an almost optimal final D 2FA with small intermediate parsers. The key to our approach is a space- and time-efficient routine for merging two compact D 2FA into a compact D 2FA. In our experiments, our algorithm runs on average 155 times faster and uses 1500 times less memory than previous algorithms. For example, we are able to construct a D 2FA with over 80 000 000 states using only 1 GB of main memory in only 77 min.
    IEEE/ACM Transactions on Networking 12/2014; 22(6):1701-1714. DOI:10.1109/TNET.2014.2309014 · 1.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Virtual network embedding, which means mapping virtual networks requested by users to a shared substrate network maintained by an Internet service provider, is a key function that network virtualization needs to provide. Prior work on virtual network embedding has primarily focused on maximizing the revenue of the Internet service provider and did not consider the energy cost in accommodating such requests. As energy cost is more than half of the operating cost of the substrate networks, while trying to accommodate more virtual network requests, minimizing energy cost is critical for infrastructure providers. In this paper, we make the first effort toward energy-aware virtual network embedding. We first propose an energy cost model and formulate the energy-aware virtual network embedding problem as an integer linear programming problem. We then propose two efficient energy-aware virtual network embedding algorithms: a heuristic-based algorithm and a particle-swarm-optimization-technique-based algorithm. We implemented our algorithms in C++ and performed side-by-side comparison with prior algorithms. The simulation results show that our algorithms significantly reduce the energy cost by up to 50% over the existing algorithm for accommodating the same sequence of virtual network requests.
    IEEE/ACM Transactions on Networking 10/2014; 22(5):1607-1620. DOI:10.1109/TNET.2013.2286156 · 1.81 Impact Factor
  • Alex X. Liu · Chad R. Meiners · Eric Norige · Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose FlowSifter, a framework for automated online application protocol field extraction. FlowSifter is based on a new grammar model called Counting Regular Grammars (CRG) and a corresponding automata model called Counting Automata (CA). The CRG and CA models add counters with update functions and transition guards to regular grammars and finite state automata. These additions give CRGs and CAs the ability to parse and extract fields from context sensitive application protocols. These additions also facilitate fast and stackless approximate parsing of recursive structures. These new grammar models enable FlowSifter to generate optimized Layer 7 field extractors from simple extraction specifications. We compare FlowSifter against both BinPAC and UltraPAC, which represent the state-of-the-art field extractors. Our experiments show that when compared to BinPAC parsers, FlowSifter runs more than 21 times faster and uses 49 times less memory. When compared to UltraPAC parsers, FlowSifter extractors run 12 times faster and use 24 times less memory.
    IEEE Journal on Selected Areas in Communications 10/2014; 32(10):1864-1880. DOI:10.1109/JSAC.2014.2358817 · 3.45 Impact Factor
  • Proceedings of the VLDB Endowment 10/2014; 7(14):1953-1964. DOI:10.14778/2733085.2733100
  • Tingwen Liu · Alex X. Liu · Jinqiao Shi · Yong Sun · Li Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: Regular Expression (RegEx) matching, as a core operation in many network and security applications, is typically performed on Deterministic Finite Automata (DFA) to process packets at wire speed; however, DFA size is often exponential in the number of RegExes. RegEx grouping is the practical way to address DFA state explosion. Prior RegEx grouping algorithms are extremely slow and memory intensive. In this paper, we first propose DFAestimator, an algorithm that can quickly estimate DFA size for a given RegEx set without building the actual DFA. Second, we propose RegexGrouper, a RegEx grouping algorithm based on DFA size estimation. In terms of speed and memory consumption, our work is orders of magnitude more efficient than prior art because DFA size estimation is much faster and memory efficient than DFA construction. In terms of the resulting size sum of DFAs, our work is significantly more effective than prior art because we use a much finer grained quantification of the degree of interaction between two RegExes. For example, to divide the RegEx set of the L7-filter system into 7 groups, prior art uses 279.3 minutes and the resulting 7 DFAs have a total of 29047 states, whereas RegexGrouper uses 3.2 minutes and the resulting 7 DFAs have a total of 15578 states.
    IEEE Journal on Selected Areas in Communications 10/2014; 32(10):1797-1809. DOI:10.1109/JSAC.2014.2358839 · 3.45 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Forwarding Information Base (FIB) of backbone routers has been rapidly growing in size. An ideal IP lookup algorithm should achieve constant, yet small, IP lookup time and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. In this paper, we first propose SAIL, a Splitting Approach to IP Lookup. One splitting is along the dimension of the lookup process, namely finding the prefix length and finding the next hop, and another splitting is along the dimension of prefix length, namely IP lookup on prefixes of length less than or equal to 24 and IP lookup on prefixes of length longer than 24. Second, we propose a suite of algorithms for IP lookup based on our SAIL framework. Third, we implemented our algorithms on four platforms: CPU, FPGA, GPU, and many-core. We conducted extensive experiments to evaluate our algorithms using real FIBs and real traffic from a major ISP in China. Experimental results show that our SAIL algorithms are several times or even two orders of magnitude faster than well known IP lookup algorithms.
    ACM SIGCOMM Computer Communication Review 08/2014; 44(4). DOI:10.1145/2619239.2626297 · 1.12 Impact Factor
  • Yiwen Wang · Sen Su · Alex X. Liu · Zhongbao Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Bulk data migration between datacenters is often a critical step in deploying new services, improving reliability under failures, or implementing various cost reduction strategies for cloud companies. These bulk amounts of transferring data consume massive bandwidth, and further incur severe network congestion. Leveraging the temporal and spacial characteristics of inter-datacenter bulk data traffic, in this paper, we investigate the Multiple Bulk Data Transfers Scheduling (MBDTS) problem to reduce the network congestion. Temporally, we apply the store-and-forward transfer mode to reduce the peak traffic load on the link. Spatially, we propose to lexicographically minimize the congestion of all links among datacenters. To solve the MBDTS problem, we first model it as an optimization problem, and then propose the novel Elastic Time-Expanded Network technique to represent the time-varying network status as a static one with a reasonable expansion cost. Using this transformation, we reformulate the problem as a Linear Programming (LP) model, and obtain the optimal solution through iteratively solving the LP model. We have conducted extensive simulations on a real network topology. The results prove that our algorithm can significantly reduce the network congestion as well as balance the entire network traffic with practical computational costs.
    Computer Networks 08/2014; 68. DOI:10.1016/j.comnet.2014.02.017 · 1.26 Impact Factor
  • Conference Paper: Noise can help
    Muhammad Shahzad · Alex X. Liu
    The 2014 ACM international conference; 06/2014
  • Muhammad Shahzad · Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this paper, we propose COLATE, the first per-flow latency measurement scheme that requires no probe packets and time stamping. Given a set of observation points, COLATE records packet timing information at each point so that later for any two points, it can accurately estimate the average and standard deviation of the latencies experienced by the packets of any flow in passing the two points. The key idea is that when recording packet timing information, COLATE purposely allows noise to be introduced for minimizing storage space, and when querying the latency of a target flow, COLATE uses statistical techniques to denoise and obtain an accurate latency estimate. COLATE is designed to be efficiently implementable on network middleboxes. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. In terms of storage space, COLATE uses less than 0.1 bit per packet, which means that, on a backbone link with about half a million packets per second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace. Results show that COLATE always achieves the required reliability for any given confidence interval.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1). DOI:10.1145/2637364.2591988
  • [Show abstract] [Hide abstract]
    ABSTRACT: Content Delivery Networks (CDNs) differ from other caching systems in terms of both workload characteristics and performance metrics. However, there has been little prior work on large-scale measurement and characterization of content requests and caching performance in CDNs. For workload characteristics, CDNs deal with extremely large content volume, high content diversity, and strong temporal dynamics. For performance metrics, other than hit ratio, CDNs also need to minimize the disk operations and the volume of traffic from origin servers. In this paper, we conduct a large-scale measurement study to characterize the content request patterns using real-world data from a commercial CDN provider.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1). DOI:10.1145/2591971.2592021

Publication Stats

701 Citations
53.43 Total Impact Points


  • 2011–2014
    • Nanjing University
      • Department of Computer Science & Technology
      Nan-ching, Jiangsu Sheng, China
  • 2007–2014
    • Michigan State University
      • Department of Computer Science and Engineering
      East Lansing, Michigan, United States
  • 2004–2008
    • University of Texas at Austin
      • Department of Computer Science
      Austin, Texas, United States