Alex X. Liu

Michigan State University, Ист-Лансинг, Michigan, United States

Are you Alex X. Liu?

Claim your profile

Publications (119)70.87 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Set queries are fundamental operations in computer systems and applications. This paper addresses the fundamental problem of designing a probabilistic data structure that can quickly process set queries using a small amount of memory. We propose a Shifting Bloom Filter (ShBF) framework for representing and querying sets. We demonstrate the effectiveness of ShBF using three types of popular set queries: membership, association, and multiplicity queries. The key novelty of ShBF is on encoding the auxiliary information of a set element in a location offset. In contrast, prior BF based set data structures allocate additional memory to store auxiliary information. To evaluate ShBF in comparison with prior art, we conducted experiments using real-world network traces. Results show that ShBF significantly advances the state-of-the-art on all three types of set queries.
    Full-text · Conference Paper · Sep 2016
  • [Show abstract] [Hide abstract]
    ABSTRACT: Set queries are fundamental operations in computer systems and applications.This paper addresses the fundamental problem of designing a probabilistic data structure that can quickly process set queries using a small amount of memory. We propose a Shifting Bloom Filter (ShBF) framework for representing and querying sets. We demonstrate the effectiveness of ShBF using three types of popular set queries: membership, association, and multiplicity queries. The key novelty of ShBF is on encoding the auxiliary information of a set element in a location offset. In contrast, prior BF based set data structures allocate additional memory to store auxiliary information. To evaluate ShBF in comparison with prior art, we conducted experiments using real-world network traces. Results show that ShBF significantly advances the state-of-the-art on all three types of set queries.
    No preview · Article · Oct 2015 · Proceedings of the VLDB Endowment
  • [Show abstract] [Hide abstract]
    ABSTRACT: Privacy has been the key road block to cloud computing as clouds may not be fully trusted. This paper is concerned with the problem of privacy-preserving range query processing on clouds. Prior schemes are weak in privacy protection as they cannot achieve index indistinguishability, and therefore allow the cloud to statistically estimate the values of data and queries using domain knowledge and history query results. In this paper, we propose the first range query processing scheme that achieves index indistinguishability under the indistinguishability against chosen keyword attack (IND-CKA). Our key idea is to organize indexing elements in a complete binary tree called PBtree, which satisfies structure indistinguishability (i.e., two sets of data items have the same PBtree structure if and only if the two sets have the same number of data items) and node indistinguishability (i.e., the values of PBtree nodes are completely random and have no statistical meaning). We prove that our scheme is secure under the widely adopted IND-CKA security model. We propose two algorithms, namely PBtree traversal width minimization and PBtree traversal depth minimization, to improve query processing efficiency. We prove that the worst-case complexity of our query processing algorithm using PBtree is O(|R|logn) , where n is the total number of data items and R is the set of data items in the query result. We implemented and evaluated our scheme on a real-world dataset with 5 million items. For example, for a query whose results contain 10 data items, it takes only 0.17 ms.
    No preview · Article · Aug 2015 · IEEE/ACM Transactions on Networking
  • Faraz Ahmed · Jeffrey Erman · Zihui Ge · Alex X. Liu · Jia Wang · He Yan

    No preview · Article · Jun 2015 · ACM SIGMETRICS Performance Evaluation Review
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel approach that transforms the feature space into a new feature space such that a range query in the original space is mapped into an equivalent box query in the transformed space. Since box queries are axis aligned, there are several implementational advantages that can be exploited to speed up the retrieval of query results using R-Tree [9] like indexing schemes. For two dimensional data, the transformation is precise. For larger than two dimensions, we propose a space transformation scheme based on disjoint planer rotation and a new type of query, pruning box query, to get the precise results. Experimental results with large synthetic databases and some real databases show the effectiveness of the proposed transformation scheme. These experimental results have been corroborated with suitable mathematical models. In disjoint planer rotation, additional computation time is required to remove the false positives produced due to the bounding box not being precise. A second topological transformation scheme is presented based on optimized bounding box, which reduces the amount of false positives. The amount of this reduction is more with increasing dimensions. Optimized bounding box for higher dimensions is computed based on a novel approach of simultaneous local optimal projections.
    No preview · Article · May 2015 · IEEE Transactions on Knowledge and Data Engineering
  • Source
    Xiulong Liu · Keqiu Li · Jie Wu · Alex X Liu · Heng Qi · Xin Xie
    [Show abstract] [Hide abstract]
    ABSTRACT: The widely used RFID tags impose serious privacy concerns as a tag responds to queries from readers no matter they are authorized or not. The common solution is to use a commercially available blocker tag which behaves as if a set of tags with known blocking IDs are present. The use of blocker tags makes RFID estimation much more challenging as some genuine tag IDs are covered by the blocker tag and some are not. In this paper, we propose REB, the first RFID estimation scheme with the presence of blocker tags. REB uses the framed slotted Aloha protocol specified in the C1G2 standard. For each round of the Aloha protocol, REB first executes the protocol on the genuine tags and the blocker tag, and then virtually executes the protocol on the known blocking IDs using the same Aloha protocol parameters. The basic idea of REB is to conduct statistically inference from the two sets of responses and estimate the number of genuine tags. We conduct extensive simulations to evaluate the performance of REB, in terms of time-efficiency and estimation reliability. The experimental results reveal that our REB scheme runs tens of times faster than the fastest identification protocol with the same accuracy requirement.
    Full-text · Conference Paper · May 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Unknown RFID tags appear when the unread tagged objects are moved in or tagged objects are misplaced. This paper studies the practically important problem of unknown tag detection while taking both time-efficiency and energy-efficiency of battery-powered active tags into consideration. We first propose a Sampling Bloom Filter which generalizes the standard Bloom Filter. Using the new filtering technique, we propose the Sampling Bloom Filter-based Unknown tag Detection Protocol (SBF-UDP), whose detection accuracy is tunable by the end users. We present the theoretical analysis to minimize the time and energy costs. SBF-UDP can be tuned to either the time-saving mode or the energy-saving mode, according to the specific requirements. Extensive simulations are conducted to evaluate the performance of the proposed protocol. The experimental results show that SBF-UDP considerably outperforms the previous related protocols in terms of both time-efficiency and energy-efficiency. For example, when 3 or more unknown tags appear in the RFID system with 30 000 known tags, the proposed SBF-UDP is able to successfully report the existence of unknown tags with a confidence more than 99%. While our protocol runs 9 times faster than the fastest existing scheme and reducing the energy consumption by more than 80%.
    Full-text · Article · Apr 2015 · IEEE Transactions on Communications
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This is the PPT slides. The source codes are available at http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource
    Full-text · Dataset · Mar 2015
  • Source

    Full-text · Dataset · Mar 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of encrypted Instant Messaging (IM) channels between users is made difficult by the presence of variable and high levels of uncorrelated background traffic. In this paper, we propose a novel Cross-correlation Outlier Detector (CCOD) to identify communicating end-users in a large group of users. Our technique uses traffic flow traces between individual users and IM service provider's data center. We evaluate the CCOD on a data set of Yahoo! IM traffic traces with an average SNR of −6.11dB (data set includes ground truth). Results show that our technique provides 88% true positives (TP) rate, 3% false positives (FP) rate and 96% ROC area. Performance of the previous correlation-based schemes on the same data set was limited to 63% TP rate, 4% FP rate and 85% ROC area.
    Full-text · Conference Paper · Mar 2015
  • Muhammad Shahzad · Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Radio frequency identification (RFID) systems have been widely deployed for various applications such as object tracking, 3-D positioning, supply chain management, inventory control, and access control. This paper concerns the fundamental problem of estimating RFID tag population size, which is needed in many applications such as tag identification, warehouse monitoring, and privacy-sensitive RFID systems. In this paper, we propose a new scheme for estimating tag population size called Average Run-based Tag estimation (ART). The technique is based on the average run length of ones in the bit string received using the standardized framed slotted Aloha protocol. ART is significantly faster than prior schemes. For example, given a required confidence interval of 0.1% and a required reliability of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (UPE and EZB) for any tag population size. Furthermore, ART's estimation time is provably independent of the tag population sizes. ART works with multiple readers with overlapping regions and can estimate sizes of arbitrarily large tag populations. ART is easy to deploy because it neither requires modification to tags nor to the communication protocol between tags and readers. ART only needs to be implemented on readers as a software module.
    No preview · Article · Feb 2015 · IEEE/ACM Transactions on Networking
  • Wei Wang · Alex X. Liu · Muhammad Shahzad · Kang Ling · Sanglu Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: Some pioneer WiFi signal based human activity recognition systems have been proposed. Their key limitation lies in the lack of a model that can quantitatively correlate CSI dynamics and human activities. In this paper, we propose CARM, a CSI based human Activity Recognition and Monitoring system. CARM has two theoretical underpinnings: a CSI-speed model, which quantifies the correlation between CSI value dynamics and human movement speeds, and a CSI-activity model, which quantifies the correlation between the movement speeds of different human body parts and a specific human activity. By these two models, we quantitatively build the correlation between CSI value dynamics and a specific human activity. CARM uses this correlation as the profiling mechanism and recognizes a given activity by matching it to the best-fit profile. We implemented CARM using commercial WiFi devices and evaluated it in several different environments. Our results show that CARM achieves an average accuracy of greater than 96%.
    No preview · Conference Paper · Jan 2015
  • Kamran Ali · Alex X. Liu · Wei Wang · Muhammad Shahzad
    [Show abstract] [Hide abstract]
    ABSTRACT: Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%.
    No preview · Conference Paper · Jan 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The breach of privacy in encrypted instant mes-senger (IM) service is a serious threat to user anonymity. Performance of previous de-anonymization strategies was limited to 65%. We perform network de-anonymization by taking advan-tage of the cause-effect relationship between sent and received packet streams and demonstrate this approach on a data set of Yahoo! IM service traffic traces. An investigation of various measures of causality shows that IM networks can be breached with a hit rate of 99%. A KCI Causality based approach alone can provide a true positive rate of about 97%. Individual performances of Granger, Zhang and IGCI causality are limited owing to the very low SNR of packet traces and variable network delays.
    Full-text · Conference Paper · Dec 2014
  • Jignesh Patel · Alex X. Liu · Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding deterministic finite state automata (DFA) for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as extended finite automata (XFA) and delayed input DFA (D 2FA). The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive nondeterministic finite automata (NFA) to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a “Minimize then Union” framework for constructing compact alternative automata focusing on the D 2FA. We show that we can construct an almost optimal final D 2FA with small intermediate parsers. The key to our approach is a space- and time-efficient routine for merging two compact D 2FA into a compact D 2FA. In our experiments, our algorithm runs on average 155 times faster and uses 1500 times less memory than previous algorithms. For example, we are able to construct a D 2FA with over 80 000 000 states using only 1 GB of main memory in only 77 min.
    No preview · Article · Dec 2014 · IEEE/ACM Transactions on Networking
  • [Show abstract] [Hide abstract]
    ABSTRACT: Privacy has been the key road block to cloud computing as clouds may not be fully trusted. This paper concerns the problem of privacy preserving range query processing on clouds. Prior schemes are weak in privacy protection as they cannot achieve index indistinguishability, and therefore allow the cloud to statistically estimate the values of data and queries using domain knowledge and history query results. In this paper, we propose the first range query processing scheme that achieves index indistinguishability under the indistinguishability against chosen keyword attack (INDCKA). Our key idea is to organize indexing elements in a complete binary tree called PBtree, which satisfies structure indistinguishability (i.e., two sets of data items have the same PBtree structure if and only if the two sets have the same number of data items) and node indistinguishability (i.e., the values of PBtree nodes are completely random and have no statistical meaning). We prove that our scheme is secure under the widely adopted IND-CKA security model. We propose two algorithms, namely PBtree traversal width minimization and PBtree traversal depth minimization, to improve query processing efficiency. We prove that the worse case complexity of our query processing algorithm using PBtree is O(|R| log n), where n is the total number of data items and R is the set of data items in the query result. We implemented and evaluated our scheme on a real world data set with 5 million items. For example, for a query whose results contain ten data items, it takes only 0.17 milliseconds.
    No preview · Article · Oct 2014 · Proceedings of the VLDB Endowment
  • [Show abstract] [Hide abstract]
    ABSTRACT: Virtual network embedding, which means mapping virtual networks requested by users to a shared substrate network maintained by an Internet service provider, is a key function that network virtualization needs to provide. Prior work on virtual network embedding has primarily focused on maximizing the revenue of the Internet service provider and did not consider the energy cost in accommodating such requests. As energy cost is more than half of the operating cost of the substrate networks, while trying to accommodate more virtual network requests, minimizing energy cost is critical for infrastructure providers. In this paper, we make the first effort toward energy-aware virtual network embedding. We first propose an energy cost model and formulate the energy-aware virtual network embedding problem as an integer linear programming problem. We then propose two efficient energy-aware virtual network embedding algorithms: a heuristic-based algorithm and a particle-swarm-optimization-technique-based algorithm. We implemented our algorithms in C++ and performed side-by-side comparison with prior algorithms. The simulation results show that our algorithms significantly reduce the energy cost by up to 50% over the existing algorithm for accommodating the same sequence of virtual network requests.
    No preview · Article · Oct 2014 · IEEE/ACM Transactions on Networking
  • Alex X. Liu · Chad R. Meiners · Eric Norige · Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose FlowSifter, a framework for automated online application protocol field extraction. FlowSifter is based on a new grammar model called Counting Regular Grammars (CRG) and a corresponding automata model called Counting Automata (CA). The CRG and CA models add counters with update functions and transition guards to regular grammars and finite state automata. These additions give CRGs and CAs the ability to parse and extract fields from context sensitive application protocols. These additions also facilitate fast and stackless approximate parsing of recursive structures. These new grammar models enable FlowSifter to generate optimized Layer 7 field extractors from simple extraction specifications. We compare FlowSifter against both BinPAC and UltraPAC, which represent the state-of-the-art field extractors. Our experiments show that when compared to BinPAC parsers, FlowSifter runs more than 21 times faster and uses 49 times less memory. When compared to UltraPAC parsers, FlowSifter extractors run 12 times faster and use 24 times less memory.
    No preview · Article · Oct 2014 · IEEE Journal on Selected Areas in Communications
  • Tingwen Liu · Alex X. Liu · Jinqiao Shi · Yong Sun · Li Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: Regular Expression (RegEx) matching, as a core operation in many network and security applications, is typically performed on Deterministic Finite Automata (DFA) to process packets at wire speed; however, DFA size is often exponential in the number of RegExes. RegEx grouping is the practical way to address DFA state explosion. Prior RegEx grouping algorithms are extremely slow and memory intensive. In this paper, we first propose DFAestimator, an algorithm that can quickly estimate DFA size for a given RegEx set without building the actual DFA. Second, we propose RegexGrouper, a RegEx grouping algorithm based on DFA size estimation. In terms of speed and memory consumption, our work is orders of magnitude more efficient than prior art because DFA size estimation is much faster and memory efficient than DFA construction. In terms of the resulting size sum of DFAs, our work is significantly more effective than prior art because we use a much finer grained quantification of the degree of interaction between two RegExes. For example, to divide the RegEx set of the L7-filter system into 7 groups, prior art uses 279.3 minutes and the resulting 7 DFAs have a total of 29047 states, whereas RegexGrouper uses 3.2 minutes and the resulting 7 DFAs have a total of 15578 states.
    No preview · Article · Oct 2014 · IEEE Journal on Selected Areas in Communications
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these protocols are, in fact, complementary. Based on this insight, we design PASE, a transport framework that synthesizes existing transport strategies, namely, self-adjusting endpoints (used in TCP style protocols), in-network prioritization (used in pFabric), and arbitration (used in PDQ).
    No preview · Conference Paper · Aug 2014

Publication Stats

1k Citations
70.87 Total Impact Points

Institutions

  • 2007-2015
    • Michigan State University
      • Department of Computer Science and Engineering
      Ист-Лансинг, Michigan, United States
  • 2011-2014
    • Nanjing University
      • Department of Computer Science & Technology
      Nan-ching, Jiangsu Sheng, China
  • 2004-2008
    • University of Texas at Austin
      • Department of Computer Science
      Austin, Texas, United States