Alex X. Liu

Nanjing University, Nan-ching, Jiangsu Sheng, China

Are you Alex X. Liu?

Claim your profile

Publications (89)49.74 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel approach that transforms the feature space into a new feature space such that a range query in the original space is mapped into an equivalent box query in the transformed space. Since box queries are axis aligned, there are several implementational advantages that can be exploited to speed up the retrieval of query results using R-Tree [9] like indexing schemes. For two dimensional data, the transformation is precise. For larger than two dimensions, we propose a space transformation scheme based on disjoint planer rotation and a new type of query, pruning box query, to get the precise results. Experimental results with large synthetic databases and some real databases show the effectiveness of the proposed transformation scheme. These experimental results have been corroborated with suitable mathematical models. In disjoint planer rotation, additional computation time is required to remove the false positives produced due to the bounding box not being precise. A second topological transformation scheme is presented based on optimized bounding box, which reduces the amount of false positives. The amount of this reduction is more with increasing dimensions. Optimized bounding box for higher dimensions is computed based on a novel approach of simultaneous local optimal projections.
    IEEE Transactions on Knowledge and Data Engineering 05/2015; 27(5):1438-1451. DOI:10.1109/TKDE.2014.2363658 · 1.82 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Unknown RFID tags appear when the unread tagged objects are moved in or tagged objects are misplaced. This paper studies the practically important problem of unknown tag detection while taking both time-efficiency and energy-efficiency of battery-powered active tags into consideration. We first propose a Sampling Bloom Filter which generalizes the standard Bloom Filter. Using the new filtering technique, we propose the Sampling Bloom Filter-based Unknown tag Detection Protocol (SBF-UDP), whose detection accuracy is tunable by the end users. We present the theoretical analysis to minimize the time and energy costs. SBF-UDP can be tuned to either the time-saving mode or the energy-saving mode, according to the specific requirements. Extensive simulations are conducted to evaluate the performance of the proposed protocol. The experimental results show that SBF-UDP considerably outperforms the previous related protocols in terms of both time-efficiency and energy-efficiency. For example, when 3 or more unknown tags appear in the RFID system with 30 000 known tags, the proposed SBF-UDP is able to successfully report the existence of unknown tags with a confidence more than 99%. While our protocol runs 9 times faster than the fastest existing scheme and reducing the energy consumption by more than 80%.
    IEEE Transactions on Communications 04/2015; 63(4):1432-1442. DOI:10.1109/TCOMM.2015.2402660 · 1.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This is the PPT slides. The source codes are available at http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of encrypted Instant Messaging (IM) channels between users is made difficult by the presence of variable and high levels of uncorrelated background traffic. In this paper, we propose a novel Cross-correlation Outlier Detector (CCOD) to identify communicating end-users in a large group of users. Our technique uses traffic flow traces between individual users and IM service provider's data center. We evaluate the CCOD on a data set of Yahoo! IM traffic traces with an average SNR of −6.11dB (data set includes ground truth). Results show that our technique provides 88% true positives (TP) rate, 3% false positives (FP) rate and 96% ROC area. Performance of the previous correlation-based schemes on the same data set was limited to 63% TP rate, 4% FP rate and 85% ROC area.
    49th Annual Conference on Information Sciences and Systems (CISS), Johns Hopkins University, Baltimore, MD, USA; 03/2015
  • Muhammad Shahzad, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Radio frequency identification (RFID) systems have been widely deployed for various applications such as object tracking, 3-D positioning, supply chain management, inventory control, and access control. This paper concerns the fundamental problem of estimating RFID tag population size, which is needed in many applications such as tag identification, warehouse monitoring, and privacy-sensitive RFID systems. In this paper, we propose a new scheme for estimating tag population size called Average Run-based Tag estimation (ART). The technique is based on the average run length of ones in the bit string received using the standardized framed slotted Aloha protocol. ART is significantly faster than prior schemes. For example, given a required confidence interval of 0.1% and a required reliability of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (UPE and EZB) for any tag population size. Furthermore, ART's estimation time is provably independent of the tag population sizes. ART works with multiple readers with overlapping regions and can estimate sizes of arbitrarily large tag populations. ART is easy to deploy because it neither requires modification to tags nor to the communication protocol between tags and readers. ART only needs to be implemented on readers as a software module.
    IEEE/ACM Transactions on Networking 02/2015; 23(1):241-254. DOI:10.1109/TNET.2014.2298039 · 1.99 Impact Factor
  • Jignesh Patel, Alex X. Liu, Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding deterministic finite state automata (DFA) for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as extended finite automata (XFA) and delayed input DFA (D 2FA). The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive nondeterministic finite automata (NFA) to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a “Minimize then Union” framework for constructing compact alternative automata focusing on the D 2FA. We show that we can construct an almost optimal final D 2FA with small intermediate parsers. The key to our approach is a space- and time-efficient routine for merging two compact D 2FA into a compact D 2FA. In our experiments, our algorithm runs on average 155 times faster and uses 1500 times less memory than previous algorithms. For example, we are able to construct a D 2FA with over 80 000 000 states using only 1 GB of main memory in only 77 min.
    IEEE/ACM Transactions on Networking 12/2014; 22(6):1701-1714. DOI:10.1109/TNET.2014.2309014 · 1.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The breach of privacy in encrypted instant mes-senger (IM) service is a serious threat to user anonymity. Performance of previous de-anonymization strategies was limited to 65%. We perform network de-anonymization by taking advan-tage of the cause-effect relationship between sent and received packet streams and demonstrate this approach on a data set of Yahoo! IM service traffic traces. An investigation of various measures of causality shows that IM networks can be breached with a hit rate of 99%. A KCI Causality based approach alone can provide a true positive rate of about 97%. Individual performances of Granger, Zhang and IGCI causality are limited owing to the very low SNR of packet traces and variable network delays.
    IEEE Global Communications Conference (GLOBECOM), Austin, TX, USA; 12/2014
  • 10/2014; 7(14):1953-1964. DOI:10.14778/2733085.2733100
  • [Show abstract] [Hide abstract]
    ABSTRACT: Virtual network embedding, which means mapping virtual networks requested by users to a shared substrate network maintained by an Internet service provider, is a key function that network virtualization needs to provide. Prior work on virtual network embedding has primarily focused on maximizing the revenue of the Internet service provider and did not consider the energy cost in accommodating such requests. As energy cost is more than half of the operating cost of the substrate networks, while trying to accommodate more virtual network requests, minimizing energy cost is critical for infrastructure providers. In this paper, we make the first effort toward energy-aware virtual network embedding. We first propose an energy cost model and formulate the energy-aware virtual network embedding problem as an integer linear programming problem. We then propose two efficient energy-aware virtual network embedding algorithms: a heuristic-based algorithm and a particle-swarm-optimization-technique-based algorithm. We implemented our algorithms in C++ and performed side-by-side comparison with prior algorithms. The simulation results show that our algorithms significantly reduce the energy cost by up to 50% over the existing algorithm for accommodating the same sequence of virtual network requests.
    IEEE/ACM Transactions on Networking 10/2014; 22(5):1607-1620. DOI:10.1109/TNET.2013.2286156 · 1.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Forwarding Information Base (FIB) of backbone routers has been rapidly growing in size. An ideal IP lookup algorithm should achieve constant, yet small, IP lookup time and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. In this paper, we first propose SAIL, a Splitting Approach to IP Lookup. One splitting is along the dimension of the lookup process, namely finding the prefix length and finding the next hop, and another splitting is along the dimension of prefix length, namely IP lookup on prefixes of length less than or equal to 24 and IP lookup on prefixes of length longer than 24. Second, we propose a suite of algorithms for IP lookup based on our SAIL framework. Third, we implemented our algorithms on four platforms: CPU, FPGA, GPU, and many-core. We conducted extensive experiments to evaluate our algorithms using real FIBs and real traffic from a major ISP in China. Experimental results show that our SAIL algorithms are several times or even two orders of magnitude faster than well known IP lookup algorithms.
    ACM SIGCOMM Computer Communication Review 08/2014; 44(4). DOI:10.1145/2619239.2626297 · 1.10 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bulk data migration between datacenters is often a critical step in deploying new services, improving reliability under failures, or implementing various cost reduction strategies for cloud companies. These bulk amounts of transferring data consume massive bandwidth, and further incur severe network congestion. Leveraging the temporal and spacial characteristics of inter-datacenter bulk data traffic, in this paper, we investigate the Multiple Bulk Data Transfers Scheduling (MBDTS) problem to reduce the network congestion. Temporally, we apply the store-and-forward transfer mode to reduce the peak traffic load on the link. Spatially, we propose to lexicographically minimize the congestion of all links among datacenters. To solve the MBDTS problem, we first model it as an optimization problem, and then propose the novel Elastic Time-Expanded Network technique to represent the time-varying network status as a static one with a reasonable expansion cost. Using this transformation, we reformulate the problem as a Linear Programming (LP) model, and obtain the optimal solution through iteratively solving the LP model. We have conducted extensive simulations on a real network topology. The results prove that our algorithm can significantly reduce the network congestion as well as balance the entire network traffic with practical computational costs.
    Computer Networks 08/2014; 68. DOI:10.1016/j.comnet.2014.02.017 · 1.28 Impact Factor
  • Conference Paper: Noise can help
    Muhammad Shahzad, Alex X. Liu
    The 2014 ACM international conference; 06/2014
  • Muhammad Shahzad, Alex X. Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this paper, we propose COLATE, the first per-flow latency measurement scheme that requires no probe packets and time stamping. Given a set of observation points, COLATE records packet timing information at each point so that later for any two points, it can accurately estimate the average and standard deviation of the latencies experienced by the packets of any flow in passing the two points. The key idea is that when recording packet timing information, COLATE purposely allows noise to be introduced for minimizing storage space, and when querying the latency of a target flow, COLATE uses statistical techniques to denoise and obtain an accurate latency estimate. COLATE is designed to be efficiently implementable on network middleboxes. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. In terms of storage space, COLATE uses less than 0.1 bit per packet, which means that, on a backbone link with about half a million packets per second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace. Results show that COLATE always achieves the required reliability for any given confidence interval.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1). DOI:10.1145/2637364.2591988
  • [Show abstract] [Hide abstract]
    ABSTRACT: Content Delivery Networks (CDNs) differ from other caching systems in terms of both workload characteristics and performance metrics. However, there has been little prior work on large-scale measurement and characterization of content requests and caching performance in CDNs. For workload characteristics, CDNs deal with extremely large content volume, high content diversity, and strong temporal dynamics. For performance metrics, other than hit ratio, CDNs also need to minimize the disk operations and the volume of traffic from origin servers. In this paper, we conduct a large-scale measurement study to characterize the content request patterns using real-world data from a commercial CDN provider.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1). DOI:10.1145/2591971.2592021
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mobile network operators have a significant interest in the performance of streaming video on their networks because network dynamics directly influence the Quality of Experience (QoE). However, unlike video service providers, network operators are not privy to the client- or server-side logs typically used to measure key video performance metrics, such as user engagement. To address this limitation, this paper presents the first large-scale study characterizing the impact of cellular network performance on mobile video user engagement from the perspective of a network operator. Our study on a month-long anonymized data set from a major cellular network makes two main contributions. First, we quantify the effect that 31 different network factors have on user behavior in mobile video. Our results provide network operators direct guidance on how to improve user engagement --- for example, improving mean signal-to-interference ratio by 1 dB reduces the likelihood of video abandonment by 2%. Second, we model the complex relationships between these factors and video abandonment, enabling operators to monitor mobile video user engagement in real-time. Our model can predict whether a user completely downloads a video with more than 87% accuracy by observing only the initial 10 seconds of video streaming sessions. Moreover, our model achieves significantly better accuracy than prior models that require client- or server-side logs, yet we only use standard radio network statistics and/or TCP/IP headers available to network operators.
    ACM SIGMETRICS Performance Evaluation Review 06/2014; 42(1). DOI:10.1145/2591971.2591975
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we study the time evolution of academic collaboration networks by predicting the appearance of new links between authors. The accurate prediction of new collaborations between members of a collaboration network can help accelerate the realization of new synergies, foster innovation, and raise productivity. For this study, the authors collected a large data set of publications from 630 conferences of the IEEE and ACM of more than 257; 000 authors, 61; 000 papers, capturing more than 818; 000 collaborations spanning a period of 10 years. The data set is rich in semantic data that allows exploration of many features that were not considered in previous approaches. We considered a comprehensive set of 98 features, and after processing identified eight features as significant. Most significantly, we identified two new features as most significant predictors of future collaborations; 1) the number of common title words, and 2) number of common references in two authors’ papers. The link prediction problem is formulated as a binary classification problem, and three different supervised learning algorithms are evaluated, i.e. Na¨ıve Bayes, C4.5 decision tree and Support Vector Machines. Extensive efforts are made to ensure complete spatial isolation of information used in training and test instances, which to the authors’ best knowledge is unprecedented. Results were validated using a modified form of the classic 10-fold cross validation (the change was necessitated by the way training, and test instances were separated). The Support Vector Machine classifier performed the best among tested approaches, and correctly classified on average more than 80% of test instances and had a receiver operating curve (ROC) area of greater than 0.80.
    ASE International Conference on Social Computing (SocialCom - Stanford), CA, USA; 05/2014
  • Alex X. Liu, Chad R. Meiners, Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet classification is the core mechanism that enables many networking devices. Although using Ternary Content Addressable Memories (TCAMs) to perform high speed packet classification has become the widely adopted solution, TCAMs are very expensive, have limited capacity, consume large amounts of power, and generate tremendous amounts of heat because of their extremely dense and parallel circuitry. In this paper, we propose the first packet classification scheme that uses Binary Content Addressable Memories (BCAMs). BCAMs are similar to TCAMs except that in BCAMs, every bit has only two possible states: 0 or 1; in contrast, in TCAMs, every bit has three possible states: 0, 1, or * (don't care). Because of the high complexity in implementing the extra “don't care” state, TCAMs have much higher circuit density than BCAMs. As the power consumption, heat generation, and price grow non-linearly with circuit density, BCAMs consume much less power, generate much less heat, and cost much less money than TCAMs. Our BCAM based packet classification scheme is built on two key ideas. First, we break a multi-dimensional lookup into a series of one-dimensional lookups. Second, for each one-dimensional lookup, we convert the ternary matching problem into a binary string exact matching problem. To speed up the lookup process, we propose a number of optimization techniques including skip lists, free expansion, minimizing maximum lookup time, minimizing average lookup time, and lookup short circuiting. We evaluated our BCAM scheme on 17 real-life packet classifiers. On these classifiers, our BCAM scheme requires roughly 5 times fewer CAM bits than the traditional TCAM based scheme. The penalty is a throughput that is roughly 4 times less.
    IEEE INFOCOM 2014 - IEEE Conference on Computer Communications; 04/2014
  • Alex X. Liu, Eric Torng
    [Show abstract] [Hide abstract]
    ABSTRACT: Regular expression (RegEx) matching, the core operation of intrusion detection and prevention systems, remains a fundamentally challenging problem. A desired RegEx matching scheme should satisfy four requirements: DFA speed, NFA size, automated construction, and scalable construction. Despite lots of work on RegEx matching, no prior scheme satisfies all four of these requirements. In this paper, we approach this holy grail by proposing OverlayCAM, a RegEx matching scheme that satisfies all four requirements. The theoretical underpinning of our scheme is OD2FA, a new automata model proposed in this paper that captures both state and transition replication inherent in DFAs. Our RegEx matching solution processes one input character per lookup like a DFA, requires only the space of an NFA, is grounded in sound automata models, is easy to deploy in existing network devices, and comes with scalable and automated construction algorithms.
    IEEE INFOCOM 2014 - IEEE Conference on Computer Communications; 04/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Owing to its superior properties, such as fast identification and relatively long interrogating range over barcode systems, Radio Frequency Identification (RFID) technology has promising application prospects in inventory management. This paper studies the problem of complete identification of missing RFID tag, which is important in practice. Time efficiency is the key performance metric of missing tag identification. However, the existing protocols are ineffective in terms of execution time and can hardly satisfy the requirements of real-time applications. In this paper, a Multi-hashing based Missing Tag Identification (MMTI) protocol is proposed, which achieves better time efficiency by improving the utilization of the time frame used for identification. Specifically, the reader recursively sends bitmaps that reflect the current slot occupation state to guide the slot selection of the next hashing process, thereby changing more empty or collision slots to the expected singleton slots. We investigate the optimal parameter settings to maximize the performance of the MMTI protocol. Furthermore, we discuss the case of channel error and propose the countermeasures to make the MMTI workable in the scenarios with imperfect communication channels. Extensive simulation experiments are conducted to evaluate the performance of MMTI, and the results demonstrate that this new protocol significantly outperforms other related protocols reported in the current literature.
    IEEE Transactions on Communications 03/2014; 62(3):1046-1057. DOI:10.1109/TCOMM.2014.011914.130089 · 1.98 Impact Factor

Publication Stats

628 Citations
49.74 Total Impact Points

Institutions

  • 2011–2014
    • Nanjing University
      • Department of Computer Science & Technology
      Nan-ching, Jiangsu Sheng, China
  • 2007–2014
    • Michigan State University
      • Department of Computer Science and Engineering
      East Lansing, Michigan, United States
  • 2004–2008
    • University of Texas at Austin
      • Department of Computer Science
      Austin, Texas, United States