Gaogang Xie

Chinese Academy of Sciences, Beijing, Beijing Shi, China

Are you Gaogang Xie?

Claim your profile

Publications (43)0 Total impact

  • Source
    Conference Proceeding: Towards Unbiased Sampling of Online Social Networks
    Dong Wang, Zhenyu Li, Gaogang Xie
    [show abstract] [hide abstract]
    ABSTRACT: Unbiased sampling of online social networks (OSNs) makes it possible to get accurate statistical properties of large-scale OSNs. However, the most used sampling methods, Breadth-First-Search (BFS) and Greedy, are known to be biased towards high degree nodes, yielding inaccurate statistical results. To give a general requirement for unbiased sampling, we model the crawling process as a Markov Chain and deduce a necessary and sufficient condition, which enables us to design various efficient unbiased sampling methods. To the best of our knowledge, we are among the first to give such a condition. Metropolis-Hastings Random Walk (MHRW) is an example which satisfies the condition. However, walkers in MHRW may stay at some low-degree nodes for a long time, resulting considerable self-loops on these nodes, which adversely affect the crawling efficiency. Based on the condition, a new unbiased sampling method, called USRS, is proposed to reduce the probabilities of self-loops. We use the dataset of Renren, the largest OSN in China, to evaluate the performance of USRS. The results have demonstrated that USRS generates unbiased samples with low self-loop probabilities, and achieves higher crawling efficiency.
    Communications (ICC), 2011 IEEE International Conference on; 07/2011
  • Conference Proceeding: Measuring and enhancing the social connectivity of UGC video systems: A case study of YouKu
    Zhenyu Li, Rong Gu, Gaogang Xie
    [show abstract] [hide abstract]
    ABSTRACT: The social connections among users have significant impacts on UGC video systems. The goal of this paper is to study the social connectivity of such systems by measuring YouKu, the most popular UGC video system in China. We have collected 627 thousand user profiles, 3 million social connections and 13.6 million videos' information. The analysis results have shown that the social connectivity is extremely weak and there are a considerable proportion of friend pairs sharing common semantic interests. These facts motivate us to enhance the connectivity by recommending semantically relevant users as friends. We thus propose a friend recommendation algorithm which locates potential friends quickly and accurately through the links to related videos, a unique feature of YouKu and similar sites. We apply the algorithm on our dataset of YouKu and evaluate it through one-hop video search. The social connectivity is greatly enhanced and the number of matched videos on friends is greatly increased. To the best of our knowledge, this work is the first to identify the semantic relevance of friend pairs in UGC video systems and to study the friend recommendation.
    Quality of Service (IWQoS), 2011 IEEE 19th International Workshop on; 07/2011
  • Source
    Conference Proceeding: Mnemonic Lossy Counting: An efficient and accurate heavy-hitters identification algorithm
    [show abstract] [hide abstract]
    ABSTRACT: Identifying heavy-hitter traffic flows efficiently and accurately is essential for Internet security, accounting and traffic engineering. However, finding all heavy-hitters might require large memory for storage of flows information that is incompatible with the usage of fast and small memory. Moreover, upcoming 100Gbps transmission rates make this recognition more challenging. How to improve the accuracy of heavy-hitters identification with limited memory space has become a critical issue. This paper presents a scalable algorithm named Mnemonic Lossy Counting (MLC) that improves the accuracy of heavy-hitters identification while having a reasonable time and space complexity. MLC algorithm holds potential candidate heavy-hitters in a historical information table. This table is used to obtain tighter error bounds on the estimated sizes of candidate heavy-hitters. We validate the MLC algorithm using real network traffic traces, and we compared its performance with two state-of-the-art algorithms, namely Lossy Counting (LC) and Probabilistic Lossy Counting (PLC). The results reveal that: 1) with same set of parameters and memory usage, MLC achieves between 31.5% and 6.67% fewer false positives than LC and PLC. 2) MLC and LC have a zero false negative ratio, whereas 38% of the cases PLC has a non-zero false negatives and PLC can miss up to 4.4% of heavy-hitters. 3) MLC has a slightly lower memory cost than LC during the first few windows and its memory usage decreases with time, when PLC memory usage declines sharply. 4) MLC has similar runtime than LC, and smaller time than PLC.
    Performance Computing and Communications Conference (IPCCC), 2010 IEEE 29th International; 01/2011
  • Article: Churn-Resilient Protocol for Massive Data Dissemination in P2P Networks.
    IEEE Trans. Parallel Distrib. Syst. 01/2011; 22:1342-1349.
  • Source
    Conference Proceeding: rSearch: Ring-Based Semantic Overlay for Efficient Recall-Guaranteed Search in P2P Networks.
    Zhenyu Li, Gaogang Xie
    39th International Conference on Parallel Processing, ICPP Workshops 2010, San Diego, California, USA, 13-16 September 2010; 01/2010
  • Source
    Conference Proceeding: Enhancing Content Distribution Performance of Locality-Aware BitTorrent Systems.
    Zhenyu Li, Gaogang Xie
    Proceedings of the Global Communications Conference, 2010. GLOBECOM 2010, 6-10 December 2010, Miami, Florida, USA; 01/2010
  • Source
    Conference Proceeding: ACNS: Adaptive Complementary Neighbor Selection in Bittorrent-Like Applications.
    Zhenbao Zhou, Zhenyu Li, Gaogang Xie
    Proceedings of IEEE International Conference on Communications, ICC 2009, Dresden, Germany, 14-18 June 2009; 01/2009
  • Conference Proceeding: Characterization of HTTP behavior on access networks in Web 2.0
    Liang Shuai, Gaogang Xie, Jianhua Yang
    [show abstract] [hide abstract]
    ABSTRACT: The traffic generated by the hypertext transfer protocol has a dominating position in current Internet traffic. The analyzing and characterizing of HTTP traffic is significant for understanding the nature of Internet traffic. Many studies on HTTP traffic were carried out when HTTP 1.1 appeared. Along with the prevailing of Web 2.0 the research on new features of HTTP traffic becomes a new issue in network measuring. In this paper comprehensive characterization of both the user behavior and the transportation feature on HTTP traffic are discussed. The characterization of HTTP behavior at HTTP conversation, TCP connection and Web flow level is parsed based on data collected on-line from access link of an institutepsilas LAN with more than 1500 users. The HTTP request and response length, temporal characteristics of HTTP message, and the interval of TCP connections as well as Web flows are discussed in detail. Some transformations of the investigated characterization caused by the influence of Web 2.0 are discovered in the experiment contrasting with previous result.
    Telecommunications, 2008. ICT 2008. International Conference on; 07/2008
  • Conference Proceeding: SAP2P: Self-adaptive and Locality-aware P2P Membership Protocol for Heterogeneous Systems
    [show abstract] [hide abstract]
    ABSTRACT: Membership protocols are the basic utilities for P2P multicast applications. Existing membership protocols always assume a homogeneous environment or bring non-negligible overhead to optimize the overlay topology. In this paper, we propose SAP2P, a self-adaptive and locality-aware P2P membership protocol for heterogeneous systems. SAP2P periodically optimizes the overlay in terms of node degree and link latency. In SAP2P, the degrees of member nodes are proportional to their capacities and neighbors are always physically close. To reduce the control overhead, SAP2P uses landmark based scheme to evaluate link latencies and takes advantage of a Markov chain based method to adoptively vary the optimization periods. Simulation study shows that SAP2P can achieve good qualities in terms of load balancing and locality awareness with relatively smaller overhead, while it still responds quickly to node churn. Specially, compared with randomized membership protocols, SAP2P achieves up to 53.7% latency reduction.
    Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on; 03/2008
  • Source
    Conference Proceeding: Accurate Online Traffic Classification with Multi-Phases Identification Methodology
    [show abstract] [hide abstract]
    ABSTRACT: Traffic metrics at application level are critical for protocol research, abnormity detection, accounting and network operation. There are great challenges to identify packets at application level since dynamic protocol ports and packet encryption are deployed popularly. There are several different methods of traffic identification being proposed in recently research for corresponding applications. It is impossible to identify traffic with any one method alone. A methodology of online traffic identification at application level named multi-phases identification (MPI) based on packet and flow is proposed in this paper. There are two stages in the methodology. The traffic classification is based on packet characteristic in the first stage and based on flow feature in the second stage to correct the results in the first stage. There are several advantages in MPI: (1) these existing traffic identification methods can be easily integrated into MPI to improve the identification accuracy, (2) the corresponding new identification method for the new application can be inserted into MPI feasibly with scripts of the identification rule, and (3) efficiency of identification can be improved with the mechanism of adaptive justification for the sequence of methods and implemented on multi-CPUs platform. MPI has been implemented a general purpose CPU platform with OC-48 POS and 10 GE network interface. Experiment on an OC-48 POS backbone link shows MPI is accurate and effective for traffic identification.
    Consumer Communications and Networking Conference, 2008. CCNC 2008. 5th IEEE; 02/2008
  • Conference Proceeding: DHT-Aid, Gossip-Based Heterogeneous Peer-to-Peer Membership Management
    [show abstract] [hide abstract]
    ABSTRACT: In P2P multicast applications, membership management protocols are the basic utilities. In this context, gossip-based protocols have emerged as attractive ones for that they are highly reliable, scalable and simple. Existing gossip-based membership protocols either ignore the underlying topology and the heterogeneity nature of peer nodes or consume lots of control overhead. In this paper, first, we present a modified scalable membership protocol, MSCAMP, to account for node heterogeneity. Then a DHT-aid, gossip-based heterogeneous peer-to-peer membership protocol, called DIGOM, is proposed. DIGOM groups nearby nodes into clusters and takes advantage of MSCAMP as an intra-cluster membership protocol. A DHT structure is built to aid node subscription and inter-cluster link building. From the theoretical analysis and simulation results, both the inter-and intra-cluster fanout in DIGOM can satisfy the requirements for reliable dissemination. Specially, DIGOM achieves a good quality of load balance and requires no synchronization.
    Consumer Communications and Networking Conference, 2008. CCNC 2008. 5th IEEE; 02/2008
  • Source
    Conference Proceeding: Rogue access point detection using segmental TCP jitter.
    Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008; 01/2008
  • Conference Proceeding: Design and Implementation of a Network Behavior Analysis-Oriented IP Network Measurement System.
    Proceedings of the 9th International Conference for Young Computer Scientists, ICYCS 2008, Zhang Jia Jie, Hunan, China, November 18-21, 2008; 01/2008
  • Article: Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems.
    Zhenyu Li, Gaogang Xie, Zhongcheng Li
    IEEE Trans. Parallel Distrib. Syst. 01/2008; 19:1695-1708.
  • Source
    Conference Proceeding: Mitigate DDoS attack using TTL buckets and host threatening index.
    Xi Chen, Gaogang Xie, Jianhua Yang
    LCN 2008, The 33rd IEEE Conference on Local Computer Networks, The Conference on Leading Edge and Practical Computer Networking, Hyatt Regency Montreal, Montreal, Quebec, Canada, 14-17 October 2008, Proceedings; 01/2008
  • Source
    Conference Proceeding: Efficient Multi-source Data Dissemination in Peer-to-Peer Networks.
    NETWORKING 2008, Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet , 7th International IFIP-TC6 Networking Conference, Singapore, May 5-9, 2008, Proceedings; 01/2008
  • Source
    Conference Proceeding: Towards reliable and efficient data dissemination in heterogeneous peer-to-peer systems.
    Zhenyu Li, Gaogang Xie, Zhongcheng Li
    22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, Miami, Florida USA, April 14-18, 2008; 01/2008
  • Source
    Conference Proceeding: RLM: Reliable and Locality-Aware Membership Protocol for Heterogeneous P2P Systems.
    Proceedings of IEEE International Conference on Communications, ICC 2008, Beijing, China, 19-23 May 2008; 01/2008
  • Conference Proceeding: Feedback and Resources Guided Mechanism for Adaptive Packet Sampling
    [show abstract] [hide abstract]
    ABSTRACT: Due to the capacity restriction of traffic measurement system, an accurate and efficient sampling method is highly demanded. Uniform packet sampling is the simplest technique for reducing the amount of packets that the network monitoring system has to process, but it can't estimate smaller traffic flows which contribute almost more than 80% of the whole traffic flows and are important prompts for network anomaly detection. In this paper, we develop a new packet sampling methodology based on SGS (sketch guided sampling) method called "feedback and resources guided sampling" (FRGS). Our FRGS scheme takes the measurement system capacity as an important parameter to adjust sampling probability, and a feedback of the flow sizes estimation instead of data streaming algorithm is used to save resources. Force sampling is also adopted to increase the accuracy of estimation of smaller flows. Experiments results show that FRGS method is more accurate than the SGS method under limited resources for estimating flow sizes. The accuracy of our scheme has improved nearly 10 times than SGS.
    Global Telecommunications Conference, 2007. GLOBECOM '07. IEEE; 12/2007
  • Conference Proceeding: A Scalable Bloom Filter for Membership Queries
    [show abstract] [hide abstract]
    ABSTRACT: Bloom filters allow membership queries over sets with allowable errors. It is widely used in databases, networks and distributed systems and it has great potential for distributed applications where systems need to share information about available data. However, the false positive errors are unavoidable, and the false positive rate increases intolerantly along with the date set expanding. To solve the scalability problem of Bloom filters, this paper presents a new design of a scalable Bloom filter (SBF) for an expanding data set. The SBF keeps a low false positive rate by adding Bloom filter vectors with double length when necessary. The paper proposes algorithms for element insertion and query operation of SBF by employing the H<sub>3</sub> class of universal hash functions. Theoretical and experimental results demonstrate that the new SBF provides false positive rate as low as 21.3% of the dynamic Bloom filter presented before and the querying CPU time increasing with logarithmic rather than linear. Therefore, the proposed SBF outperforms other current scalable Bloom filters significantly.
    Global Telecommunications Conference, 2007. GLOBECOM '07. IEEE; 12/2007