Hong Shen

University of Adelaide, Tarndarnya, South Australia, Australia

Are you Hong Shen?

Claim your profile

Publications (294)61.2 Total impact

  • Wenhao Shu, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.
    Pattern Recognition. 12/2014; 47(12):3890–3906.
  • Hong Shen, Yingpeng Sang, Yidong Li
    Journal of Interconnection Networks 05/2014; 14(03).
  • Hui Tian, Binze Zhong, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Traffic matrix (TM) describes the traffic volumes traversing a network from the input nodes to the output nodes over a measured period. Such a TM contains very useful information for network managers, traffic engineers and users. However, TM is hard to be obtained and analyzed due to its large size, especially for large-scale networks. In this paper, we present a new method based on diffusion wavelets for analyzing the traffic matrix. It is shown that this method can conduct efficient multi-resolution analysis (MRA) on TM. We compare the analysis results by using different diffusion operators. Through reconstructing the original TM from the diffused traffic on a particular level, we show the high efficiency of this MRA tool based on these operators. We then develop an anomaly detection method based on the analysis results and explore the possibilities of other potential applications.
    Computers & Electrical Engineering 01/2014; · 0.93 Impact Factor
  • Ping He, Hong Shen, Hui Tian
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a set of data items broadcasting at multiple parallel channels, where each channel has the same broadcast pattern over a time period, and a set of client's requested data items, the data retrieval problem requires to find a sequence of channel access to retrieve the requested data items among the channels such that the total access latency is minimized, where both channel access (to retrieve a data item) and channel switch are assumed to take a single time slot. As an important problem of information retrieval in wireless networks, this problem arises in many applications such as e-commerce and ubiquitous data sharing, and is known two conflicts: requested data items are broadcast at same time slots or adjacent time slots in different channels. Although existing studies focus on this problem with one conflict, there is little work on this problem with two conflicts. So this paper proposes efficient algorithms from two views: single antenna and multiple antennae. Our algorithm adopts a novel approach that wireless data broadcast system is converted to DAG, and applies set cover to solve this problem. Through Experiments, this result presents currently the most efficient algorithm for this problem with two conflicts.
    Proceedings of International Conference on Advances in Mobile Computing & Multimedia; 12/2013
  • Kewen Liao, Hong Shen, Longkun Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: In Constrained Fault-Tolerant Resource Allocation (FTRA) problem, we are given a set of sites containing facilities as resources and a set of clients accessing these resources. Each site i can open at most Ri facilities with opening cost fi. Each client j requires an allocation of rj open facilities and connecting j to any facility at site i incurs a connection cost cij. The goal is to minimize the total cost of this resource allocation scenario. FTRA generalizes the Unconstrained Fault-Tolerant Resource Allocation (FTRA∞) [10] and the classical Fault-Tolerant Facility Location (FTFL) [7] problems: for every site i, FTRA∞ does not have the constraint Ri, whereas FTFL sets Ri=1. These problems are said to be uniform if all rj's are the same, and general otherwise. For the general metric FTRA, we first give an LP-rounding algorithm achieving an approximation ratio of 4. Then we show the problem reduces to FTFL, implying the ratio of 1.7245 from [2]. For the uniform FTRA, we provide a 1.52-approximation primal-dual algorithm in O(n4) time, where n is the total number of sites and clients.
    Proceedings of the 19th international conference on Fundamentals of Computation Theory; 08/2013
  • Source
    Longkun Guo, Hong Shen, Kewen Liao
    [Show abstract] [Hide abstract]
    ABSTRACT: For a given graph $G$ with positive integral cost and delay on edges, distinct vertices $s$ and $t$, cost bound $C\in Z^{+}$ and delay bound $D\in Z^{+}$, the $k$ bi-constraint path ($k$BCP) problem is to compute $k$ disjoint $st$-paths subject to $C$ and $D$. This problem is known NP-hard, even when $k=1$ [4]. This paper first gives a simple approximation algorithm with factor-$(2,2)$, i.e. the algorithm computes a solution with delay and cost bounded by $2*D$ and $2*C$ respectively. Later, a novel improved approximation algorithm with ratio $(1+\beta, \max{2, 1+\ln\frac{1}{\beta}})$ is developed by constructing interesting auxiliary graphs and employing the cycle cancelation method. As a consequence, we can obtain a factor-$(1.369, 2)$ approximation algorithm by setting $1+\ln\frac{1}{\beta}=2$ and a factor-$(1.567, 1.567)$ algorithm by setting $1+\beta=1+\ln\frac{1}{\beta}$. Besides, when $\beta=0$, by slightly modifying our algorithm, an approximation algorithm with ratio $(1, (1+\epsilon)(\ln n+\ln\frac{1}{\epsilon}))$, i.e. an algorithm with only a single factor ratio $O(\ln n)$ on cost, can be immediately obtained by setting the delay of each edge $e$ to $\lfloor \frac{d(e)}{\frac{\epsilon D}{n}}\rfloor $ for a given fixed $\epsilon>0$. To the best of our knowledge, this is the first non-trivial approximation algorithm for the $k$BCP problem which strictly obeys the delay constraint. Our developed algorithms can be directly used to solve some related problems, in particular, the k-disjoint restricted shortest path problem ($k$RSP) [10], resulting in the same ratio $(1+\beta, \max{2, 1+\ln\frac{1}{\beta}})$, which improves currently the best result of ratio $(2, 2)$ in [6].
    Journal of Combinatorial Optimization 01/2013; · 0.59 Impact Factor
  • Hong Shen, Longkun Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: For a given undirected (edge) weighted graph G = (V, E), a terminal set S ⊆ V and a root r ∈ S, the rooted k-vertex connected minimum Steiner network (kVSMNr) problem requires to construct a minimum-cost subgraph of G such that each terminal in S {R} is k-vertex connected to τ. As an important problem in survivable network design, the kVSMNτ problem is known to be NP-hard even when k 1/4 1 [14]. For k 1/4 3 this paper presents a simple combinatorial eight-approximation algorithm, improving the known best ratio 14 of Nutov [20]. Our algorithm constructs an approximate 3VSMNτ through augmenting a two-vertex connected counterpart with additional edges of bounded cost to the optimal. We prove that the total cost of the added edges is at most six times of the optimal by showing that the edges in a 3VSMNτ compose a subgraph containing our solution in such a way that each edge appears in the subgraph at most six times.
    IEEE Transactions on Computers 01/2013; 62(9):1684-1693. · 1.38 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the emerging environment of the Internet of things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we propose a fully distributed model for federated RFID data streams. This model combines two techniques, namely, tilted time frame and histogram to represent the patterns of object flows. Our model is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose several algorithms that use a statistically minimum number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(10):2036-2045. · 1.80 Impact Factor
  • Yidong Li, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Data publishing based on hypergraphs is becoming increasingly popular due to its power in representing multirelations among objects. However, security issues have been little studied on this subject, while most recent work only focuses on the protection of relational data or graphs. As a major privacy breach, identity disclosure reveals the identification of entities with certain background knowledge known by an adversary. In this paper, we first introduce a novel background knowledge attack model based on the property of hyperedge ranks, and formalize the rank-based hypergraph anonymization problem. We then propose a complete solution in a two-step framework: rank anonymization and hypergraph reconstruction. We also take hypergraph clustering (known as community detection) as data utility into consideration, and discuss two metrics to quantify information loss incurred in the perturbation. Our approaches are effective in terms of efficacy, privacy, and utility. The algorithms run in near-quadratic time on hypergraph size, and protect data from rank attacks with almost the same utility preserved. The performances of the methods have been validated by extensive experiments on real-world datasets as well. Our rank-based attack model and algorithms for rank anonymization and hypergraph reconstruction are, to our best knowledge, the first systematic study to privacy preserving for hypergraph-based data publishing.
    IEEE Transactions on Information Forensics and Security 01/2013; 8(8):1384-1396. · 1.90 Impact Factor
  • Longkun Guo, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: The Min-Min problem of finding a disjoint-path pair with the length of the shorter path minimized is known to be NP-hard and admits no K-approximation for any K>1 in the general case (Xu et al. in IEEE/ACM Trans. Netw. 14:147–158, 2006). In this paper, we first show that Bhatia et al.’s NP-hardness proof (Bhatia et al. in J. Comb. Optim. 12:83–96, 2006), a claim of correction to Xu et al.’s proof (Xu et al. in IEEE/ACM Trans. Netw. 14:147–158, 2006), for the edge-disjoint Min-Min problem in the general undirected graphs is incorrect by giving a counter example that is an unsatisfiable 3SAT instance but classified as a satisfiable 3SAT instance in the proof of Bhatia et al. (J. Comb. Optim. 12:83–96, 2006). We then gave a correct proof of NP-hardness of this problem in undirected graphs. Finally we give a polynomial-time algorithm for the vertex disjoint Min-Min problem in planar graphs by showing that the vertex disjoint Min-Min problem is polynomially solvable in st-planar graph G=(V,E) whose corresponding auxiliary graph G(V,E∪{e(st)}) can be embedded into a plane, and a planar graph can be decomposed into several st-planar graphs whose Min-Min paths collectively contain a Min-Min disjoint-path pair between s and t in the original graph G. To the best of our knowledge, these are the first polynomial algorithms for the Min-Min problems in planar graphs.
    Algorithmica 01/2013; 66(3). · 0.49 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Randomization methods widely applied for privacy-preserving data mining are generally subject to reconstruction attack, linkage attack, and semantic-related attacks. A probabilistic anonymity definition has been proposed in [1] to defend against the linkage attack in which the attacker links the same randomized record to all of the original records. In this paper we name this type of attack as Multiple (original records) to One (randomized record) attack, while focus on another attack that has not been researched before, i.e. One (original record) to Multiple (randomized records) attack. The latter is different from the former in that it does not require the attacker to know the distribution and all values of quasi-identifiers in original records, and thus is easier to be launched by the attacker. To defend against this attack we propose a novel probabilistic anonymity concept different from [1]. We achieve this anonymity goal on a hybrid model combining random projection and random noise addition. We also analyze the security properties of this model against the other common types of attacks. Compared with existing work in randomization, k-anonymity and differential privacy, our work achieves the holistic aim of higher security, higher efficiency and higher data utility, and demonstrates very promising applications in large-scale and high-dimensional data mining in clouds.
    e-Business Engineering (ICEBE), 2013 IEEE 10th International Conference on; 01/2013
  • Hong Shen, Bo Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Watermarking as a powerful technique for copyright protection, content verification, covert communication and so on, has been studied for years, and is drawing more and more attention recently. There are many situations in which embedding multiple watermarks in an image is desired. This paper proposes an effective approach to embed dual watermarks by extending the single watermarking algorithms in Xie and Shen (2005) [1] and Xie and Shen (2006) [2] for numerical and logo watermarking, respectively. Experimental results show that the resulting dual watermarking algorithms have a significantly higher PSNR than existing dual watermarking algorithms and also retain the same robustness as and higher sensitivity than the original single watermarking algorithms on which they are based.
    Computers & Electrical Engineering 09/2012; 38(5):1310–1324. · 0.93 Impact Factor
  • Source
    Kewen Liao, Hong Shen, Longkun Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: In the Constrained Fault-Tolerant Resource Allocation (FTRA) problem, we are given a set of sites containing facilities as resources, and a set of clients accessing these resources. Specifically, each site i is allowed to open at most R_i facilities with cost f_i for each opened facility. Each client j requires an allocation of r_j open facilities and connecting j to any facility at site i incurs a connection cost c_ij. The goal is to minimize the total cost of this resource allocation scenario. FTRA generalizes the Unconstrained Fault-Tolerant Resource Allocation (FTRA_{\infty}) [18] and the classical Fault-Tolerant Facility Location (FTFL) [13] problems: for every site i, FTRA_{\infty} does not have the constraint R_i, whereas FTFL sets R_i=1. These problems are said to be uniform if all r_j's are the same, and general otherwise. For the general metric FTRA, we first give an LP-rounding algorithm achieving the approximation ratio of 4. Then we show the problem reduces to FTFL, implying the ratio of 1.7245 from [3]. For the uniform FTRA, we provide a 1.52-approximation primal-dual algorithm in O(n^4) time, where n is the total number of sites and clients. We also consider the Constrained Fault-Tolerant k-Resource Allocation (k-FTRA) problem where additionally the total number of facilities can be opened across all sites is bounded by k. For the uniform k-FTRA, we give the first constant-factor approximation algorithm with a factor of 4. Note that the above results carry over to FTRA_{\infty} and k-FTRA_{\infty}.
    08/2012;
  • Kewen Liao, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: We initiate the study of the Reliable Resource Allocation (RRA) problem. In this problem, we are given a set of sites equipped with an unbounded number of facilities as resources. Each facility has an opening cost and an estimated reliability. There is also a set of clients to be allocated to facilities with corresponding connection costs. Each client has a reliability requirement (RR) for accessing resources. The objective is to open a subset of facilities from sites to satisfy all clients' RRs at a minimum total cost. The Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem studied in (Liao & Shen 2011) is a special case of RRA. In this paper, we present two equivalent primal-dual algorithms for the RRA problem, where the second one is an acceleration of the first and runs in quasi-linear time. If all clients have the same RR above the threshold that a single facility can provide, our analysis of the algorithm yields an approximation factor of 2+2√2 and later a reduced ratio of 3.722 using a factor revealing program. The analysis further elaborates and generalizes the generic inverse dual fitting technique introduced in (Xu & Shen 2009). As a by-product, we also formalize this technique for the classical minimum set cover problem.
    Proceedings of the Eighteenth Computing: The Australasian Theory Symposium - Volume 128; 01/2012
  • Longkun Guo, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: The min–min problem of finding a disjoint path pair with the length of the shorter path minimized is known to be NP-complete (Xu et al., 2006) [1]. In this paper, we prove that in planar digraphs the edge-disjoint min–min problem remains NP-complete and admits no KK-approximation for any K>1K>1 unless P=NPP=NP. As a by-product, we show that this problem remains NP-complete even when all edge costs are equal (i.e., stronglyNP-complete). To our knowledge, this is the first NP-completeness proof for the edge-disjoint min–min problem in planar digraphs.
    Theoretical Computer Science 01/2012; 432:58–63. · 0.49 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Combining manifold ranking with active learning (MRAL for short) is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional MRAL has two main drawbacks. First, the performance of manifold ranking is very sensitive to the scale parameter used for calculating the Laplacian matrix. Second, conventional MRAL does not take into account the redundancy among examples and thus could select multiple examples that are similar to each other. In this work, a novel MRAL framework is presented to address the drawbacks. Concretely, we first propose a self-tuning manifold ranking algorithm that can adaptively calculate the Laplacian matrix via a local scaling mechanism, and then develop a hybrid active learning algorithm by integrating three well-known selective sampling criteria, which is able to effectively and efficiently identify the most informative and diversified examples for the user to label. Experiments on 10,000 Corel images show that the proposed method is significantly more effective than some existing approaches.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many real applications, such as network traffic monitoring, intrusion detection, satellite remote sensing, and electronic business, generate data in the form of a stream arriving continuously at high speed. Clustering is an important data analysis tool for knowledge discovery. Compared with traditional clustering algorithms, clustering stream data is an important and challenging problem which has attracted many researchers. Clustering stream data is facing two main challenges. First, as the data is continuously arriving with high rate and the computer storage capacity is limited, raw data can only be scaned in one pass. Second, stream data is always changing with time, so viewing a data stream as a set of static data can deteriorate the clustering quality. In fact, users are more concerned with the evolving behaviors of clusters which can help people making correct decisions. This paper proposes a density-grid based clustering algorithm, PKS-Stream-I, for stream data. It is an optimization of PKS-Stream in density detection period selection, sporadic grid detection and removal. Empirical results show the proposed method yields out better performance.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • Longkun Guo, Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: For a given graph G with distinct vertices s, t and a given delay constraint D ∈ R+, the k-disjoint restricted shortest path (kRSP) problem of computing k-disjoint minimum cost stpaths with total delay restrained by D, is known to be NP-hard. Bifactor approximation algorithms have been developed for its special case when k = 2, while no approximation algorithm with constant single factor or bifactor ratio has been developed for general k. This paper firstly presents a (k, (1 + ε)H(k))-approximation algorithm for the kRSP problem by extending Orda's factor(1.5, 1.5) approximation algorithm [9]. Secondly, this paper gives a novel linear programming (LP) formula for the kRSP problem. Based on LP rounding technology, this paper rounds an optimal solution of this formula and obtains an approximation algorithm within a bifactor ratio of (2, 2). To the best of our knowledge, it is the first approximation algorithm with constant bifactor ratio for the kRSP problem. Our results can be applied to serve applications in networks which require quality of service and robustness simultaneously, and also have broad applications in construction of survivable networks and fault tolerance systems.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The multicast problem in multi-channel multi-radio wireless mesh networks has received much attention recently. Most recent studies on this problem focus on improving the network throughput. However, many real-world applications require routing algorithms to achieve low-delay and low-loss. In this paper, we tackle the problem of constructing a robust minimum-cost multicast tree that tolerates link interference. To save bandwidth resource and alleviate the interference in the communication, we propose a robust multicast algorithm for multi-channel multi-radio wireless mesh networks. Our experimental results show that our algorithm is very efficient to achieve better performances in network throughput and end-to-end delay than previous studies.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • Yingpeng Sang, Hong Shen, Hui Tian
    [Show abstract] [Hide abstract]
    ABSTRACT: Random Projection (RP) has raised great concern among the research community of privacy-preserving data mining, due to its high efficiency and utility, e.g., keeping the euclidean distances among the data points. It was shown in (33) that, if the original data set composed of m attributes is multiplied by a mixing matrix of kmð m>k Þ which is random and orthogonal on expectation, then the k series of perturbed data can be released for mining purposes. Given the data perturbed by RP and some necessary prior knowledge, to our knowledge, little work has been done in reconstructing the original data to recover some sensitive information. In this paper, we choose several typical scenarios in data mining with different assumptions on prior knowledge. For the cases that an attacker has full or zero knowledge of the mixing matrix R, respectively, we propose reconstruction methods based on Underdetermined Independent Component Analysis (UICA) if the attributes of the original data are mutually independent and sparse, and propose reconstruction methods based on Maximum A Posteriori (MAP) if the attributes of the original data are correlated and nonsparse. Simulation results show that our reconstructions achieve high recovery rates, and outperform the reconstructions based on Principal Component Analysis (PCA). Successful reconstructions essentially mean the leakage of privacy, so our work identify the possible risks of RP when it is used for data perturbations.
    IEEE Transactions on Computers 01/2012; 61:101-117. · 1.38 Impact Factor

Publication Stats

1k Citations
61.20 Total Impact Points

Institutions

  • 2006–2014
    • University of Adelaide
      • School of Computer Science
      Tarndarnya, South Australia, Australia
    • Manchester Metropolitan University
      Manchester, England, United Kingdom
  • 2013
    • Sun Yat-Sen University
      Shengcheng, Guangdong, China
  • 2008–2013
    • Beijing Jiaotong University
      • • School of Computer and Information Technology
      • • Department of Computer Science
      Peping, Beijing, China
  • 2006–2008
    • University of Science and Technology of China
      • School of Computer Science and Technology
      Luchow, Anhui Sheng, China
  • 2007
    • University of Texas at Dallas
      • Department of Computer Science
      Dallas, TX, United States
  • 2001–2007
    • Japan Advanced Institute of Science and Technology
      • School of Information Science
      KMQ, Ishikawa, Japan
    • Georgia State University
      • Department of Computer Science
      Atlanta, Georgia, United States
  • 2003–2006
    • Fudan University
      • School of Computer Science
      Shanghai, Shanghai Shi, China
    • The Hong Kong Polytechnic University
      • Department of Computing
      Hong Kong, Hong Kong
  • 2005
    • Texas A&M University
      • Department of Computer Science and Engineering
      College Station, TX, United States
  • 1995–2001
    • Griffith University
      • School of Information and Communication Technology (ICT)
      Southport, Queensland, Australia
    • Australian National University
      • Research School of Computer Science
      Canberra, Australian Capital Territory, Australia
  • 2000
    • University of Dayton
      • Department of Computer Science
      Dayton, Ohio, United States