Hong Shen

University of Adelaide, Tarndarnya, South Australia, Australia

Are you Hong Shen?

Claim your profile

Publications (346)123.93 Total impact

  • Yanbo Wu · Hong Shen · Quan Z. Sheng
    [Show abstract] [Hide abstract]
    ABSTRACT: In the emerging environment of the Internet of Things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in mining useful information from RFID trajectories. RFID data usually contain a considerable degree of uncertainty caused by various factors such as hardware flaws, transmission faults and environment instability. In this paper, we propose an efficient clustering algorithm that is much less sensitive to noise and outliers than the existing methods. To better facilitate the emerging cloud computing resources, our algorithm is designed cloud-friendly so that it can be easily adopted in a cloud environment. The scalability and efficiency of the proposed algorithm are demonstrated through an extensive set of experimental studies.
    IEEE Transactions on Parallel and Distributed Systems 08/2015; 26(8):2075-2088. DOI:10.1109/TPDS.2014.2347286 · 2.17 Impact Factor
  • Ping He · Hong Shen · Hui Tian
    [Show abstract] [Hide abstract]
    ABSTRACT: On-demand data broadcast (ODDB) has attracted increasing interest due to its efficiency of disseminating information in many real-world applications such as mobile social services, mobile payment and mobile e-commerce. In an ODDB system, the server places client requested data items received from the uplink to a set of downlink channels for downloading by the clients. Most existing work focused on how to allocate client requested data items to multiple channels for efficient downloading, but did not consider the time constraint of downloading which is critical for many real-world applications. For a set of requests with deadlines for downloading, this paper proposes an effective algorithm to broadcast data items of each request within its specified deadline using multiple channels under the well-known 2-conflict constraint: two data items conflict if they are broadcast in the same time slot or two adjacent time slots in different channels. Our algorithm adopts an approach of allocating most urgent and popular data item first (UPF) for minimizing the overall deadline miss ratio. The performance of the UPF method has been validated by extensive experiments on real-world data sets against three popular on-demand data broadcast schemes.
    Journal of Systems and Software 05/2015; 103. DOI:10.1016/j.jss.2015.01.022 · 1.25 Impact Factor
  • Source
    Longkun Guo · Kewen Liao · Hong Shen · Peng Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Network applications, such as multimedia streaming and video conferencing, impose growing requirements over Quality of Service (QoS), including bandwidth, delay, jitter, etc. Meanwhile, networks are expected to be load-balanced, energy-efficient, and resilient to some degree of failures. It is observed that the above requirements could be better met with multiple disjoint QoS paths than a single one. Let $G=(V,\, E)$ be a digraph with nonnegative integral cost and delay on every edge, $s,\, t\in V$ be two specified vertices, and $D\in\mathbb{Z}_{0}^{+}$ be a delay bound (or some other constraint), the \emph{$k$ Disjoint Restricted Shortest Path} ($k$\emph{RSP})\emph{ Problem} is computing $k$ disjoint paths between $s$ and $t$ with total cost minimized and total delay bounded by $D$. Few efficient algorithms have been developed because of the hardness of the problem. In this paper, we propose efficient algorithms with provable performance guarantees for the $k$RSP problem. We first present a pseudo-polynomial-time approximation algorithm with a bifactor approximation ratio of $(1,\,2)$, then improve the algorithm to polynomial time with a bifactor ratio of $(1+\epsilon,\,2+\epsilon)$ for any fixed $\epsilon>0$, which is better than the current best approximation ratio $(O(1+\gamma),\, O(1+\frac{1}{\gamma})\})$ for any fixed $\gamma>0$ \cite{orda2004efficient}. To the best of our knowledge, this is the first constant-factor algorithm that almost strictly obeys the constraint for the $k$RSP problem.
  • Longkun Guo · Wenxing Zhu · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: In mobile networks, wireless data broadcast is a powerful approach for disseminating a large number of data copies to a great number of clients. The largest weight data retrieval (LWDR) problem, first addressed by \cite{Infocom12LuEfficient}, is to schedule the download of a specified subset of data items in a given time interval, such that the weight of the downloaded items will be maximized. In this paper, we present an approximation algorithm with ratio $0.632$ for LWDR via approximating the maximum sectionalized coverage (MSC) problem, a generalization of the maximum coverage problem which is one of the most famous ${\cal NP}-$complete problems. Let $\mathbb{F}=\{\mathbb{F}_{1},\dots,\mathbb{F}_{N}\}$, in which $\mathbb{F}_{i}=\{S_{i,\, j}\vert j=1,\,2\,\dots\}$ is a collection of subsets of $S=\{u_{1},\, u_{2},\,\dots,u_{n}\}$, and $w(u_{i})$ be the weight of $u_{i}$. MSC is to select at most one $S_{i,\, j_{i}}$ from each $\mathbb{F}_{i}$, such that $\sum_{u_{i}\in S'}w(u_{i})$ is maximized, where $S'=\underset{i=1}{\overset{N}{\cup}}S_{i,\, j_{i}}$. First, the paper presents a factor-$0.632$ approximation algorithm for MSC by giving a novel linear programming (LP) formula and employing the randomized LP rounding technique. By reducing from the maximum 3 dimensional matching problem, the paper then shows that MSC is ${\cal NP}-$complete even when every $S\in\mathbb{F}_{i}$ is with cardinality 2, i.e. $|S|=2$. Last but not the least, the paper gives a method transforming any instance of LWDR to an instance of MSC, and shows that an approximation for MSC can be applied to LWDR almost preserving the approximation ratio. That is a factor-$0.632$ approximation for LWDR, improving the currently best ratio of $0.5$ in \cite{Infocom12LuEfficient}.
  • Ke Ji · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating content-based information about user and item, i.e., tag and keyword. The user-item ratings imply the relevance of users’ tags to items’ keywords, so we convert the direct prediction on the user-item rating matrix into the indirect prediction on the tag-keyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tag-keyword relation matrix based on the statistics of tag-keyword pairs in the ratings. Then, with the relation matrix, we propose a 3-factor matrix factorization model over the rating matrix, for learning every user’s interest vector for selected tags and every item’s correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.
    Knowledge-Based Systems 03/2015; 83. DOI:10.1016/j.knosys.2015.03.008 · 3.06 Impact Factor
  • Ke Ji · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Group-aware collaborative filtering (CF) has recently become a hot research topic in recommender systems, which typically divides a large CF task on the entire data (i.e. rating matrix) into some smaller CF tasks on subgroups (i.e., sub-matrices). This leads to an effective way to improve current CF systems in accuracy and efficiency. However, existing approaches consider each subgroup separately, ignoring relationships among subgroups. In this paper, motivated by the intuition that there are similar users or items among different subgroups, we propose an improved group-aware CF algorithm which predicts a rating using a weighted sum of similar ratings from multiple subgroups. Our algorithm is based on Matrix Factorization and CodeBook Transfer (CBT), especially that we construct N matrix approximations based on N best sub-matrices, and then integrate the N approximations via a linear combination. We conduct experiments on real-life data to evaluate the performance of our algorithm in comparison with traditional CF algorithms and other state-of-the-art social and group-aware recommendation models. The empirical result and analysis demonstrate that our algorithm achieves a significant increase in recommendation accuracy.
    Neurocomputing 03/2015; 165. DOI:10.1016/j.neucom.2015.03.013 · 2.01 Impact Factor
  • Qinghai Liu · Hong Shen · Yingpeng Sang
    [Show abstract] [Hide abstract]
    ABSTRACT: Anonymized data publication has received considerable attention from the research community in recent years. For numerical sensitive attributes, most of the existing privacy-preserving data publishing techniques concentrate on microdata with multiple categorical sensitive attributes or only one numerical sensitive attribute. However, many real-world applications can contain multiple numerical sensitive attributes. Directly applying the existing privacy-preserving techniques for single-numerical-sensitive-attribute and multiple-categorical-sensitive-attributes often causes unexpected disclosure of private information. These techniques are particularly prone to the proximity breach, which is a privacy threat specific to numerical sensitive attributes in data publication. In this paper, we propose a privacy-preserving data publishing method, namely MNSACM, which uses the ideas of clustering and Multi-Sensitive Bucketization (MSB) to publish microdata with multiple numerical sensitive attributes. We use an example to show the effectiveness of this method in privacy protection when using multiple numerical sensitive attributes.
    Tsinghua Science & Technology 01/2015; 20(3):246-254.
  • Wenhao Shu · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.
    Pattern Recognition 12/2014; 47(12):3890–3906. DOI:10.1016/j.patcog.2014.06.002 · 2.58 Impact Factor
  • Ping He · Hong Shen · Longkun Guo · Yidong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a set of multiple channels, a set of multiple requests, where each request contains multiple requested data items and a client equipped with multiple antennae, the multi-antenna-based multirequest data retrieval problem (DRMR-MA) is to find a data retrieval sequence for downloading all data items of the requests allocated to each antenna, such that the maximum access latency of all antennae is minimized. Most existing approaches for the data retrieval problem focus on either single antenna or single request and are hence not directly applicable to DRMR-MA for retrieving multiple requests. This paper proposes two data retrieval algorithms that adopt two different grouping schemes to solve DRMR-MA so that the requests can be suitably allocated to each antenna. To find the data retrieval sequence of each request efficiently, we present a data retrieval scheme that converts a wireless data broadcast system to a special tree. Experimental results show that the proposed scheme is more efficient than other existing schemes. Copyright © 2014 John Wiley & Sons, Ltd.
    International Journal of Communication Systems 12/2014; DOI:10.1002/dac.2917 · 1.11 Impact Factor
  • Hui Tian · Binze Zhong · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: Traffic matrix (TM) describes the traffic volumes traversing a network from the input nodes to the output nodes over a measured period. Such a TM contains very useful information for network managers, traffic engineers and users. However, TM is hard to be obtained and analyzed due to its large size, especially for large-scale networks. In this paper, we present a new method based on diffusion wavelets for analyzing the traffic matrix. It is shown that this method can conduct efficient multi-resolution analysis (MRA) on TM. We compare the analysis results by using different diffusion operators. Through reconstructing the original TM from the diffused traffic on a particular level, we show the high efficiency of this MRA tool based on these operators. We then develop an anomaly detection method based on the analysis results and explore the possibilities of other potential applications.
    Computers & Electrical Engineering 08/2014; 40(6). DOI:10.1016/j.compeleceng.2014.04.021 · 0.99 Impact Factor
  • Source
    Hong Shen · Shaohua Tang
    The Journal of Supercomputing 08/2014; 69(2):509-511. DOI:10.1007/s11227-014-1244-4 · 0.84 Impact Factor
  • Hong Shen · Yingpeng Sang · Yidong Li
    Journal of Interconnection Networks 05/2014; 14(03). DOI:10.1142/S0219265913020015
  • Wenhao Shu · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: In rough set theory, attribute reduction is a challenging problem in the applications in which data with numbers of attributes available. Moreover, due to dynamic characteristics of data collection in decision systems, attribute reduction will change dynamically as attribute set in decision systems varies over time. How to carry out updating attribute reduction by utilizing previous information is an important task that can help to improve the efficiency of knowledge discovery. In view of that attribute reduction algorithms in incomplete decision systems with the variation of attribute set have not yet been discussed so far. This paper focuses on positive region-based attribute reduction algorithm to solve the attribute reduction problem efficiently in the incomplete decision systems with dynamically varying attribute set. We first introduce an incremental manner to calculate the new positive region and tolerance classes. Consequently, based on the calculated positive region and tolerance classes, the corresponding attribute reduction algorithms on how to compute new attribute reduct are put forward respectively when an attribute set is added into and deleted from the incomplete decision systems. Finally, numerical experiments conducted on different data sets from UCI validate the effectiveness and efficiency of the proposed algorithms in incomplete decision systems with the variation of attribute set.
    International Journal of Approximate Reasoning 03/2014; 55(3):867–884. DOI:10.1016/j.ijar.2013.09.015 · 1.98 Impact Factor
  • Hong Shen · Yingpeng Sang
    Computer Science and Information Systems 01/2014; 11(1):VII-VIII. · 0.58 Impact Factor
  • Hui Tian · Yingpeng Sang · Hong Shen · Chunyue Zhou
    Computer Science and Information Systems 01/2014; 11(1):309-320. DOI:10.2298/CSIS130212010T · 0.58 Impact Factor
  • Kewen Liao · Hong Shen
    [Show abstract] [Hide abstract]
    ABSTRACT: We initiate the study of the reliable resource allocation (RRA) problem. In this problem, we are given a set of sites F each with an unconstrained number of facilities as resources. Every facility at site i is an element of F has an opening cost and a service reliability p(i). There is also a set of clients C to be allocated to facilities. Every client j is an element of C accesses a facility at i with a connection cost and reliability l(ij). In addition, every client j has a minimum reliability requirement (MRR) r(j) for accessing facilities. The objective of the problem is to decide the number of facilities to open at each site and connect these facilities to clients such that all clients' MRRs are satisfied at a minimum total cost. The unconstrained fault-tolerant resource allocation problem studied in Liao and Shen [(2011) Unconstrained and Constrained Fault-Tolerant Resource Allocation. Proceedings of the 17th Annual International Conference on Computing and Combinatorics (COCOON), Dallas, Texas, USA, August 14-16, pp. 555-566. Springer, Berlin] is a special case of RRA. Both of these resource allocation problems are derived from the classical facility location theory. In this paper, for solving the general RRA problem, we develop two equivalent primal-dual algorithms where the second one is an acceleration of the first and runs in quasi-quadratic time. In the algorithm's ratio analysis, we first obtain a constant approximation factor of 2+2 root 2 and then a reduced ratio of 3.722 using a factor revealing program, when l(ij)'s are uniform on i (partially uniform) and r(j)'s are uniform above the threshold reliability that a single access to a facility is able to provide. The analysis further elaborates and generalizes the inverse dual-fitting technique introduced in Xu and Shen [(2009) The Fault-Tolerant Facility Allocation Problem. Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC), Honolulu, HI, USA, December 16-18, pp. 689-698. Springer, Berlin]. Moreover, we formalize this technique for analyzing the minimum set cover problem. For a special case of RRA, where all r(j)'s and l(ij)'s are uniform, we derive its approximation ratio through a novel reduction to the uncapacitated facility location problem. The reduction demonstrates some useful and generic linear programming techniques.
    The Computer Journal 12/2013; 57(1):154-164. DOI:10.1093/comjnl/bxs164 · 0.89 Impact Factor
  • Ping He · Hong Shen · Hui Tian
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a set of data items broadcasting at multiple parallel channels, where each channel has the same broadcast pattern over a time period, and a set of client's requested data items, the data retrieval problem requires to find a sequence of channel access to retrieve the requested data items among the channels such that the total access latency is minimized, where both channel access (to retrieve a data item) and channel switch are assumed to take a single time slot. As an important problem of information retrieval in wireless networks, this problem arises in many applications such as e-commerce and ubiquitous data sharing, and is known two conflicts: requested data items are broadcast at same time slots or adjacent time slots in different channels. Although existing studies focus on this problem with one conflict, there is little work on this problem with two conflicts. So this paper proposes efficient algorithms from two views: single antenna and multiple antennae. Our algorithm adopts a novel approach that wireless data broadcast system is converted to DAG, and applies set cover to solve this problem. Through Experiments, this result presents currently the most efficient algorithm for this problem with two conflicts.
    Proceedings of International Conference on Advances in Mobile Computing & Multimedia; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Learning similarity measure from relevance feedback has become a promising way to enhance the image retrieval performance. Existing approaches mainly focus on taking short-term learning experience to identify a visual similarity measure within a single query session, or applying long-term learning methodology to infer a semantic similarity measure crossing multiple query sessions. However, there is still a big room to elevate the retrieval effectiveness, because little is known in taking the relationship between visual similarity and semantic similarity into account. In this paper, we propose a novel hybrid similarity learning scheme to preserve both visual and semantic resemblance by integrating short-term with long-term learning processes. Concretely, the proposed scheme first learns a semantic similarity from the users' query log, and then, taking this as prior knowledge, learns a visual similarity from a mixture of labeled and unlabeled images. In particular, unlabeled images are exploited for the relevant and irrelevant classes differently and the visual similarity is learned incrementally. Finally, a hybrid similarity measure is produced by fusing the visual and semantic similarities in a nonlinear way for image ranking. An empirical study shows that using hybrid similarity measure for image retrieval is beneficial, and the proposed algorithm achieves better performance than some existing approaches.
    Pattern Recognition 11/2013; 46(11):2927–2939. DOI:10.1016/j.patcog.2013.04.008 · 2.58 Impact Factor
  • Yanbo Wu · Q.Z. Sheng · Hong Shen · Sherali Zeadally
    [Show abstract] [Hide abstract]
    ABSTRACT: In the emerging environment of the Internet of things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we propose a fully distributed model for federated RFID data streams. This model combines two techniques, namely, tilted time frame and histogram to represent the patterns of object flows. Our model is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose several algorithms that use a statistically minimum number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments.
    IEEE Transactions on Parallel and Distributed Systems 10/2013; 24(10):2036-2045. DOI:10.1109/TPDS.2013.99 · 2.17 Impact Factor
  • Hong Shen · Longkun Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: For a given undirected (edge) weighted graph G = (V, E), a terminal set S ⊆ V and a root r ∈ S, the rooted k-vertex connected minimum Steiner network (kVSMNr) problem requires to construct a minimum-cost subgraph of G such that each terminal in S {R} is k-vertex connected to τ. As an important problem in survivable network design, the kVSMNτ problem is known to be NP-hard even when k 1/4 1 [14]. For k 1/4 3 this paper presents a simple combinatorial eight-approximation algorithm, improving the known best ratio 14 of Nutov [20]. Our algorithm constructs an approximate 3VSMNτ through augmenting a two-vertex connected counterpart with additional edges of bounded cost to the optimal. We prove that the total cost of the added edges is at most six times of the optimal by showing that the edges in a 3VSMNτ compose a subgraph containing our solution in such a way that each edge appears in the subgraph at most six times.
    IEEE Transactions on Computers 09/2013; 62(9):1684-1693. DOI:10.1109/TC.2012.170 · 1.47 Impact Factor

Publication Stats

2k Citations
123.93 Total Impact Points

Institutions

  • 2006–2015
    • University of Adelaide
      • School of Computer Science
      Tarndarnya, South Australia, Australia
    • University of Waterloo
      • Department of Electrical & Computer Engineering
      Ватерлоо, Ontario, Canada
    • Manchester Metropolitan University
      Manchester, England, United Kingdom
  • 2014
    • Sun Yat-Sen University
      Shengcheng, Guangdong, China
  • 2013
    • Beijing Jiaotong University
      • School of Computer and Information Technology
      Peping, Beijing, China
  • 2010
    • Hefei University of Technology
      Luchow, Anhui Sheng, China
  • 2006–2010
    • University of Science and Technology of China
      • School of Computer Science and Technology
      Hefei, Anhui Sheng, China
  • 2001–2008
    • Japan Advanced Institute of Science and Technology
      • School of Information Science
      KMQ, Ishikawa, Japan
  • 2007
    • University of Texas at Dallas
      • Department of Computer Science
      Dallas, TX, United States
  • 2006–2007
    • Yunnan Agricultural University
      Panlong, Shaanxi, China
  • 2005–2006
    • Fudan University
      • School of Computer Science
      Shanghai, Shanghai Shi, China
  • 1993–2001
    • Griffith University
      • School of Information and Communication Technology (ICT)
      Southport, Queensland, Australia
  • 2000
    • University of Dayton
      • Department of Computer Science
      Dayton, Ohio, United States
  • 1995
    • Australian National University
      • Research School of Computer Science
      Canberra, Australian Capital Territory, Australia