
[Show abstract] [Hide abstract]
ABSTRACT: Ondemand data broadcast (ODDB) has attracted increasing interest due to its efficiency of disseminating information in many realworld applications such as mobile social services, mobile payment and mobile ecommerce. In an ODDB system, the server places client requested data items received from the uplink to a set of downlink channels for downloading by the clients. Most existing work focused on how to allocate client requested data items to multiple channels for efficient downloading, but did not consider the time constraint of downloading which is critical for many realworld applications. For a set of requests with deadlines for downloading, this paper proposes an effective algorithm to broadcast data items of each request within its specified deadline using multiple channels under the wellknown 2conflict constraint: two data items conflict if they are broadcast in the same time slot or two adjacent time slots in different channels. Our algorithm adopts an approach of allocating most urgent and popular data item first (UPF) for minimizing the overall deadline miss ratio. The performance of the UPF method has been validated by extensive experiments on realworld data sets against three popular ondemand data broadcast schemes. Journal of Systems and Software 05/2015; 103. DOI:10.1016/j.jss.2015.01.022 · 1.25 Impact Factor

Source Available from: Peng Li
[Show abstract] [Hide abstract]
ABSTRACT: Network applications, such as multimedia streaming and video conferencing,
impose growing requirements over Quality of Service (QoS), including bandwidth,
delay, jitter, etc. Meanwhile, networks are expected to be loadbalanced,
energyefficient, and resilient to some degree of failures. It is observed that
the above requirements could be better met with multiple disjoint QoS paths
than a single one. Let $G=(V,\, E)$ be a digraph with nonnegative integral cost
and delay on every edge, $s,\, t\in V$ be two specified vertices, and
$D\in\mathbb{Z}_{0}^{+}$ be a delay bound (or some other constraint), the
\emph{$k$ Disjoint Restricted Shortest Path} ($k$\emph{RSP})\emph{ Problem} is
computing $k$ disjoint paths between $s$ and $t$ with total cost minimized and
total delay bounded by $D$. Few efficient algorithms have been developed
because of the hardness of the problem.
In this paper, we propose efficient algorithms with provable performance
guarantees for the $k$RSP problem. We first present a pseudopolynomialtime
approximation algorithm with a bifactor approximation ratio of $(1,\,2)$, then
improve the algorithm to polynomial time with a bifactor ratio of
$(1+\epsilon,\,2+\epsilon)$ for any fixed $\epsilon>0$, which is better than
the current best approximation ratio $(O(1+\gamma),\, O(1+\frac{1}{\gamma})\})$
for any fixed $\gamma>0$ \cite{orda2004efficient}. To the best of our
knowledge, this is the first constantfactor algorithm that almost strictly
obeys the constraint for the $k$RSP problem.

[Show abstract] [Hide abstract]
ABSTRACT: In mobile networks, wireless data broadcast is a powerful approach for
disseminating a large number of data copies to a great number of clients. The
largest weight data retrieval (LWDR) problem, first addressed by
\cite{Infocom12LuEfficient}, is to schedule the download of a specified subset
of data items in a given time interval, such that the weight of the downloaded
items will be maximized.
In this paper, we present an approximation algorithm with ratio $0.632$ for
LWDR via approximating the maximum sectionalized coverage (MSC) problem, a
generalization of the maximum coverage problem which is one of the most famous
${\cal NP}$complete problems. Let
$\mathbb{F}=\{\mathbb{F}_{1},\dots,\mathbb{F}_{N}\}$, in which
$\mathbb{F}_{i}=\{S_{i,\, j}\vert j=1,\,2\,\dots\}$ is a collection of subsets
of $S=\{u_{1},\, u_{2},\,\dots,u_{n}\}$, and $w(u_{i})$ be the weight of
$u_{i}$. MSC is to select at most one $S_{i,\, j_{i}}$ from each
$\mathbb{F}_{i}$, such that $\sum_{u_{i}\in S'}w(u_{i})$ is maximized, where
$S'=\underset{i=1}{\overset{N}{\cup}}S_{i,\, j_{i}}$. First, the paper presents
a factor$0.632$ approximation algorithm for MSC by giving a novel linear
programming (LP) formula and employing the randomized LP rounding technique. By
reducing from the maximum 3 dimensional matching problem, the paper then shows
that MSC is ${\cal NP}$complete even when every $S\in\mathbb{F}_{i}$ is with
cardinality 2, i.e. $S=2$. Last but not the least, the paper gives a method
transforming any instance of LWDR to an instance of MSC, and shows that an
approximation for MSC can be applied to LWDR almost preserving the
approximation ratio. That is a factor$0.632$ approximation for LWDR, improving
the currently best ratio of $0.5$ in \cite{Infocom12LuEfficient}.

[Show abstract] [Hide abstract]
ABSTRACT: Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating contentbased information about user and item, i.e., tag and keyword. The useritem ratings imply the relevance of users’ tags to items’ keywords, so we convert the direct prediction on the useritem rating matrix into the indirect prediction on the tagkeyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tagkeyword relation matrix based on the statistics of tagkeyword pairs in the ratings. Then, with the relation matrix, we propose a 3factor matrix factorization model over the rating matrix, for learning every user’s interest vector for selected tags and every item’s correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other stateoftheart CF algorithms for historical data, but also has good scalability for new data. KnowledgeBased Systems 03/2015; 83. DOI:10.1016/j.knosys.2015.03.008 · 3.06 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: Groupaware collaborative filtering (CF) has recently become a hot research topic in recommender systems, which typically divides a large CF task on the entire data (i.e. rating matrix) into some smaller CF tasks on subgroups (i.e., submatrices). This leads to an effective way to improve current CF systems in accuracy and efficiency. However, existing approaches consider each subgroup separately, ignoring relationships among subgroups. In this paper, motivated by the intuition that there are similar users or items among different subgroups, we propose an improved groupaware CF algorithm which predicts a rating using a weighted sum of similar ratings from multiple subgroups. Our algorithm is based on Matrix Factorization and CodeBook Transfer (CBT), especially that we construct N matrix approximations based on N best submatrices, and then integrate the N approximations via a linear combination. We conduct experiments on reallife data to evaluate the performance of our algorithm in comparison with traditional CF algorithms and other stateoftheart social and groupaware recommendation models. The empirical result and analysis demonstrate that our algorithm achieves a significant increase in recommendation accuracy. Neurocomputing 03/2015; 165. DOI:10.1016/j.neucom.2015.03.013 · 2.01 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: Anonymized data publication has received considerable attention from the research community in recent years. For numerical sensitive attributes, most of the existing privacypreserving data publishing techniques concentrate on microdata with multiple categorical sensitive attributes or only one numerical sensitive attribute. However, many realworld applications can contain multiple numerical sensitive attributes. Directly applying the existing privacypreserving techniques for singlenumericalsensitiveattribute and multiplecategoricalsensitiveattributes often causes unexpected disclosure of private information. These techniques are particularly prone to the proximity breach, which is a privacy threat specific to numerical sensitive attributes in data publication. In this paper, we propose a privacypreserving data publishing method, namely MNSACM, which uses the ideas of clustering and MultiSensitive Bucketization (MSB) to publish microdata with multiple numerical sensitive attributes. We use an example to show the effectiveness of this method in privacy protection when using multiple numerical sensitive attributes. Tsinghua Science & Technology 01/2015; 20(3):246254.

[Show abstract] [Hide abstract]
ABSTRACT: Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in realworld applications. For such dynamic incomplete data, a classic (nonincremental) approach of feature selection is usually computationally timeconsuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing nonincremental methods. Pattern Recognition 12/2014; 47(12):3890–3906. DOI:10.1016/j.patcog.2014.06.002 · 2.58 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: Given a set of multiple channels, a set of multiple requests, where each request contains multiple requested data items and a client equipped with multiple antennae, the multiantennabased multirequest data retrieval problem (DRMRMA) is to find a data retrieval sequence for downloading all data items of the requests allocated to each antenna, such that the maximum access latency of all antennae is minimized. Most existing approaches for the data retrieval problem focus on either single antenna or single request and are hence not directly applicable to DRMRMA for retrieving multiple requests. This paper proposes two data retrieval algorithms that adopt two different grouping schemes to solve DRMRMA so that the requests can be suitably allocated to each antenna. To find the data retrieval sequence of each request efficiently, we present a data retrieval scheme that converts a wireless data broadcast system to a special tree. Experimental results show that the proposed scheme is more efficient than other existing schemes. Copyright © 2014 John Wiley & Sons, Ltd. International Journal of Communication Systems 12/2014; DOI:10.1002/dac.2917 · 1.11 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: Traffic matrix (TM) describes the traffic volumes traversing a network from the input nodes to the output nodes over a measured period. Such a TM contains very useful information for network managers, traffic engineers and users. However, TM is hard to be obtained and analyzed due to its large size, especially for largescale networks. In this paper, we present a new method based on diffusion wavelets for analyzing the traffic matrix. It is shown that this method can conduct efficient multiresolution analysis (MRA) on TM. We compare the analysis results by using different diffusion operators. Through reconstructing the original TM from the diffused traffic on a particular level, we show the high efficiency of this MRA tool based on these operators. We then develop an anomaly detection method based on the analysis results and explore the possibilities of other potential applications. Computers & Electrical Engineering 08/2014; 40(6). DOI:10.1016/j.compeleceng.2014.04.021 · 0.99 Impact Factor

Journal of Interconnection Networks 05/2014; 14(03). DOI:10.1142/S0219265913020015

[Show abstract] [Hide abstract]
ABSTRACT: In rough set theory, attribute reduction is a challenging problem in the applications in which data with numbers of attributes available. Moreover, due to dynamic characteristics of data collection in decision systems, attribute reduction will change dynamically as attribute set in decision systems varies over time. How to carry out updating attribute reduction by utilizing previous information is an important task that can help to improve the efficiency of knowledge discovery. In view of that attribute reduction algorithms in incomplete decision systems with the variation of attribute set have not yet been discussed so far. This paper focuses on positive regionbased attribute reduction algorithm to solve the attribute reduction problem efficiently in the incomplete decision systems with dynamically varying attribute set. We first introduce an incremental manner to calculate the new positive region and tolerance classes. Consequently, based on the calculated positive region and tolerance classes, the corresponding attribute reduction algorithms on how to compute new attribute reduct are put forward respectively when an attribute set is added into and deleted from the incomplete decision systems. Finally, numerical experiments conducted on different data sets from UCI validate the effectiveness and efficiency of the proposed algorithms in incomplete decision systems with the variation of attribute set. International Journal of Approximate Reasoning 03/2014; 55(3):867–884. DOI:10.1016/j.ijar.2013.09.015 · 1.98 Impact Factor

Computer Science and Information Systems 01/2014; 11(1):VIIVIII. · 0.58 Impact Factor

Computer Science and Information Systems 01/2014; 11(1):309320. DOI:10.2298/CSIS130212010T · 0.58 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: We initiate the study of the reliable resource allocation (RRA) problem. In this problem, we are given a set of sites F each with an unconstrained number of facilities as resources. Every facility at site i is an element of F has an opening cost and a service reliability p(i). There is also a set of clients C to be allocated to facilities. Every client j is an element of C accesses a facility at i with a connection cost and reliability l(ij). In addition, every client j has a minimum reliability requirement (MRR) r(j) for accessing facilities. The objective of the problem is to decide the number of facilities to open at each site and connect these facilities to clients such that all clients' MRRs are satisfied at a minimum total cost. The unconstrained faulttolerant resource allocation problem studied in Liao and Shen [(2011) Unconstrained and Constrained FaultTolerant Resource Allocation. Proceedings of the 17th Annual International Conference on Computing and Combinatorics (COCOON), Dallas, Texas, USA, August 1416, pp. 555566. Springer, Berlin] is a special case of RRA. Both of these resource allocation problems are derived from the classical facility location theory. In this paper, for solving the general RRA problem, we develop two equivalent primaldual algorithms where the second one is an acceleration of the first and runs in quasiquadratic time. In the algorithm's ratio analysis, we first obtain a constant approximation factor of 2+2 root 2 and then a reduced ratio of 3.722 using a factor revealing program, when l(ij)'s are uniform on i (partially uniform) and r(j)'s are uniform above the threshold reliability that a single access to a facility is able to provide. The analysis further elaborates and generalizes the inverse dualfitting technique introduced in Xu and Shen [(2009) The FaultTolerant Facility Allocation Problem. Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC), Honolulu, HI, USA, December 1618, pp. 689698. Springer, Berlin]. Moreover, we formalize this technique for analyzing the minimum set cover problem. For a special case of RRA, where all r(j)'s and l(ij)'s are uniform, we derive its approximation ratio through a novel reduction to the uncapacitated facility location problem. The reduction demonstrates some useful and generic linear programming techniques. The Computer Journal 12/2013; 57(1):154164. DOI:10.1093/comjnl/bxs164 · 0.89 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: Given a set of data items broadcasting at multiple parallel channels, where each channel has the same broadcast pattern over a time period, and a set of client's requested data items, the data retrieval problem requires to find a sequence of channel access to retrieve the requested data items among the channels such that the total access latency is minimized, where both channel access (to retrieve a data item) and channel switch are assumed to take a single time slot. As an important problem of information retrieval in wireless networks, this problem arises in many applications such as ecommerce and ubiquitous data sharing, and is known two conflicts: requested data items are broadcast at same time slots or adjacent time slots in different channels. Although existing studies focus on this problem with one conflict, there is little work on this problem with two conflicts. So this paper proposes efficient algorithms from two views: single antenna and multiple antennae. Our algorithm adopts a novel approach that wireless data broadcast system is converted to DAG, and applies set cover to solve this problem. Through Experiments, this result presents currently the most efficient algorithm for this problem with two conflicts. Proceedings of International Conference on Advances in Mobile Computing & Multimedia; 12/2013

[Show abstract] [Hide abstract]
ABSTRACT: Learning similarity measure from relevance feedback has become a promising way to enhance the image retrieval performance. Existing approaches mainly focus on taking shortterm learning experience to identify a visual similarity measure within a single query session, or applying longterm learning methodology to infer a semantic similarity measure crossing multiple query sessions. However, there is still a big room to elevate the retrieval effectiveness, because little is known in taking the relationship between visual similarity and semantic similarity into account. In this paper, we propose a novel hybrid similarity learning scheme to preserve both visual and semantic resemblance by integrating shortterm with longterm learning processes. Concretely, the proposed scheme first learns a semantic similarity from the users' query log, and then, taking this as prior knowledge, learns a visual similarity from a mixture of labeled and unlabeled images. In particular, unlabeled images are exploited for the relevant and irrelevant classes differently and the visual similarity is learned incrementally. Finally, a hybrid similarity measure is produced by fusing the visual and semantic similarities in a nonlinear way for image ranking. An empirical study shows that using hybrid similarity measure for image retrieval is beneficial, and the proposed algorithm achieves better performance than some existing approaches. Pattern Recognition 11/2013; 46(11):2927–2939. DOI:10.1016/j.patcog.2013.04.008 · 2.58 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: In the emerging environment of the Internet of things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we propose a fully distributed model for federated RFID data streams. This model combines two techniques, namely, tilted time frame and histogram to represent the patterns of object flows. Our model is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose several algorithms that use a statistically minimum number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments. IEEE Transactions on Parallel and Distributed Systems 10/2013; 24(10):20362045. DOI:10.1109/TPDS.2013.99 · 2.17 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: For a given undirected (edge) weighted graph G = (V, E), a terminal set S ⊆ V and a root r ∈ S, the rooted kvertex connected minimum Steiner network (kVSMNr) problem requires to construct a minimumcost subgraph of G such that each terminal in S {R} is kvertex connected to τ. As an important problem in survivable network design, the kVSMNτ problem is known to be NPhard even when k 1/4 1 [14]. For k 1/4 3 this paper presents a simple combinatorial eightapproximation algorithm, improving the known best ratio 14 of Nutov [20]. Our algorithm constructs an approximate 3VSMNτ through augmenting a twovertex connected counterpart with additional edges of bounded cost to the optimal. We prove that the total cost of the added edges is at most six times of the optimal by showing that the edges in a 3VSMNτ compose a subgraph containing our solution in such a way that each edge appears in the subgraph at most six times. IEEE Transactions on Computers 09/2013; 62(9):16841693. DOI:10.1109/TC.2012.170 · 1.47 Impact Factor

[Show abstract] [Hide abstract]
ABSTRACT: In Constrained FaultTolerant Resource Allocation (FTRA) problem, we are given a set of sites containing facilities as resources and a set of clients accessing these resources. Each site i can open at most Ri facilities with opening cost fi. Each client j requires an allocation of rj open facilities and connecting j to any facility at site i incurs a connection cost cij. The goal is to minimize the total cost of this resource allocation scenario. FTRA generalizes the Unconstrained FaultTolerant Resource Allocation (FTRA∞) [10] and the classical FaultTolerant Facility Location (FTFL) [7] problems: for every site i, FTRA∞ does not have the constraint Ri, whereas FTFL sets Ri=1. These problems are said to be uniform if all rj's are the same, and general otherwise. For the general metric FTRA, we first give an LProunding algorithm achieving an approximation ratio of 4. Then we show the problem reduces to FTFL, implying the ratio of 1.7245 from [2]. For the uniform FTRA, we provide a 1.52approximation primaldual algorithm in O(n4) time, where n is the total number of sites and clients. Proceedings of the 19th international conference on Fundamentals of Computation Theory; 08/2013

[Show abstract] [Hide abstract]
ABSTRACT: Compressive sensing based innetwork compression is an efficient technique to reduce communication cost and accurately recover sensory data at the sink. Existing compressive sensing based data gathering methods require a large number of sensors to participate in each measurement gathering, and it leads to waste a lot of energy. In this paper, we present an energy efficient clustering routing data gathering scheme for largescale wireless sensor networks. The main challenges of our scheme are how to obtain the optimal number of clusters and how to keep all cluster heads uniformly distributed. To solve the above problems, we first formulate an energy consumption model to obtain the optimal number of clusters. Second, we design an efficient deterministic dynamic clustering scheme to guarantee all cluster heads uniformly distributed approximately. With extensive simulation, we demonstrate that our scheme not only prolongs nearly 2x network's lifetime compared with the state of the art compressive sensing based data gathering schemes, but also makes the network energy consumption very uniformly. Computers & Electrical Engineering 08/2013; 39(6):19351946. DOI:10.1016/j.compeleceng.2013.04.009 · 0.99 Impact Factor