
[Show abstract]
[Hide abstract]
ABSTRACT:
Ondemand data broadcast (ODDB) has attracted increasing interest due to its efficiency of disseminating information in many realworld applications such as mobile social services, mobile payment and mobile ecommerce. In an ODDB system, the server places client requested data items received from the uplink to a set of downlink channels for downloading by the clients. Most existing work focused on how to allocate client requested data items to multiple channels for efficient downloading, but did not consider the time constraint of downloading which is critical for many realworld applications. For a set of requests with deadlines for downloading, this paper proposes an effective algorithm to broadcast data items of each request within its specified deadline using multiple channels under the wellknown 2conflict constraint: two data items conflict if they are broadcast in the same time slot or two adjacent time slots in different channels. Our algorithm adopts an approach of allocating most urgent and popular data item first (UPF) for minimizing the overall deadline miss ratio. The performance of the UPF method has been validated by extensive experiments on realworld data sets against three popular ondemand data broadcast schemes.
Journal of Systems and Software 05/2015; 103. DOI:10.1016/j.jss.2015.01.022 · 1.25 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Network applications, such as multimedia streaming and video conferencing,
impose growing requirements over Quality of Service (QoS), including bandwidth,
delay, jitter, etc. Meanwhile, networks are expected to be loadbalanced,
energyefficient, and resilient to some degree of failures. It is observed that
the above requirements could be better met with multiple disjoint QoS paths
than a single one. Let $G=(V,\, E)$ be a digraph with nonnegative integral cost
and delay on every edge, $s,\, t\in V$ be two specified vertices, and
$D\in\mathbb{Z}_{0}^{+}$ be a delay bound (or some other constraint), the
\emph{$k$ Disjoint Restricted Shortest Path} ($k$\emph{RSP})\emph{ Problem} is
computing $k$ disjoint paths between $s$ and $t$ with total cost minimized and
total delay bounded by $D$. Few efficient algorithms have been developed
because of the hardness of the problem.
In this paper, we propose efficient algorithms with provable performance
guarantees for the $k$RSP problem. We first present a pseudopolynomialtime
approximation algorithm with a bifactor approximation ratio of $(1,\,2)$, then
improve the algorithm to polynomial time with a bifactor ratio of
$(1+\epsilon,\,2+\epsilon)$ for any fixed $\epsilon>0$, which is better than
the current best approximation ratio $(O(1+\gamma),\, O(1+\frac{1}{\gamma})\})$
for any fixed $\gamma>0$ \cite{orda2004efficient}. To the best of our
knowledge, this is the first constantfactor algorithm that almost strictly
obeys the constraint for the $k$RSP problem.

[Show abstract]
[Hide abstract]
ABSTRACT:
In mobile networks, wireless data broadcast is a powerful approach for
disseminating a large number of data copies to a great number of clients. The
largest weight data retrieval (LWDR) problem, first addressed by
\cite{Infocom12LuEfficient}, is to schedule the download of a specified subset
of data items in a given time interval, such that the weight of the downloaded
items will be maximized.
In this paper, we present an approximation algorithm with ratio $0.632$ for
LWDR via approximating the maximum sectionalized coverage (MSC) problem, a
generalization of the maximum coverage problem which is one of the most famous
${\cal NP}$complete problems. Let
$\mathbb{F}=\{\mathbb{F}_{1},\dots,\mathbb{F}_{N}\}$, in which
$\mathbb{F}_{i}=\{S_{i,\, j}\vert j=1,\,2\,\dots\}$ is a collection of subsets
of $S=\{u_{1},\, u_{2},\,\dots,u_{n}\}$, and $w(u_{i})$ be the weight of
$u_{i}$. MSC is to select at most one $S_{i,\, j_{i}}$ from each
$\mathbb{F}_{i}$, such that $\sum_{u_{i}\in S'}w(u_{i})$ is maximized, where
$S'=\underset{i=1}{\overset{N}{\cup}}S_{i,\, j_{i}}$. First, the paper presents
a factor$0.632$ approximation algorithm for MSC by giving a novel linear
programming (LP) formula and employing the randomized LP rounding technique. By
reducing from the maximum 3 dimensional matching problem, the paper then shows
that MSC is ${\cal NP}$complete even when every $S\in\mathbb{F}_{i}$ is with
cardinality 2, i.e. $S=2$. Last but not the least, the paper gives a method
transforming any instance of LWDR to an instance of MSC, and shows that an
approximation for MSC can be applied to LWDR almost preserving the
approximation ratio. That is a factor$0.632$ approximation for LWDR, improving
the currently best ratio of $0.5$ in \cite{Infocom12LuEfficient}.

[Show abstract]
[Hide abstract]
ABSTRACT:
Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating contentbased information about user and item, i.e., tag and keyword. The useritem ratings imply the relevance of users’ tags to items’ keywords, so we convert the direct prediction on the useritem rating matrix into the indirect prediction on the tagkeyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tagkeyword relation matrix based on the statistics of tagkeyword pairs in the ratings. Then, with the relation matrix, we propose a 3factor matrix factorization model over the rating matrix, for learning every user’s interest vector for selected tags and every item’s correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other stateoftheart CF algorithms for historical data, but also has good scalability for new data.
KnowledgeBased Systems 03/2015; 83. DOI:10.1016/j.knosys.2015.03.008 · 3.06 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Groupaware collaborative filtering (CF) has recently become a hot research topic in recommender systems, which typically divides a large CF task on the entire data (i.e. rating matrix) into some smaller CF tasks on subgroups (i.e., submatrices). This leads to an effective way to improve current CF systems in accuracy and efficiency. However, existing approaches consider each subgroup separately, ignoring relationships among subgroups. In this paper, motivated by the intuition that there are similar users or items among different subgroups, we propose an improved groupaware CF algorithm which predicts a rating using a weighted sum of similar ratings from multiple subgroups. Our algorithm is based on Matrix Factorization and CodeBook Transfer (CBT), especially that we construct N matrix approximations based on N best submatrices, and then integrate the N approximations via a linear combination. We conduct experiments on reallife data to evaluate the performance of our algorithm in comparison with traditional CF algorithms and other stateoftheart social and groupaware recommendation models. The empirical result and analysis demonstrate that our algorithm achieves a significant increase in recommendation accuracy.
Neurocomputing 03/2015; DOI:10.1016/j.neucom.2015.03.013 · 2.01 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in realworld applications. For such dynamic incomplete data, a classic (nonincremental) approach of feature selection is usually computationally timeconsuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing nonincremental methods.
Pattern Recognition 12/2014; 47(12):3890–3906. DOI:10.1016/j.patcog.2014.06.002 · 2.58 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Given a set of multiple channels, a set of multiple requests, where each request contains multiple requested data items and a client equipped with multiple antennae, the multiantennabased multirequest data retrieval problem (DRMRMA) is to find a data retrieval sequence for downloading all data items of the requests allocated to each antenna, such that the maximum access latency of all antennae is minimized. Most existing approaches for the data retrieval problem focus on either single antenna or single request and are hence not directly applicable to DRMRMA for retrieving multiple requests. This paper proposes two data retrieval algorithms that adopt two different grouping schemes to solve DRMRMA so that the requests can be suitably allocated to each antenna. To find the data retrieval sequence of each request efficiently, we present a data retrieval scheme that converts a wireless data broadcast system to a special tree. Experimental results show that the proposed scheme is more efficient than other existing schemes. Copyright © 2014 John Wiley & Sons, Ltd.
International Journal of Communication Systems 12/2014; DOI:10.1002/dac.2917 · 1.11 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Traffic matrix (TM) describes the traffic volumes traversing a network from the input nodes to the output nodes over a measured period. Such a TM contains very useful information for network managers, traffic engineers and users. However, TM is hard to be obtained and analyzed due to its large size, especially for largescale networks. In this paper, we present a new method based on diffusion wavelets for analyzing the traffic matrix. It is shown that this method can conduct efficient multiresolution analysis (MRA) on TM. We compare the analysis results by using different diffusion operators. Through reconstructing the original TM from the diffused traffic on a particular level, we show the high efficiency of this MRA tool based on these operators. We then develop an anomaly detection method based on the analysis results and explore the possibilities of other potential applications.
Computers & Electrical Engineering 08/2014; 40(6). DOI:10.1016/j.compeleceng.2014.04.021 · 0.99 Impact Factor

Journal of Interconnection Networks 05/2014; 14(03). DOI:10.1142/S0219265913020015

[Show abstract]
[Hide abstract]
ABSTRACT:
In rough set theory, attribute reduction is a challenging problem in the applications in which data with numbers of attributes available. Moreover, due to dynamic characteristics of data collection in decision systems, attribute reduction will change dynamically as attribute set in decision systems varies over time. How to carry out updating attribute reduction by utilizing previous information is an important task that can help to improve the efficiency of knowledge discovery. In view of that attribute reduction algorithms in incomplete decision systems with the variation of attribute set have not yet been discussed so far. This paper focuses on positive regionbased attribute reduction algorithm to solve the attribute reduction problem efficiently in the incomplete decision systems with dynamically varying attribute set. We first introduce an incremental manner to calculate the new positive region and tolerance classes. Consequently, based on the calculated positive region and tolerance classes, the corresponding attribute reduction algorithms on how to compute new attribute reduct are put forward respectively when an attribute set is added into and deleted from the incomplete decision systems. Finally, numerical experiments conducted on different data sets from UCI validate the effectiveness and efficiency of the proposed algorithms in incomplete decision systems with the variation of attribute set.
International Journal of Approximate Reasoning 03/2014; 55(3):867–884. DOI:10.1016/j.ijar.2013.09.015 · 1.98 Impact Factor

Computer Science and Information Systems 01/2014; 11(1):309320. DOI:10.2298/CSIS130212010T · 0.58 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
We initiate the study of the reliable resource allocation (RRA) problem. In this problem, we are given a set of sites F each with an unconstrained number of facilities as resources. Every facility at site i is an element of F has an opening cost and a service reliability p(i). There is also a set of clients C to be allocated to facilities. Every client j is an element of C accesses a facility at i with a connection cost and reliability l(ij). In addition, every client j has a minimum reliability requirement (MRR) r(j) for accessing facilities. The objective of the problem is to decide the number of facilities to open at each site and connect these facilities to clients such that all clients' MRRs are satisfied at a minimum total cost. The unconstrained faulttolerant resource allocation problem studied in Liao and Shen [(2011) Unconstrained and Constrained FaultTolerant Resource Allocation. Proceedings of the 17th Annual International Conference on Computing and Combinatorics (COCOON), Dallas, Texas, USA, August 1416, pp. 555566. Springer, Berlin] is a special case of RRA. Both of these resource allocation problems are derived from the classical facility location theory. In this paper, for solving the general RRA problem, we develop two equivalent primaldual algorithms where the second one is an acceleration of the first and runs in quasiquadratic time. In the algorithm's ratio analysis, we first obtain a constant approximation factor of 2+2 root 2 and then a reduced ratio of 3.722 using a factor revealing program, when l(ij)'s are uniform on i (partially uniform) and r(j)'s are uniform above the threshold reliability that a single access to a facility is able to provide. The analysis further elaborates and generalizes the inverse dualfitting technique introduced in Xu and Shen [(2009) The FaultTolerant Facility Allocation Problem. Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC), Honolulu, HI, USA, December 1618, pp. 689698. Springer, Berlin]. Moreover, we formalize this technique for analyzing the minimum set cover problem. For a special case of RRA, where all r(j)'s and l(ij)'s are uniform, we derive its approximation ratio through a novel reduction to the uncapacitated facility location problem. The reduction demonstrates some useful and generic linear programming techniques.
The Computer Journal 12/2013; 57(1):154164. DOI:10.1093/comjnl/bxs164 · 0.89 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Given a set of data items broadcasting at multiple parallel channels, where each channel has the same broadcast pattern over a time period, and a set of client's requested data items, the data retrieval problem requires to find a sequence of channel access to retrieve the requested data items among the channels such that the total access latency is minimized, where both channel access (to retrieve a data item) and channel switch are assumed to take a single time slot. As an important problem of information retrieval in wireless networks, this problem arises in many applications such as ecommerce and ubiquitous data sharing, and is known two conflicts: requested data items are broadcast at same time slots or adjacent time slots in different channels. Although existing studies focus on this problem with one conflict, there is little work on this problem with two conflicts. So this paper proposes efficient algorithms from two views: single antenna and multiple antennae. Our algorithm adopts a novel approach that wireless data broadcast system is converted to DAG, and applies set cover to solve this problem. Through Experiments, this result presents currently the most efficient algorithm for this problem with two conflicts.
Proceedings of International Conference on Advances in Mobile Computing & Multimedia; 12/2013

[Show abstract]
[Hide abstract]
ABSTRACT:
Learning similarity measure from relevance feedback has become a promising way to enhance the image retrieval performance. Existing approaches mainly focus on taking shortterm learning experience to identify a visual similarity measure within a single query session, or applying longterm learning methodology to infer a semantic similarity measure crossing multiple query sessions. However, there is still a big room to elevate the retrieval effectiveness, because little is known in taking the relationship between visual similarity and semantic similarity into account. In this paper, we propose a novel hybrid similarity learning scheme to preserve both visual and semantic resemblance by integrating shortterm with longterm learning processes. Concretely, the proposed scheme first learns a semantic similarity from the users' query log, and then, taking this as prior knowledge, learns a visual similarity from a mixture of labeled and unlabeled images. In particular, unlabeled images are exploited for the relevant and irrelevant classes differently and the visual similarity is learned incrementally. Finally, a hybrid similarity measure is produced by fusing the visual and semantic similarities in a nonlinear way for image ranking. An empirical study shows that using hybrid similarity measure for image retrieval is beneficial, and the proposed algorithm achieves better performance than some existing approaches.
Pattern Recognition 11/2013; 46(11):2927–2939. DOI:10.1016/j.patcog.2013.04.008 · 2.58 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
In the emerging environment of the Internet of things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we propose a fully distributed model for federated RFID data streams. This model combines two techniques, namely, tilted time frame and histogram to represent the patterns of object flows. Our model is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose several algorithms that use a statistically minimum number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments.
IEEE Transactions on Parallel and Distributed Systems 10/2013; 24(10):20362045. DOI:10.1109/TPDS.2013.99 · 2.17 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
For a given undirected (edge) weighted graph G = (V, E), a terminal set S ⊆ V and a root r ∈ S, the rooted kvertex connected minimum Steiner network (kVSMNr) problem requires to construct a minimumcost subgraph of G such that each terminal in S {R} is kvertex connected to τ. As an important problem in survivable network design, the kVSMNτ problem is known to be NPhard even when k 1/4 1 [14]. For k 1/4 3 this paper presents a simple combinatorial eightapproximation algorithm, improving the known best ratio 14 of Nutov [20]. Our algorithm constructs an approximate 3VSMNτ through augmenting a twovertex connected counterpart with additional edges of bounded cost to the optimal. We prove that the total cost of the added edges is at most six times of the optimal by showing that the edges in a 3VSMNτ compose a subgraph containing our solution in such a way that each edge appears in the subgraph at most six times.
IEEE Transactions on Computers 09/2013; 62(9):16841693. DOI:10.1109/TC.2012.170 · 1.47 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
In Constrained FaultTolerant Resource Allocation (FTRA) problem, we are given a set of sites containing facilities as resources and a set of clients accessing these resources. Each site i can open at most Ri facilities with opening cost fi. Each client j requires an allocation of rj open facilities and connecting j to any facility at site i incurs a connection cost cij. The goal is to minimize the total cost of this resource allocation scenario. FTRA generalizes the Unconstrained FaultTolerant Resource Allocation (FTRA∞) [10] and the classical FaultTolerant Facility Location (FTFL) [7] problems: for every site i, FTRA∞ does not have the constraint Ri, whereas FTFL sets Ri=1. These problems are said to be uniform if all rj's are the same, and general otherwise. For the general metric FTRA, we first give an LProunding algorithm achieving an approximation ratio of 4. Then we show the problem reduces to FTFL, implying the ratio of 1.7245 from [2]. For the uniform FTRA, we provide a 1.52approximation primaldual algorithm in O(n4) time, where n is the total number of sites and clients.
Proceedings of the 19th international conference on Fundamentals of Computation Theory; 08/2013

[Show abstract]
[Hide abstract]
ABSTRACT:
Compressive sensing based innetwork compression is an efficient technique to reduce communication cost and accurately recover sensory data at the sink. Existing compressive sensing based data gathering methods require a large number of sensors to participate in each measurement gathering, and it leads to waste a lot of energy. In this paper, we present an energy efficient clustering routing data gathering scheme for largescale wireless sensor networks. The main challenges of our scheme are how to obtain the optimal number of clusters and how to keep all cluster heads uniformly distributed. To solve the above problems, we first formulate an energy consumption model to obtain the optimal number of clusters. Second, we design an efficient deterministic dynamic clustering scheme to guarantee all cluster heads uniformly distributed approximately. With extensive simulation, we demonstrate that our scheme not only prolongs nearly 2x network's lifetime compared with the state of the art compressive sensing based data gathering schemes, but also makes the network energy consumption very uniformly.
Computers & Electrical Engineering 08/2013; 39(6):19351946. DOI:10.1016/j.compeleceng.2013.04.009 · 0.99 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Data publishing based on hypergraphs is becoming increasingly popular due to its power in representing multirelations among objects. However, security issues have been little studied on this subject, while most recent work only focuses on the protection of relational data or graphs. As a major privacy breach, identity disclosure reveals the identification of entities with certain background knowledge known by an adversary. In this paper, we first introduce a novel background knowledge attack model based on the property of hyperedge ranks, and formalize the rankbased hypergraph anonymization problem. We then propose a complete solution in a twostep framework: rank anonymization and hypergraph reconstruction. We also take hypergraph clustering (known as community detection) as data utility into consideration, and discuss two metrics to quantify information loss incurred in the perturbation. Our approaches are effective in terms of efficacy, privacy, and utility. The algorithms run in nearquadratic time on hypergraph size, and protect data from rank attacks with almost the same utility preserved. The performances of the methods have been validated by extensive experiments on realworld datasets as well. Our rankbased attack model and algorithms for rank anonymization and hypergraph reconstruction are, to our best knowledge, the first systematic study to privacy preserving for hypergraphbased data publishing.
IEEE Transactions on Information Forensics and Security 08/2013; 8(8):13841396. DOI:10.1109/TIFS.2013.2271425 · 2.07 Impact Factor

[Show abstract]
[Hide abstract]
ABSTRACT:
Given n f sites, each equipped with one facility, and n c cities, fault tolerant facility location (FTFL) [K. Jain and V. V. Vazirani, Lect. Notes Comput. Sci. 1913, 177–183 (2000; Zbl 0976.90056)] requires computing a minimumcost connection scheme such that each city connects to a specified number of facilities. When each city connects to exactly one facility, FTFL becomes the classical uncapacitated facility location problem (UFL) that is wellknown NP hard. The current best solution to FTFL admits an approximation ratio 1.7245 due to J. Byrka, A. Srinivasan, and Ch. Swamy applying the dependent rounding technique announced recently [Lect. Notes Comput. Sci. 6080, 244–257 (2010; Zbl 1281.90021)], which improves the ratio 2.076 obtained by Ch. Swamy and D. B. Shmoys based on LP rounding [ACM Trans. Algorithms 4, 1–27 (2008)]. In this paper, we study a variant of the FTFL problem, namely, fault tolerant facility allocation (FTFA), as another generalization of UFL by allowing each site to hold multiple facilities and show that we can obtain better solutions for this problem. We first give two algorithms with 1.81 and 1.61 approximation ratios in time complexity O(mRlogm) and O(Rn 3 ), respectively, where R is the maximum number of facilities required by any city, m=n f n c , and n=max{n f ,n c }. Instead of applying the dualfitting technique that reduces the dual problem’s solution to fit the original problem as used in the literature, we propose a method called inverse dualfitting that alters the original problem to fit the dual solution and show that this method is more effective for obtaining solutions of multifactor approximation. We show that applying inverse dualfitting and factorrevealing techniques our second algorithm is also (1.11,1.78) and (1,2)approximation simultaneously. These results can be further used to achieve solutions of 1.52approximation to FTFA and 4approximation to the fault tolerant kfacility allocation problem in which the total number of facilities is bounded by k. These are currently the best bifactor and singlefactor approximation ratios for the problems concerned.
SIAM Journal on Discrete Mathematics 07/2013; 27(3). DOI:10.1137/090781048 · 0.58 Impact Factor