About
292
Publications
21,681
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,416
Citations
Publications
Publications (292)
Motif discovery is a fundamental operation in the analysis of time series data. Existing motif discovery algorithms that support Dynamic Time Warping require manual determination of the exact length of motifs. However, setting appropriate length for interesting motifs can be challenging and selecting inappropriate motif lengths may result in valuab...
Reachability query is a fundamental problem and has been well studied on static graphs. However, in the real world, the graphs are not static but always evolving over time. In this paper, we study the problem of historical reachability query on evolving graphs. We propose a novel index, named HR-Index, which integrates complete and correct historic...
The relationship strength between individuals in the network is an essential task in network analysis. However, existing measures of relationship strength are mostly artificially predefined, which can only reflect the relationship strength from a single perspective. To compensate for this, we propose a novel GNN-based model for Mining Relationship...
Age of Information, which emerged as a new metric to quantify the freshness of information, has attracted increasing interests recently. To optimize the system AoI, most existing works try to compute an efficient schedule from the point of data transmission. Unfortunately, at wireless-powered network edge, the charging schedule of the source nodes...
By combining edge computing and parallel computing, distributed edge computing has emerged as a new paradigm to exploit the booming IoT devices at the edge. To accelerate computation at the edge,
i.e.
, the inference tasks for DNN-driven applications, the parallelism of both computation and communication needs to be considered for distributed edg...
In smart phones, vehicles and wearable devices, GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world. Given a set of weighted points and a rectangle r in the space, a maximizing range sum (MaxRS) query is to find the position of r, so as to maximize the total weight of the points covered by r (i.e., the range su...
Age of information (AoI), a metric measuring the information freshness, has drawn increased attention due to its importance in monitoring applications in which nodes send time-stamped status updates to interested recipients, and timely updates about phenomena are important. In this work, we consider the AoI minimization scheduling problem in multi-...
In the social network, each user has attributes for self-description called user attributes which are semantically hierarchical. Attribute inference has become an essential way for social platforms to realize user classifications and targeted recommendations. Most existing approaches mainly focus on the flat inference problem neglecting the semanti...
Graph is a generic model of various networks in real-world applications. And, graph embedding aims to represent nodes (edges or graphs) as low-dimensional vectors which can be fed into machine learning algorithms for downstream graph analysis tasks. However, existing random walk-based node embedding methods often map some nodes with (dis)similar lo...
Nowadays, there are ubiquitousness of GPS sensors in various devices collecting, transmitting and storing tremendous trajectory data. However, such an unprecedented scale of GPS data has put great pressure on transmitting it on the internet and posed an urgent demand for not only an effective storage mechanism but also an efficient query mechanism....
Subsequence matching is an important and fundamental problem on time series data. This paper studies the inherent time complexity of the subsequence matching problem and designs a more efficient algorithm for solving the problem. Firstly, it is proved that the subsequence matching problem is incomputable in time O ( n 1-δ ) even allowing polynomial...
Multi-access Edge Computing (MEC) is an emerging computing architecture to release the resource burden of the centralized cloud and reduce the mobile application latency. Services management and MEC requests routing is a major problem in MEC systems. Existing works mainly focus on the one-hop centralized request routing strategies. However, the cen...
Due to the large volume of IoT data, conventional sensor network based and the cloud base IoT systems cannot handle latency-sensitive and resource-consuming IoT applications. Sensor networks do not have enough computation resources and also suffer from a limited network lifetime. On the other hand, the cloud based IoT system is far away from the us...
In many practical applications, G-Skyline query is an important operation to return the best tuple groups, which are not g-dominated by other tuple groups of the same size, from a potentially huge data space. It is found that the existing G-Skyline algorithms cannot deal well with massive data due to high I/O cost and high computation cost. This pa...
Approximate top-k query returns a list of k tuples that have approximate largest scores with respect to the user given query. However, existing algorithms cannot effectively process the approximate top-k queries on big data, because they either restrict the class of ranking functions, or fail to take selection conditions into consideration. In this...
Battery-free Wireless Sensor Networks (BF-WSNs) extend the lifetime of wireless sensor networks (WSNs) using ambient energy sources. Thus, it becomes an emerging research area of Internet of Things (IoT) in recent years. Although many existing works in this area study data collection, few of them focus on optimizing the latency of data collection....
Battery-free Wireless Sensor Networks (BF-WSNs) is a promising technology of Wireless Sensor Networks (WSNs). Nodes in BF-WSNs are able to harvest ambient energy, so they have eternal life in theory. Although many works in this field study data collection, few works are concerned about latency. Moreover, some of them adopt unrealistic centralized s...
Age of Information (AoI) is a new metric for measuring the freshness of sensory data in wireless sensor networks. The Battery-Free Wireless Sensor Network (BF-WSN) is proposed to break through the lifetime limitation of battery-powered wireless sensor networks. However, the emerging BF-WSN also brings challenges to the minimization of AoI, on accou...
Analyzing dynamic information networks, which contain evolving objects and links, to meet users’ various needs has attracted much attention in recent years. For sales companies, recruiting staff who are socialites would help to increase sales volume since such staff often sell more. For universities, employing active collaborators who are productiv...
The energy limitation of wireless sensors limits the lifetime of the traditional wireless sensor networks. The Battery-Free Sensor Network (BF-WSN) is a new network architecture proposed in recent years to address the limitation of wireless sensor networks. In a BF-WSN, the battery-free node can harvest energy from the ambient environment, and thus...
The emerging energy-harvesting technology enables charging sensor batteries with renewable energy sources, which has been effectively integrated into Wireless Sensor Networks (EH-WSNs). Due to the limited energy-harvesting capacities of tiny sensors, the captured energy remains scarce and differs greatly among nodes, which makes the data aggregatio...
Broadcasting is an essential operation for the source node to disseminate the message to all other nodes in the network. Unfortunately, the problem of Minimum Latency Broadcast Scheduling (MLBS) in duty-cycled wireless networks is not well studied. In existing works, the construction of broadcast tree and the scheduling of transmissions are conduct...
SPARQL 1.1 offers a type of navigational query for RDF systems, called regular path query (RPQ). A regular path query allows for retrieving node pairs with the paths between them satisfying regular expressions. Regular path queries are always difficult to be evaluated efficiently because of the possible large search space. Thus there has been no sc...
With the popularity of time series analysis, failure during data recording, transmission, and storage makes missing blocks in time series a problem to be solved. Therefore, it is of great significance to study effective methods to recover missing blocks in time series for better analysis and mining. In this paper, we focus on the situation of conti...
Nowadays, the amount of GPS-equipped devices is increasing dramatically and they generate raw trajectory data constantly. Many location-based services that use trajectory data are becoming increasingly popular in many fields. However, the amount of raw trajectory data is usually too large. Such a large amount of data is expensive to store, and the...
In smart phones, vehicles and wearable devices, GPS sensors are ubiquitous, which can collect a large amount of valuable trajectory data by tracking moving objects. Analysis of this valuable trajectory data can benefit many practical applications, such as route planning and transportation optimization. However, unprecedented large-scale GPS data po...
In a real social network, each user has attributes for self-description called user attributes which are semantically hierarchical. With these attributes, we can implement personalized services such as user classification and targeted recommendations. Most traditional approaches mainly focus on the flat inference problem without considering the sem...
Dynamic information networks, containing evolving nodes and links, exist in various applications. For example, in a Facebook network, nodes represent users, links represent friendship, and users often form different groups. Over time, some users will leave some groups. Thus, for both users and groups, it’s meaningful to predict which users would le...
With advances in wireless power transfer techniques, Battery-Free Wireless Sensor Networks (BF-WSNs) which can support long-term applications, has been attracting increasing interests in recent years. Unfortunately, the problem of Minimum Latency Aggregation Scheduling (MLAS) is not well studied in BF-WSNs. Existing works always have a rigid assump...
Both the volume and the collection velocity of time series generated by monitoring sensors are increasing in the Internet of Things (IoT). Data management and analysis requires high quality and applicability of the IoT data. However, errors are prevalent in original time series data. Inconsistency in time series is a serious data quality problem ex...
Masses of large-scale knowledge graphs on various domains have sprung up in recent years. They are no longer able to be managed on a single machine. The distributed RDF systems intervene in the scalability issue using partitioning techniques. However, most of these systems are unaware of query workload and employ static partitioning. As diverse and...
To break through the limitation of battery-powered wireless sensor networks, a novel kind of network, named battery-free wireless sensor network (BF-WSN), is proposed. Battery-free sensor nodes in BF-WSNs harvest energy from power sources in their ambient environment, such as solar power, wind power and radio frequency (RF) signal power,
etc.
, i...
The existing literatures of the query processing on knowledge graphs focus on an exhaustive enumeration of all matches, which is time-consuming. Users are often interested in diversified top-k matches, rather than the entire match set. Motivated by these, this paper formalizes the diversified top-k querying (DTQ) problem in the context of RDF/SPARQ...
In traditional facility location recommendations, the objective is to select the best locations which maximize the coverage or convenience of users. However, since users’ behavioral habits are often influenced by time, the temporal impacts should not be neglected in recommendation. In this paper, we study the problem of time-aware facility location...
Battery-Free Wireless Sensor Network (BF-WSN) is a newly proposed network architecture to address the limitation of traditional Wireless Sensor Networks (WSNs). The special features of BF-WSNs make the coverage problem quite different and even more challenging from and than that in traditional WSNs. This paper defines a new coverage problem in BF-W...
Recently, in the area of big data, some popular applications such as web search engines and recommendation systems, face the problem to diversify results during query processing. In this sense, it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set. In this paper, we firstl...
In this paper, we study the problem of the SUM query approximation with histograms. We define a new kind of histogram called the SUM-optimal histogram which can provide better estimation result for the SUM queries than the traditional equi-depth and V-optimal histograms. We propose three methods for the histogram construction. The first one is a dy...
In practical applications, top-k high utility itemset mining (top-k HUIM) is an interesting operation to find the k itemsets with the highest utilities. It is analyzed that, the existing algorithms only can deal with the small and medium-sized data, and their performance degrades significantly on massive data. This paper presents a novel top-k HUIM...
The lifetime of battery-powered Wireless Sensor Networks (WSNs) are limited by the batteries equipped in sensors. The appearance of Battery-free Wireless Sensor Networks (BF-WSNs) breaks through this limitation, in which battery-free sensors harvest energy from sustainable but uncontrollable energy sources in ambient environment, such as solar powe...
Data quality plays a key role in big data management today. With the explosive growth of data from a variety of sources, the quality of data is faced with multiple problems. Motivated by this, we study the multiple data cleaning on incompleteness and inconsistency with currency reasoning and determination in this paper. We introduce a 4-step framew...
Data aggregation is a fundamental yet popular operation in wireless networks where the sink needs to obtain the combined information of the whole network. However, the problem of minimum latency aggregation scheduling (MLAS) is not well studied in cognitive radio networks. Few studies have addressed this issue and most previous aggregation methods...
With the explosive growth of information, inconsistent data are increasingly common. However, traditional feature selection methods are lack of efficiency due to inconsistent data repairing beforehand. Therefore, it is necessary to take inconsistencies into consideration during feature selection to not only reduce time costs but also guarantee accu...
The mobile devices can send jobs to be processed at one of the nearby edge-cloud servers (edge server) rather than the remote cloud server with low latency in edge computing systems. One key problem in such environment is how to assign the jobs to the edge servers so that the completion time is minimized. In this paper, we propose a general model f...
The problem of hyperparameter optimization exists widely in the real life and many common tasks can be transformed into it, such as neural architecture search and feature subset selection. Without considering various constraints, the existing hyperparameter tuning techniques can solve these problems effectively by traversing as many hyperparameter...
This paper revisits set containment join (SCJ) problem, which uses the subset relationship (i.e., ⊆) as condition to join set-valued attributes of two relations and has many fundamental applications in commercial and scientific fields. Existing in-memory algorithms for SCJ are either signature-based or prefix-tree-based. The former incurs high CPU...
Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single imputation method could impute all missing values...
In many fields, a mass of algorithms with completely different hyperparameters have been developed to address the same type of problems. Choosing the algorithm and hyperparameter setting correctly can promote the overall performance greatly, but users often fail to do so due to the absence of knowledge. How to help users to effectively and quickly...
Battery-Free Wireless Sensor Networks (BF-WSNs) are newly emerging Wireless Sensor Networks (WSNs) to break through the energy limitations of traditional WSNs. In BF-WSNs, the broadcast scheduling problem is more challenging than that in traditional WSNs. This article investigates the broadcast scheduling problem in BF-WSNs with the purpose of mini...
A new network architecture, named as RF-based battery-free sensor network, was proposed in recent years to overcome the lifetime limitation of traditional wireless sensor networks. In an RF-based battery-free sensor network, the battery-free nodes equip no battery and can be recharged by RF-signals. The Dominating Set (DS) is a key method to mainta...
In many applications, range skyline query is an important operation to find the interesting tuples in a potentially huge data space. Given selection condition, range skyline query returns tuples both satisfying the specified selection condition and not dominated by other satisfying tuples. It is found that most of the existing skyline algorithms do...
The Count-Min sketch and its variations are widely used to solve the frequency estimation problem due to its sub-linear space cost. However, the collisions between high-frequency and low-frequency items introduce a significant estimation error. In this paper, we propose two learned sketches called the Learned Count-Min sketch and Learned Augmented...
The great amount of time series generated by machines has enormous value in intelligent industry. Knowledge can be discovered from high-quality time series, and used for production optimization and anomaly detection in industry. However, the original sensors data always contain many errors. This requires a sophisticated cleaning strategy and a well...
This paper revisits set containment join (SCJ), which has many fundamental applications in commercial and scientific fields. To improve the performance further, this paper proposes a new adaptive parameter-free in-memory algorithm for SCJ, named as \(\mathsf {FreshJoin}\). It accomplishes this by exploiting two flat indices, which record three kind...
In many applications, top-k skyline query is an important operation to return k skyline tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms cannot process top-k skyline query on massive data efficiently. In this paper, we propose a novel table-scan-based algorithm RSTS to compute t...
Intelligent manufactory is a typical application of big data analysis. Flexible production line is an essential fundamental of intelligent manufactory. Producing different types of similar products alternately in one line with fixed stations but varying parameters is a typical kind of flexibility. In this case, the quality of products is directly d...
As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a characteristic-set-based partitioning scheme. This scheme (i) supports the fully parallel processing of join within characteristic se...
Beaconing is a fundamental networking service where each node broadcasts a packet to all its neighbors locally. Unfortunately, the problem Minimum Latency Beaconing Schedule (MLBS) in duty-cycled scenarios is not well studied. Existing works always have rigid assumption that each node is only active once per working cycle. Aiming at making the work...
The skyline query is important in database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users with valuable references. In this paper, we propose a novel skyline definition utilizing probab...
Frequent itemset mining is an important operation to return all itemsets in the transaction table, which occur as a subset of at least a specified fraction of the transactions. The existing algorithms cannot compute frequent itemsets on massive data efficiently, since they either require multiple-pass scans on the table, or construct complex data s...
Data in the real world is often dirty. Inconsistency is an important kind of dirty data. Before repairing inconsistency, we need to detect them first. The time complexities of current inconsistency detection algorithms are super-linear to the size of data and not suitable for big data. For inconsistency detection for big data, we develop an algorit...
Broadcasting is a fundamental function for disseminating messages in multihop wireless networks. Minimum-Transmission Broadcasting (MTB) problem aims to find a broadcast schedule with minimum number of transmissions. Previous works on MTB in duty-cycled networks exploit a rigid assumption that nodes have only active time slot per working cycle. In...
The improvement of energy utilization efficiency has been an important research subject of EH-WSN. The existing studies usually use the energy model to analyze the network. As the harvesting and consumption of energy is affected by the real-time voltage, the energy model is not suitable for the EH-WSN that varies greatly with voltage. Notably, most...
As an important application of smart home, the smart keys, which can record the locking information of users, are quite useful in our daily lives and guarantee the security of our houses and properties. The existing techniques for supporting smart keys either require to change the locks or need a large amount of sensory data to build a complex mode...
In actual applications, aggregation is an important operation to return statistical characterizations of subset of the data set. On massive data, approximate aggregation often is preferable for its better timeliness and responsiveness. This paper focuses on deterministic approximate aggregation to return running aggregate within progressive determi...
Skyline query retrieves a set of skyline points which are not dominated by any other point and has attracted wide attention in database community. Recently, an important variant G-Skyline is developed. It aims to return optimal groups of points. However, when data dimensionality is high, G-Skyline result has too many groups, which makes that users...
Modern applications and services leveraged by interactive cyberphysical systems (CPS) are providing significant convenience to our daily life in various aspects at present. Clients submit their requests including query contents to CPS servers to enjoy diverse services such as health care, automatic driving, and location-based services. However, pri...