Shaojie Tang

Shaojie Tang
University of Texas at Dallas | UTD · Department of Computer Science

About

278
Publications
20,264
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,646
Citations
Citations since 2016
112 Research Items
4558 Citations
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800

Publications

Publications (278)
Preprint
The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired t...
Preprint
Full-text available
To meet the practical requirements of low latency, low cost, and good privacy in online intelligent services, more and more deep learning models are offloaded from the cloud to mobile devices. To further deal with cross-device data heterogeneity, the offloaded models normally need to be fine-tuned with each individual user's local samples before be...
Article
To control the rapid spread of COVID-19, we consider deploying a set of UAVs to form a quarantine barrier such that anyone crossing the barrier can be detected. We use a charging pile to recharge UAVs. The problem is scheduling UAVs to cover the barrier, and, for any scheduling strategy, estimating the minimum number of UAVs needed to cover the bar...
Article
System capacity is an important metric for evaluating the performance of edge-caching networks. In most previous studies, the caching content was assumed to remain unchanged in a base station. Hence, the requested probability for each content was treated as stable; In fact, both the number of content and users can change overtime, which impacts the...
Article
Full-text available
In this paper, we first study the problem of Correlation-aware Task computation offloading (CoTask) in mobile edge computing. Specifically, considering the correlation among multiple computation tasks, we study how to determine a joint task offloading decision and resource allocation strategy under the constraints of feasible offloading decisions a...
Preprint
Full-text available
To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing...
Article
To build a secure wireless networking system, it is essential that the cryptographic key is known only to the two (or more) communicating parties. Existing key extraction schemes put the devices into physical proximity and utilize the common inherent randomness between the devices to agree on a secret key, but they often rely on specialized hardwar...
Article
User profiling refers to inferring people’s attributes of interest ( AoIs ) like gender and occupation, which enables various applications ranging from personalized services to collective analyses. Massive nonlinguistic audio data brings a novel opportunity for user profiling due to the prevalence of studying spontaneous face-to-face communication....
Preprint
Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device tr...
Article
More and more IoT data is being traded online in cloud-based data marketplaces due to the fast-growing market demand. Within the current data selling mechanisms, data consumers have difficulties in making purchasing decisions due to uncertain data quality and inflexible pricing interface. To resolve them, potential solutions could be to launch data...
Preprint
Full-text available
We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed...
Preprint
Full-text available
Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources. A critical issues is to evaluate individual users' contributions so that (1) users' effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can...
Article
We study the problem of Resilient Service Provisioning for Edge computing (RSPE), i.e., how to determine a service placement strategy to maximize the expected overall utility by service provisioning, in the presence of uncertain service failures. RSPE is extremely challenging to tackle, because the explicit expression of its objective function is d...
Article
Federated learning allows mobile clients to jointly train a global model without sending their private data to a central server. Extensive works have studied the performance guarantee of the global model, however, it is still unclear how each individual client influences the collaborative training process. In this work, we defined a new notion, cal...
Preprint
Full-text available
Federated learning allows mobile clients to jointly train a global model without sending their private data to a central server. Despite that extensive works have studied the performance guarantee of the global model, it is still unclear how each individual client influences the collaborative training process. In this work, we defined a novel notio...
Article
In this paper, we study the problem of R obust offloading sch E duling for mob I le edge computi N g (REIN), i.e., in the presence of uncertain offloading failures, how to determine an offloading schedule to minimize the overall latency of all computation-intensive tasks. We mathematically formulate the problem in the form of min-max robust op...
Article
The ubiquitous needs for extracting insights from data are driving the emergence of service providers to offer predictions given the inputs from customers. During this process, it is important and highly nontrivial for the service providers to generate proofs of honest predictions without leaking the key parameters of their trained models. In addit...
Article
Due to the benefits of small latency, low energy consumption and increased data rate, device-to-device (D2D) communication is recognized as one of the promising techniques in the 5G era. However, the distributed nature of D2D communication makes it non-trivial to generate symmetric keys for the involving parties. Many efforts have been devoted to d...
Article
The society's insatiable appetites for personal data are driving the emergence of data markets, allowing data consumers to launch customized queries over the datasets collected by a data broker from data owners. In this paper, we study how the data broker can maximize its cumulative revenue by posting reasonable prices for sequential queries. We th...
Article
The number of Internet of Things is growing exponentially, among which the ZigBee devices are being widely deployed, incurring severe collision problem in ZigBee networks. Instead of collision avoidance or packet retransmissions which introduce extra time/energy overhead, existing methods try to decompose multi-packet collision directly. For exampl...
Article
In mobile crowdsensing, finding the best match between tasks and users is crucial to ensure both the quality and effectiveness of a crowdsensing system. Existing works usually assume a centralized task assignment by the crowdsensing platform, without addressing the need of fine-grained personalized task matching. In this paper, we argue that it is...
Article
We propose the first budget-limited multi-armed bandit (BMAB) algorithm subject to a union of matroid constraints in arm pulling, while at the same time achieving differential privacy. Our model generalizes the arm-pulling models studied in prior BMAB schemes, and it can be used to address many practical problems such as network backbone constructi...
Preprint
We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client's training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the co...
Preprint
Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training p...
Preprint
The society's insatiable appetites for personal data are driving the emergency of data markets, allowing data consumers to launch customized queries over the datasets collected by a data broker from data owners. In this paper, we study how the data broker can maximize her cumulative revenue by posting reasonable prices for sequential queries. We th...
Article
Predicting the popularity of a single tweet is useful for both users and enterprises. However, adopting existing topic or event prediction models cannot obtain satisfactory results. The reason is that one topic or event that consists of multiple tweets, has more features and characteristics than a single tweet. In this article, we propose two varia...
Preprint
Full-text available
Federated learning was proposed with an intriguing vision of achieving collaborative machine learning among numerous clients without uploading their private data to a cloud server. However, the conventional framework requires each client to leverage the full model for learning, which can be prohibitively inefficient for resource-constrained clients...
Article
This paper studies the capacity of edge caching systems. Capacity is analyzed from the perspctive of node mobility, caching, content popularity, etc., neglecting the influence of edge node cooperation on system performance. However, cooperation among edge nodes has been shown to substantially improve system performance at the expense of a cooperati...
Article
With the commoditization of personal privacy, pricing private data has become an intriguing problem. In this paper, we study noisy aggregate statistics trading from the perspective of a data broker in data markets. We thus propose ERATO, which enables aggr E gate statistics p R icing over priv AT e c O rrelated data. On one hand, ERATO guarante...
Conference Paper
To build a secure wireless networking system, it is essential that the cryptographic key is known only to the two (or more) communicating parties. Existing key extraction schemes put the devices into the physical proximity, and utilize the common inherent randomness between the devices to agree on a secret key, but they often rely on custom devices...
Article
We study the min-cost seed selection problem in online social networks for viral marketing, where the goal is to select a set of seed nodes with the minimum total cost such that the expected number of influenced nodes in the network exceeds a predefined threshold. We propose several algorithms that outperform the previous studies both on the theore...
Article
Emerging techniques such as Wi-Fi direct makes the objective of always-on be true. People can easily chat and share files with nearby friends even without AP (Access Point) or cellular coverage. In this paper, we focus on the channel efficiency issue of AP-free Wi-Fi networks, which can be easily constructed in the subway, in a high-speed railway,...
Article
Multi-channel data broadcast attracts increasingly focus in recent years. For the complex and large multimedia data like images, audios, and videos etc, multi-channel data broadcast is a promising approach to mitigate various limitations of data dissemination in mobile environment, such as narrow bandwidth, unreliable connections, and battery limit...
Article
As social networks become a major source of information, predicting the outcome of information diffusion has appeared intriguing to both researchers and practitioners. By organizing and categorizing the joint efforts of numerous studies on popularity prediction, this article presents a hierarchical taxonomy and helps to establish a systematic overv...
Article
Although data has become an important kind of commercial goods, there are few appropriate online platforms to facilitate the trading of mobile crowd-sensed data so far. In this paper, we present the first architecture of mobile crowd-sensed data market, and conduct an in-depth study of the design problem of online data pricing and reward sharing. T...
Article
In this paper, we consider the scenario in which a mobile charger (MC) periodically travels within a sensor network to recharge the sensors wirelessly, and design charging and scheduling schemes to maximize the Quality of Monitoring (QoM) for stochastic events, which arrive and departure according to known distributions that can be modeled as stoch...
Article
Crowdsensing has been well recognized as a promising approach to enable large scale urban data collection. In a typical crowdsensing system, the task owner usually needs to provide incentives to the users (say participants) to encourage their participation. Among existing incentive mechanisms, posted pricing has been widely adopted because it is ea...
Preprint
In mobile crowdsensing, finding the best match between tasks and users is crucial to ensure both the quality and effectiveness of a crowdsensing system. Existing works usually assume a centralized task assignment by the crowdsensing platform, without addressing the need of fine-grained personalized task matching. In this paper, we argue that it is...
Article
Full-text available
Recently, the proliferation of event-based social services has made it possible for organizing personalized offline events through the users' information shared online. In this paper, we study the budget-constrained influential social event organization problem, where the goal is to select a group of influential users with required features to orga...
Article
Caching content on the edge of a network can effectively localize traffic, reduce network latency and improve network throughput. In this paper, we propose an a posteriori caching mechanism rather than the mainstream apriority theory, in which the content placement strategy is determined based on the identical distribution of content popularity and...
Article
Auctions are believed to be effective methods to solve the problem of wireless spectrum allocation. Existing spectrum auction mechanisms are all centralized and suffer from several critical drawbacks of the centralized systems, which motivates the design of distributed spectrum auction mechanisms. However, extending a centralized spectrum auction t...
Conference Paper
With the commoditization of personal privacy, pricing private data has become an intriguing problem. In this paper, we study noisy aggregate statistics trading from the perspective of a data broker in data markets. We thus propose ERATO, which enables aggrEgate statistics pRicing over privATe cOrrelated data. On one hand, ERATO guarantees arbitrage...
Conference Paper
Full-text available
Viral marketing through online social networks (OSNs) has aroused great interests in the literature. However, the fundamental problem of how to optimize the "pure gravy" of a marketing strategy through influence propagation in OSNs still remains largely open. In this paper, we consider a practical setting where the "seed nodes" in an OSN can only b...
Article
Due to the simplicity of implementation, userinitiated WiFi offloading becomes more and more popular, and naturally the benefits of users become the main optimization goal. We notice the inter-contact and intra-contact durations could be uncertain in reality by reason of the user mobility and network dynamics. The two uncertain durations can cause...
Article
Full-text available
Detecting shopping groups is gaining popularity as it enables various applications ranging from marketing to advertising. Existing methods exploit WiFi probe requests to detect shopping groups by identifying co-located customers. However, the probe request is prone to suffer from device heterogeneity which might pose a severe data sparseness proble...
Article
Full-text available
The opportunistic data collection paradigm leverages human mobility to improve sensing coverage and data transmission for collecting data from a number of Points of Interest scattered across a large sensing field, enabling many large-scale mobile crowd sensing applications at lower cost. Sensing delay and transmission delay are two critical Quality...
Article
Connected dominating set (CDS) problem has been extensively studied in the literature due to its applications in many domains, including computer science and operations research. For example, CDS has been recommended to serve as a virtual backbone in wireless sensor networks (WSNs). Since sensor nodes in WSNs are prone to failures, it is important...
Article
Finding a connected dominating set (CDS) in a given graph is a fundamental problem and has been studied intensively for a long time because of its application in computer science and operations research, e.g., connected facility location and wireless networks. In some cases, fault-tolerance is desirable. Taking wireless networks as an example, sinc...
Article
Cloud-assisted image services are widely used for various applications. Due to the high computational complexity of existing image encryption techniques, privacy protection becomes extremely challenging for resource-constrained smart devices. We propose eCIS, a cloud-assisted image service where compression and encryption are jointly used for image...
Article
This paper studies approximation algorithm for the degree-balanced spanning tree (DBST) problem. Given a graph G = (V, E), the goal is to find a spanning tree T such that Sigma v is an element of V deg(T)(v)(2) is minimized, where deg T (v) denotes the degree of node v in tree T. The idea of taking squares on node degrees is to manifest the role of...
Article
http://ieeexplore.ieee.org/document/8015142/ To save energy and alleviate interference, connected dominating set (CDS) was proposed to serve as a virtual backbone of wireless sensor networks (WSNs). Because sensor nodes may fail due to accidental damages or energy depletion, it is desirable to construct a fault tolerant virtual backbone with high r...
Article
We study the min-cost seed selection problem in online social networks, where the goal is to select a set of seed nodes with the minimum total cost such that the expected number of influenced nodes in the network exceeds a predefined threshold. We propose several algorithms that outperform the previous studies both on the theoretical approximation...
Article
Full-text available
In recent years, cooperative communication becomes a promising technology to improve the spatial diversity for the future mobile network. Under this communication paradigm, both relay assignment and power allocation will greatly impact the network performance. However, since each device may be selfish, a significant challenge is to make the joint r...
Article
Full-text available
Following the trend of data trading and data publishing, many online social networks have enabled potentially sensitive data to be exchanged or shared on the web. As a result, users' privacy could be exposed to malicious third parties since they are extremely vulnerable to de-anonymization attacks, i.e., the attacker links the anonymous nodes in th...
Article
Full-text available
Home owners are typically charged differently when they consume power at different periods within a day. Specifically, they are charged more during peak periods. Thus, in this paper, we explore how scheduling algorithms can be designed to minimize the peak energy consumption of a group of homes served by the same substation. We assume that a set of...
Conference Paper
With the rapid growth of e-commerce and World Wide Web, internet advertising revenue has surpassed broadcast revenue very recently. As online advertising has become a major source of revenue for online publishers, such as Google and Amazon, one problem facing them is to optimize the ads selection and allocation in order to maximize their revenue. A...
Conference Paper
Although influence maximization problem has been extensively studied over the past ten years, majority of existing work adopt one of the following models: full-feedback model or zero-feedback model. In the zero-feedback model, we have to commit the seed users all at once in advance, this strategy is also known as non-adaptive policy. In the full-fe...
Article
In many domain-specific monitoring applications of wireless sensor networks (WSNs), such as structural health monitoring, volcano tomography and machine diagnosis, the raw data in WSNs are required to be losslessly gathered to the sink where a specialized centralized algorithm is then executed to extract some global features or model parameters. To...