Dongsheng Li's research while affiliated with National University of Defense Technology and other places

Publications (48)

Chapter
Face-to-face communication leads to better interactions between speakers than text-to-text conversations since the speakers can capture both textual and visual signals. Image-grounded emotional response generation (IgERG) tasks requires chatbots to generate a response with the understanding of both textual contexts and speakers’ emotions in visual...
Article
Relation classification between entities is a fundamental problem in knowledge extraction. It aims at determining if a semantic relation holds between a pair of entities based on textual descriptions. In general, the training data for each relation is limited. Distant supervision has thus been widely used to generate abundant weakly labeled data fo...
Article
Full-text available
With the development of online social media, the fabrication and dissemination of rumors are much easier than before. As a result, automatic rumor detection becomes more urgent for internet governors, and has received great interest from the AI community. Even though many initial successes have been achieved in terms of detection accuracy and timel...
Article
Full-text available
Text generation from abstract meaning representation is a fundamental task in natural language generation. An interesting challenge is that distant context could influence the surface realization for each node. In the previous encoder-decoder based approaches, graph neural networks have been commonly used to encode abstract meaning representation g...
Chapter
Full-text available
There are millions of parameters and huge computational power consumption behind the outstanding performance of pre-trained language models in natural language processing tasks. Knowledge distillation is considered as a compression strategy to address this problem. However, previous works have the following shortcomings: (i) distill partial transfo...
Article
Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs record planning in advance for text realization. However, they cannot revise plans to cope with unex...
Preprint
Full-text available
Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the ef...
Chapter
Graph embedding is a crucial method to produce node features that can be used for various machine learning tasks. Because of the large number of embedded parameters in large graphs, a single machine cannot load the entire graph into GPUs at once, so a partitioning strategy is required. However, there are some problems with partitioning strategies....
Chapter
Document-level relation extraction is a challenging task in Natural Language Processing, which extracts relations expressed with one or multiple sentences. It plays an important role in data mining and information retrieval. The key challenge comes from the indirect relations expressed across sentences. Graph-based neural networks have been proved...
Preprint
Graph pooling that summaries the information in a large graph into a compact form is essential in hierarchical graph representation learning. Existing graph pooling methods either suffer from high computational complexity or cannot capture the global dependencies between graphs before and after pooling. To address the problems of existing graph poo...
Preprint
Full-text available
Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the upda...
Preprint
Full-text available
The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decent...
Preprint
Full-text available
Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have been proposed to select a subset of reliable instances for neural model training, but they still suffer from no...
Article
Full-text available
Subspace clustering has been widely applied to detect meaningful clusters in high-dimensional data spaces. And the sparse subspace clustering (SSC) obtains superior clustering performance by solving a relaxed ℓ0-minimization problem with ℓ1-norm. Although the use of ℓ1-norm instead of the ℓ0 one can make the object function convex, it causes large...
Preprint
This paper revisits the celebrated temporal difference (TD) learning algorithm for the policy evaluation in reinforcement learning. Typically, the performance of the plain-vanilla TD algorithm is sensitive to the choice of stepsizes. Oftentimes, TD suffers from slow convergence. Motivated by the tight connection between the TD learning algorithm an...
Preprint
It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations. Most existing methods tend to solve this problem by using a two-phase learning procedure, i.e., multiple instance learning detector followed by a fully supervised learning detect...
Preprint
Full-text available
Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized stochastic gradient methods where the random samples are taken along the trajectory of a Markov chain. This s...
Article
Full-text available
Taxi demand prediction is one of the key factors in making online taxi hailing services more successful and more popular. Accurate taxi demand prediction can bring various advantages including, but not limited to, enhancing user experience, increasing taxi utilization, and optimizing traffic efficiency. However, the task is challenging because of c...
Article
Full-text available
Person re-identification has become increasing popular because of its widely application in computer vision. In this paper, we propose a novel, simple and efficient person re-id network called MPLFN. The network combines two tasks: the classification task and the metric learning task. In the classification task, we uniformly partition N feature par...
Chapter
Full-text available
Learning topic information from large-scale unstructured text has attracted extensive attention from both the academia and industry. Topic models, such as LDA and its variants, are a popular machine learning technique to discover such latent structure. Among them, online variational hierarchical Dirichlet process (onlineHDP) is a promising candidat...
Conference Paper
Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is...
Preprint
Full-text available
Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is...
Article
Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a...
Preprint
Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. In the network, Information Fusion M...
Preprint
Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a...
Preprint
In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing. We solve this problem by the penalty method and propose the iteratively reweighted alternating minimization algorithm. To speed up the algorithm, we also apply the continuation strategy to the penalty parameter. A c...
Preprint
Full-text available
In this paper, we revisit the convergence of the Heavy-ball method, and present improved convergence complexity results in the convex setting. We provide the first non-ergodic O(1/k) rate result of the Heavy-ball algorithm with constant step size for coercive objective functions. For objective functions satisfying a relaxed strongly convex conditio...
Chapter
Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a...
Preprint
Support vector machines (SVMs) with sparsity-inducing nonconvex penalties have received considerable attentions for the characteristics of automatic classification and variable selection. However, it is quite challenging to solve the nonconvex penalized SVMs due to their nondifferentiability, nonsmoothness and nonconvexity. In this paper, we propos...
Article
User Generated Content (UGC) sites like YouTube are nowadays entertaining over a billion people. Identifying popular contents is essential for these giant UGC sites as they allow users to request contents from a potentially unlimited selection in an asynchronous fashion. In this work, we conduct an analysis on the popularity prediction problem in U...
Article
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. How...
Article
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. How...
Article
Object detection is an import task of computer vision.A variety of methods have been proposed,but methods using the weak labels still do not have a satisfactory result.In this paper,we propose a new framework that using the weakly supervised method's output as the pseudo-strong labels to train a strongly supervised model.One weakly supervised metho...
Conference Paper
Full-text available
Fingerprint has been widely used in a variety of biometric identification systems. However, with the rapid development of fingerprint identification systems, the amount of fingerprints information stored in systems has been rising sharply, making it challenging to process and store fingerprints efficiently and robustly with traditional stand-alone...
Article
Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOW PROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the commo...

Citations

... Most of them utilize the two-stage approach, i.e., they first generate proposals, and then classify them. To generate proposals, earlier methods adopt the sliding window technique (Shou, Wang, and Chang 2016;Yuan et al. 2016;Shou et al. 2017;Yang et al. 2018;Xiong et al. 2017;Chao et al. 2018), while recent models predict the start and end frames of action (Lin et al. 2018(Lin et al. , 2019(Lin et al. , 2020. Meanwhile, there appear some attempts to leverage graph structural information (Zeng et al. 2019;Xu et al. 2020). ...
... Rumor is defined as unverified information at the time of posting (Qazvinian et al., 2011;Zubiaga et al., 2018;Lu et al., 2022). Malicious rumors that are spread massively on social media have become a threat to mislead the public and cause social panic. ...
... Leaving aside global pooling [2,39,47], we distinguish between two main types of hierarchical approaches. Node drop methods [3,13,19,31,48,50,52] use a learnable scoring function based on message passing representations to assess all nodes and drop the ones with lowest score. The drawback is that we loose information during pooling by dropping completely certain nodes. ...
... Replacement Policies for Disk-based Graph Learning To scale training beyond CPU memory, Marius++ supports disk-based training for GNNs. Disk-based training requires that the graph is split into multiple node partitions [22,29,35]. Across training iterations, a subset of partitions is transferred to CPU memory and mixed CPU-GPU training is performed on training data obtained by the induced subgraph. ...
... higher than other models with insignificant margins. However, the performance degradation caused by removing the three components of distance, entity type, and co-reference embeddings is smaller than the sum of removing one of the components alone, indicating that the model may be overfitted and further model improvement is needed [34]. In addition, Guo et al. proposed a conjoined graph neural network BioGraphSAGE model with structured databases as domain knowledge, which combines bio-semantic features and location features to extract biological entity relationships from the literature [35]. ...
... An instance-selector is utilized to pick out trustable instance, which is trained by reinforcement learning (Qin et al., 2018b;Feng et al., 2018) and adversarial learning (Qin et al., 2018a). The bootstrapping framework (Jia et al., 2019;Li et al., 2020) is also utilized to promote the ability of classification model gradually from a small seed. For influence function, which is commonly used in robust statistics (Cook and Weisberg, 1982), Koh and Liang (2017) introduce it in machine learning area. ...
... Recent years have seen a noticeable increase in research efforts to elevate the annotation burden in object detection. Many such efforts have focused on weak supervision settings, such as using only image class labels [6], [7], [8], scribble [9] and click supervision [10]. These methods still require some form of accurate labeling. ...
... Due to lack of object bounding box position supervision, most current WSOD methods [14][15][16][17][18][19][20][21][22][23][24][25][26][27] use multiple instance learning (MIL) [28] to mine object instances from pregenerated proposals, and treat them as pseudo instance-level annotations to train weakly supervised detectors. However, these methods only focus on a single image to learn object representations without considering the internal relevance of various object instances across images. ...
... The most representative ones are sparse subspace clustering [10] (SSC, pursuing a sparse coefficient matrix) and low-rank subspace clustering [11] (LRR, pursuing a low-rank coefficient matrix). Thereafter, many extensions have been developed to improve these methods, such as sparse subspace clustering algorithm based on a nonconvex modeling formulation [12] and sketchbased subspace clustering [13]. In each iteration, we stack all the self-representation matrices Z (v) V v=1 into a third-order tensor Z, and then rotate it to a N × V × N tensor. ...
... Such instance-level annotation consumes a lot of manpower and material resources, which greatly reduces the labeling efficiency for the detection task. To reduce the cost of labeling training data, researchers have developed weakly supervised object detection (WSOD) [35][36][37][38][39][40][41][42][43][44][45][46][47] to train an object detector with only image-level annotation (i.e., image tags, indicating whether there are some object categories in an image). ...