Dan Guo

Dan Guo
Hefei University of Technology

About

34
Publications
2,639
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
375
Citations

Publications

Publications (34)
Article
Full-text available
In most E-commerce platforms, whether the displayed items trigger the user’s interest largely depends on their most eye-catching multimodal content. Consequently, increasing efforts focus on modeling multimodal user preference, and the pressing paradigm is to incorporate complete multimodal deep features of the items into the recommendation module....
Article
Lipreading is a task of decoding the movement of the speaker’s lip region into text. In recent years, lipreading methods based on deep neural network have attracted widespread attention, and the accuracy has far surpassed that of experienced human lipreaders. The visual differences in some phonemes are extremely subtle and pose a great challenge to...
Article
Sign language translation (SLT) is a challenging weakly supervised task without word-level annotations. An effective method of SLT is to leverage multimodal complementarity and to explore implicit temporal cues. In this work, we propose a graph-based multimodal sequential embedding network (MSeqGraph), in which multiple sequential modalities are de...
Chapter
This chapter covers several research works on sign language recognition (SLR), including isolated word recognition and continuous sentence translation. To solve isolated SLR, an Adaptive-HMM (hidden Markov model) framework (Guo et al., TOMCCAP 14(1):1–18, 2017) is proposed. The method explores the intrinsic properties and complementary relationship...
Preprint
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network ($R^2M$). Unlike complicated and sensitive adversarial le...
Conference Paper
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R2M). Unlike complicated and sensitive adversarial learn...
Preprint
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with sparse contexts and unknown graph structure (relation descriptor), and how to model the underlying context-aware relation inference is cr...
Conference Paper
Most existing CNN-based methods for crowd counting always suffer from large scale variation in objects of interest, leading to density maps of low quality. In this paper, we propose a novel deep model called Dilated-Attention-Deformable ConvNet (DADNet), which consists of two schemes: multi-scale dilated attention and deformable convolutional DME (...
Conference Paper
Continuous sign language recognition task is challenging for the reason that the ordered words have no exact temporal locations in the video. Aiming at this problem, we propose a method based on pseudo-supervised learning. First, we use a 3D residual convolutional network (3D-ResNet) pre-trained on the UCF101 dataset to extract visual features. Sec...
Conference Paper
Video captioning is a challenging problem in neural networks, computer vision, and natural language processing. It aims to translate a given video into a sequence of words which can be understood by humans. The dynamic information in videos and the complexity in linguistic cause the difficulty of this task. This paper proposes a semantic enhanced e...
Article
Vision-based sign language translation (SLT) is a challenging task due to the complicated variations of facial expressions, gestures, and articulated poses involved in sign linguistics. As a weakly supervised sequence-to-sequence learning problem, in SLT there are usually no exact temporal boundaries of actions. To adequately explore temporal hints...
Poster
Full-text available
We propose a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed to convert 2D CNN features to pseudo 3D' features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we impl...
Conference Paper
Full-text available
Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Tempor...
Conference Paper
The sign language translation (SLT) which aims at translating a sign language video into natural language is a weakly supervised task, given that there is no exact mapping relationship between visual actions and textual words in a sentence label. To align the sign language actions and translate them into the respective words automatically, this pap...
Conference Paper
Visual dialog is a challenging task, which involves multi-round semantic transformations between vision and language. This paper aims to address cross-modal semantic correlation for visual dialog. Motivated by that Vg (global vision), Vl (local vision), Q (question) and H (history) have inseparable relevances, the paper proposes a novel Dual Visual...
Article
Full-text available
Mining co-occurrence frequency patterns from multiple sequences is a hot topic in bioinformatics. Many seemingly disorganized constituents repetitively appear under different biological matrices, such as PAM250 and BLOSUM62, which are considered hidden frequent patterns (FPs). A hidden FP with both gap and flexible approximation operations (replace...
Conference Paper
Continuous sign language translation (CSLT) is a weakly supervised problem aiming at translating vision-based videos into natural languages under complicated sign linguistics, where the ordered words in a sentence label have no exact boundary of each sign action in the video. This paper proposes a hybrid deep architecture which consists of a tempor...
Article
Continuous Sign Language Translation (SLT) is a challenging task due to its specific linguistics under sequential gesture variation without word alignment. Current hybrid HMM and CTC (Connectionist temporal classification) based models are proposed to solve frame or word level alignment. They may fail to tackle the cases with messing word order cor...
Article
In sign language recognition (SLR) with multimodal data, a sign word can be represented by multiply features, for which there exist an intrinsic property and a mutually complementary relationship among them. To fully explore those relationships, we propose an online early-late fusion method based on the adaptive Hidden Markov Model (HMM). In terms...
Article
3D reconstruction systems are promoted by developments of both computer hardware and computing technologies. They still remain problems like high expense, low efficiency and inaccuracy. Especially for large-scale scenes, lack of full use of multi-scale depth information will cause blurring and irreal reconstruction results. To solve this problem, w...
Article
For approximate nearest neighbor (ANN) search in many vision-based applications, vector quantization (VQ) is an efficient compact encoding technology. A representative approach of VQ is product quantization (PQ) which quantizes subspaces separately by Cartesian product and achieves high accuracy. But its space decomposition still leads to quantizat...
Article
Large amounts of data have been gathered by techniques such as mobile devices, remote sensing and cameras. With the rapid development of high-definition digital TV and multimedia information systems, motion-compensated frame interpolation (MCFI) has become a widely used tactic for frame rate up-conversion (FRUC) to improve the visual property of vi...
Article
Complex queries are widely used in current Web applications. They express highly specific information needs, but simply aggregating the meanings of primitive visual concepts does not perform well. To facilitate image search of complex queries, we propose a new image reranking scheme based on concept relevance estimation, which consists of Concept-Q...
Conference Paper
This paper focuses on pattern matching with wildcard, gap-length and one-off conditions. It is difficult to achieve optimal solutions. We propose an FNP algorithm based on Free-Node Optimum Pruning. Each Free-Node set is a set of nodes labeled by the same number which appear on different layers in a directed graph structure WON-Net. Compared on bio...
Article
Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support threshold and a gap constraint, we aim to find freq...
Article
Pattern matching with wildcards is a challenging topic in many domains, such as bioinformatics and information retrieval. This paper focuses on the problem with gap-length constraints and the one-off condition (The one-off condition means that each character can be used at most once in all occurrences of a pattern in the sequence). It is difficult...
Conference Paper
In this paper we present a new algorithm to handle the pattern matching problem where the pattern can contain flexible wildcards. Given a sequence S and a pattern P consisting of subpatterns separated by flexible wildcards, the problem is to find P's all occurrences with exact match positions under the one-off condition and length constraints. We p...
Article
Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit...
Conference Paper
Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support threshold and a gap constraint, we aim to find freq...

Network

Cited By