Yuqi Huo

Yuqi Huo
Renmin University of China | RUC · Department of Computer Science and Technology

About

21
Publications
4,389
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
283
Citations

Publications

Publications (21)
Preprint
Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is...
Article
Full-text available
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-traine...
Preprint
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high inference efficiency have also shown promising performance, however, they only consider instance-level alignment betwe...
Preprint
Full-text available
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human including perception, memory, and reasoning. Although tremendous success has been achieved in various AI research fields (e.g., computer vision and natural language processing), the majority of existing works only focus on acquiring single cognit...
Conference Paper
Full-text available
This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned by detecting spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis...
Article
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledg...
Preprint
Full-text available
This work explores how to design a single neural network that is capable of adapting to multiple heterogeneous tasks of computer vision, such as image segmentation, 3D detection, and video recognition. This goal is challenging because network architecture designs in different tasks are inconsistent. We solve this challenge by proposing Network Codi...
Preprint
Full-text available
Multi-modal pre-training models have been intensively explored to bridge vision and language in recent years. However, most of them explicitly model the cross-modal interaction between image-text pairs, by assuming that there exists strong semantic correlation between the text and image modalities. Since this strong assumption is often invalid in r...
Chapter
Most existing action recognition models are large convolutional neural networks that work only with raw RGB frames as input. However, practical applications require lightweight models that directly process compressed videos. In this work, for the first time, such a model is developed, which is lightweight enough to run in real-time on embedded AI d...
Preprint
Full-text available
3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically tran...
Preprint
Full-text available
Video action recognition, which is topical in computer vision and video analysis, aims to allocate a short video clip to a pre-defined category such as brushing hair or climbing stairs. Recent works focus on action recognition with deep neural networks that achieve state-of-the-art results in need of high-performance platforms. Despite the fast dev...
Conference Paper
Full-text available
Fine-grained image classification and retrieval become topical in both computer vision and information retrieval. In real-life scenarios, fine-grained tasks tend to appear along with coarse-grained tasks when the observed object is coming closer. However, in previous works, the combination of fine-grained and coarse-grained tasks was often ignored....
Chapter
Full-text available
Zero-shot learning (ZSL) can be regarded as transfer learning from seen classes to unseen ones so that the later can be recognized without any training samples. Its main difficulty lies in that there often exists a large domain gap between the seen and unseen class domains. Inspired by the fact that an unseen class is not strictly `zero-shot' (thus...
Chapter
Full-text available
We present a novel generative adversarial network (GAN) model, called InsightGAN, for drug abuse detection. Our model is inspired by two closely related works on machine learning for healthcare applications: 1) drug abuse detection has been solved by machine learning with plentiful data from social media (where face pictures can be easily obtained)...
Conference Paper
Full-text available
We propose a novel deep learning approach, called DeepInsight, to quick diagnosis of autism spectrum disorder (ASD) and major depressive disorder (MDD). Our approach is motivated by recent advances in artificial intelligence (AI) for healthcare. In particular, researchers have found distinct differences between facial characteristics of children wi...
Article
Context We observed a special type of bug reopen that has no direct impact on the user experience or the normal operation of the system being developed. We refer to these as non-negative bug reopens. Objective Non-negative bug reopens are novel and somewhat contradictory to popular conceptions. Therefore, we thoroughly explored these phenomena in...

Network

Cited By