Junwei Liang

Junwei Liang
Carnegie Mellon University | CMU · Language Technologies Institute

Doctor of Philosophy
Researcher at Tencent Youtu Lab

About

33
Publications
3,447
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
669
Citations
Introduction
Greetings! I obtained my Ph.D. from Carnegie Mellon University in 2021. My research interests are large-scale computer vision and language processing. I am also doing research/building tools for event reconstruction from unorganized social media videos for public safety projects. More on my academic website: https://junweiliang.github.io/

Publications

Publications (33)
Preprint
Full-text available
This paper studies how to introduce viewpoint-invariant feature representations that can help action recognition and detection. Although we have witnessed great progress of action recognition in the past decade, it remains challenging yet interesting how to efficiently model the geometric variations in large scale datasets. This paper proposes a no...
Chapter
This paper studies the problem of predicting future trajectories of people in unseen cameras of novel scenarios and views. We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras. We propose a novel approach to learn robust repr...
Preprint
Full-text available
In this paper, we study the problem of efficiently assessing building damage after natural disasters like hurricanes, floods or fires, through aerial video analysis. We make two main contributions. The first contribution is a new dataset, consisting of user-generated aerial videos from social media with annotations of instance-level building damage...
Preprint
Full-text available
This paper focuses on the problem of predicting future trajectories of people in unseen scenarios and camera views. We propose a method to efficiently utilize multi-view 3D simulation data for training. Our approach finds the hardest camera view to mix up with adversarial data from the original camera view in training, thus enabling the model to le...
Preprint
Full-text available
This paper studies the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes. We make two main contributions. The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators...
Conference Paper
Full-text available
Nowadays a huge number of user-generated videos are uploaded to social media every second, capturing glimpses of events all over the world. These videos provide important and useful information for reconstructing events like the Las Vegas Shooting in 2017. In this paper, we describe a system that can localize the shooter location only based on a co...
Conference Paper
Full-text available
Nowadays a huge number of user-generated videosare uploaded to social media every second, capturing glimpsesof events all over the world. These videos in the wild provideimportant and useful information for reconstructing events likethe Las Vegas Shooting in 2017. In this paper, we describe asystem that can localize the shooter location only based...
Preprint
Full-text available
Nowadays a huge number of user-generated videos are uploaded to social media every second, capturing glimpses of events all over the world. These videos provide important and useful information for reconstructing the events. In this paper, we describe the DAISY system, enabled by established machine learning techniques and physics models, that can...
Preprint
Full-text available
Deciphering human behaviors to predict their future paths/trajectories and what they would do from videos is important in many applications. Motivated by this idea, this paper studies predicting a pedestrian's future path jointly with future activities. We propose an end-to-end, multi-task learning system utilizing rich visual features about the hu...
Article
Full-text available
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photo albums, we have to look at whole collections with sequences of photos. This paper proposes a new multim...
Preprint
Full-text available
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions fro...
Conference Paper
Full-text available
Developing an efficient and effective social media monitoring system has become one of the important steps towards improved public safety. With the explosive availability of user-generated content documenting most conflicts and human rights abuses around the world, analysts and first-responders increasingly find themselves overwhelmed with massive...
Article
Full-text available
We tackle the problem of learning concept classifiers from videos on the web without using manually labeled data. Although metadata attached to videos (e.g., video titles, descriptions) can be of help collecting training data for the target concept, the collected data is often very noisy. The main challenge is therefore how to select good examples...
Preprint
Full-text available
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection. Towards solving the task, we 1) present the MemexQA dataset, a large, realistic multimodal dataset consisting of real personal photos...
Preprint
Full-text available
Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web are associated with rich but noisy contextual information, such as the title, which provides weak annota...
Conference Paper
Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal inform...
Article
What happened during the Boston Marathon in 2013? Nowadays, at any major event, lots of people take videos and share them on social media. To fully understand exactly what happened in these major events, researchers and analysts often have to examine thousands of these videos manually. To reduce this manual effort, we present an investigative syste...
Article
Given any complicated or specialized video content search query, e.g. ”Batkid (a kid in batman costume)” or ”destroyed buildings”, existing methods require manually labeled data to build detectors for searching. We present a demonstration of an artificial intelligence application, Webly-labeled Learning (WELL) that enables learning of ad-hoc concep...
Conference Paper
The recent advances in image captioning stimulate the research in generating natural language description for visual content, which can be widely applied in many applications such as assisting blind people. Video description generation is a more complex task than image caption. Most works of video description generation focus on visual information...
Conference Paper
Full-text available
In this paper we summarize our experiments in the ImageCLEF 2015 Scalable Concept Image Annotation challenge. The RUC-Tencent team participated in all subtasks: concept detection and localization, and image sentence generation. For concept detection, we experiments with automated approaches to gather high-quality training examples from the Web, in...
Conference Paper
With the increasing use of audio sensors in user generated content (UGC) collections, semantic concept annotation from video soundtracks has become an important research problem. In this paper, we investigate reducing the semantic gap of the traditional data-driven bag-of-audio-words based audio annotation approach by utilizing the large-amount of...
Conference Paper
With the increasing use of audio sensors in user generated content (UGC) collection, semantic concept annotation using audio streams has become an important research problem. Huawei initiates a grand challenge in the International Conference on Multimedia & Expo (ICME) 2014: Huawei Accurate and Fast Mobile Video Annotation Challenge. In this paper,...

Projects

Projects (2)