Thao Minh Le

Thao Minh Le
Deakin University · Applied Artificial Intelligence Institute

Doctor of Philosophy
Research Fellow

About

20
Publications
3,091
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
81
Citations
Additional affiliations
May 2021 - present
Deakin University
Position
  • PostDoc Position
May 2019 - October 2020
Deakin University
Position
  • Research Assistant
August 2017 - April 2018
Yahoo! JAPAN Research
Position
  • Research Intern
Education
November 2018 - May 2021
Deakin University
Field of study
  • Computer Science
September 2016 - September 2018
Tokyo Institute of Technology
Field of study
  • Computer Science
August 2009 - June 2014
Hanoi University of Science and Technology
Field of study
  • Electronics and Communication Engineering

Publications

Publications (20)
Preprint
Full-text available
The current success of modern visual reasoning systems is arguably attributed to cross-modality attention mechanisms. However, in deliberative reasoning such as in VQA, attention is unconstrained at each step, and thus may serve as a statistical pooling mechanism rather than a semantic operation intended to select information relevant to inference....
Article
Full-text available
Video Question Answering (Video QA) challenges modelers in multiple fronts. Modeling video necessitates building not only spatio-temporal models for the dynamic visual channel but also multimodal structures for associated information channels such as subtitles or audio. Video QA adds at least two more layers of complexity – selecting relevant conte...
Article
Full-text available
Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol like manipulation over objects, their...
Article
Full-text available
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA) problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target...
Preprint
Full-text available
Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their...
Preprint
Full-text available
Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors. The task demands new capabilities to integrate video processing, language understanding, binding abstract linguistic concepts to concrete visual artifacts, and deliberative reasoning over spacetime. Neural networks offer a promising approach to rea...
Preprint
Full-text available
Video QA challenges modelers in multiple fronts. Modeling video necessitates building not only spatio-temporal models for the dynamic visual channel but also multimodal structures for associated information channels such as subtitles or audio. Video QA adds at least two more layers of complexity - selecting relevant content for each channel in the...
Preprint
Full-text available
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA) problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target...
Article
Full-text available
What does it take to design a machine that learns to answer natural questions about a video? A Video QA system must simultaneously understand language, represent visual content over space-time, and iteratively transform these representations in response to lingual content in the query, and finally arriving at a sensible answer. While recent advance...
Article
Full-text available
We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering. Relaxing the common assumption made by current models that the object predicates pre-exist and stay static, passive to the reasoning process, we...
Article
Full-text available
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated str...
Preprint
Full-text available
We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering. Relaxing the common assumption made by current models that the object predicates pre-exist and stay static, passive to the reasoning process, we...
Article
Full-text available
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated str...
Preprint
Full-text available
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated str...
Preprint
Full-text available
What does it take to design a machine that learns to answer natural questions about a video? A Video QA system must simultaneously understand language, represent visual content over space-time, and iteratively transform these representations in response to lingual content in the query, and finally arriving at a sensible answer. While recent advance...
Conference Paper
Full-text available
With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specifie...
Preprint
Full-text available
This paper presents a new framework for human action recognition from 3D skeleton sequences. Previous studies do not fully utilize the temporal relationships between video segments in a human action. Some studies successfully used very deep Convolutional Neural Network (CNN) models but often suffer from the data insufficiency problem. In this study...
Conference Paper
Full-text available
Falls and consequent injuries among the elderly population are major public health problems that often require medical attention. Every year, the world has more than 30% of people aged over 65 fall and most of them could not stand up without assistance promptly. The long laying makes the injury even more seriously. A system of monitoring the status...

Network

Cited By