Haibin Yan's research while affiliated with Beijing University of Posts and Telecommunications and other places

Publications (43)

Article
In this paper, we present a dense hybrid proposal modulation (DHPM) method for lane detection. Most existing methods perform sparse supervision on a subset of high-scoring proposals, while other proposals fail to obtain effective shape and location guidance, resulting in poor overall quality. To address this, we densely modulate all proposals to ge...
Chapter
Finding objects in dense clutter and placing them in specific poses play an important role in robot manipulation in fields like warehousing and logistics, and have a significant influence on the automation of these fields. However, most methods that perform well in simple clutter do not hold up well in dense clutter because of severe stacking and o...
Preprint
Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments. Recent large language models (LLM) can embed rich semantic knowledge for agents in plan generation of complex tasks, while they lack the information about the realistic world and usually yield infeasible ac...
Preprint
In this paper, we present a dense hybrid proposal modulation (DHPM) method for lane detection. Most existing methods perform sparse supervision on a subset of high-scoring proposals, while other proposals fail to obtain effective shape and location guidance, resulting in poor overall quality. To address this, we densely modulate all proposals to ge...
Preprint
Accurately estimating the shape of objects in dense clutters makes important contribution to robotic packing, because the optimal object arrangement requires the robot planner to acquire shape information of all existed objects. However, the objects for packing are usually piled in dense clutters with severe occlusion, and the object shape varies s...
Preprint
Full-text available
Recognizing objects in dense clutter accurately plays an important role to a wide variety of robotic manipulation tasks including grasping, packing, rearranging and many others. However, conventional visual recognition models usually miss objects because of the significant occlusion among instances and causes incorrect prediction due to the visual...
Article
In this paper, we propose a Structure-Aware Fusion Network (SAFNet) for 3D scene understanding. As 2D images present more detailed information while 3D point clouds convey more geometric information, fusing the two complementary data can improve the discriminative ability of the model. Fusion is a very challenging task since 2D and 3D data are esse...
Article
In this paper, we propose a deep relational network which exploits multi-scale information of facial images for kinship verification. Unlike most existing deep learning based facial kinship verification methods which employ convolutional neural networks to extract holistic features, we present a deep model to exploit facial kinship relationship fro...
Article
In this paper, we propose a discriminative sampling method to select most effective negative samples via deep reinforcement learning for kinship verification. Unlike most existing facial kinship verification methods which focus on extracting effective features with the random sampling strategy, we develop a deep reinforcement learning method to sel...
Article
In this paper, we propose a semantic three-stream network (STN) for social relation recognition, which learns discriminative features from facial images directly by exploiting semantic information effectively. Specifically, we employ a semantic augmentation structure to extract enriched semantic features from original face images, where a Siamese n...
Article
In this paper, we present a method for facial kinship verification, which uses an attention network to focus on extracting information of local facial parts. Unlike most existing approaches which use low-level features for verification, we introduce an attention mechanism in the deep network to extract high-level features for face representation. W...
Article
In this paper, we propose a weakly-supervised feature learning method called discriminative compact binary face descriptor (D-CBFD) for facial kinship verification. Unlike most existing kinship verification methods where hand-crafted features are used for face representation, our D-CBFD learns discriminative face representation from a set of weakly...
Chapter
In this chapter, we investigate the problem of video-based kinship verification via human face analysis. While several attempts have been made on facial kinship verification from still images, to our best knowledge, the problem of video-based kinship verification has not been formally addressed in the literature. In this chapter, we first present a...
Chapter
In this chapter, we discuss feature learning techniques for facial kinship verification. We first review two well-known hand-crafted facial descriptors including local binary patterns (LBP) and the Gabor feature. Then, we introduce a compact binary face descriptor (CBFD) method which learns face descriptors directly from raw pixels. Unlike LBP whic...
Chapter
In this chapter, we make some conclusions for the research results of existing facial kinship verification methods. Then, we suggest some possible interesting future direction for facial kinship verification in the next few years.
Chapter
In this chapter, we discuss metric learning techniques for facial kinship verification. We first review several conventional and representative metric learning methods, including principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), information-theoretic metric learning (ITML), side-informati...
Chapter
In this chapter, we first introduce the background of facial kinship verification and then review the state-of-the-art of facial kinship verification. Lastly, we outline the organization of the book.
Article
Facial expression recognition in video has been an important and relatively new topic in human face analysis and attracted growing interests in recent years. Unlike conventional image-based facial expression recognition methods which recognize facial expression category from still images, facial expression recognition in video is more challenging b...
Article
In this paper, we investigate the problem of video-based kinship verification via human face analysis. While several attempts have been made on facial kinship verification from still images, to our knowledge, the problem of video-based kinship verification has not been formally addressed in the literature. In this paper, we make the two contributio...
Book
This book provides the first systematic study of facial kinship verification, a new research topic in biometrics. It presents three key aspects of facial kinship verification: 1) feature learning for kinship verification, 2) metric learning for kinship verification, and 3) video-based kinship verification, and reviews state-of-the-art research find...
Article
Kinship verification is an interesting and challenging problem in human face analysis, which has received increasing interests in computer vision and biometrics in recent years. This paper presents a neighborhood repulsed correlation metric learning (NRCML) method for kinship verification via facial image analysis. Most existing metric learning bas...
Article
In this paper, we propose an activity-based person recognition approach based on discriminative sparse projections (DSP) and ensemble metric learning. Unlike gait recognition where only the walking activity is utilized for human identification, we aim to recognize people from more types of activities such as eating, drinking, running, and so on. Ou...
Article
In this paper, we propose a transfer subspace learning approach cross-dataset facial expression recognition. To our best knowledge, this problem has been seldom addressed in the literature. While many facial expression recognition methods have been proposed in recent years, most of them assume that face images in the training and testing sets are c...
Article
In this paper, we propose a biased subspace learning approach for misalignment-robust facial expression recognition. While a variety of facial expression recognition methods have been proposed in the literature, most of them only work well when face images are well registered and aligned. In many practical applications such as human robot interacti...
Article
Kinship verification using face images (KVFI) is a relatively new and challenging problem in computer vision and biometrics, while kin relationship in psychology has been well studied over the past decades. Recent advances in KVFI have shown that learning an effective similarity metric plays a critical role in the verification problem. However, mos...
Article
Kin relationship has been well investigated in psychology community over the past decades, while kin verification using facial images is relatively new and challenging problem in biometrics society. Recently, it has attracted substantial attention from biometrics society, mainly motivated by the relative characteristics that children generally rese...
Article
In this paper, we propose a new prototype-based discriminative feature learning (PDFL) method for kinship verification. Unlike most previous kinship verification methods which employ low-level hand-crafted descriptors such as local binary pattern and Gabor features for face representation, this paper aims to learn discriminative mid-level features...
Article
This paper presents a Multi-feature Multi-Manifold Learning (M3L) method for single-sample face recognition (SSFR). While numerous face recognition methods have been proposed over the past two decades, most of them suffer a heavy performance drop or even fail to work for the SSFR problem because there are not enough training samples for discriminat...
Conference Paper
In this paper, we propose an activity-based human identification approach using discriminative sparse projections (DSP) and orthogonal ensemble metric learning (OEML). Unlike gait recognition which recognizes person only from his/her walking activity, this study aims to identify people from more general types of human activities such as eating, dri...
Article
In this paper, we propose a new discriminative multimetric learning method for kinship verification via facial image analysis. Given each face image, we first extract multiple features using different face descriptors to characterize face images from different aspects because different feature descriptors can provide complementary information. Then...
Article
In this paper, we propose a new cost-sensitive ordinal regression (CSOR) approach for fully automatic facial beauty assessment. While there have been several facial beauty assessment methods in the literature, most of them require an accurate set of manual landmarks and are not fully automatic. In many real-world applications, face images are usual...
Article
Full-text available
For human–robot interaction (HRI), perception is one of the most important capabilities. This paper reviews several widely used perception methods of HRI in social robots. Specifically, we investigate general perception tasks crucial for HRI, such as where the objects are located in the rooms, what objects are in the scene, and how they interact wi...
Conference Paper
In this paper, we introduce our designed robotic nanny called Dorothy Robotubby to play with and take care of a child in case his/her parent or caregiver is absent. There are two main user interfaces in our robotic system: local control-based and remote control-based. Local control-based interface is developed for a child to control the robot direc...
Conference Paper
This paper investigates the problem of cross-dataset facial expression recognition. To the best of our knowledge, this problem has not been formally addressed in the literature. Conventional facial expression recognition methods assume expression images in the training and testing sets are collected under the same condition such that they are indep...
Conference Paper
We investigate in this paper the problem of misalignment-robust facial expression recognition. To the best of our knowledge, this problem has not been formally addressed in the literature. Most existing facial expression recognition methods, however, can only work well when face images are well-aligned. In many real world applications such as human...

Citations

... Language model grounding for embodied tasks: An embodied agent not only requires active exploration [33], manipulation [34], and scene perception [35,36] as well as embodied task planning ability. Embodied task planning aims to generate executable action steps in the given environments, where action plans are generated from grounded LLMs by receiving information from the surrounding environments [37,38,39] or prompt engineering [40]. ...
... They achieved 78.5% and 89.7% verification accuracy on KFW-I and KFW-II datasets, respectively. Yan and Song [50] presented a deep model to verify facial kin relations from local regions. Their approach used two CNNs that shared parameters to extract various feature scales. ...
... Their experimental results on KFW-I and KFW-II datasets were 77.50% and 88.4%, respectively. Song and Yan [49] introduced a data augmentation approach called KinMix for verifying kinship. They produced samples at the feature level rather than the original image level for data augmentation, in contrast to most existing data augmentation approaches. ...
... Wang and Yan [51] presented a kinship verification network (KVN) method for verifying kinship verification using deep CN and reinforcement learning. They aimed to show the negative sample screening importance in their method for kinship verification. ...
... Following that, kinship research received a wide range of attention. In 2015, applied deep learning approaches in kinship verification, which also brought the FKV into the deep learning era (Dahan and Keller, 2020;Li et al., 2016Li et al., , 2020Yan and Wang, 2019). Meanwhile, the emergence of the large-scale kinship dataset FIW promoted the further development of the field. ...
... We communicate interpersonally face-to-face, by phone, text, social media, etc. Recent research in computer vision attempted to understand and recognize human relationships from face-to-face nonverbal communication, where people constantly interact through different attributes such as emotions, head position, gender, age and facial expressions [35,49,52,53]. ...
... It is reliable and accurate but it is not always feasible or easy to implement due to various reasons. Kinship Verification through facial images is a growing new field in computer vision [2] [3] [4] [5], it pulls the kinship verification process out of biochemistry labs to computer vision due to the fact that facial images can be useful to verify kin relations. ...
... Lu et al. [20,37] proposed a series of metric learning methods [33,18,12] aming at pulling the feature of intraclass samples as close as possible and repulsing the interclass as far as possible. Other handcrafted featurebased methods can be found in [32,28,30,38,6,19,33,8]. With the explosion and development of neural networks, the deep learning-based methods [36,34] can make use of the pre-trained neural networks in an off-the-shelf way and enjoy the advantages of deep feature representation. ...
... Indeed, the basic principle of the representing image locally relies upon various strategies: grid of blocks, different facial parts and set of landmarks [39]. For instance, [12,14,16,75,92,93] adopted the grid of non-overlapping blocks strategy. Alternatively, [30,33,61,69,87,94] suggested another strategy which is different local facial parts, while [91,99] followed the strategy of detection of facial landmarks and interest points. ...
... The combination of facial dynamics and static features is able to exploit both the textural and temporal information present in face videos, improving the description of kin-related face pairs. Lately, the apparition of other works incide on the use of more advanced features [13,26] or on metric learning methods [75,76] to improve the results obtained by simple facial dynamics. ...