July 2024
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
July 2024
·
3 Reads
July 2023
·
13 Reads
November 2021
·
20 Reads
·
17 Citations
IEEE Transactions on Circuits and Systems for Video Technology
Scene graph generation aims to detect visual entities and relationships between them from an image. The object-level visual information is of vital importance for predicting accurate relationships. However, most existing methods essentially encode visual information with coarse supervised information, since they regard different relationships as mutually exclusive semantics with equal-distance labels by taking cross-entropy function as the main training loss. Intuitively, different relationship semantics naturally have their own similarity and dissimilarity with different level distances, i.e. , the topological information of relationship semantics. It can serve as an inspiring hint to aid learning to grasp the key related visual information. Accordingly, we propose a Semantically Similarity-wise Dual-branch Network (SSDN) which introduces topological information of relationship semantics as extra supervision to aid learning extracting and encoding relationship-related visual information. To avoid possible chaotic feature learning and enable the introduced knowledge to be better absorbed during inference, we design a dual-branch framework consisting of an auxiliary branch and an inference branch. The topological information extracted from the groundtruth is introduced at the front end of the auxiliary branch which then generates a soft embedding to be propagated to the inference branch in a knowledge distillation manner. Extensive experiments show that our model averagely outperforms state-of-the-art approaches on benchmark Visual Genome and VRD significantly, which demonstrates its effectiveness and superiority.
... Existing works on VRD can be grouped into twostage methods and one-stage (end-to-end) methods. Two-stage methods [32], [10], [9], [8], [31] first apply an off-the-shelf object detector to detect object bounding boxes, and then exhaustively combine them as subject-object pairs which are fed to a subsequent module for interaction classification. Onestage methods [1], [2], [4], [6], [5] detect subject-object bounding boxes and their visual relations simultaneously, typically in a DETR [7]-based paradigm. ...
November 2021
IEEE Transactions on Circuits and Systems for Video Technology