Xiaomei Wang's research while affiliated with Fudan University and other places

Publications (4)

Article
This paper studies the task of image captioning with novel objects, which only exist in testing images. Intrinsically, this task can reflect the generalization ability of models in understanding and captioning the semantic meanings of visual concepts and objects unseen in training set, sharing the similarity to one/zero-shot learning. The critical...
Article
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work wi...

Citations

... Observe that some captions are very informative, Fig.13 (a), whereas others, Fig.13 (b,c), failed to capture the scene context. Considerable efforts have been made in this field aiming to improve such descriptions, as the proposal of Wang [91], which uses a neuro-symbolic representation of the image in the form of an attributed relational graph, to model the object relationships. Furthermore, we could use textual descriptions from distinct machine learning techniques, trained over datasets of different domains of human knowledge, and leverage each individual model's strength through ensemble techniques [92], [93], [94], to improve the prediction process. ...
... A Lot of research in the field of computer vision focusing on image captioning has been done using mixed architectures where part of the network is composed of CNN on the encoder side while part of the networks is the combination of LSTM and GAN on the decoder side. Application of FDM-net (feature deformation meta-network) which is trained over source data and has the capability of learning on object features detected by auxiliary models is discussed in [28]. FDM-net is composed of two major components: (1) feature deformation and (2) scene graph sentence reconstruction. ...
... ZSL methods exploit a prior source of knowledge like attributes [15,39,59], textural features [59,76,77] or other source of information [50] to recognize unseen classes. There exists a list of prominent methods proposed for zero-shot learning [3,4,18,21,37,40,78,85,90,94,106]. ...