Conference Paper

Context-driven Visual Object Recognition based on Knowledge Graphs

To read the full-text of this research, you can request a copy directly from the authors.


Current deep learning methods for object recognition are purely data-driven and require a large number of training samples to achieve good results. Due to their sole dependence on image data, these methods tend to fail when confronted with new environments where even small deviations occur. Human perception, however, has proven to be significantly more robust to such distribution shifts. It is assumed that their ability to deal with unknown scenarios is based on extensive incorporation of contextual knowledge. Context can be based either on object co-occurrences in a scene or on memory of experience. In accordance with the human visual cortex which uses context to form different object representations for a seen image, we propose an approach that enhances deep learning methods by using external contextual knowledge encoded in a knowledge graph. Therefore, we extract different contextual views from a generic knowledge graph, transform the views into vector space and infuse it into a DNN. We conduct a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset. The experimental results provide evidence that the contextual views influence the image representations in the DNN differently and therefore lead to different predictions for the same images. We also show that context helps to strengthen the robustness of object recognition models for out-of-distribution images, usually occurring in transfer learning tasks or real-world scenarios.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Visual object recognition is one of the most fundamental and challenging research topics in the field of computer vision. The research on the neural mechanism of the primates’ recognition function may bring revolutionary breakthroughs in brain-inspired vision. This Review aims to systematically review the recent works on the intersection of computational neuroscience and computer vision. It attempts to investigate the current brain-inspired object recognition models and their underlying visual neural mechanism. According to the technical architecture and exploitation methods, we describe the brain-inspired object recognition models and their advantages and disadvantages in realizing brain-inspired object recognition. We focus on analyzing the similarity between the artificial and biological neural network, and studying the biological credibility of the current popular DNN-based visual benchmark models. The analysis provides a guide for researchers to measure the occasion and condition when conducting visual object recognition research.
Full-text available
The information perceived via visual observations of real-world phenomena is unstructured and complex. Computer vision (CV) is the field of research that attempts to make use of that information. Recent approaches of CV utilize deep learning (DL) methods as they perform quite well if training and testing domains follow the same underlying data distribution. However, it has been shown that minor variations in the images that occur when these methods are used in the real world can lead to unpredictable and catastrophic errors. Transfer learning is the area of machine learning that tries to prevent these errors. Especially, approaches that augment image data using auxiliary knowledge encoded in language embeddings or knowledge graphs (KGs) have achieved promising results in recent years. This survey focuses on visual transfer learning approaches using KGs, as we believe that KGs are well suited to store and represent any kind of auxiliary knowledge. KGs can represent auxiliary knowledge either in an underlying graph-structured schema or in a vector-based knowledge graph embedding. Intending to enable the reader to solve visual transfer learning problems with the help of specific KG-DL configurations we start with a description of relevant modeling structures of a KG of various expressions, such as directed labeled graphs, hypergraphs, and hyper-relational graphs. We explain the notion of feature extractor, while specifically referring to visual and semantic features. We provide a broad overview of knowledge graph embedding methods and describe several joint training objectives suitable to combine them with high dimensional visual embeddings. The main section introduces four different categories on how a KG can be combined with a DL pipeline: 1) Knowledge Graph as a Reviewer; 2) Knowledge Graph as a Trainee; 3) Knowledge Graph as a Trainer; and 4) Knowledge Graph as a Peer. To help researchers find meaningful evaluation benchmarks, we provide an overview of generic KGs and a set of image processing datasets and benchmarks that include various types of auxiliary knowledge. Last, we summarize related surveys and give an outlook about challenges and open issues for future research.
Full-text available
While scene context is known to facilitate object recognition, little is known about which contextual “ingredients” are at the heart of this phenomenon. Here, we address the question of whether the materials that frequently occur in scenes (e.g., tiles in a bathroom) associated with specific objects (e.g., a perfume) are relevant for the processing of that object. To this end, we presented photographs of consistent and inconsistent objects (e.g., perfume vs. pinecone) superimposed on scenes (e.g., a bathroom) and close-ups of materials (e.g., tiles). In Experiment 1, consistent objects on scenes were named more accurately than inconsistent ones, while there was only a marginal consistency effect for objects on materials. Also, we did not find any consistency effect for scrambled materials that served as color control condition. In Experiment 2, we recorded event-related potentials and found N300/N400 responses—markers of semantic violations—for objects on inconsistent relative to consistent scenes. Critically, objects on materials triggered N300/N400 responses of similar magnitudes. Our findings show that contextual materials indeed affect object processing—even in the absence of spatial scene structure and object content—suggesting that material is one of the contextual “ingredients” driving scene context effects.
Full-text available
A central regularity of visual perception is the co-occurrence of objects in the natural environment. Here we use machine learning and fMRI to test the hypothesis that object co-occurrence statistics are encoded in the human visual system and elicited by the perception of individual objects. We identified low-dimensional representations that capture the latent statistical structure of object co-occurrence in real-world scenes, and we mapped these statistical representations onto voxel-wise fMRI responses during object viewing. We found that cortical responses to single objects were predicted by the statistical ensembles in which they typically occur, and that this link between objects and their visual contexts was made most strongly in parahippocampal cortex, overlapping with the anterior portion of scene-selective parahippocampal place area. In contrast, a language-based statistical model of the co-occurrence of object names in written text predicted responses in neighboring regions of object-selective visual cortex. Together, these findings show that the sensory coding of objects in the human brain reflects the latent statistics of object context in visual and linguistic experience.
Full-text available
Children until the age of five are only able to reverse an ambiguous figure when they are informed about the second interpretation. In two experiments, we examined whether children's difficulties would extend to a continuous version of the ambiguous figures task. Children (Experiment 1: 66 3-to 5-year olds; Experiment 2: 54 4-to 9-year olds) and adult controls saw line drawings of animals gradually morph-through well-known ambiguous figures-into other animals. Results show a relatively late developing ability to recognize the target animal, with difficulties extending beyond preschool-age. This delay can neither be explained with improvements in theory of mind, inhibitory control, nor individual differences in eye movements. Even the best achieving children only started to approach adult level performance at the age of 9, suggesting a fundamentally different processing style in children and adults.
Full-text available
Although the perception of faces depends on low-level neuronal processes, it is also affected by high-level social processes. Faces from a social in-group, such as people of a similar age, receive more in-depth processing and are processed holistically. To explore whether own-age biases affect subconscious face perception, we presented participants with the young/old lady ambiguous figure. Mechanical Turk was used to sample participants of varying ages from the USA. Results demonstrated that younger and older participants estimated the age of the image as younger and older, respectively. This own-age effect ties in with socio-cultural practices, which are less inclusive towards the elderly. Participants were not aware the study was related to ageing and the stimulus was shown briefly. The results therefore demonstrate that high-level social group processes have a subconscious effect on the early stages of face processing. A neural feedback model is used to explain this interaction.
Full-text available
The dorsal, parietal visual stream is activated when seeing objects, but the exact nature of parietal object representations is still under discussion. Here we test 2 specific hypotheses. First, parietal cortex is biased to host some representations more than others, with a different bias compared with ventral areas. A prime example would be object action representations. Second, parietal cortex forms a general multiple-demand network with frontal areas, showing similar task effects and representational content compared with frontal areas. To differentiate between these hypotheses, we implemented a human neuroimaging study with a stimulus set that dissociates associated object action from object category while manipulating task context to be either action- or category-related. Representations in parietal as well as prefrontal areas represented task-relevant object properties (action representations in the action task), with no sign of the irrelevant object property (category representations in the action task). In contrast, irrelevant object properties were represented in ventral areas. These findings emphasize that human parietal cortex does not preferentially represent particular object properties irrespective of task, but together with frontal areas is part of a multiple-demand and content-rich cortical network representing task-relevant object properties.
Conference Paper
Full-text available
We present a coherent, discriminative framework for simultaneously tracking multiple people and estimating their collective activities. Instead of treating the two problems separately, our model is grounded in the intuition that a strong correlation exists between a person's motion, their activity, and the motion and activities of other nearby people. Instead of directly linking the solutions to these two problems, we introduce a hierarchy of activity types that creates a natural progression that leads from a specific person's motion to the activity of the group as a whole. Our model is capable of jointly tracking multiple people, recognizing individual activities (atomic activities), the interactions between pairs of people (interaction activities), and finally the behavior of groups of people (collective activities). We also propose an algorithm for solving this otherwise intractable joint inference problem by combining belief propagation with a version of the branch and bound algorithm equipped with integer programming. Experimental results on challenging video datasets demonstrate our theoretical claims and indicate that our model achieves the best collective activity classification results to date.
Conference Paper
Full-text available
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.
Full-text available
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Conference Paper
Traditional approaches to computer vision tasks based on neural networks are typically trained on a large amount of pure image data. By minimizing the cross-entropy loss between a prediction and a given target class, the network and its visual embedding space are automatically learned to fullfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a network that is more robust to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the learning of the network by image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships and embedded into a dense vector representation via an embedding method. This invariant embedding space is then used to train the neural network on visual object classification tasks. Using a contrastive loss function, the neural network learns to adapt its visual embedding space and thus its weights according to the knowledge graph embedding space. We evaluate KG-NN on the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides a better performance and a stronger robustness to domain shifts, these networks can simultaneously adapt to multiple datasets and classes without heavily suffering from catastrophic forgetting.
Conference Paper
Knowledge graph embeddings (KGE) are vector representations that capture the global distributional semantics of each entity instance and relation type in a static Knowledge Graph (KG). While KGEs have the capability to embed information related to an entity into a single representation, they are not customizable to a specific context. This is fundamentally limiting for many applications, since the latent state of an entity can change depending on the current situation and the entity's history of related observations. Such context-specific roles an entity might play cannot be captured in global KGEs, since it requires to generate an embedding unique for each situation. This paper proposes a KG modeling template for temporally contextual-ized observations and introduces the Recurrent Transformer (RETRA), a neural encoder stack with a feedback loop and constrained multi-headed self-attention layers. RETRA enables to transform global KGEs into custom embeddings, given the situation-specific factors of the relation and the subjective history of the entity. This way, entity embeddings for downstream Knowledge Graph Tasks (KGT) can be contextualized, like link prediction for location recommendation , event prediction, or driving-scene classification. Our experimental results demonstrate the benefits standard KGEs can gain, if they are customized according to the situational context.
Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g. a cow in the ocean). To understand and model the role of contextual information in visual recognition, we systematically and quantitatively investigated ten critical properties of where, when, and how context modulates recognition including amount of context, context and object resolution, geometrical structure of context, context congruence, time required to incorporate contextual information, and temporal dynamics of contextual modulation. The tasks involve recognizing a target object surrounded with context in a natural image. As an essential benchmark, we first describe a series of psychophysics experiments, where we alter one aspect of context at a time, and quantify human recognition accuracy. To computationally assess performance on the same tasks, we propose a biologically inspired context aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates both object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition.
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we employed a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2,250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and was within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms post-image onset), while high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Taken together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.Significance Statement:In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties such as colors and contours, to high-level properties such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials (vERPs) over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.
Conference Paper
It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems. (The dataset is available at
Conference Paper
A great deal of research has focused on algorithms for learning features from un- labeled data. Indeed, much progress has been made on benchmark datasets like NORB and CIFAR by employing increasingly complex unsupervised learning al- gorithms and deep models. In this paper, however, we show that several very sim- ple factors, such as the number of hidden nodes in the model, may be as important to achieving high performance as the choice of learning algorithm or the depth of the model. Specifically, we will apply several off-the-shelf feature learning al- gorithms (sparse auto-encoders, sparse RBMs and K-means clustering, Gaussian mixtures) to NORB and CIFAR datasets using only single-layer networks. We then present a detailed analysis of the effect of changes in the model setup: the receptive field size, number of hidden nodes (features), the step-size (stride) be- tween extracted features, and the effect of whitening. Our results show that large numbers of hidden nodes and dense feature extraction are as critical to achieving high performance as the choice of algorithm itselfso critical, in fact, that when these parameters are pushed to their limits, we are able to achieve state-of-the- art performance on both CIFAR and NORB using only a single layer of features. More surprisingly, our best performance is based on K-means clustering, which is extremely fast, has no hyper-parameters to tune beyond the model structure it- self, and is very easy implement. Despite the simplicity of our system, we achieve performance beyond all previously published results on the CIFAR-10 and NORB datasets (79.6% and 97.0% accuracy respectively).
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ 25 percent. Models and code are available at .
We introduce the variational graph auto-encoder (VGAE), a framework for unsupervised learning on graph-structured data based on the variational auto-encoder (VAE). This model makes use of latent variables and is capable of learning interpretable latent representations for undirected graphs. We demonstrate this model using a graph convolutional network (GCN) encoder and a simple inner product decoder. Our model achieves competitive results on a link prediction task in citation networks. In contrast to most existing models for unsupervised learning on graph-structured data and link prediction, our model can naturally incorporate node features, which significantly improves predictive performance on a number of benchmark datasets.
In this article, I discuss some of the latest functional neuroimaging findings on the organization of object concepts in the human brain. I argue that these data provide strong support for viewing concepts as the products of highly interactive neural circuits grounded in the action, perception, and emotion systems. The nodes of these circuits are defined by regions representing specific object properties (e.g., form, color, and motion) and thus are property-specific, rather than strictly modality-specific. How these circuits are modified by external and internal environmental demands, the distinction between representational content and format, and the grounding of abstract social concepts are also discussed.
Argues that, although much recent research has emphasized the equivalence between imagery and perception, there are critical differences between these activities. Perception, initiated by an external stimulus, is to a large extent concerned with the interpretation of that stimulus; in contrast, images are created as symbols of something and hence need no interpretive process. Without a construal process, images do not allow reconstrual. In support of this argument, a series of 4 experiments with 65 university students was conducted to test whether Ss could reverse an ambiguous figure (e.g., duck/rabbit) in mental imagery. The S population contained many with vivid imagery, as assessed by a visual elaboration scale and the Vividness of Visual Imagery Questionnaire. In all 4 experiments, Ss were unable to reverse a mental image, but all Ss were able, immediately after this failure, to draw a picture from their mental image and then reconstrue the figure in their own drawing. This failure to reverse images occurs despite hints to the S, some coaching, and a moderate amount of training in figural reversal. Findings emphasize the difference between images and percepts. (46 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.
The perceptual recognition of objects is conceptualized to be a process in which the image of the input is segmented at regions of deep concavity into an arrangement of simple geometric components. The fundamental assumption of the proposed theory, recognition-by-components (RBC), is that a modest set of generalized-cone components, called geons, can be derived from contrasts of five readily detectable properties of edges in a two-dimensional image. The detection of these properties is generally invariant over viewing position and image quality and consequently allows robust object perception when the image is projected from a novel viewpoint or is degraded. RBC thus provides a principled account of the heretofore undecided relation between the classic principles of perceptual organization and pattern recognition. The results from experiments on the perception of briefly presented pictures by human observers provide empirical support for the theory. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
To study the influence of motivational expectancy on perception, the ambiguous drawing of a duck/rabbit was shown to 265 subjects on Easter and to 276 subjects in October. The ambiguous drawing, though perceived as a bird by a majority of subjects in October, was most frequently named a bunny on Easter. This biasing effect of expectancy upon perception was observed for young children (2 to 10 years) as well as for older subjects (11 to 93 years).
Despite tremendous variation in the appearance of visual objects, primates can recognize a multitude of objects, each in a fraction of a second, with no apparent effort. However, the brain mechanisms that enable this fundamental ability are not understood. Drawing on ideas from neurophysiology and computation, we present a graphical perspective on the key computational challenges of object recognition, and argue that the format of neuronal population representation and a property that we term 'object tangling' are central. We use this perspective to show that the primate ventral visual processing stream achieves a particularly effective solution in which single-neuron invariance is not the goal. Finally, we speculate on the key neuronal mechanisms that could enable this solution, which, if understood, would have far-reaching implications for cognitive neuroscience.
Conference Paper
Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then be used to improve the performance of many other applications. We provide a thorough quantitative evaluation of our algorithm on a set of outdoor images and demonstrate its usefulness in two applications: object detection and automatic single-view reconstruction.
The role of context in object recognition
The role of context in object recognition. Trends in Cognitive Sciences (2007)
Interaction networks for learning about objects, relations and physics
  • P W Battaglia
  • R Pascanu
  • M Lai
  • D J Rezende
  • K Kavukcuoglu
Battaglia, P.W., Pascanu, R., Lai, M., Rezende, D.J., Kavukcuoglu, K.: Interaction networks for learning about objects, relations and physics. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems (2016)
Approximating cnns with bag-of-local-features models works surprisingly well on imagenet
  • W Brendel
  • M Bethge
Brendel, W., Bethge, M.: Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. In: 7th International Conference on Learning Representations, ICLR (2019)
Container: Context aggregation networks
  • P Gao
  • J Lu
  • H Li
  • R Mottaghi
  • A Kembhavi
Gao, P., Lu, J., Li, H., Mottaghi, R., Kembhavi, A.: Container: Context aggregation networks. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems (2021)
Gather-excite: Exploiting feature context in convolutional neural networks
  • J Hu
  • L Shen
  • S Albanie
  • G Sun
  • A Vedaldi
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: Exploiting feature context in convolutional neural networks. In: Advances in Neural Information Processing Systems: Annual Conf. on Neural Information Processing Systems (2018)
Supervised contrastive learning
  • P Khosla
  • P Teterwak
  • C Wang
  • A Sarna
  • Y Tian
  • P Isola
  • A Maschinot
  • C Liu
  • D Krishnan
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. In: Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems (2020)
Lightcake: A lightweight framework for context-aware knowledge graph embedding
  • Z Ning
  • Z Qiao
  • H Dong
  • Y Du
  • Y Zhou
Ning, Z., Qiao, Z., Dong, H., Du, Y., Zhou, Y.: Lightcake: A lightweight framework for context-aware knowledge graph embedding. In: Advances in Knowledge Discovery and Data Mining -25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11-14, 2021, Proceedings, Part III (2021)
Graph attention networks
  • P Velickovic
  • G Cucurull
  • A Casanova
  • A Romero
  • P Liò
  • Y Bengio
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations ICLR (2018)
Learning robust global representations by penalizing local predictive power
  • H Wang
  • S Ge
  • Z Lipton
  • E P Xing
Wang, H., Ge, S., Lipton, Z., Xing, E.P.: Learning robust global representations by penalizing local predictive power. In: NeurIPS (2019)
Dolores: Deep contextualized knowledge graph embeddings
  • H Wang
  • V Kulkarni
  • W Y Wang
Wang, H., Kulkarni, V., Wang, W.Y.: Dolores: Deep contextualized knowledge graph embeddings. In: Conference on Automated Knowledge Base Construction, AKBC 2020, Virtual, June 22-24, 2020 (2020)
Context-aware zero-shot learning for object recognition
  • E Zablocki
  • P Bordes
  • L Soulier
  • B Piwowarski
  • P Gallinari
Zablocki, E., Bordes, P., Soulier, L., Piwowarski, B., Gallinari, P.: Context-aware zero-shot learning for object recognition. In: Proceedings of the 36th International Conference on Machine Learning ICML (2019)