Article

Poincaré Embeddings for Learning Hierarchical Representations

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Graph Neural Networks (GNNs) have garnered considerable interest recently [1], [2], [3], [4]. The majority of GNNs learn node representations in Euclidean space since it is intuition-friendly and has a number of computationally advantageous qualities [5], [6], [7]. Despite the effectiveness of Euclidean models in graph embedding, their representation ability of Euclidean space to embed complex patterns is fundamentally constrained by its polynomially expanding capacity. ...
... Hyperbolic geometry has also received increasing interest in machine learning and network science communities given its attractive properties. It has been applied to neural networks, for problems of computer vision [24], natural language processing [5], [6], [25], [26], recommender systems [27], [28], [29], [30], [31] and graph embedding tasks [25], [32], [9], [10], [33], [34]. In the graph embedding filed, recent works, including HGNN [10], HGCN [9], and HGAT [32] generalize the hyperbolic graph operations on the tangent space. ...
... There exist multiple equivalent hyperbolic models which show different characteristics but are mathematically equivalent. We here mainly consider two widely studied hyperbolic models: the Poincaré ball model [5] and the Lorentz model (also known as the hyperboloid model) [6]. Let . ...
Preprint
Hyperbolic space is emerging as a promising learning space for representation learning, owning to its exponential growth volume. Compared with the flat Euclidean space, the curved hyperbolic space is far more ambient and embeddable, particularly for datasets with implicit tree-like architectures, such as hierarchies and power-law distributions. On the other hand, the structure of a real-world network is usually intricate, with some regions being tree-like, some being flat, and others being circular. Directly embedding heterogeneous structural networks into a homogeneous embedding space unavoidably brings inductive biases and distortions. Inspiringly, the discrete curvature can well describe the local structure of a node and its surroundings, which motivates us to investigate the information conveyed by the network topology explicitly in improving geometric learning. To this end, we explore the properties of the local discrete curvature of graph topology and the continuous global curvature of embedding space. Besides, a Hyperbolic Curvature-aware Graph Neural Network, HCGNN, is further proposed. In particular, HCGNN utilizes the discrete curvature to lead message passing of the surroundings and adaptively adjust the continuous curvature simultaneously. Extensive experiments on node classification and link prediction tasks show that the proposed method outperforms various competitive models by a large margin in both high and low hyperbolic graph data. Case studies further illustrate the efficacy of discrete curvature in finding local clusters and alleviating the distortion caused by hyperbolic geometry.
... 3 Hyperbolic space can informally be thought of as the continuous analog of a tree, and so 4 the exponential expansion of hyperbolic spaces allows them to capture hierarchical 5 structure with only a few degrees of freedom. This has spurred a variety of techniques 6 for embedding taxonomies, networks, and continuous datasets in these spaces [14,1,5]. 7 For example [22] used hyperbolic embeddings to show that volatile metabolites from 8 plants and animals conform to a low-dimensional hyperbolic geometry. It has also been 9 shown that real world networks such as the internet possess a latent hyperbolic 10 geometry that allows for efficient communication [1], and [10] has proposed a general 11 framework for understanding how scale-free network topologies arise from networks 12 being embedded in hyperbolic spaces. ...
... It has also been 9 shown that real world networks such as the internet possess a latent hyperbolic 10 geometry that allows for efficient communication [1], and [10] has proposed a general 11 framework for understanding how scale-free network topologies arise from networks 12 being embedded in hyperbolic spaces. 13 Hierarchical structures are typically understood in the form of graphs, so previous 14 representation learning studies in hyperbolic space have focused on embedding explicit 15 networks or taxonomies [14,15,5,3], where links between nodes determine their tree-like graph [20]. Data typically instead have continuous relationships, more akin to a 19 distance or similarity, than a binary connection. ...
... It has also been 9 shown that real world networks such as the internet possess a latent hyperbolic 10 geometry that allows for efficient communication [1], and [10] has proposed a general 11 framework for understanding how scale-free network topologies arise from networks 12 being embedded in hyperbolic spaces. 13 Hierarchical structures are typically understood in the form of graphs, so previous 14 representation learning studies in hyperbolic space have focused on embedding explicit 15 networks or taxonomies [14,15,5,3], where links between nodes determine their tree-like graph [20]. Data typically instead have continuous relationships, more akin to a 19 distance or similarity, than a binary connection. ...
Preprint
Full-text available
Recent studies have increasingly demonstrated that hyperbolic geometry confers many advantages for analyzing hierarchical structure in complex systems. However, available embedding methods do not give a precise metric for determining the dimensionality of the data, and do not vary curvature. These parameters are important for obtaining accurate, low dimensional, continuous descriptions of the data. To address this we develop a Bayesian formulation of Multi-Dimensional Scaling for embedding data in hyperbolic spaces that can fit for the optimal values of geometric parameters such as curvature and dimension. We propose a novel model of embedding uncertainty within this Bayesian framework which improves both performance and interpretability of the model. Because the method allows for variable curvature, it can also correctly embed Euclidean data using zero curvature, thus subsuming traditional Euclidean MDS models. We demonstrate that only a small amount of data is needed to constrain the geometry in our model and that the model is robust against false minima when scaling to large datasets. We apply our model to real world datasets and uncover new insights into their hierarchical structure derived from our geometric embeddings.
... However, the norm of the representation can also be used to encode useful representational structure. In hyperbolic space, the magnitude of a vector often plays the role of modeling the hypernymy of the hierarchical structure [50,58,64]. When projecting the representations to the hyperbolic space, the norm information is preserved and used to determine the Riemannian distance, which eventually affects the loss. ...
... However, modeling scenes requires a much larger volume due to the exponential number of possible compositions of objects. Another way to think about the object-centric hierarchy is through the generality and specificity as often discussed in the language literature [47,50]. An object concept is general when standing alone in the visual world, and it will become specific when a certain context is given. ...
... Among these isometries, there are five common models that previous studies often work on [6]. In this paper, we choose the Poincaré ball D n := p ∈ R n | p 2 < r 2 as our basic model [23,50,64], where r > 0 is the radius of the ball. The Poincaré ball is coupled with a Riemannian metric g D (p) = 4 (1− p 2 /r 2 ) 2 g E , where p ∈ D n and g E is the canonical metric of the Euclidean space. ...
Preprint
Full-text available
Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion. Our code can be found at \url{https://github.com/shlokk/HCL/tree/main/HCL}.
... In addition to mapping words and hierarchical topics into a shared embedding space, SawETM has also developed a unique Sawtooth Connection module to capture the dependencies between the topics at different layers, which, on the other side, empowers it to support a deep network structure. While achieving promising results, both ETM and SawETM hold the Euclidean embedding space assumption, leading to a fundamental limitation that their ability to model complex patterns (akin to social networks, knowledge graphs, and taxonomies) is inherently bounded by the dimensionality of the embedding space [29,30]. As a consequence, the underlying semantic hierarchy among the words and topics can hardly be expressed adequately in a relatively low-dimensional embedding space, as illustrated on the left side of Figure 1. ...
... Mathematically, there exist multiple equivalent models for hyperbolic space with different definitions and metrics. Here, we consider two representative ones in light of optimization simplicity and stability: Poincaré ball model [29] and the Lorentz model [40]. ...
... Sampling strategy Inspired by the homophily property (i.e., similar actors tend to associate with each other) in many graph networks [29], we take one-hop neighbors of each anchor, i.e., its parent node and its child nodes as positive samples to maintain the hierarchical semantic information. For the negative samples, we select m embeddings from the non-first-order neighbors that have the highest similarity scores with the anchor embedding. ...
Preprint
Full-text available
Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical relations. To this end, we present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy among words and topics can be better exploited to mine more interpretable topics. Furthermore, due to the superiority of hyperbolic geometry in representing hierarchical data, tree-structure knowledge can also be naturally injected to guide the learning of a topic hierarchy. Therefore, we further develop a regularization term based on the idea of contrastive learning to inject prior structural knowledge efficiently. Experiments on both topic taxonomy discovery and document representation demonstrate that the proposed framework achieves improved performance against existing embedded topic models.
... Most models created using word and sentence embeddings are based on the Euclidean space. Though this vector space is commonly used, it poses significant limitations when representing complex structures (Nickel and Kiela, 2017). Using the hyperbolic space provides a plausible solution for such instances. ...
... It is advantageous for embedding trees as the circumference of a circle grows exponentially with the radius. The usage of hyperbolic embedding is still a novel research area as it was only introduced recently, through the work of Nickel and Kiela (2017);Chamberlain et al. (2017); Sala et al. (2018). The work of Lu et al. ( , 2020 highlight the importance of using the hyperbolic space to improve the quality of embeddings in a practical context within the medical domain. ...
... The work of Nickel and Kiela (2017) introduces and explores the potential of hyperbolic embedding by using an n-dimension Poincaré ball. The research work compares the hyperbolic and Euclidean embeddings for a complex latent data structure and comes to the conclusion that hyperbolic embedding surpasses the Euclidean embedding in effectivity. ...
Preprint
Full-text available
In the process of numerically modeling natural languages, developing language embeddings is a vital step. However, it is challenging to develop functional embeddings for resource-poor languages such as Sinhala, for which sufficiently large corpora, effective language parsers, and any other required resources are difficult to find. In such conditions, the exploitation of existing models to come up with an efficacious embedding methodology to numerically represent text could be quite fruitful. This paper explores the effectivity of several one-tiered and two-tiered embedding architectures in representing Sinhala text in the sentiment analysis domain. With our findings, the two-tiered embedding architecture where the lower-tier consists of a word embedding and the upper-tier consists of a sentence embedding has been proven to perform better than one-tier word embeddings, by achieving a maximum F1 score of 88.04% in contrast to the 83.76% achieved by word embedding models. Furthermore, embeddings in the hyperbolic space are also developed and compared with Euclidean embeddings in terms of performance. A sentiment data set consisting of Facebook posts and associated reactions have been used for this research. To effectively compare the performance of different embedding systems, the same deep neural network structure has been trained on sentiment data with each of the embedding systems used to encode the text associated.
... One can thus embed finite trees into H n with arbitrarily small distortion (Sarkar, 2011). This motivates the study of representation learning of hierarchical data into hyperbolic space (Nickel and Kiela, 2017) and, moreover, the design of deep neural networks in hyperbolic spaces , with applications in various domains where hierarchical data is abundant, such as NLP (Zhu et al., 2020;López et al., 2019;López and Strube, 2020) and recommendation systems (Chamberlain et al., 2019). ...
... However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. Sala et al. (2018) proved that in order to represent points in the hyperbolic space through the popular Poincaré model (Nickel and Kiela, 2017;, one needs a large number of bits to avoid undesirable rounding errors when dealing with small numbers. The Lorentz model, a popular alternative for representing the hyperbolic space (Nickel and Kiela, 2018;Law et al., 2019), suffers an opposite numerical issue in dealing with large numbers. ...
... Together with the optimization superiority of the Lorentz model we mention in the next section, this may account for some empirical observations that the Lorentz model is more stable than the Poincaré ball. We finally comment that many works either explicitly or implicitly impose different thresholds in their implementations to restrict all points within a certain radius ((Skopek et al., 2019) 2 , (Nickel and Kiela, 2017) 3 , and the popular manifold research toolbox package by (Kochurov et al., 2020) 4 ), with limited discussion on the impact of the choice of these thresholds on the representation capacity and other performance. The discussion in this section fills the gap and provides a guide for the choice of thresholds. ...
Preprint
Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
... For taxonomy embedding, graph embedding methods like spectral embedding (Shi and Malik 2000) are popular among the researchers. Recently, Nickel and Kiela (2017) extended the taxonomy embedding from the Euclidean space to the hyperbolic space, which can naturally reflect the hierarchy by the space's negative curvature. ...
... We implement the measure to evaluate the distance between category nodes in a taxonomy. 2. Poincare Embedding(poincare) (Nickel and Kiela 2017) is a representation learning method which embeds the hierarchical symbolic data into hyperbolic space to capture hierarchy and similarity. We take use of this method to learn the representations of the nodes in a taxonomy. ...
Article
Full-text available
With the emergence of webpage services, huge amounts of customer transaction data are flooded in cyberspace, which are getting more and more useful for profiling users and making recommendations. Since web user transaction data are usually multi-modal, heterogeneous and large-scale, the traditional data analysis methods meet new challenges. One of the challenges is the distance definition on two transaction data or two web users. The distance definition takes an important role in further analysis, such as the cluster analysis or k-nearest neighbor query. We introduce a category tree distance in this paper, which makes use of the product taxonomy information to convert the user transaction data to vectors. Then, the similarity between web users can be evaluated by the vectors from their transaction data. The properties of the distance like upper and lower bounds and the complexity analysis are also given in the paper. To investigate the performance of the proposal, we conduct experiments on real web user transaction data. The results show that the proposed distance outperforms the other distances on user transaction analysis.
... The embedding method [13,[23][24][25] is another representative approximate shortest path distance method. In the data preprocessing stage, this method learns the vector embedding of each node through embedding technology [26][27][28][29] to maintain the shortest path distance; that is, each node is embedded into the -dimensional mapping space, such as Euclidean space [30] and hyperbolic space [31], to calculate the shortest path distance between nodes. Therefore, each node has a corresponding -dimensional embedding vector. ...
... Researchers at the University of Passau proposed a new method [13] for approximating the shortest path distance between two nodes in a social graph based on a landmark approach, and they used simple neural networks with node2vec [27] or Poincare [28] embeddings and obtained better results than Orion and Rigel on a social graph dataset. For convenience, we name this method node2vec-Sg. ...
Article
Full-text available
The ability to quickly calculate or query the shortest path distance between nodes on a road network is essential for many real-world applications. However, the traditional graph traversal shortest path algorithm methods, such as Dijkstra and Floyd–Warshall, cannot be extended to large-scale road networks, or the traversal speed on large-scale networks is very slow, which is computational and memory intensive. Therefore, researchers have developed many approximate methods, such as the landmark method and the embedding method, to speed up the processing time of graphs and the shortest path query. This study proposes a new method based on landmarks and embedding technology, and it proposes a multilayer neural network model to solve this problem. On the one hand, we generate distance-preserving embedding for each node, and on the other hand, we predict the shortest path distance between two nodes of a given embedment. Our approach significantly reduces training time costs and is able to approximate the real distance with a relatively low Mean Absolute Error (MAE). The experimental results on a real road network confirm these advantages.
... We tackle this problem from a representation learning perspective by modeling taxonomy data as embeddings that capture the associated hierarchical structure. Inspired by recent advances on word embeddings (Nickel & Kiela, 2017;2018;Mathieu et al., 2019), we propose to leverage the hyperbolic manifold (Ratcliffe, 2019) to learn such embeddings. An important property of the hyperbolic manifold is that distances grow exponentially when moving away from the origin, and shortest paths between distant points tend to pass through it, resembling a continuous hierarchical structure. ...
... In other words, distances in H d grow exponentially when moving away from the origin, and shortest paths between distant points on the manifold tend to pass through the origin, resembling a continuous hierarchical structure. Because of this, the hyperbolic manifold is often exploited to embed hierarchical data such as trees or graphs (Nickel & Kiela, 2017;Chami et al., 2020). Although its potential to embed discrete data structures into a continuous space is well known in the machine learning community, its application in robotics is presently scarce. ...
Preprint
Robotic taxonomies have appeared as high-level hierarchical abstractions that classify how humans move and interact with their environment. They have proven useful to analyse grasps, manipulation skills, and whole-body support poses. Despite the efforts devoted to design their hierarchy and underlying categories, their use in application fields remains scarce. This may be attributed to the lack of computational models that fill the gap between the discrete hierarchical structure of the taxonomy and the high-dimensional heterogeneous data associated to its categories. To overcome this problem, we propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure. To do so, we formulate a Gaussian process hyperbolic latent variable model and enforce the taxonomy structure through graph-based priors on the latent space and distance-preserving back constraints. We test our model on the whole-body support pose taxonomy to learn hyperbolic embeddings that comply with the original graph structure. We show that our model properly encodes unseen poses from existing or new taxonomy categories, it can be used to generate trajectories between the embeddings, and it outperforms its Euclidean counterparts.
... Non-Euclidean Riemannian spaces have recently gained extensive attention in learning representation for non-euclidean data. [36] was the first work to learn hierarchical embeddings in hyperbolic space for link prediction. Following this work, [37] applied hyperbolic embedding in word embedding. ...
... The objective function is to preserve all pairwise graph distances. Motivated by the fact that most of graphs are partially observable, we minimize an alternative loss function [21,36] that preserves local graph distance, given by, ...
... [52,64,72,80,90]) where data belonging to a fixed label is conceived of as being on a common manifold, as well as both generative and encoding tasks, (see e.g. [12,14,20,30,49,66,67,75,78] and [19,22,24,46]) where the manifold hypothesis is used as an "existence proof" of a low-dimensional parameterization of the data of interest. In the context of inverse problems, the manifold hypothesis can be interpreted as a statement that forward operators map low-dimensional space to the high-dimensional space of all possible measurements [1,2,3,5,44,48,50,65,79,88]. ...
... In a generation problem, the goal is to approximate a probability distribution ν over some subset X of R m given samples X from ν. This can be solved by fixing a base distribution q over some simpler subset Z of R n and using a neural network to learn f : R n → R m so that f # q ≈ ν [12,14,20,30,49,66,67,75,78]. This leads to the question: how should we choose Z to allow for maximal flexibility of X ? ...
Preprint
Full-text available
How can we design neural networks that allow for stable universal approximation of maps between topologically interesting manifolds? The answer is with a coordinate projection. Neural networks based on topological data analysis (TDA) use tools such as persistent homology to learn topological signatures of data and stabilize training but may not be universal approximators or have stable inverses. Other architectures universally approximate data distributions on submanifolds but only when the latter are given by a single chart, making them unable to learn maps that change topology. By exploiting the topological parallels between locally bilipschitz maps, covering spaces, and local homeomorphisms, and by using universal approximation arguments from machine learning, we find that a novel network of the form $\mathcal{T} \circ p \circ \mathcal{E}$, where $\mathcal{E}$ is an injective network, $p$ a fixed coordinate projection, and $\mathcal{T}$ a bijective network, is a universal approximator of local diffeomorphisms between compact smooth submanifolds embedded in $\mathbb{R}^n$. We emphasize the case when the target map changes topology. Further, we find that by constraining the projection $p$, multivalued inversions of our networks can be computed without sacrificing universality. As an application, we show that learning a group invariant function with unknown group action naturally reduces to the question of learning local diffeomorphisms for finite groups. Our theory permits us to recover orbits of the group action. We also outline possible extensions of our architecture to address molecular imaging of molecules with symmetries. Finally, our analysis informs the choice of topologically expressive starting spaces in generative problems.
... The Poincaré Ball Model. To describe the hyperbolic space in mathematical language, there are several models, among which the Poincaré ball model is popular for graph representation (Nickel and Kiela, 2017;Chami et al., 2020) due to the relatively convenient computations. ...
... This issue has been observed previously (Nickel and Kiela, 2017;Chami et al., 2020), though their comparisons are established between hyperbolic space and Euclidean space. The representation capacity gap between geometric spaces is distinctly revealed in low dimensions. ...
Preprint
Full-text available
The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks. The hyperbolic geometry has been shown to capture the hierarchical patterns due to its tree-like metrics, which addressed the limitations of the Euclidean embedding models. Recent explorations of the complex hyperbolic geometry further improved the hyperbolic embeddings for capturing a variety of hierarchical structures. However, the performance of the hyperbolic KG embedding models for non-transitive relations is still unpromising, while the complex hyperbolic embeddings do not deal with multi-relations. This paper aims to utilize the representation capacity of the complex hyperbolic geometry in multi-relational KG embeddings. To apply the geometric transformations which account for different relations and the attention mechanism in the complex hyperbolic space, we propose to use the fast Fourier transform (FFT) as the conversion between the real and complex hyperbolic space. Constructing the attention-based transformations in the complex space is very challenging, while the proposed Fourier transform-based complex hyperbolic approaches provide a simple and effective solution. Experimental results show that our methods outperform the baselines, including the Euclidean and the real hyperbolic embedding models.
... Nodes in KGs represent real-world entities (e.g., names, events and products) and edges structure. In order to effectively capture the hierarchy structures in KGs, ATTH [3] was proposed to embed KGs into Hyperbolic Space with trainable curvatures, where richer transformations can be used to separate nodes than Euclidean space [76], while capturing logical patterns simultaneously. ...
... Indeed, different representation spaces have their unique structures and properties, as we show in Section 2. However, in addition to the fundamental mathematical spaces introduced in Section 2, there are more spaces that provide better properties for KGE. For example, in hyperbolic space, the region and length increase exponentially with the radius, which provides more available space for embedding task[3,17,76]. Moreover, in Lie group, embedding vectors will never diverge unlimitedly and therefore regularisation of embedding vectors is no longer required for effective learning[26]. ...
Preprint
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
... Poincaré embeddings (Nickel and Kiela, 2017) is a graph embedding technique which embeds the nodes of a complex network into a Poincaré ball of arbitrary dimensionality. One of its immediate applications was the embedding of the noun hierarchies of WordNet. ...
... In (Nickel and Kiela, 2017) the authors tested Poincaré embedding on three different datasets (WordNet noun hierarchy, social network embedding and lexical entailment dataset). On the Word-Net dataset as evaluation metric they use average rank (Rank) and mean average precision (MAP). ...
Preprint
Full-text available
A new development in NLP is the construction of hyperbolic word embeddings. As opposed to their Euclidean counterparts, hyperbolic embeddings are represented not by vectors, but by points in hyperbolic space. This makes the most common basic scheme for constructing document representations, namely the averaging of word vectors, meaningless in the hyperbolic setting. We reinterpret the vector mean as the centroid of the points represented by the vectors, and investigate various hyperbolic centroid schemes and their effectiveness at text classification.
... This is due to the fact that the volume of the Euclidean space grows only as a power of its radius rather than exponentially, limiting the representation capacity of tree-like data with an exponential number of leaves. This unique characteristic has inspired many researchers to represent hierarchical relations in many domains, from natural language processing [9], [10] to computer vision [11] , [12]. However, the use of such principles for point clouds and 3D data is still unexplored. ...
... This inspired several works which investigated how various frameworks of representation learning can be reformulated in non-Euclidean manifolds. In particular, [9] [13] and [10] were some of the first works to explore hyperbolic representation learning by introducing Riemannian adaptive optimization, Poincarè embeddings and hyperbolic neural networks for natural language processing. The new mathematical formalism introduced by Ganea et al. [10] was decisive to demonstrate the effectiveness of hyperbolic variants of neural network layers compared to the Euclidean counterparts. ...
Preprint
Full-text available
Point clouds of 3D objects exhibit an inherent compositional nature where simple parts can be assembled into progressively more complex shapes to form whole objects. Explicitly capturing such part-whole hierarchy is a long-sought objective in order to build effective models, but its tree-like nature has made the task elusive. In this paper, we propose to embed the features of a point cloud classifier into the hyperbolic space and explicitly regularize the space to account for the part-whole hierarchy. The hyperbolic space is the only space that can successfully embed the tree-like nature of the hierarchy. This leads to substantial improvements in the performance of state-of-art supervised models for point cloud classification.
... Further, each Riemmanian manifold, , is associated with a Riemanian metric that defines the geodesic distance of two points on the manifold and the curvature of the space. In the spherical space, curvature > 0, suitable for capturing cyclical structures [31], while in the hyperbolic space, curvature < 0, suitable for capturing hierarchical structures [20]. Widely used models on the hyperbolic space include the Poincaré ball model [7], the Lorentz [3] model, and the Klein model [3]. ...
... We calculate hyperbolic geodesic distance [20] between points and on the manifold as follows: ...
Preprint
Full-text available
Two-view knowledge graphs (KGs) jointly represent two components: an ontology view for abstract and commonsense concepts, and an instance view for specific entities that are instantiated from ontological concepts. As such, these KGs contain heterogeneous structures that are hierarchical, from the ontology-view, and cyclical, from the instance-view. Despite these various structures in KGs, most recent works on embedding KGs assume that the entire KG belongs to only one of the two views but not both simultaneously. For works that seek to put both views of the KG together, the instance and ontology views are assumed to belong to the same geometric space, such as all nodes embedded in the same Euclidean space or non-Euclidean product space, an assumption no longer reasonable for two-view KGs where different portions of the graph exhibit different structures. To address this issue, we define and construct a dual-geometric space embedding model (DGS) that models two-view KGs using a complex non-Euclidean geometric space, by embedding different portions of the KG in different geometric spaces. DGS utilizes the spherical space, hyperbolic space, and their intersecting space in a unified framework for learning embeddings. Furthermore, for the spherical space, we propose novel closed spherical space operators that directly operate in the spherical space without the need for mapping to an approximate tangent space. Experiments on public datasets show that DGS significantly outperforms previous state-of-the-art baseline models on KG completion tasks, demonstrating its ability to better model heterogeneous structures in KGs.
... Besides, these models are built in Euclidean space while study [5] shows that Euclidean space suffer from heavy volume intersection and points arranged with Euclidean distances would no longer be capable of persevering the structure of the original tree. By contrast, approaches with hyperbolic geometry for modelling symbolic data have demonstrated to be more effective in representing hierarchical relations [5,12]. The hyperbolic space can reflect complex structural patterns inherent in taxonomic data with a low dimensional embedding. ...
... There exist five insightful models of H n and they are conformal to the Euclidean space. Following [12], we use the Poincaré ball model because it can be easily optimized with gradient-based methods. The Poincaré ball model D n , g D is defined by the manifold D n = {x ∈ R n : c x < 1, c ≥ 0} endowed with the Riemannian metric g D x = λ 2c ...
Preprint
In practice, many medical datasets have an underlying taxonomy defined over the disease label space. However, existing classification algorithms for medical diagnoses often assume semantically independent labels. In this study, we aim to leverage class hierarchy with deep learning algorithms for more accurate and reliable skin lesion recognition. We propose a hyperbolic network to learn image embeddings and class prototypes jointly. The hyperbola provably provides a space for modeling hierarchical relations better than Euclidean geometry. Meanwhile, we restrict the distribution of hyperbolic prototypes with a distance matrix that is encoded from the class hierarchy. Accordingly, the learned prototypes preserve the semantic class relations in the embedding space and we can predict the label of an image by assigning its feature to the nearest hyperbolic class prototype. We use an in-house skin lesion dataset which consists of around 230k dermoscopic images on 65 skin diseases to verify our method. Extensive experiments provide evidence that our model can achieve higher accuracy with less severe classification errors than models without considering class relations.
... However, in continual graph learning, the curvature of a graph remains unknown until its arrival. In particular, the negatively curved Riemannian space, hyperbolic space, is well-suited for graphs presenting hierarchical patterns or tree-like structures (Krioukov et al. 2010;Nickel and Kiela 2017). The underlying geometry shifts to be positively curved, hyperspherical space, when cyclical patterns (e.g., triangles or cliques) become dominant (Bachmann, Bécigneul, and Ganea 2020). ...
... Here, we focus on Riemannian models on graphs. In hyperbolic space, Nickel and Kiela (2017); Suzuki, Takahama, and Onoda (2019) introduce shallow models, while HGCN (Chami et al. 2019), HGNN (Liu, Nickel, and Kiela 2019) and LGNN ) generalize convolutional network with different formalism under static setting. Recently, HVGNN (Sun et al. 2021) and HTGN ) extend hyperbolic graph neural network to temporal graphs. ...
Preprint
Full-text available
Continual graph learning routinely finds its role in a variety of real-world applications where the graph data with different tasks come sequentially. Despite the success of prior works, it still faces great challenges. On the one hand, existing methods work with the zero-curvature Euclidean space, and largely ignore the fact that curvature varies over the coming graph sequence. On the other hand, continual learners in the literature rely on abundant labels, but labeling graph in practice is particularly hard especially for the continuously emerging graphs on-the-fly. To address the aforementioned challenges, we propose to explore a challenging yet practical problem, the self-supervised continual graph learning in adaptive Riemannian spaces. In this paper, we propose a novel self-supervised Riemannian Graph Continual Learner (RieGrace). In RieGrace, we first design an Adaptive Riemannian GCN (AdaRGCN), a unified GCN coupled with a neural curvature adapter, so that Riemannian space is shaped by the learnt curvature adaptive to each graph. Then, we present a Label-free Lorentz Distillation approach, in which we create teacher-student AdaRGCN for the graph sequence. The student successively performs intra-distillation from itself and inter-distillation from the teacher so as to consolidate knowledge without catastrophic forgetting. In particular, we propose a theoretically grounded Generalized Lorentz Projection for the contrastive distillation in Riemannian space. Extensive experiments on the benchmark datasets show the superiority of RieGrace, and additionally, we investigate on how curvature changes over the graph sequence.
... Step 3. Node attributes or links prediction. For link prediction, the probability scores of the edges were calculated using the Fermi-Dirac decoder [58,59] (a generalization of sigmoid): ...
Article
Full-text available
Proteins are the fundamental biological macromolecules which underline practically all biological activities. Protein–protein interactions (PPIs), as they are known, are how proteins interact with other proteins in their environment to perform biological functions. Understanding PPIs reveals how cells behave and operate, such as the antigen recognition and signal transduction in the immune system. In the past decades, many computational methods have been developed to predict PPIs automatically, requiring less time and resources than experimental techniques. In this paper, we present a comparative study of various graph neural networks for protein–protein interaction prediction. Five network models are analyzed and compared, including neural networks (NN), graph convolutional neural networks (GCN), graph attention networks (GAT), hyperbolic neural networks (HNN), and hyperbolic graph convolutions (HGCN). By utilizing the protein sequence information, all of these models can predict the interaction between proteins. Fourteen PPI datasets are extracted and utilized to compare the prediction performance of all these methods. The experimental results show that hyperbolic graph neural networks tend to have a better performance than the other methods on the protein-related datasets.
... Nickel and Kiela [54] is one of the first works proposing to adopt non-Euclidean latent space for deep representation learning. It shows that hyperbolic space is inherently superior for modeling hierarchically structural data like trees and graphs and inspires the following studies to choose a latent topology more suitable for data [55], [56]. ...
Article
Full-text available
Recently, there has been a focus on drawing progress on representation learning to obtain more identifiable and interpretable latent representations for spike trains, which helps analyze neural population activity and understand neural mechanisms. Most existing deep generative models adopt carefully designed constraints to capture meaningful latent representations. For neural data involving navigation in cognitive space, based on insights from studies on cognitive maps, we argue that the good representations should reflect such directional nature. Due to manifold mismatch, models utilizing the Euclidean space learn a distorted geometric structure that is difficult to interpret. In the present work, we explore capturing the directional nature in a simpler yet more efficient way by introducing hyperspherical neural latent variable models (SNLVM). SNLVM is an improved deep latent variable model modeling neural activity and behavioral variables simultaneously with hyperspherical latent space. It bridges cognitive maps and latent variable models. We conduct experiments on modeling a static unidirectional task. The results show that while SNLVM has competitive performance, a hyperspherical prior naturally provides more informative and significantly better latent structures that can be interpreted as spatial cognitive maps.
... Examples include clustering (Chami et al., 2020;Dokmanić, 2021, 2020), PCA-type methods (Chami et al., 2021), classification López and Strube (2020), hyperbolic analogues of feedforward networks (Ganea et al., 2018;Shimizu et al., 2021), and several Python packages supporting this Riemannian geometry Miolane et al. (2020); Kochurov et al. (2020). These advances contributed to state-of-the-art performance when learning from natural language Zipf (1949); Dhingra et al. (2018); Le et al. (2019), knowledge graphs Kolyvakis et al. (2020), social networks Krioukov et al. (2010); Muscoloni et al. (2017), directed graphs Munzner (1997), scenario generation for stochastic phenomena Pflug and Pichler (2015), and combinatorial trees Nickel and Kiela (2017). ...
Preprint
Full-text available
We study representations of data from an arbitrary metric space X in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called probabilistic transformers. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about nlog(n) and width about n^2 can bi-Hölder embed any n-point dataset from X with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If X's geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When instead embedding into multivariate Gaussian mixtures, we show that probabilistic transformers can compute bi-Hölder embeddings with arbitrarily small distortion.
... Graph embedding algorithms can also be clustered into several methodological classes: matrix factorization [9,48], random walk [26], auto-encoder [54], and GCNs [27]. Recent work has also embedded nodes into hyperbolic space, which better represents hierarchical tree-like structures [18,45]. ...
Preprint
Full-text available
The issue of bias (i.e., systematic unfairness) in machine learning models has recently attracted the attention of both researchers and practitioners. For the graph mining community in particular, an important goal toward algorithmic fairness is to detect and mitigate bias incorporated into graph embeddings since they are commonly used in human-centered applications, e.g., social-media recommendations. However, simple analytical methods for detecting bias typically involve aggregate statistics which do not reveal the sources of unfairness. Instead, visual methods can provide a holistic fairness characterization of graph embeddings and help uncover the causes of observed bias. In this work, we present BiaScope, an interactive visualization tool that supports end-to-end visual unfairness diagnosis for graph embeddings. The tool is the product of a design study in collaboration with domain experts. It allows the user to (i) visually compare two embeddings with respect to fairness, (ii) locate nodes or graph communities that are unfairly embedded, and (iii) understand the source of bias by interactively linking the relevant embedding subspace with the corresponding graph topology. Experts' feedback confirms that our tool is effective at detecting and diagnosing unfairness. Thus, we envision our tool both as a companion for researchers in designing their algorithms as well as a guide for practitioners who use off-the-shelf graph embeddings.
... Gromov's (δ) hyperbolicity introduced by (Gromov 1987) helps ascertain if the graph is inherently hyperbolic. Mathematically, we define hyperbolicity as -Let {a, b, c, d} be the vertices of the graph G(V, E) and let ...
Article
Learning low-dimensional embeddings of graph data in curved Riemannian manifolds has gained traction due to their desirable property of acting as effective geometrical inductive biases. More specifically, models of Hyperbolic geometry such as Poincar\'{e} Ball and Lorentz/Hyperboloid Model have found applications for learning data with hierarchical anatomy. Gromov's hyperbolicity measures whether a graph can be isometrically embedded in hyperbolic space. This paper shows that adversarial attacks that perturb the network structure also affect the hyperbolicity of graphs rendering hyperbolic space less effective for learning low-dimensional node embeddings of the graph. To circumvent this problem, we introduce learning embeddings in pseudo-Riemannian manifolds such as Lorentzian manifolds and show empirically that they are robust to adversarial perturbations. Despite the recent proliferation of adversarial robustness methods in the graph data, this is the first work exploring the relationship between adversarial attacks and hyperbolicity while also providing resolution to navigate such vulnerabilities.
... Optimization with manifold based constraints has become increasingly popular and has been employed in various applications such as low rank matrix completion [22], learning taxonomy embeddings [85,86], neural networks [60,61,62,43,84,90], density estimation [57,52], optimal transport [29,9,99,82,51], shape analysis [103,59], and topological dimension reduction [63], among others. ...
Preprint
We present Rieoptax, an open source Python library for Riemannian optimization in JAX. We show that many differential geometric primitives, such as Riemannian exponential and logarithm maps, are usually faster in Rieoptax than existing frameworks in Python, both on CPU and GPU. We support various range of basic and advanced stochastic optimization solvers like Riemannian stochastic gradient, stochastic variance reduction, and adaptive gradient methods. A distinguishing feature of the proposed toolbox is that we also support differentially private optimization on Riemannian manifolds.
... A variety of Riemannian manifolds arise in statistics and machine learning. For example, data on spheres are the object of study of directional statistics (Mardia and Jupp 2009), hyperbolic spaces are routinely deployed to represent hierarchical data (Nickel and Kiela 2017) and complex projective spaces correspond to Kendall shape spaces from computer vision (Klingenberg 2020). Those areas of research can potentially benefit from the geometric characteristics and the computational efficiency of an extension of RVDE to Riemannian manifolds. ...
Preprint
Full-text available
We introduce a non-parametric density estimator deemed Radial Voronoi Density Estimator (RVDE). RVDE is grounded in the geometry of Voronoi tessellations and as such benefits from local geometric adaptiveness and broad convergence properties. Due to its radial definition RVDE is moreover continuous and computable in linear time with respect to the dataset size. This amends for the main shortcomings of previously studied VDEs, which are highly discontinuous and computationally expensive. We provide a theoretical study of the modes of RVDE as well as an empirical investigation of its performance on high-dimensional data. Results show that RVDE outperforms other non-parametric density estimators, including recently introduced VDEs.
... Such application of manifold geometry has also been explored in substantial depth in works like (Batmanghelich et al., 2016;Reisinger et al., 2010;Gopal and Yang, 2014). There are also other notable Riemannian optimization based embedding training models like (Tifrea et al., 2018;Nickel and Kiela, 2017) which train embeddings on the hyperbolic manifold space and uses its tree like property for better hierarchical representation of data. Hyperbolic word embeddings are also intrinsically linked with Gaussian word embeddings (Vilnis and McCallum, 2014) which gives a lot more insight into the geometry of word embeddings. ...
Preprint
Full-text available
This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual similarity benchmark tests.
... Expanding the literature search more broadly, we find that there have been very few side-by-side comparisons of Euclidean metrics versus strongly mathematically-formulated non-Euclidean metrics for tasks in computational linguistics. (Nickel and Kiela, 2017), (Tifrea et al., 2018) and (Saxena et al., 2022) performed their learning of word embeddings on a non-Euclidean metric, choosing a Poincaré hyperbolic space. Calculating derivatives and finding minima of a function in a Poincaré space is substantially more complex both mathematically and computationally than for a Euclidean space. ...
Preprint
A simple machine learning model of pluralisation as a linear regression problem minimising a p-adic metric substantially outperforms even the most robust of Euclidean-space regressors on languages in the Indo-European, Austronesian, Trans New-Guinea, Sino-Tibetan, Nilo-Saharan, Oto-Meanguean and Atlantic-Congo language families. There is insufficient evidence to support modelling distinct noun declensions as a p-adic neighbourhood even in Indo-European languages.
... where C is the number of class labels. For the link prediction task, following the settings in [2,44], we use the Fermi-Dirac decoder [45] to calculate the probability of existing an edge between two nodes given their embeddings Z i and Z j as in formula (14). ...
Article
Full-text available
Graph neural networks (GNNs) have achieved outstanding results in research tasks on graph data. Most existing GNN models are defined in Euclidean space. However, when embedding hierarchical and scale-free graphs, models lying in hyperbolic space attain significant improvements over Euclidean graph convolutional networks (GCNs). To further enhance the performance of hyperbolic graph convolution and expand the applicability of related models to different data, we propose a hyperbolic graph convolution model based on the minimum spanning tree (MST-HGCN). Our method utilizes the minimum spanning tree (MST) algorithm to extract and process the topological structure of the input graph, which yields a more hierarchical topological structure and largely eliminates noisy edges. Then, several different topological structures based on the same spanning tree are produced by randomly re-adding the edges deleted by the MST algorithm; subsequently, a consistency loss is introduced to jointly optimize different outputs obtained from these topological structures. Experiments on node classification tasks and link prediction tasks for datasets with different hierarchy extents show that, our method comprehensively outperforms the vanilla hyperbolic GCN model on all the datasets, approaching or even outperforming the representative Euclidean comparison methods, which indicates that our method has better performance and data applicability.
... Since it was proven that hyperbolic geometry 1 [5] is very well suited for embedded tree graphs with low distortions [58] as hyperbolic Delaunay subgraphs of embedded tree nodes, a recent trend in machine learning and data science is to embed discrete hierarchical graphs into continuous spaces with low distortions for further downstream tasks [43,57,31,42,63,60,61,40,34,24]. There exists many models of hyperbolic geometry [5] like the Poincaré disk or upper-half plane conformal models, the Klein non-conformal disk model, the Beltrami hemisphere model, the Minkowski or Lorentz hyperboloid model, etc. ...
Preprint
Full-text available
Hyperbolic geometry has become popular in machine learning due to its capacity to embed hierarchical graph structures with low distortions for further downstream processing. It has thus become important to consider statistical models and inference methods for data sets grounded in hyperbolic spaces. In this paper, we study various information-theoretic measures and the information geometry of the Poincaré distributions and the related hyperboloid distributions , and prove that their statistical mixture models are universal density estimators of smooth densities in hyperbolic spaces. The Poincaré and the hyperboloid distributions are two types of hyperbolic probability distributions defined using different models of hyperbolic geometry. Namely, the Poincaré distributions form a triparametric bivariate exponential family whose sample space is the hyperbolic Poincaré upper-half plane and natural parameter space is the open 3D convex cone of two-by-two positive-definite matrices. The family of hyperboloid distributions form another exponential family which has sample space the forward sheet of the two-sheeted unit hyperboloid modeling hyperbolic geometry. In the first part, we prove that all Ali-Silvey-Csiszár's f-divergences between Poincaré distributions can be expressed using three canonical terms using Eaton's framework of maximal group invariance. We also show that the f-divergences between any two Poincaré distributions are asymmetric except when those distributions belong to a same leaf of a particular foliation of the parameter space. We report closed-form formula for the Fisher information matrix, the Shannon's differential entropy and the Kullback-Leibler divergence. and Bhattacharyya distances between such distributions using the framework of exponential families. In the second part, we state the corresponding results for the exponential family of hyperboloid distributions by highlighting a parameter correspondence between the Poincaré and the hyperboloid distributions. Finally, we describe a random generator to draw variates and present two Monte Carlo methods to stochastically estimate numerically f-divergences between hyperbolic distributions.
... Liu et al. [17] exploit hyperbolic geometry to learn the hierarchical representations. Similar to DeViSE [10], they minimize the Poincaré distance between the Poincaré label embeddings [20] and the image features embeddings. Barz & Denzler [2] map the embeddings onto a unit hypersphere and use LCA to encode the hierarchical distances. ...
Chapter
Full-text available
Label hierarchies are often available apriori as part of biological taxonomy or language datasets WordNet. Several works exploit these to learn hierarchy aware features in order to improve the classifier to make semantically meaningful mistakes while maintaining or reducing the overall error. In this paper, we propose a novel approach for learning Hierarchy Aware Features (HAF) that leverages classifiers at each level of the hierarchy that are constrained to generate predictions consistent with the label hierarchy. The classifiers are trained by minimizing a Jensen-Shannon Divergence with target soft labels obtained from the fine-grained classifiers. Additionally, we employ a simple geometric loss that constrains the feature space geometry to capture the semantic structure of the label space. HAF is a training time approach that improves the mistakes while maintaining top-1 error, thereby, addressing the problem of cross-entropy loss that treats all mistakes as equal. We evaluate HAF on three hierarchical datasets and achieve state-of-the-art results on the iNaturalist-19 and CIFAR-100 datasets. The source code is available at https://github.com/07Agarg/HAF.
... Such algorithms can be very effective in embedding hierarchical graphs. Nickel and Kiela [2017] proposed a new algorithm for learning hierarchical representations of symbolic data by embedding them into the hyperbolic space using the Poincaré-ball model. Later, they proposed a new optimization approach based on the Lorentz model of hyperbolic space for learning, while they found that learning embeddings in the Lorentz model is more efficient than in the Poincaré-ball model [Nickel and Kiela, 2018]. ...
Preprint
Full-text available
In recent years, graph neural networks (GNNs) have emerged as a promising tool for solving machine learning problems on graphs. Most GNNs are members of the family of message passing neural networks (MPNNs). There is a close connection between these models and the Weisfeiler-Leman (WL) test of isomorphism, an algorithm that can successfully test isomorphism for a broad class of graphs. Recently, much research has focused on measuring the expressive power of GNNs. For instance, it has been shown that standard MPNNs are at most as powerful as WL in terms of distinguishing non-isomorphic graphs. However, these studies have largely ignored the distances between the representations of nodes/graphs which are of paramount importance for learning tasks. In this paper, we define a distance function between nodes which is based on the hierarchy produced by the WL algorithm, and propose a model that learns representations which preserve those distances between nodes. Since the emerging hierarchy corresponds to a tree, to learn these representations, we capitalize on recent advances in the field of hyperbolic neural networks. We empirically evaluate the proposed model on standard node and graph classification datasets where it achieves competitive performance with state-of-the-art models.
Chapter
In practice, many medical datasets have an underlying taxonomy defined over the disease label space. However, existing classification algorithms for medical diagnoses often assume semantically independent labels. In this study, we aim to leverage class hierarchy with deep learning algorithms for more accurate and reliable skin lesion recognition. We propose a hyperbolic network to jointly learn image embeddings and class prototypes. The hyperbola provably provides a space for modeling hierarchical relations better than Euclidean geometry. Meanwhile, we restrict the distribution of hyperbolic prototypes with a distance matrix which is encoded from the class hierarchy. Accordingly, the learned prototypes preserve the semantic class relations in the embedding space and we can predict label of an image by assigning its feature to the nearest hyperbolic class prototype. We use an in-house skin lesion dataset which consists of \(\sim \)230k dermoscopic images on 65 skin diseases to verify our method. Extensive experiments provide evidence that our model can achieve higher accuracy with less severe classification errors compared to that of models without considering class relations.KeywordsSkin lesion recognitionClass hierarchyDeep learningHyperbolic geometry
Article
This paper presents the Hager–Zhang (HZ)-type Riemannian conjugate gradient method that uses the exponential retraction. We also present global convergence analyses of our proposed method under two kinds of assumptions. Moreover, we numerically compare our proposed methods with the existing methods by solving two kinds of Riemannian optimization problems on the unit sphere. The numerical results show that our proposed method has much better performance than the existing methods, i.e., the FR, DY, PRP, and HS methods. In particular, they show that it has much higher performance than existing methods including the hybrid ones in computing the stability number of graphs problem.
Chapter
Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype. In this work, we study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures and evaluate them on large-scale, long-tailed, and zero-shot object detection benchmarks. In our extensive experimental evaluations, we observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance.
Chapter
Temporal heterogeneous information network (temporal HIN) embedding, aiming to represent various types of nodes of different timestamps into low-dimensional spaces while preserving structural and semantic information, is of vital importance in diverse real-life tasks. Researchers have made great efforts on temporal HIN embedding in Euclidean spaces and got some considerable achievements. However, there is always a fundamental conflict that many real-world networks show hierarchical property and power-law distribution, and are not isometric of Euclidean spaces. Recently, representation learning in hyperbolic spaces has been proved to be valid for data with hierarchical and power-law structure. Inspired by this character, we propose a hyperbolic heterogeneous temporal network embedding (\(\text {H}^2\text {TNE}\)) model for temporal HINs. Specifically, we leverage a temporally and heterogeneously double-constrained random walk strategy to capture the structural and semantic information, and then calculate the embedding by exploiting hyperbolic distance in proximity measurement. Experimental results show that our method has superior performance on temporal link prediction and node classification compared with SOTA models.KeywordsTemporal heterogeneous information networksHyperbolic geometryRepresentation learning
Chapter
A good similarity metric should be consistent with the human perception of similarities: a sparrow is more similar to an owl if compared to a dog but is more similar to a dog if compared to a car. It depends on the semantic levels to determine if two images are from the same class. As most existing metric learning methods push away interclass samples and pull closer intraclass samples, it seems contradictory if the labels cross semantic levels. The core problem is that a negative pair on a finer semantic level can be a positive pair on a coarser semantic level, so pushing away this pair damages the class structure on the coarser semantic level. We identify the negative repulsion as the key obstacle in existing methods since a positive pair is always positive for coarser semantic levels but not for negative pairs. Our solution, cross-level concept distillation (CLCD), is simple in concept: we only pull closer positive pairs. To facilitate the cross-level semantic structure of the image representations, we propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and then pull closer the distance of the corresponding concepts. Extensive experiments demonstrate that the proposed CLCD method outperforms all other competing methods on the hierarchically labeled datasets. Code is available at: https://github.com/wzzheng/CLCD.
Thesis
Text has been the dominant way of storing data in computer systems and sending information around the Web. Extracting meaningful representations out of text has been a key element for modelling language in order to tackle NLP tasks like text classification. These representations can then form groups that one can use for supervised learning problems. More specifically, one can utilize these linguistic groups for regularization purposes. Last, these structures can be of help in another important field, distance computation between text documents.The main goal of this thesis is to study the aforementioned problems; first, by examining new graph-based representations of text. Next, we studied how groups of these representations can help regularization in machine learning mod- els for text classification. Last, we dealt with sets and measuring distances between documents, utilizing our proposed linguistic groups, as well as graph-based ap- proaches.In the first part of the thesis, we have studied graph-based representations of text. Turning text to graphs is not trivial and has been around even before word embeddings were introduced to the NLP community. In our work, we show that graph-based representations of text can capture effectively relationships like order, semantic or syntactic structure. Moreover, they can be created fast while offering great versatility for multiple tasks.In the second part, we focused on structured regularization for text. Textual data suffer from the dimensionality problem, creating huge feature spaces. Regu- larization is critical for any machine learning model, as it can address overfitting. In our work we present novel approaches for text regularization, by introducing new groups of linguistic structures and designing new algorithms.In the last part of the thesis, we study new methods to measure distance in the word embedding space. First, we introduce diverse methods to boost comparison between documents that consist of word vectors. Next, representing the comparison of the documents as a weighted bipartite matching, we show how we can learn hidden representations and improve results for the text classification task.Finally, we conclude by summarizing the main points of the total contribution and discuss future directions.
Article
Full-text available
The target of the multi-hop knowledge base question-answering task is to find answers of some factoid questions by reasoning across multiple knowledge triples in the knowledge base. Most of the existing methods for multi-hop knowledge base question answering based on a general knowledge graph ignore the semantic relationship between each hop. However, modeling the knowledge base as a directed hypergraph has the problems of sparse incidence matrices and asymmetric Laplacian matrices. To make up for the deficiency, we propose a directed hypergraph convolutional network modeled on hyperbolic space, which can better deal with the sparse structure, and effectively adapt to the problem of an asymmetric incidence matrix of directed hypergraphs modeled on a knowledge base. We propose an interpretable KBQA model based on the hyperbolic directed hypergraph convolutional neural network named HDH-GCN which can update relation semantic information hop-by-hop and pays attention to different relations at different hops. The model can improve the accuracy of the multi-hop knowledge base question-answering task, and has application value in text question answering, human–computer interactions and other fields. Extensive experiments on benchmarks—PQL, MetaQA—demonstrate the effectiveness and universality of our HDH-GCN model, leading to state-of-the-art performance.
Article
Full-text available
We introduce HyperLex - a dataset and evaluation resource that quantifies the extent of of the semantic category membership and lexical entailment (LE) relation between 2,616 concept pairs. Cognitive psychology research has established that category/class membership, and hence LE, is computed in human semantic memory as a gradual rather than binary relation. Nevertheless, most NLP research, and existing large-scale invetories of concept category membership (WordNet, DBPedia, etc.) treat category membership and LE as binary. To address this, we asked hundreds of native English speakers to indicate strength of category membership between a diverse range of concept pairs on a crowdsourcing platform. Our results confirm that category membership and LE are indeed more gradual than binary. We then compare these human judgements with the predictions of automatic systems, which reveals a huge gap between human performance and state-of-the-art LE, distributional and representation learning models, and substantial differences between the models themselves. We discuss a pathway for improving semantic models to overcome this discrepancy, and indicate future application areas for improved graded LE systems.
Article
Full-text available
We study optimization of finite sums of \emph{geodesically} smooth functions on Riemannian manifolds. Although variance reduction techniques for optimizing finite-sum problems have witnessed a huge surge of interest in recent years, all existing work is limited to vector space problems. We introduce \emph{Riemannian SVRG}, a new variance reduced Riemannian optimization method. We analyze this method for both geodesically smooth \emph{convex} and \emph{nonconvex} functions. Our analysis reveals that Riemannian SVRG comes with advantages of the usual SVRG method, but with factors depending on manifold curvature that influence its convergence. To the best of our knowledge, ours is the first \emph{fast} stochastic Riemannian method. Moreover, our work offers the first non-asymptotic complexity analysis for nonconvex Riemannian optimization (even for the batch setting). Our results have several implications; for instance, they offer a Riemannian perspective on variance reduced PCA, which promises a short, transparent convergence analysis.
Conference Paper
Full-text available
This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Conference Paper
Full-text available
Relational learning is becoming increasingly important in many areas of application. Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. We show that unlike other tensor approaches, our method is able to perform collective learning via the latent components of the model and provide an efficient algorithm to compute the factorization. We substantiate our theoretical considerations regarding the collective learning capabilities of our model by the means of experiments on both a new dataset and a dataset commonly used in entity resolution. Furthermore, we show on common benchmark datasets that our approach achieves better or on-par results, if compared to current state-of-the-art relational learning solutions, while it is significantly faster to compute.
Article
Full-text available
Stochastic gradient descent is a simple approach to find the local minima of a cost function whose evaluations are corrupted by noise. In this paper, we develop a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold. We prove that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function. The algorithm has numerous potential applications, and is illustrated here by four examples. In particular a novel gossip algorithm on the set of covariance matrices is derived and tested numerically.
Article
Full-text available
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.
Article
Full-text available
We develop a geometric framework to study the structure and function of complex networks. We assume that hyperbolic geometry underlies these networks, and we show that with this assumption, heterogeneous degree distributions and strong clustering in complex networks emerge naturally as simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. Conversely, we show that if a network has some metric structure, and if the network degree distribution is heterogeneous, then the network has an effective hyperbolic geometry underneath. We then establish a mapping between our geometric framework and statistical mechanics of complex networks. This mapping interprets edges in a network as noninteracting fermions whose energies are hyperbolic distances between nodes, while the auxiliary fields coupled to edges are linear functions of these energies or distances. The geometric network ensemble subsumes the standard configuration model and classical random graphs as two limiting cases with degenerate geometric structures. Finally, we show that targeted transport processes without global topology knowledge, made possible by our geometric framework, are maximally efficient, according to all efficiency measures, in networks with strongest heterogeneity and clustering, and that this efficiency is remarkably robust with respect to even catastrophic disturbances and damages to the network structure.
Conference Paper
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
Article
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Many popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for morphologically rich languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skip-gram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpus quickly. We evaluate the obtained word representations on five different languages, on word similarity and analogy tasks.
Tensor factorization has become a popular method for learning from multirelational data. In this context, the rank of the factorization is an important parameter that determines runtime as well as generalization ability. To identify conditions under which factorization is an efficient approach for learning from relational data, we derive upper and lower bounds on the rank required to recover adjacency tensors. Based on our findings, we propose a novel additive tensor factorization model to learn from latent and observable patterns on multi-relational data and present a scalable algorithm for computing the factorization. We show experimentally both that the proposed additive model does improve the predictive performance over pure latent variable methods and that it also reduces the required rank - and therefore runtime and memory complexity - significantly.
Article
We consider the problem of embedding entities and relationships of multi relational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.
Conference Paper
Although large social and information networks are often thought of as having hierarchical or tree-like structure, this assumption is rarely tested. We have performed a detailed empirical analysis of the tree-like properties of realistic informatics graphs using two very different notions of tree-likeness: Gromov's d-hyperbolicity, which is a notion from geometric group theory that measures how tree-like a graph is in terms of its metric structure, and tree decompositions, tools from structural graph theory which measure how tree-like a graph is in terms of its cut structure. Although realistic informatics graphs often do not have meaningful tree-like structure when viewed with respect to the simplest and most popular metrics, e.g., the value of d or the tree width, we conclude that many such graphs do have meaningful tree-like structure when viewed with respect to more refined metrics, e.g., a size-resolved notion of d or a closer analysis of the tree decompositions. We also show that, although these two rigorous notions of tree-likeness capture very different tree-like structures in worst-case, for realistic informatics graphs they empirically identify surprisingly similar structure. We interpret this tree-like structure in terms of the recently-characterized "nested core-periphery" property of large informatics graphs, and we show that the fast and scalable k-core heuristic can be used to identify this tree-like structure.
Article
Let us start with three equivalent definitions of hyperbolic groups. First observe that for every finitely presented group Γ there exists a smooth bounded (i.e. bounded by a smooth hypersurface) connected domain V ⊂ ℝn for every n ≥ 5. such that the fundamental group π1(V) is isomorphic to Γ. A standard example of such a V is obtained as follows. Fix a finite presentation of Γ and let P be the 2-dimensional cell complex whose 1-cells correspond in the usual way to the generators and the 2-cells to the relations in Γ, such that π1(P) = Γ. Then embed P into ℝ5 and take a regular neighborhood of P ⊂ ℝ5 for V.
Article
For the purposes of the present discussion, the term structure will be used in the following non-rigorous sense: A set of phonemes or a set of data is structured in respect to some feature, to the extent that we can form in terms of that feature some organized system of statements which describes the members of the set and their interrelations (at least up to some limit of complexity). In this sense, language can be structured in respect to various independent features. And whether it is structured (to more than a trivial extent) in respect to, say, regular historical change, social intercourse, meaning, or distribution — or to what extent it is structured in any of these respects — is a matter decidable by investigation. Here we will discuss how each language can be described in terms of a distributional structure, i.e. in terms of the occurrence of parts (ultimately sounds) relative to other parts, and how this description is complete without intrusion of other features such as history or meaning. It goes without saying that other studies of language — historical, psychological, etc.—are also possible, both in relation to distributional structure and independently of it.
Conference Paper
We propose a scalable and reliable point-to-point routing algorithm for ad hoc wireless networks and sensor-nets. Our algorithm assigns to each node of the network a virtual coordinate in the hyperbolic plane, and performs greedy geographic routing with respect to these virtual coordinates. Unlike other proposed greedy routing algorithms based on virtual coordinates, our embedding guarantees that the greedy algorithm is always successful in finding a route to the destination, if such a route exists. We describe a distributed algorithm for computing each node's virtual coordinates in the hyperbolic plane, and for greedily routing packets to a destination point in the hyperbolic plane. (This destination may be the address of another node of the network, or it may be an address associated to a piece of content in a Distributed Hash Table. In the latter case we prove that the greedy routing strategy makes a consistent choice of the node responsible for the address, irrespective of the source address of the request.) We evaluate the resulting algorithm in terms of both path stretch and node congestion.
Article
We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of connections follow power laws that indicate a scale-free pattern of connectivity, with most nodes having relatively few connections joined together through a small number of hubs with many connections. These regularities have also been found in certain other complex natural networks, such as the World Wide Web, but they are not consistent with many conventional models of semantic organization, based on inheritance hierarchies, arbitrarily structured networks, or high-dimensional vector spaces. We propose that these structures reflect the mechanisms by which semantic networks grow. We describe a simple model for semantic growth, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node. This model generates appropriate small-world statistics and power-law connectivity distributions, and it also suggests one possible mechanistic basis for the effects of learning history variables (age of acquisition, usage frequency) on behavioral performance in semantic processing tasks.
Article
The Internet infrastructure is severely stressed. Rapidly growing overheads associated with the primary function of the Internet-routing information packets between any two computers in the world-cause concerns among Internet experts that the existing Internet routing architecture may not sustain even another decade. In this paper, we present a method to map the Internet to a hyperbolic space. Guided by a constructed map, which we release with this paper, Internet routing exhibits scaling properties that are theoretically close to the best possible, thus resolving serious scaling limitations that the Internet faces today. Besides this immediate practical viability, our network mapping method can provide a different perspective on the community structure in complex networks.
Article
Many real networks in nature and society share two generic properties: they are scale-free and they display a high degree of clustering. We show that these two features are the consequence of a hierarchical organization, implying that small groups of nodes organize in a hierarchical manner into increasingly large groups, while maintaining a scale-free topology. In hierarchical networks, the degree of clustering characterizing the different groups follows a strict scaling law, which can be used to identify the presence of a hierarchical organization in real networks. We find that several real networks, such as the Worldwideweb, actor network, the Internet at the domain level, and the semantic web obey this scaling law, indicating that hierarchy is a fundamental characteristic of many complex systems.
Article
Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, in which vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases the groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks or genetic regulatory networks) or communities in social networks. Here we present a general technique for inferring hierarchical structure from network data and show that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partly known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.
Article
Network models are widely used to represent relational information among interacting units. In studies of social networks, recent emphasis has been placed on random graph models where the nodes usually represent individual social actors and the edges represent the presence of a speci ed relation between actors. We develop a class of models where the probability of a relation between actors depends on the positions of individuals in an unobserved social space." Inference for the social space is developed within a maximum likelihood and Bayesian framework, and Markov chain Monte Carlo procedures are proposed for making inference on latent positions and the eects of observed covariates. We present analyses of three standard datasets from the social networks literature, and compare the method to an alternative stochastic blockmodeling approach. In addition to improving upon model t, our method provides a visual and interpretable model-based spatial representation of social relationships, and improves upon existing methods by allowing the statistical uncertainty in the social space to be quanti ed and graphically represented. KEY WORDS: Network data; latent position model; conditional independence model. 1
Article
When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural graadient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the space of linear dynamical systems (for blind source deconvolution). The dynamical behaviour of natural gradient online learning is analyzed and is proved to be Fischer efficient, implying that it has assymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptrons, might disappear or might not be so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed.
On approximate reasoning capabilities of low-rank vector spaces
  • Guillaume Bouchard
  • Sameer Singh
  • Theo Trouillon
Guillaume Bouchard, Sameer Singh, and Theo Trouillon. On approximate reasoning capabilities of low-rank vector spaces. 2015.
Holographic embeddings of knowledge graphs
  • Maximilian Nickel
  • Lorenzo Rosasco
  • Tomaso A Poggio
Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. Holographic embeddings of knowledge graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 1955-1961, 2016.