Figure 3 - uploaded by Romain Azaïs
Content may be subject to copyright.
Densities of  

Densities of  

Source publication
Article
Full-text available
Tree-structured data naturally appear in various fields, particularly in biology where plants and blood vessels may be described by trees, but also in computer science because XML documents form a tree structure. This paper is devoted to the estimation of the relative scale of ordered trees that share the same layout. The theoretical study is achie...

Context in source publication

Context 1
... difference in the dispersions is quite apparent in Figure 3 where the densities of 2 π ∆ ∞ and Λ ∞ have been displayed. Consequently, one may expect better results in terms of dispersion from our approach. ...

Similar publications

Article
Full-text available
In this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts -- the acquisition of entities and the...

Citations

... Interpreting 0 as +1 and 1 as −1, we can read this sequence as an excursion (i.e. a walk that comes back to the origin) in Z, starting at 0. This walk also draws the graph of a function, which is called the Harris path of the tree [50,51] [50]: Aldous (1993), 'The continuum random tree III' In the case of unordered trees, this tuple is not unique (except in pathological cases). In the example of Figure 2.3, if we swap the nodes of depth 1 to place the leaf between its two siblings (or after them), we obtain a different tuple for the tree. ...
Thesis
Tree data appear naturally in many scientific domains. Their intrinsically non-Euclidean nature and the combinatorial explosion phenomenon make their analysis delicate. In this thesis, we focus on three approaches to compare trees, notably through the prism of a lossless compression technique of trees into directed acyclic graphs. First, concerning tree isomorphism, we consider an extension of the classical definition to labeled trees, which requires that trees are identical up to label rewriting. This problem is as hard as graph isomorphism, and we have developed an algorithm that drastically reduces the size of the solution search space which is then explored with a backtracking strategy. When two trees are different, we may try to find common substructures. If this question has already been addressed for subtrees, we are interested in a larger problem, namely finding sets of subtrees appearing simultaneously. This leads us to consider forest enumeration, for which we propose a reverse search algorithm that constructs an enumeration tree whose branching factor is linear. Finally, from a list of common substructures, one can build a convolution kernel allowing to tackle classification problems. We consider the subtree kernel from the literature, and build an algorithm that explicitly enumerates subtrees (unlike the original method). In particular, our approach allows us to parameterize the kernel more finely, significantly improving its classification abilities.
... treex offers converters to the standard encoding of nested brackets (see for instance (Aho, Hopcroft, & Ullman, 1974)) and L-strings as manipulated by L-Py, a simulation framework for modeling plant architectures (Boudon, Pradal, Cokelaer, Prusinkiewicz, & Godin, 2012). Numerical experiments and/or figures of recent publications (Azais, Genadot, & Henry, 2019, Azais (2017, ) have been made using the current or previous versions of treex. Furthermore, ongoing academic projects on the development and implementation of supervised classification methods for tree data, the study of lineage trees, as well as investigations on plant modeling, make intensive use of structures and algorithms implemented in treex. ...
... Several approaches have been considered in the literature to deal with this kind of data: edit distances between unordered or ordered trees (see [6] and the references therein), coding processes for ordered trees [24], with a special focus on conditioned Galton-Watson trees [3,5]. One can also mention the approach developed in [29]. ...
... By virtue of the previous lemma, one can derive the following result on the quantity ∆ i x defined by (3). ...
Preprint
Full-text available
Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficul per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through two real data classification problems the great efficiency of our approach, in particular with respect to the ones considered in the literature, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.