Tree–Tree Matrices and Other Combinatorial Problems from Taxonomy

European Journal of Combinatorics (Impact Factor: 0.65). 02/1996; 17(2-3):191-208. DOI: 10.1006/eujc.1996.0017
Source: DBLP


Let A be a bipartite graph between two sets D and T. Then A defines by Hamming distance, metrics on both T and D. The question is studied which pairs of metric spaces can arise this way. If both spaces are trivial the matrix A comes from a Hadamard matrix or is a BIBD. The second question studied is in what ways A can be used to transfer (classification) information from one of the two sets to the other. These problems find their origin in mathematical taxonomy. Mathematics subject classification 1991: 05B20, 05B25, 05C05, 54E35, 62H30, 68T10 Key words & phrases: bipartite graph, Hamming distance, tree metric space, tree, mathematical taxonomy, design, BIBD, generalized projective space, Hausdorff distance, Urysohn distance, Lipshits distance, cocitation analysis, clustering, ultrametric, single link clustering, linked design, balanced design 1.

Download full-text


Available from: Michiel Hazewinkel, Sep 26, 2014
7 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: .. This paper is concerned with information retrieval from large scientific data bases of scientific literature. The central idea is to define metrics on the information space of terms (key phrases) and the information space of documents. This leads naturally to the idea of a weak enriched thesaurus and the semiautomatic generation of such tools. Quite a large number of unsolved (mathematical) problems turn up in this context. Some of these are described and discussed. They mostly have to do with classification and clustering issues. Mathematics subject classification 1991: 68P20 Key words & phrases: information space, discrete metric space, Lipshits distance, clustering, single link clustering, information retrieval, data base, local search, neighborhood search, classification schemes, hierarchical schemes, classification trees, key phrases, co-citation analysis, thesaurus, weak thesaurus Note. The present text is a write up of a talk presented at the workshop on "Metadata: qualify...
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new cluster algorithm for graphs called the Markov Cluster algorithm (MCL algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let G be such a graph. The MCL algorithm simulates flow in G by first identifying G in a canonical way with a Markov graph G 1 . Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs Gi . The expansion step is done by computing higher step transition probabilities (TP 's), the contraction step creates a new Markov graph by favouring high TP 's and demoting low TP 's in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph G. Overlap is limited to cases where the input gr...
  • [Show abstract] [Hide abstract]
    ABSTRACT: In [6] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in [8] establish an intrinsic relationship between the corresponding algebraic process (MCL process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL algorithm and the heuristic on which the...
Show more