Tree-tree matrices and other combinatorial problems from taxonomy

European Journal of Combinatorics (Impact Factor: 0.66). 02/1996; 17(2-3):191-208. DOI: 10.1006/eujc.1996.0017
Source: DBLP

ABSTRACT Let A be a bipartite graph between two sets D and T. Then A defines by Hamming distance, metrics on both T and D. The question is studied which pairs of metric spaces can arise this way. If both spaces are trivial the matrix A comes from a Hadamard matrix or is a BIBD. The second question studied is in what ways A can be used to transfer (classification) information from one of the two sets to the other. These problems find their origin in mathematical taxonomy. Mathematics subject classification 1991: 05B20, 05B25, 05C05, 54E35, 62H30, 68T10 Key words & phrases: bipartite graph, Hamming distance, tree metric space, tree, mathematical taxonomy, design, BIBD, generalized projective space, Hausdorff distance, Urysohn distance, Lipshits distance, cocitation analysis, clustering, ultrametric, single link clustering, linked design, balanced design 1.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In [6] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in [8] establish an intrinsic relationship between the corresponding algebraic process (MCL process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL algorithm and the heuristic on which the...
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new cluster algorithm for graphs called the Markov Cluster algorithm (MCL algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let G be such a graph. The MCL algorithm simulates flow in G by first identifying G in a canonical way with a Markov graph G 1 . Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs Gi . The expansion step is done by computing higher step transition probabilities (TP 's), the contraction step creates a new Markov graph by favouring high TP 's and demoting low TP 's in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph G. Overlap is limited to cases where the input gr...
  • [Show abstract] [Hide abstract]
    ABSTRACT: A discrete stochastic uncoupling process for finite spaces is introduced, called the Markov Cluster Process. The process takes a stochastic matrix as input, and then alternates flow expansion and flow inflation, each step defining a stochastic matrix in terms of the previous one. Flow expansion corresponds with taking the k th power of a stochastic matrix, where k 2 IN . Flow inflation corresponds with a parametrized operator Gamma r , r 0, which maps the set of (column) stochastic matrices onto itself. The image Gamma r M is obtained by raising each entry in M to the r th power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit which is idempotent under both matrix multiplication and inflation, with quadratic convergence around the limit points. The limit is in general extremely sparse and the number of components of its associated graph may be larger than the number associated with the input matrix. This uncoupli...

Full-text (2 Sources)

Available from