# Tree–Tree Matrices and Other Combinatorial Problems from Taxonomy

### Full-text

Michiel Hazewinkel, Sep 26, 2014 Available from:- [Show abstract] [Hide abstract]

**ABSTRACT:**.. This paper is concerned with information retrieval from large scientific data bases of scientific literature. The central idea is to define metrics on the information space of terms (key phrases) and the information space of documents. This leads naturally to the idea of a weak enriched thesaurus and the semiautomatic generation of such tools. Quite a large number of unsolved (mathematical) problems turn up in this context. Some of these are described and discussed. They mostly have to do with classification and clustering issues. Mathematics subject classification 1991: 68P20 Key words & phrases: information space, discrete metric space, Lipshits distance, clustering, single link clustering, information retrieval, data base, local search, neighborhood search, classification schemes, hierarchical schemes, classification trees, key phrases, co-citation analysis, thesaurus, weak thesaurus Note. The present text is a write up of a talk presented at the workshop on "Metadata: qualify... -
##### Article: A New Cluster Algorithm for Graphs

[Show abstract] [Hide abstract]

**ABSTRACT:**A new cluster algorithm for graphs called the Markov Cluster algorithm (MCL algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let G be such a graph. The MCL algorithm simulates flow in G by first identifying G in a canonical way with a Markov graph G 1 . Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs Gi . The expansion step is done by computing higher step transition probabilities (TP 's), the contraction step creates a new Markov graph by favouring high TP 's and demoting low TP 's in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph G. Overlap is limited to cases where the input gr... - [Show abstract] [Hide abstract]

**ABSTRACT:**A discrete stochastic uncoupling process for finite spaces is introduced, called the Markov Cluster Process. The process takes a stochastic matrix as input, and then alternates flow expansion and flow inflation, each step defining a stochastic matrix in terms of the previous one. Flow expansion corresponds with taking the k th power of a stochastic matrix, where k 2 IN . Flow inflation corresponds with a parametrized operator Gamma r , r 0, which maps the set of (column) stochastic matrices onto itself. The image Gamma r M is obtained by raising each entry in M to the r th power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit which is idempotent under both matrix multiplication and inflation, with quadratic convergence around the limit points. The limit is in general extremely sparse and the number of components of its associated graph may be larger than the number associated with the input matrix. This uncoupli...