# Tree-tree matrices and other combinatorial problems from taxonomy

**ABSTRACT** Let A be a bipartite graph between two sets D and T. Then A defines by Hamming distance, metrics on both T and D. The question is studied which pairs of metric spaces can arise this way. If both spaces are trivial the matrix A comes from a Hadamard matrix or is a BIBD. The second question studied is in what ways A can be used to transfer (classification) information from one of the two sets to the other. These problems find their origin in mathematical taxonomy. Mathematics subject classification 1991: 05B20, 05B25, 05C05, 54E35, 62H30, 68T10 Key words & phrases: bipartite graph, Hamming distance, tree metric space, tree, mathematical taxonomy, design, BIBD, generalized projective space, Hausdorff distance, Urysohn distance, Lipshits distance, cocitation analysis, clustering, ultrametric, single link clustering, linked design, balanced design 1.

**0**Bookmarks

**·**

**19**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**.. This paper is concerned with information retrieval from large scientific data bases of scientific literature. The central idea is to define metrics on the information space of terms (key phrases) and the information space of documents. This leads naturally to the idea of a weak enriched thesaurus and the semiautomatic generation of such tools. Quite a large number of unsolved (mathematical) problems turn up in this context. Some of these are described and discussed. They mostly have to do with classification and clustering issues. Mathematics subject classification 1991: 68P20 Key words & phrases: information space, discrete metric space, Lipshits distance, clustering, single link clustering, information retrieval, data base, local search, neighborhood search, classification schemes, hierarchical schemes, classification trees, key phrases, co-citation analysis, thesaurus, weak thesaurus Note. The present text is a write up of a talk presented at the workshop on "Metadata: qualify...05/1998; - SourceAvailable from: psu.edu
##### Article: A New Cluster Algorithm for Graphs

[Show abstract] [Hide abstract]

**ABSTRACT:**A new cluster algorithm for graphs called the Markov Cluster algorithm (MCL algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let G be such a graph. The MCL algorithm simulates flow in G by first identifying G in a canonical way with a Markov graph G 1 . Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs Gi . The expansion step is done by computing higher step transition probabilities (TP 's), the contraction step creates a new Markov graph by favouring high TP 's and demoting low TP 's in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph G. Overlap is limited to cases where the input gr...04/1999; - [Show abstract] [Hide abstract]

**ABSTRACT:**A discrete stochastic uncoupling process for finite spaces is introduced, called the Markov Cluster Process. The process takes a stochastic matrix as input, and then alternates flow expansion and flow inflation, each step defining a stochastic matrix in terms of the previous one. Flow expansion corresponds with taking the k th power of a stochastic matrix, where k 2 IN . Flow inflation corresponds with a parametrized operator Gamma r , r 0, which maps the set of (column) stochastic matrices onto itself. The image Gamma r M is obtained by raising each entry in M to the r th power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit which is idempotent under both matrix multiplication and inflation, with quadratic convergence around the limit points. The limit is in general extremely sparse and the number of components of its associated graph may be larger than the number associated with the input matrix. This uncoupli...06/2000;

Page 1

Europ . J . Combinatorics (1996) 17 , 191 – 208

Tree – Tree Matrices and Other Combinatorial Problems

from Taxonomy

M I C H I E L H A Z E W I N K E L

Let A be a bipartite graph between two sets D and T . Then A defines , via Hamming

distance , metrics on both T and D . The question is studied which pairs of metric spaces can

arise this way . If both spaces are trivial , the matrix A comes from a Hadamard matrix or is a

BIBD . The second question studied is how A can be used to transfer (classification)

information from one of the two sets to the other . These problems find their origin in

mathematical taxonomy .

÷ 1996 Academic Press Limited

1 . I N T R O D U C T I O N

A great deal of the literature in mathematical taxonomy focuses on clustering ; i . e .

summarizing the information present in a metric or dissimilarity on a set X by means of

a classification tree or something similar .

Here , we focus directly on the situation that one finds in the taxonomic problems of

scientific disciplines . Often , the data are in the form of a collection of documents and a

collection of key words and key phrases that is supposed to be suf ficiently rich to

describe (up to a point) the scientific field in question . Here , I am not concerned with

how such a control list or thesaurus is generated .

The data are thus in the form of a bipartite graph A (or , equivalently , a relation)

between two sets , a set D (of documents) and a set T (of terms) . The bipartite graph A

tells us which terms occur in which documents .

These data can be used to define a metric space structure on both T and D by means

of Hamming distance—the distance between two terms is the number of documents in

which one term occurs and the other not . A first question that arises is what pairs of

discrete metric spaces can arise this way . For trivial metric space structures on both T

and D it turns out that A must be very regular (a Hadamard matrix , a Hadamard

matrix minus one row or column , or a symmetric BIBD) . Section 2 below is devoted to

some results in this direction .

It arises frequently in practice that on one of the spaces T or D there is available

metric information coming from other sources . For instance , in the case of a body of

scientific literature , co-citation analysis can be used to define ‘research clusters’ or

‘research fronts’ of strongly linked clusters of documents . The question then arises how

to transfer such information from one of the sets , in this case D , to the other by means

of the bipartite graph between them . This matter is discussed in Section 3 .

Finally , in Section 4 some recent ideas and results concerning metrics on the space of

all metrics on a given finite set are summarized . These things are fundamental for

addressing the question of finding , for instance , the best approximative ultrametric to a

given metric or dissimilarity .

191

0195-6698 / 96 / 020191 ? 18 $18 . 00 / 0

÷ 1996 Academic Press Limited

Page 2

M . Hazewinkel

192

2 . T H E T R E E – T R E E P R O B L E M

2 . 1 . Definition of the problem .

available data a bipartite graph A between terms and documents . Or , equivalently , A is

a 0 – 1 matrix with the set of terms as column indices and the set of documents as row

indices . A 1 at spot ( i , j ) means that the term j occurs in the document i . These data

define two metric spaces as follows :

(i) The column space of A , cs ( A ) ? T ( A ) . As a set , this is the set of terms . The

distance between two terms t , t ? is the Hamming distance between the corresponding

columns , i . e . the number of row indices with dif ferent entries at spots t and t ? .

(ii) The row space rs ( A ) ? D ( A ) of A . As a set , this is the set of documents . The

distance between two documents d , d ? is the Hamming distance between the

corresponding rows , i . e . the number of column indices with dif ferent entries in rows d

and d ? .

This leads immediately to a number of natural basic questions , such as :

(i) Which metric spaces can arise as a T ( A ) or a D ( A )?

(ii) To what extent is A determined by D ( A ) and T ( A )?

(iii) Which pairs of metric spaces D , T can arise from a 0 – 1 matrix A ?

In this paper I concentrate on the last question . Trees and classification schemes

(which are special kinds of treess) are ubiquitous in (mathematical taxonomy) . Thus it

is important and natural to start with the question when both the column and row

spaces of a 0 – 1 matrix are trees or related to trees .

As indicated above , we shall take as the basic

2 . 1 . 1 . D E F I N I T I O N S .

unique path between any two given vertices . A leaf of a tree is a vertex with just one

edge incident with it . An edge weighted tree is a tree with each edge labelled with a real

number ? 0 . An example is shown in Figure 1 . The distance between two vertices of an

edge weighted tree is the sum of the weights of the edges occurring in the unique path

between those vertices . This defines a metric on the set of vertices (and on any subset ,

particularly the set of leafs) . A rooted tree is a tree with a special , selected vertex called

the root . An hierarchical tree is a rooted edge weighted tree such that each leaf has the

same distance to the root .

Figure 1 is not a hierarchical tree but Figures 2 and 3 are . In these figures and those

below an unlabelled edge is supposed to have weight 1 . An hierarchical tree defines an

ultrametric on its set of leaves : and , inversely , [6 , 11] , every finite ultrametric space

arises that way . By inserting , if necessary , extra vertices of valency two (as was done in

Figure 3) , each ultrametric space arises as the space of leafs of some ‘hierarchically

organized’ tree like the one in Figure 3 in which , for each vertex , all the edges pointing

towards the leafs have the same weight .

It is rather easy to see that each edge weighted tree with integer weights can be

realized as a T ( A ) (or a D ( A )) . Things are rather dif ferent if both T ( A ) and D ( A ) are

A tree is an unoriented connected graph such that there is a

1

2

3

22

3

1

32

F IGURE 1

Page 3

Tree – tree matrices

193

2

F IGURE 2 F IGURE 3

required to be trees or tree-like (definition below) . This appears to be quite dif ficult to

realize . In particular , it seems dif ficult to realize a pair of spaces that are not (nearly)

isomorphic . This is , roughly , what I like to call the tree – tree problem . To make the

problem more precise , let us make the following definition .

2 . 1 . 2 . D E F I N I T I O N .

subspace of the vertex metric space defined by an edge weighted tree .

A finite metric space ( X , m ) is tree - like if it is isometric to a

2 . 1 . 3 . T R E E – T R E E P R O B L E M . Which pairs of tree-like spaces can be realized by a 0 – 1

matrix?

I view these 0 – 1 matrices as some sort of generalized hierarchical block designs . The

reason for that is Theorem 2 . 2 . 5 below .

Related to the tree – tree problem is the problem of finding a good characterization of

those matrices for which both the column metric space and the row metric space are

tree-like .

Of course , tree-like metric spaces are characterized by the so-called four-point

condition .

2 . 1 . 4 . F O U R -P O I N T C O N D I T I O N .

not necessarily distinct four points a 1 , a 2 , b 1 , b 2 ? X ,

A finite metric space ( X , m ) is tree-like if f , for all

m ( a 1 , a 2 ) ? m ( b 1 , b 2 ) ? max ? m ( a 1 , b 1 ) ? m ( a 2 , b 2 ) , m ( a 1 , b 2 ) ? m ( a 2 , b 1 ) ? .

(1)

This gives a necessary and suf ficient condition for a 0 – 1 matrix A to yield a pair of

tree-like spaces—but certainly a very inelegant and unsatisfying one .

2 . 1 . 5 . U L T R A M E T R I C T R E E – T R E E P R O B L E M . Which pairs of ultrametric spaces can be

realized by a 0 – 1 matrix?

2 . 1 . 6 . C O M P L E T E T R E E – T R E E P R O B L E M . Which (complete) pairs of edge weighted

trees can be realized by a 0 – 1 matrix?

2 . 2 . Tri ? ial tree – tri ? ial tree matrices and BIBDs .

Let us start with some very simple

Page 4

M . Hazewinkel

194

examples in which the column and row metric spaces are ‘trivial’ in the sense of the

definition below .

2 . 2 . 1 . D E F I N I T I O N .

there is a positive number a such that

A tri ? ial discrete metric space ( X , m ) is a metric space such that

m ( x , y ) ? a

for all x ? y in X

(and of course m ( x , x ) ? 0 for all x ? X ) .

2 . 2 . 2 . E X A M P L E : H A D A M A R D M A T R I C E S .

with entries 1 , ? 1 such that

A Hadamard matrix is an n ? n matrix H

HH T ? nI n .

It follows that also HH T ? nI n (and that n is even , n ? 2 k ) . It is immediate from

these two properties that for each two rows there are precisely k entries that are equal

and k entries that are unequal—and similarly for the columns . Let A be the matrix

obtained from H by replacing each ? 1 with 0 . Then both the column and the row space

of A are the trivial metric space of n ? 2 k points with distance k .

2 . 2 . 3 . E X A M P L E : H A D A M A R D M A T R I C E S W I T H O N E R O W O R C O L U M N D E L E T E D .

let H be a Hadamard matrix for which one row or column consists entirely of ? 1’s or

entirely of ? 1’s . Delete that row or column . Again replace ? 1 with 0 everywhere . The

result is a 0 – 1 matrix with trivial column and trivial row space of sizes n and n ? 1 and

distance n / 2 .

Not every Hadamard matrix has such a column or row . However , if D is diagonal

with each diagonal element equal to 1 or ? 1 , and if H is an Hadamard matrix , then so

are HD and DH . So it is easy to modify a Hadamard matrix so as to obtain one with

such a column or row .

Now

2 . 2 . 4 . E X A M P L E : S Y M M E T R I C BIBD S .

a zero – one matrix A such that each row has the same number , r , of 1’s each column

has the same number , s , of 1’s , and further , for each pair of column indices i ? j there

are precisely ? rows which have a 1 at both locations i and j . This last condition is the

same as saying that each two dif ferent columns have ? common 1’s .

A BIBD is symmetric if A is square . It then follows that r ? s and that each two

distinct rows also have ? common 1’s (see , e . g ., [3]) .

It follows immediately that the row space and the column space of a symmetric

BIBD are trivial metric spaces with n points and distance 2( r ? ? ) .

A balanced incomplete block design (BIBD) is

2 . 2 . 5 . T H E O R E M . Let A be an m ? n zero – one matrix such that both the column space

and the row space are tri ? ial . Then A is one of the Examples 2 . 2 . 2 – 2 . 2 . 4 ; i .e . A ‘ is ’ a

Hadamard matrix , a Hadamard matrix with one constant row or column deleted , or it is

a symmetric BIBD .

Let B be the matrix obtained from A by replacing each 0 with ? 1 . Then the trivial

column and row space condition on A translates for B into the statement that the rows

Page 5

Tree – tree matrices

195

of B form a system of m length n vectors , all of whom make the same angle with one

another , and the columns form a system of n vectors of length m that also all make the

same angle with one another .

2 . 2 . 6 . P R O O F O F T H E O R E M 2 . 2 . 5 .

replacing each 0 with ? 1 . Let d be the distance between each two distinct rows of B (or

A ) and e the distance between each two distinct columns . Then

Let B be the m ? n matrix obtained from A by

BB T ? ?

n

p

? ? ?

p

p

n

? ? ?

. . .

. . .

? ? ?

? ? ?

p

p

? ? ?

p

n ?

, p ? n ? 2 d ,

(2)

B T B ? ?

m

q

? ? ?

q

q

m

? ? ?

. . .

. . .

? ? ?

? ? ?

q

q

? ? ?

q

m ?

, q ? m ? 2 e .

(3)

Interchanging rows and columns if necessary , we can assume that m ? n . By the lemma

below , the m ? m matrix BB T is non-singular except when p ? n or n ? ? ( m ? 1) p .

The first case cannot happen because d ? 0 . The second case can happen . Then ,

because m ? n , n ? m ? 1 and p ? ? 1 . Now , add one column of 1’s (or ? 1’s) to B to

obtain an m ? m matrix B ? . It follows that B ? is a Hadamard matrix . Therefore , in this

case , we are dealing with an instance of Example 2 . 2 . 3 .

Continuing , we can assume that BB T is non-singular and hence that

n ? m .

(4)

Let c 1 , c 2 , . . . , c n be the column sums of B , and let r 1 , r 2 , . . . , r n be the row sums of B .

Multiply (2) with B on the right , to obtain

BB T B ? ( n ? p ) B ? p ?

c 1

? ? ?

c 1

. . .

. . .

c n

? ? ?

c n ? ,

(5)

and , using n ? m , multiply (3) on the left with B , to obtain

BB T B ? ( n ? q ) B ? q ?

r 1

? ? ?

r n

. . .

. . .

r 1

? ? ?

r n ? .

(6)

Subtracting (6) from (5) , we see that the matrix ( q ? p ) B is equal to a matrix of rank

? 2 . If n ? m ? 3 this is only possible if p ? q and hence e ? d , because B is invertible .

Now , there are two cases :

(i) Case 1 ; p ? q ? 0 . Then , B is a Hadamard matrix by (2) .

(ii) Case 2 ; p ? q ? 0 . Then , it follows from (5) and (6) that

c 1 ? . . . ? c n ? r 1 ? . . . ? r n ,

so that A is a symmetric BIBD with r ? ( n ? c 1 ) / 2 entries 1 in each column and row

and ? ? ( n ? c 1 ? d ) / 2 .

Page 6

M . Hazewinkel

196

This proves the theorem for n , m ? 3 ; it is trivial to deal with the remaining cases .

?

2 . 2 . 7 . L E M M A .

The determinant of the m ? m matrix in (2) is equal to

det ?

n

p

? ? ?

p

p

n

? ? ?

. . .

. . .

? ? ?

? ? ?

p

p

? ? ?

p

n ?

? ( n ? p ) m ? 1 ( n ? ( m ? 1) p ) .

P R O O F . The proof is straightforward .

?

Using similar but more complicated arguments , one can show that if A is an m ? n

zero – one matrix such that each two distinct rows have exactly ? ones in common and

each two distinct columns have exactly ? ones in common , then A is a symmetric

BIBD . Interpreting the column indices of A as points and the row indices of A as lines ,

this gives the following [8] .

2 . 2 . 8 . T H E O R E M .

lines . Let there be n points and m lines . Suppose that lines distinguish points ( i .e . no two

distinct points ha ? e the same set of lines through them ) and points distinguish lines , and

that :

(i) each two distinct lines meet in ? points ; and

(ii) through each pair of distinct points there pass ? lines .

Then n ? m and ? ? ? , each line has r points and through each point there pass r lines

( where r ( r ? 1) ? ? ( n ? 1)) .

Let X be a finite set ( of points ) , with a system of subsets called

This is a special case of a more general result of Ro ¨ hmel [16] ; see also [3 , p . 102f f . ] .

2 . 3 . More examples .

variety of examples of tree – tree matrices can be constructed . Here is a small selection .

In the illustrations below (and above) , the black nodes in a tree make up the tree-like

space that is being realized .

Using the various symmetric BIBDs as main building blocks , a

2 . 3 . 1 . E X A M P L E .

A ?

1

1

1

1

0

0

1

1

1

0

1

0

1

1

1

0

0

1

1

0

0

1

1

1

0

1

0

1

1

1

0

0

1

1

1

1

, cs ( A ) ? rs ( A ) ?

AB

F IGURE 4

Page 7

Tree – tree matrices

197

2 . 3 . 2 . E X A M P L E .

A ?

1

1

1

1

1

1

1

1

1

1

1

1

0

0

1

1

1

1

0

1

0

1

1

1

1

0

0

1

1

1

0

0

1

1

1

1

0

1

0

1

1

1

1

0

0

1

1

1

1

, cs ( A ) ? rs ( A ) ?

AB

2 . 3 . 3 . E X A M P L E .

deleting the top row . Then the row space of A ? is equal to the space of Figure 4 , while

the column space is that of Figure 5 .

Let A ? be the matrix obtained from that of Example 2 . 3 . 2 by

2 . 3 . 4 . E X A M P L E .

denote the n ? n unit matrix , and let 0 denote whatever size matrix of zeros is

appropriate . Then :

Let E n denote the n ? n matrix with every entry equal to 1 , let I n

A ? ?

0

0

0

0

E 3

I 3

0

I 3

I 3 ? , cs ( A ) ? rs ( A ) ?

2 . 3 . 5 . E X A M P L E .

A ? ?

0

E 3

I 3

I 3

E 3 ?

E 3

I 3

I 3

E 3

0 ?

, cs ( A ) ? rs ( A ) ?

2 . 3 . 6 . E X A M P L E .

A ? ?

1

0

1

1

0

1

1

1

0

0

1

1

0

0

0

1

0

0

1

0 ?

, cs ( A ) ?

, rs ( A ) ?

F IGURE 5

F IGURE 6

22

F IGURE 7

F IGURE 8

1/21/2

F IGURE 9

Page 8

M . Hazewinkel

198

2 . 3 . 7 . R E M A R K .

with a 4 ? 4 matrix . Here , as always , unlabelled edges have weight 1 .

It is not possible to realize the tree-like space depicted in Figure 10

F IGURE 10

2 . 3 . 8 . E X A M P L E .

A

2 . 3 . 9 . E X A M P L E .

A ?

1

0

0

1

1

0

1

0

1

1

0

0

1

1

1

0

0

0

1

1

0

0

0

0

1

0

0

0

1

0

, cs ( A ) ?

, rs ( A ) ?

B

A ? ?

I 4

E 4

E 4

I 4

0 ?

0

I 3

E 3

E 3

I 3 ?

, cs ( A ) ? rs ( A ) ?

2 . 3 . 10 . R E M A R K .

its leafs is equal to a , a tree of a levels . Using similar techniques as in the proof of

Theorem 2 . 2 . 5 , there is a great deal that one can say about the zero – one matrices that

produce tree-like spaces of level ? 2 for their row and column spaces . I intend to return

to this in a future paper .

Call a rooted tree for which the number of edges towards any of

2 . 4 . Tree - like spaces of unbounded height .

that yields trees and tree-like spaces of any number of levels .

There is a systematic iterative construction

2 . 4 . 1 . T H E Z E R O C O N S T R U C T I O N .

suppose that :

(i) all columns have distance ? d c to one another ;

(ii) all rows have distance ? d r to one another ;

(iii) the rows of A all have precisely w r ones ;

(iv) the columns of A have precisely w c ones ;

(v) 2 w r ? d r , n ? w r ? 0 , and 2 w c ? d c , 0 ? w c ? m ; and

(vi) the row space of A and the column space of A are both tree-like .

Now consider the k ? k block matrices

Let A be a zero – one matrix of size m ? n , and

A 0

k ? ?

A

0

? ? ?

0

0

A

? ? ?

? ? ?

? ? ?

? ? ?

? ? ?

0

0

? ? ?

0

A ?

.

(7)

F IGURE 11

F IGURE 12

F IGURE 13

Page 9

Tree – tree matrices

199

Tr(A)

Tr(A)

Tr(A)

. . .

wr–dr/2

wr–dr/2

Tc(A)

. . .

wc-dc/2

wc-dc/2

Tc(A)

Tc(A)

F IGURE 14 F IGURE 15

Then , if Tr ( A ) denotes the row tree-like space of A , and Tc ( A ) is the column tree-like

space , then the row space and column space of A 0

Note that

d r ( A 0

d c ( A 0

As a rule , if A is just an arbitrary 0 – 1 matrix with tree-like column and row spaces this

construction gives a 0 – 1 matrix for which neither the row space , nor the column space

is tree-like .

k look like Figures 14 and 15 .

k ) ? 2 w r ,

k ) ? 2 w c , w r ( A 0

k ) ? w r , w c ( A 0

k ) ? w c .

(8)

2 . 4 . 2 . T H E O N E C O N S T R U C T I O N .

ones instead of zeros in (7) . Let A be as before in Section 2 . 4 . 1 , except that the

conditions (v) are replaced by

(v ? ) 2( n ? w r ) ? d r , n ? w r ? 0 , and 2( n ? w c ) ? d c , 0 ? w c ? m .

In this case , consider the k ? k block matrices

k ? ?

E

? ? ?

where E is the m ? n matrix consisting completely of ones . Then , the row space and

column space of A 1

replaced by n ? w r ? d r / 2 and n ? w c ? d c / 2 , respectively .

Furthermore ,

d r ( A 1

w r ( A 1

A very similar construction can be carried out with

A 1

A

E

? ? ?

E

A

? ? ?

? ? ?

? ? ?

? ? ?

E

E

? ? ?

E

A ?

,

k look like Figures 14 and 15 , except that w r ? d r / 2 and w c ? d c / 2 are

k ) ? 2( n ? w r ) , d c ( A 1

w c ( A 1

k ) ? 2( n ? w c ) ,

(9)

(10)

k ) ? ( k ? 1) n ? w r ,

k ) ? ( k ? 1) m ? w c .

2 . 4 . 3 . I T E R A T I N G T H E C O N S T R U C T I O N S .

conditions for the zero construction , then A 0

construction , and that if A satisfies the conditions for the one construction , then A 1

satisfies the conditions for the zero construction .

Indeed A 0

2( kn ? w r ( A 0

because k ? 2 and n ? w r . Also , 0 ? w r ? w ( A 0

checked similarly , and it follows that the conditions for the one construction are

satisfied for A 0

Analogously , A 1

2 w r ( A 1

because k ? 2 k and n ? w r . Also , 0 ? ( k ? 1) n ? w r ? w r ( A 1

tions are checked similarly and it follows indeed that A 1

zero construction .

Thus , provided that a starting A can be found , the two constructions can be applied

alternatively to yield tree-like spaces with an arbitrary number of levels .

There are many possible starting matrices : e . g . the unit matrix of size 3 or more

It is now easy to check that if A satisfies the

k satisfies the conditions for the one

k

k is an km ? kn matrix ( k ? 2) . So ,

k )) ? 2 kn ? 2 w r ? 2 w r ? d r ( A 0

k ) ,

k ) ? n ? kn . The column conditions are

k .

k is also a km ? kn matrix , and

k ) ? 2( k ? 1) n ? 2 w r ? 2( n ? w r ) ? d r ( A 1

k )

k ) ? kn . The column condi-

k satisfies the conditions for the

Page 10

M . Hazewinkel

200

satisfies the conditions for the one construction ; the matrix E n ? I n , n ? 3 , satisfies the

conditions for the zero construction , and the incidence matrix M of the projective space

P 2 ( F 2 ) , i . e .

A

satisfies the conditions for both the zero construction and the one construction .

M ?

1

0

1

1

0

0

0

1

1

0

0

1

0

0

0

1

1

0

0

1

0

0

1

0

1

0

0

1

0

0

1

0

1

0

1

1

0

0

0

0

1

1

0

0

0

1

1

1

0

B

2 . 5 . Complete trees .

both the row and column space are not just tree-like (i . e . isometric to a subspace of the

vertex space of an edge labelled tree) but isometric to the full vertex space of an edge

labelled tree .

Let T k be the following k ? k matrix

T k ? ?

0

? ? ?

To conclude this selection of examples , here are some in which

1

? ? ?

1

? ? ?

? ? ?

? ? ?

1

? ? ?

? ? ?

0

? ? ?

? ? ?

0 ?

and let E denote matrices consisting entirely of 1’s of the appropriate sizes . Consider

the block zero – one matrix

A

The column and row spaces of A are both complete trees with just one node of valence

? 2 , as depicted in Figure 16 . They consist of one central node of valencey m , from

A ?

1

E

E

? ? ?

E

E

T k 1

E

? ? ?

E

E

E

T k 2

? ? ?

? ? ?

? ? ?

? ? ?

? ? ?

? ? ?

E

E

E

? ? ?

E

T k m

.

B

...

...

...

F IGURE 16

Page 11

Tree – tree matrices

201

which issue m branches with k i nodes , i ? 1 , . . . , m . These are the only kind of

examples I know for which both the row and column space are complete trees .

Modifying the example a bit , the edges can be given arbitrary positive integer weights .

3 . T R A N S F E R O F M E T R I C S

As noted in the Introduction , a bipartite graph connecting terms and documents

should also permit the transfer of information about one of the two sets to the other .

This section is devoted to aspects of that problem .

3 . 1 . The transfer problem . .

following situation . Let ? ’ D ? T be a bipartite graph (or , equivalently , a relation)

between a set D of documents and a set T of terms . Let there be given a metric on D

(resp . T ) . What is the ‘best’ corresponding metric on T (resp . D ) .

This sort of situation frequently arises in practice . In the case of the taxonomy of a

scientific field for instance , the technique of cocitation analysis (cf . e . g . [5 , 20] gives

clustering type information on the set D of documents , and the question arises how to

transfer this information optimally to classification information on the set of terms .

Loosely stated , the transfer problem is concerned with the

3 . 2 . The canonical embedding in function space .

transfer problem we first need to describe a canonical embedding of a (discrete) metric

space into the space of functions on it .

To discuss various aspects of the

3 . 2 . 1 . D E F I N I T I O N .

all real valued functions on X . Give F ( X ) the max (or sup) norm metric :

Let ( X , m ) be a (discrete) metric space . let F ( X ) be the space of

m F ( f , g ) ? max

x ? X ? f ( x ) ? g ( x ) ? .

(11)

The canonical embedding of X into F ( X ) is given by

? X : X 5 F ( X ) , x S g x , g x ( y ) ? m ( x , y )

(12)

3 . 2 . 2 . L E M M A .

The canonical embedding ? X is an isometry .

The proof of this lemma is a straightforward application of the triangle inequality .

3 . 3 . The Hausdorf f metric .

metric spaces . The definitions extend to more general cases . To do this , replace ‘max’

by ‘sup’ and ‘min’ by ‘inf’ .

Below , the Hausdorf f metric is defined only for finite

3 . 3 . 1 . D E F I N I T I O N .

of X . Then , the Hausdorf f distance between the sets A and B is defined as

m H d ( A , B ) ? max ? max

Let ( X , m ) be a finite metric space , and let A and B be subsets

a ? A min

b ? B m ( a , b ) , max

b ? B min

a ? A m ( a , b ) ? .

(13)

It is well known that the Hausdorf f metric is a metric on the set of all subsets of X , i . e .

Page 12

M . Hazewinkel

202

it satisfies m H d ( A , B ) ? 0 , m H d ( A , B ) ? 0 ï A ? B , and the triangle inequality

m H d ( A , B ) ? m H d ( A , C ) ? m H d ( C , B ) , cf ., e . g ., [2 , 17] .

3 . 3 . 2 . D E F I N I T I O N (extension of the canonical embedding) .

define

For a subset A of X

g A : X 5 R ,

g A ( x ) ? min

a ? A m ( a , x ) .

(14)

3 . 3 . 3 . P R O P O S I T I O N .

For all subsets A and B of X :

m F ( g A , g B ) ? m H d ( A , B ) .

(15)

P ROOF .

that m ( a 1 , b 1 ) ? m ( a 1 , b ) for all b ? B . We have

m H d ( A , B ) ? max ? max

Take x ? X . Let a 1 ? A be such that g A ( x ) ? m ( a 1 , x ) . Let b 1 ? B be such

a ? A min

b ? B m ( a , b ) , max

b ? B min

a ? A m ( a , b ) ?

? max

a ? A min

b ? B m ( a , b )

? min

b ? B m ( a 1 , b ) ? m ( a 1 , b 1 ) .

Now ,

g B ( x ) ? m ( x , b 1 ) ? m ( x , a 1 ) ? m ( a 1 , b 1 ) ? m ( x , a 1 ) ? m H d ( A , B ) .

Hence

g B ( x ) ? g A ( x ) ? m H d ( A , B )

and , similarly g A ( x ) ? g B ( x ) ? m H d ( A , B ) , showing that

? x ? X ? g A ( x ) ? g B ( x ) ? ? m H d ( A , B ) .

On the other hand , switching A and B is necessary , we can assume that

m H d ( A , B ) ? max

b ? B min

a ? A m ( a , b ) .

Let this maximum be assumed at b 2 ? B . Then g A ( b 2 ) ? m H d ( A , B ) and g B ( b 2 ) ? 0 .

Hence also

m F ( g A , g B ) ? ? g A ( b 2 ) ? g B ( b 2 ) ? ? m H d ( A , B )

and the proposition is proved .

?

3 . 3 . 4 . R E M A R K .

dif ferent definition of Hausdorf f distance :

In the literature , one also frequently encounters the following

m ? H d ( A , B ) ? max

a ? A min

b ? B m ( a , b ) ? max

b ? B min

a ? A m ( a , b ) .

(16)

Proposition 3 . 3 . 3 fails for this alternative definition . Instead , one has

m ? H d ( A , B ) ? max

x

( g B ( x ) ? g A ( x )) ? max

x

( g A ( x ) ? g B ( x )) .

(17)

This is proved in practically the same way .

3 . 4 . Fi ? e transfer procedures .

have a bipartite graph between two sets D and T and we want to transfer a given

metric on D to one on T (or vice versa) . In this subsection I describe five potential

methods for doing this . They have dif ferent background philosophies and which one (if

Now , let us return to the basic situation in which we

Page 13

Tree – tree matrices

203

any of these five) is appropriate in a given situation will probably depend on the

particular circumstances . All need further investigation .

3 . 4 . 1 . H A U S D O R F F T R A N S F E R .

Given ? ’ D ? T , for each t ? T let

D t ? ? d ? D : ( d , t ) ? ? ? .

Now , given a metric m D on D , a metric m T ? ? ? ( m D ) on T is defined by

m T ( t , t ? ) ? ( m D ) H d ( D t , D t ? ) .

This transfer method has a number of advantages (and looks very natural) . For

instance , if D is a trivial metric space (no information) then so is the induced metric on

T . Another nice aspect is the following .

3 . 4 . 2 . P R O P O S I T I O N .

on T .

If the metric m on D is an ultrametric , then so is m ? ? ? ? ( m )

P ROOF .

This is an immediate consequence of the lemma below .

?

3 . 4 . 3 . L E M M A .

the subsets of X defined by formulas (13) . Then u ? is an ultrametric .

Let ( X , u ) be an ultrametric space . Let u ? be the Hausdorf f metric on

P ROOF .

By definition

u ? ( A , C ) ? max ? max

a ? A min

c ? C m ( a , c ) , max

c ? C min

a ? A m ( a , c ) ? .

Interchanging A and C if necessary , we can assume that

u ? ( A , C ) ? u ( a 1 , c 1 ) ? max

a

min

c

u ( a , c )

for a certain a 1 ? A and c 1 ? C . Consider the set ? u ( a 1 , b ) : b ? B ? and let the minimum

be assumed at b 1 ? B . If u ( a 1 , b 1 ) ? u ( b 1 , c 1 ) , then

u ( a 1 , c 1 ) ? max ? u ( a 1 , b 1 ) , u ( b 1 , c 1 ) ? ? u ( a 1 , b 1 )

? min

b

u ( a 1 , b ) ? max

a

min

b

u ( a , b ) ? u ? ( A , B )

and we are through . It remains to deal with the case

u ( a 1 , b 1 ) ? u ( b 1 , c 1 ) .

(18)

Consider the set ? u ( b , c 1 ) : b ? B ? and let the minimum be assumed at b 2 . If

u ( b 2 , c 1 ) ? u ( a 1 , b 2 ) , then we have

u ( a 1 , c 1 ) ? max ? u ( a 1 , b 2 ) , u ( b 2 , c 1 ) ? ? u ( b 2 , c 1 )

? min

b

u ( b , c 1 ) ? max

c

min

b

u ( b , c ) ? u ? ( B , C )

and we are through . It remains to deal with the case

u ( b 2 , c 1 ) ? u ( a 1 , b 2 ) .

(19)

Page 14

M . Hazewinkel

204

Thus , in total , it remains to deal with the case in which both (18) and (19) hold . By the

ultrametric inequality , we then have :

u ( a 1 , c 1 ) ? u ( a 1 , b 2 ) ? u ( b 2 , c 1 ) ,

u ( a 1 , c 1 ) ? u ( b 1 , c 1 ) ? u ( a 1 , b 1 ) .

(20)

Now suppose that u ( b 1 , c 1 ) ? u ( b 1 , c ) for all c ? C . Then

u ( a 1 , c 1 ) ? u ( b 1 , c 1 ) ? min

c

u ( b 1 , c ) ? max

b

min

c

u ( b , c ) ? u ? ( B , C )

and we are done . Thus it remains to deal with the case in which there exists a c 2 ? C

such that

u ( b 1 , c 2 ) ? u ( b 1 , c 1 ) .

But then , using (21) and (20) ,

(21)

u ( a 1 , c 2 ) ? max ? u ( a 1 , b 1 ) , u ( b 1 , c 2 ) ?

? max ? u ( a 1 , c 1 ) , u ( b 1 , c 1 ) ?

? u ( a 1 , c 1 ) ,

contradicting that

u ( a 1 , c 1 ) ? min

c

u ( a 1 , c )

This finishes the proof .

?

3 . 4 . 4 . R E M A R K .

the Hausdorf f distance .

Proposition 3 . 4 . 2 fails if the alternative definition (16) is taken for

3 . 4 . 5 . A N O T H E R D E S C R I P T I O N O F T H E H A U S F O R F F M E T R I C O F A N U L T R A M E T R I C .

π ? ? Y 1 , . . . , Y n ? be a partition of X . For each subset J of ? 1 , 2 , . . . , n ? , J ? ? , let

Let

P J ? ? A ’ X : A ? Y j ? ? for all j ? J and A ? J j ? ? for all j ? J ? .

Then , as is easily checked , the P j form a partition ? of ? ( X ) , the set of subsets of X .

Now , an ultrametric u on X is given by a series of coarser and coarser partitions

? singletons ? ? π 0 ? π 1 ? ? ? ? ? π k ? X ,

with levels d 0 , d 1 , . . . , d k attached to them . Then u ( x , y ) ? d l if l is the index of the

finest partition of these that does not separate x and y . Associated to the sequence of

partitions above there is the sequence of partitions

? singletons ? ? ? 0 ? ? 1 ? ? ? ? ? ? k ? ? ( X ) .

Then the Hausdorf f metric on ? ( X ) is defined by this series of partitions with the same

levels as above , i . e . u ( A , B ) ? d l if l is the index of finest partition from the ? i that does

not separate A and B .

3 . 4 . 6 . A V E R A G I N G T R A N S F E R .

unknown which of the documents in D t and D t ? really represent t and t ? . This leads to

the idea that the dissimilarity of t and t ? should be measured by the average distance of

documents in D t and D t ? , i . e .

4 D t ? ?

The central idea here is that given two terms t , t ? it is

? ( D t , D t ? ) ?

1

4 D t

1

d ? D t ,d ? ? D t ?

m ( d , d ? ) .

Page 15

Tree – tree matrices

205

However , this expression does not define a metric . It does suggest , however ,

considering the a ? eraging transfer . This transfer method attaches to a metric m on D

the metric ? a ?

? ( m ) on T defined by :

m a ? ( A , B ) ? m F ( D ) ?

(22)

1

4 A ?

d ? A g d ,

1

4 B ?

d ? ? B g d ? ? ,

? a ?

? ( m )( t , t ? ) ? m a ? ( D t , D t ? )

Another way to think about this is that m a ? somehow measures the distance between

the (non-existing) centres of D t and D t ? . (For a subspace of the line , and non-interlacing

subsets of it , this is exactly the case . )

This idea is reinforced by the following observation . For a subset A of X with metric

m , let

1

h A ?

4 A ?

a ? A g a .

Then , for any x ? X ,

m F ( h A , g ) ?

1

4 A ?

a ? A m ( a , x ) ,

as is easily proved .

Note that the metric on T comes again , via ? , from a metric on the set of all subsets

of D , as defined by the first part of (22) . Observe that , for all A , B ’ X ,

? ( A , B ) ? m a ? ( h A , h B )

and it could well be that it is the largest metric subordinate to the averaging

dissimilarity ? .

Easy examples show that there is no particular relation between the Hausdorf f

distance , m H d , on the set of all subsets ? ( X ) of a metric space ( X , m ) and the

averaging distance , m a ? , on ? ( X ) .

3 . 4 . 7 . T R A N S F E R V I A W E I G H T S .

be the characteristic functions of these subsets . Then the Hamming distance between t

and t ? is equal to the sum (or integral)

?

Let t , t ? ? T be terms , A ? D t , A ? ? D t ? , and χ A , χ A ?

d ? D ? χ A ( d ) ? χ A ? ( d ) ? .

In this formula , all d ? D are given equal weight . Now let there be given a metric m on

D . This can be used to assign a measure of relative importance to the elements of D in

which ‘central elements’ acquire more weight than ‘peripheral’ ones . For instance , we

could proceed as follows :

? ( y ) ?

S

? x ? D m ( x , y ) , S ? ?

x ,y ? D m ( x , y ) .

Now , for t , t ? ? T , define

? w

? ( m )( t , t ? ) ? ?

d

? χ A ( d ) ? χ A ? ( d ) ? ? ( d ) .

3 . 4 . 8 . The last two transfer of metrics procedures , 3 . 4 . 9 and 3 . 4 . 10 below , require

#### View other sources

#### Hide other sources

- Available from Michiel Hazewinkel · Sep 26, 2014
- Available from psu.edu