Mathematical knowledge management is needed
ABSTRACT In this lecture I discuss some aspects of MKM, Mathematical Knowledge Management, with particuar emphasis on information storage and information retrieval.
Direct line: +31-20-5924204
original version: 12 February 2004
Mathematical knowledge management is needed
(Keynote speech at the November 2003 MKM meeting in Edinburg)
Abstract. In this lecture I discuss some aspects of MKM, Mathematical
Knowledge Management, with particuar emphasis on information storage
and information retrieval.
MSCS: 68P20, 68T30
Key words and key phrases: mathematical knowledge, knowledge
management, IRIS, information retrieval, information storage, keyphrase
assignment, identification cloud, formal mathematics, classification,
1. Is there a problem?
The issue at hand is that of information storage and information retrieval as regards mathematics.
Or, concentrating on the latter, if one has a mathematical question, can one find out what is
known about it (in reasonable time). Many say yes; what with friends (= knowledgeable
colleagues), two quite good databases of abstracts (complete with sophisticated search engines),
and, lately, the web with a good many search engines and full text search, of course we can. I am
of a different opinion and in this lecture I will try to indicate why I think that:
“We don’t even know how much we know that we don’t know we know”.
Mostly the reasons why I think this come from my experiences as editor of the
Encyclopaedia of Mathematics, , and, consequently, trying out to find what is known in areas
where one is not a (super)specialist (meaning a highly specialized scientist).
Of course this is not unique to mathematics. As far as sheer quantity of published material
goes, physics, chemistry, biology, medicine, ... are far larger. Think of a factor 10 or so. On the
other hand, results in mathematics have a long life. As an example, results from around 1850,
that were all but forgotten, can be very relevant for questions to be solved now (as I can testify
from personal experience; the results in question were some of Kronecker concerning cyclotomic
2. The size of the problem.
To start with let me give you some descriptive numbers of the size of this matter, the sheer
volume of paper (data) involved.
The total number of pages published in mathematics so far makes up a stack of about 60km
height. This is excluding college level textbooks such as those voluminous texts on calculus and
Michiel Hazewinkel2MKM is needed
analytic geometry beloved by certain publishers. Or, to put it another way, 60 km shelf space.
So, taking back to back shelves, seven high, separated by the absolute minimum of 80cm, one
would need a room of 100m by 120m, about the size of two football fields.
How much redundancy there is in all that is anybody’s guess. At present there not even
preliminary ideas how to estimate that.
Here are some more indicators of size and growth.
The universally used and excellent classification scheme MSCS2000 (Mathematical Subject
Classification Scheme, version of the year 2000) 1 is a tree with some 5500 leafs. That is a lot of
leafs. Still, most of these are large enough to drown a whole (super)specialism in. Here are some
14L05: Formal groups, p-divisible groups. The 570 page monograph  contains maybe
half of what is necessary to give an adequate treatment of the subject.
16W30: Coalgebras, bialgebras, Hopf algebras. The time is long past that the subject of
Hopf algebras could be dealt with reasonably in one monograph; 2500 pages is my estimate.
20D08: Sporadic groups. Besides some small number of infinite series there are just 26
socalled sporadic simple groups. Describing them can be done in a few hundred pages; proving
that these are all would take well over 5000 pages. Indeed just dealing with the largest of them,
the Fischer-Griess monster, can be the subject of a full length monograph (and in fact such a
monograph is being written).
90C27: Combinatorial optimization. Recently, 2003, there appeared a three volume
monograph on this subject totalling 1882 pages, . According to the author this represents just
those parts of the subject that were interesting to him (private communication).2
John Ewing in a recent article in the Mathematical Intelligencer, complaining about the
excessive3 (his words) profits made by commercial publishers, noted that there were some 25000
mathematical articles published in 2001.
In connection with , I once calculated in several different ways, that to give an adequate
description of the more established parts of mathematics one would need between 120 000 and
200 000 (controlled standardized) key phrases. The first 10 volumes of loc. cit. have about 30000
(not counting inversions and linguistic variations. So to do a good job a four to six fold increase
is needed. Work on that is in progress.
In the 1970’s Raoul Bott of Harvard had a graduate student he did not really know what to do
with. So he set him to counting how many new theorems there appeared a year (using Math
1 To see the scheme go to www.emis.de/ZMATH and click on the second item under the heading
‘services’ in the left side bar.
2 These considerations suggest that all of mathematics can be dealt with in some 11 000 monographs; thus
reducing the stack of 60km to someting like 550m, assuming that enough competent people can be induced to do the
job. There are only some 50 000 mathematicians worldwide (of which much less than 50% are active). This gives
maybe a first idea of how much redundancy is involved. However, the idea that one can cover all of mathematics in
some 100 volumes, as the publisher Springer once claimed in an advertisement for the series EMS, is of course utter
3 I tend to diasagree. The whole debate sounds to me like the squeeking of a mouse that has seen a particle
of cheese slide away while next door a large rat is gnawing contentedly on a big chunk of meat. Many seem to be
complaining about the price of scientific literature. At the very least the debate should include the price and the
vastly larger profits involved in manifacturing the instruments of scientific research and practice in physics,
astronomy, chemistry, biology, and, especially, medicine.
Michiel Hazewinkel3MKM is needed
Review). The number was 200 000. These are what their respective authors called theorems; not
lemmas, propossitions, scholia, constructions, definitions, ... .
There is a pleasing thing about these various numbers, coming from very different ways of
looking at the field of mathematics. They all, within an order of magnitude, fit with one another.
It is a remarkable fact that vast sums, worldwide, are spent on research and very little on making
sure that if and when some known result is needed it can be found again.
This leads to the well known and widespread phenomenon of rediscovery. For instance I
once read a paper from algebraic topology which contained a substantial amount of finite group
theory which was needed; later in the review in Mathematical Reviews the reviewer
complimented the author on the fact that he had not only done important work in algebraic
topology but also in group theory; still later in turned out that the group theory part had been
done long before.
It has been said that for moderately difficult problems it is often less effort to solve them
again than to try to find them in the published literature.4
A famous case (of rediscovery) occurred in the theory of completely integrable dynamical
systems. An important question there is that of commuting differential operators. A lot of work
on that had been done when it was discovered that the question had been settled in 1928 by
Burchnall and Chaundy, see [3, 5]. For another tool in the field of completely integrable systems,
that of what are now called ‘Akhiezer functions’, a similar story is true.5 For a while, indeed, it
was something of a cult thing in integrable system theory to look for and find old forgotten
results that were important for the theory.
3. How old is the problem?
Actually quite old. Burton wrote in his ‘Anatomy of melancholy’ , first edition published in
1621, when discussing ways of combating that very ‘melancholy’ (Part 2, Sect. 2, Memb.4; p.
455 in ):
“What vast Tomes are extent in Law, Physick, & Divinity, for profit, pleasure, practice,
speculation, in verse or prose, &.! Their names alone are subjects of whole Volumes, we have
thousands of Authors of all sorts, many great Libraries full well furnished, like so many dishes of
meat, served out for several palates; & he is a very block that is affected with none of them.”
and, ibid p. 460:
“By this art you may contemplate the variation of the 23 letters, which may be so infinitely
varied, that the words contemplated and deduced thence will not be contained within the
compass of the firmament; ...”
or, quoting a much more recent author:
“It suffices that a book be possible for it to exist.”
J L Borges, The library of Babel
However, reading these old accounts, one does not get the feeling that the abundance of learning
then extant was regarded as a problem. Rather it was regarded with warm feeling, like a treasure
house affectionatically viewed with the comfortable sentiment that there will always be enough
Perhaps that was the case because at the time, Burton’s time, scientists mainly relied on
human memory and human information processing. The human brain is a magnificent
4 And, most appropriately, I cannot at the moment find where the algebraic topolgy example came from.
5 But in this case the results go still further back to the first years aof the 20-th century.
Michiel Hazewinkel4MKM is needed
information processing device (operating in very different ways from computers). Just how good
it is (can be) appears to be largely unknown. Estimates of the memory capability vary wildly
from 1020 bits (von Neumann) to 109 bits (Th K Landauer, 1986). Estimates based on the
number of synapses range from 1013 to 1015 bits of memory. Clearly, given the astonishing
feats of some people with an eidetic (photographic) memory the potential is very high.
Personally I am inclined to be very optimistic about the (potential) information processing
capabilities of humans and to think that it was/is a mistake to trust too much to computers (and
before that to card indexing systems).
4. Why MKM?
MKM stands for ‘Mathematical Knowledgement Management’. Whence the theme of this
conference. Also the inspiration. Whether it is explicitely realized/acknowleged or not, many
national and international initiatives have to do with MKM, or ?KM. As distinguished from
M(KM), which is the mathematics of knowledge management, a parsing which we own to Bruno
Buchberger, and a subject which has hardly been touched so far, but see [11, 12] and the
references cited there, and, perhaps, when in a generous mood, the business of citation analysis.
There is much more to MKM than just mathematical information storage and retrieval.
These are by no means the only issues. Others are formalized mathematics, proof checking and
transparent proofs, the quality of proofs, indexes and thesauri, mathematical databases, ... . Some
of these will briefly be discussed below.
5. Tools 1. The MSCS 2000 (Mathematics Subject Classification Scheme, version of 2000).
The field of mathematics is lucky in that it has a very good classification scheme. It is a tree of
four levels (counting the root) consisting of the root (all of mathematics), specialisms within
mathematics (such as ‘54: General topology), subspecialisms (such as 35L: Partial differential
equations of hyperbolic type) and subsubspecialisms (such as 20C30: Representations of finite
symmetric groups). There are about 5500 of these finest classification numbers (leaves).
Perhaps inevitably the thing is organized as a tree in imitation of evolutionary trees. Certainly
mathematics and other fields of knowledge evolve, but they do not evolve (like nature) in ever
finer subdivisions and more specialized species. Other things happen too. Like two
subsubspecialisms merging, or two quite far apart subsubspecialisms turning out to be quite
narrowly related, or one subspecialism turning out to be a special case of some subsubspecialism
elsewhere. All these things do not happen in nature’s evolutionary tree.
A more accurate picture of how the various parts of mathematics interrelate etc. is
probably is directed graph and with modern computer technology and computer graphics there is
no reason not to display things like that.
Still trees6 are traditional for classification schemes and thesauri and an awful lot of ‘see
also’ links do help a good deal.
To illlustrate that mathematics is not at all like a tree, here is the part of the MSCS that is
relevant for the concept of Hopf algebras
6 Personally I think trees are an inheritance from the printing age: they are just about the most complicated
directed graphs that can be conveniently printed.
Michiel Hazewinkel5MKM is needed
Here are some words on what the diagram above means.
* The exact phrase “Hopf algebras” occurs twice in the MSCS: 16W30 and 57T05.
Basically a Hopf agebra is an algebraic structure; more precisely it is an algebra with additional
structure. This gives the classification 16W30 (= coalgebras, bialgebras, Hopf algebras and
modules on which they act). The reason that there is a classification 57T05: Hopf algebras, is
that historically the first examples of Hopf algebras came in the form of the cohomology or
homology of suitable spaces and manifolds.
* 05 stands for the subfield ‘‘Combinatorics’. A good many combinatorial identities arise
by looking at representations of Hopf algebras in two ways (often universal enveloping algebras,
a special kind of Hopf algebras), whence lots of applications of Hopf algebras in subsubfield
05A19 (= Combinatorial identities).
Many combinatorial objects naturally form the basis of a Hopf algebra; for instance planar
binary trees. Whence e.g. the role of Hopf algebras in 05C (= Graph theory).
The symmetric functions and several generalizations are best looked at as forming Hopf
algebras. Hence the role of Hopf algebras in subfield 05E (Algebraic combinatorics).
* Many important Hopf algebras are commutative. So one would expect a subfield 13???
(13 = Commutative rings and algebras), for commutative algebras with extra structure and a
place for Hopf algebras there. But there isn’t.
* 14L05 is ‘Formal groups, p-divisible groups’, a subsubfield of 14 (= Algebraic
geometry). Formal groups are a special kind of Hopf algebras which in turn have applications in
number theory (11M, 11R, 11S), field theory (12F). Another kind of Hopf algebras is formed by
(the coordinate rings of) algebraic groups (14L, 20G, 20H)
* 17B stands for ‘Lie algebras’, and 17B10 for the algebraic approach to representations of
Lie algebras. A representation of a Lie algebras is the same thing as a representation of its
universal enveloping algebra which is a special kind of Hopf algebra.
17B37 is ‘‘Quantum groups, quantized enveloping algebras and related deformations’. This
kind of defomations is also a special kind of Hopf algebra.
* 20C is ‘Representations of groups’. A representation of a group is the same thing as a
representation of its group ring which is another special kind of Hopf algebra.
* 20G is the subfield of ‘linear algebraic groups’ and 20G42 is the subfield of ‘quantum
groups’ which are special kinds of Hopf algebras (deformations of the coordinate rings of the
* 46 is the specialism ‘Functional analysis’ with subfield 46L: ‘Selfadjoint operator
algebras’. There are well over 300 published papers dealing with Hopf algebras of this kind.
Mostly in subsubfields 47L85 (Noncommutative topology), 46L87 (Noncommutative geometry).
There are also some in 46M (Methods of category theory in functional analysis)