Content uploaded by Gunnar Carlsson

Author content

All content in this area was uploaded by Gunnar Carlsson on Dec 26, 2013

Content may be subject to copyright.

An Algebraic Topological Method for Feature Identiﬁcation

Erik Carlsson, Gunnar Carlsson∗and Vin de Silva†

August 12, 2003

Abstract

We develop a mathematical framework for describing local features of a geometric object—

such as the edges of a square or the apex of a cone—in terms of algebraic topological invariants.

The main tool is the construction of a tangent complex for an arbitrary geometrical object,

generalising the usual tangent bundle of a manifold. This framework can be used to develop

algorithms for automatic feature location. We give several examples of applying such algorithms

to geometric objects represented by point-cloud data sets.

1 Introduction

In attempting to recognize geometric objects, it is often very useful to ﬁrst recognize iden-

tiﬁable features of the object in question. For example, in correctly identifying a square a

natural ﬁrst step is to locate the corners; this information is enough to determine which

square we are dealing with. Similarly, if the object in question is a convex polyhedron, then

the vertices and edges of the polyhedron are the most important features to identify. In the

case of a cone, one looks for the cone point. It is an interesting problem theoretically and

computationally to construct automatic methods for locating such features.

In order to develop such methods, it is ﬁrst necessary to make mathematical sense of the

notion of “feature”. A reasonable starting point, based on the examples above, is to deﬁne

features as singular points of geometric curves, surfaces, etc. Accordingly, in this paper we

set ourselves the task of developing automatic methods for locating singular points on a

curve, surface, or higher dimensional geometric object.

A desirable feature of such methods is that they should be robust to deformation, to a

certain degree. For example, in optical character recognition, it is important that variously

deformed versions of a given character should be identiﬁed as being equivalent and having

equivalent features. The methods we develop here ought to be able to recognize the apex,

T-junctions and leg-ends of an upper-case letter “A”, even if that letter has been sheared,

or bent, or compressed in the vertical or horizontal direction. Another situation where

robustness is important occurs when an object is viewed in two diﬀerent coordinate systems.

The locus given by {(r, θ) : 1 ≤r≤2,0≤θ≤π/2}looks like a rectangle in (r, θ)-space,

but it looks like a sector of an annulus when viewed in rectangular coordinates. We develop

methods which detect properties of this locus which are invariant under such coordinate

changes.

A typical method (see [3] or [5]), for dealing with such questions is to develop templates,

equipped with parameters, with the hope that the ﬁgure in question will be very close to a

∗Supported in part by NSF DMS-0101364

†Supported in part by NSF DMS-0101364

1

template model, for some choice of the parameter values. For example, in the case of the

letter “A” above, one might have a template consisting of a standard letter “A”, together

with two parameters describing vertical and horizontal compression of the letter. This family

of templates may be adequate for a particular class of documents, but it would not be

adequate in documents where a “sheared” letter is permitted. Of course, a new parameter

can be added which describes the shear. To cover an even larger class of documents, perhaps

containing instances of “A” where some of the line segments deﬁning it are in fact curves,

yet more parameters are necessary. Clearly this can become unwieldy quite quickly.

By contrast, our approach uses algebraic topology to locate and identify relevant features

of objects without requiring the choice of templates, or of parametrized families of defor-

mations. Our method permits us to conclude the existence of a singular point, without

having to match it with any particular model of the particular singularity. For example, a

sharp bend (“corner”) in a curve can be recognized without having to match that region of

the curve to any particular pair of lines locally. The idea is to identify algebraic topological

invariants which can recognize a singular point, and which are by their nature deformation

invariant; instead of trying to match with a larger and larger family of templates.

These invariants automatically distinguish between diﬀerent kinds of singular points. For

example, if our underlying point set is a cube, the set of singular points consists of all the

edges on the cube; this includes the vertices, which are common to multiple edges. However,

we may wish to isolate the vertices directly. This can be done by adjusting a single parameter

in the search. We use a topological invariant referred to as the ﬁrst Betti number β1in this

case, and setting β1>1 ﬁnds all singular points, while setting β1>3 will ﬁnd only the

vertices.

A key consideration is in what form the geometric objects are presented. For instance,

if they are presented using ﬁnite systems of algebraic equations and inequalities, then it is

typically feasible to determine the collection of singular points explicitly. In this paper, we

will instead deal with point cloud data, i.e ﬁnite but large sets of points sampled from a

geometric object in Euclidean space. Dealing with spaces presented in this form produces

computational challenges for us, since one must determine how to “estimate” the topological

invariants from a geometric object using only a ﬁnite sample from it.

1.1 Overview of the method

We now give an informal description of our method. An initial observation is that many

singular points are topologically standard. This means that there is a continuous, but not

smooth, change of coordinates which transforms the surface locally into a smooth model.

Since topological invariants are insensitive to such coordinate changes, this means that we

cannot apply topological invariants directly to the spaces in question to detect these features.

We are instead forced to consider constructions on the surface, which are sensitive to the

local smooth structure, and which produce spaces which can be distinguished by topological

methods. In this paper, we will develop an extension of the concept of the tangent bundle

to a smooth submanifold of Rn, which applies to more general subsets. We will refer to

this construction as the tangent complex T(X) of a subset X⊆Rn; the tangent complex

is a subset of X×Rn. It is closely related to the notion of tangent cone used in geometric

measure theory (see [4]).

In many examples, which are topologically standard, T(X) nevertheless produces a space

2

which is topologically distinct from Tapplied to a smooth submanifold. This will ultimately

permit us to detect singular points by ﬁnding regions in which the tangent complex is

homotopically non-standard. Here are some examples of how the construction behaves;

we will give a formal deﬁnition in the body of the paper.

Example 1.1 When Xis a smooth submanifold of Rn,T(X)is the usual tangent unit sphere

bundle of X.

Contrast the following two examples.

Example 1.2 (Straight line.) If X1=R× {0}is the x-axis in R2, then T(X1)is the union

of two components R× {0} × {e1}and R× {0} × {−e1}. Here e1denotes the standard basis

vector (1,0) ∈R2.

Example 1.3 (L-shaped line.) Let X2=R+×{0} ∪ {0} × R+, where R+denotes the set

of nonnegative reals {x:x≥0}. In this case, T(X2)is a disconnected union of four rays,

given by R+× {0} × {e1},R+× {0} × {−e1},{0} × R+× {e2}, and {0} × R+× {−e2}.

The sets in Examples 1.2 and 1.3 are topologically equivalent to the real line, but their

tangent complexes fall into two and four connected components respectively. Thus we dis-

tinguish X1and X2by simple topological invariants of T(X1) and T(X2), though the spaces

themselves are topologically indistinguishable. In fact for any smooth curve C⊂Rn, the

tangent complex T(C) is topologically equivalent to T(X1). In contrast, for a piecewise

smooth curve with ktangent discontinuities, the tangent complex has 2k+ 2 connected

components. Example 1.3 simply illustrates the case k= 1.

In these simple examples the existence of a corner, or the number of corners, can be

derived from the number of connected components of the tangent complex. This may be

computed exactly or up to some tolerance using a clustering algorithm ([7], pp. 453-480).

In higher-dimensional cases it is may not be enough to count connected components, as we

see next.

Example 1.4 (One wall.) Let X3be the set {0}×R2in R3. Then the tangent complex T(X3)

is connected and has the homotopy type of a circle.

Example 1.5 (Two walls meeting at a corner.) Let X4be the subset of R3given by

X4=X2×R

=R+× {0} × R∪ {0} × R+×R

In this case the tangent complex is connected, but has the homotopy type of a bouquet of three

circles.

Here the presence of singular points in X4along the subset {0} × {0} × Rcan be detected

using one-dimensional homology, which detects loops.

In this paper, we use these ideas as the basis for an algorithm to locate the singular set.

To give an idea of how the algorithm works, we consider the case of a curve in the plane.

The object is to locate any singular points. We suppose that we are dealing with a bounded

part of the curve contained in a square window, as in Figure 1.

The ﬁrst step in the algorithm is to compute the homology of the tangent complex for the

part of the curve contained in the window. If the homology agrees with the standard model

3

Figure 1: A curve with a singular point

Figure 2: A divide-and-conquer strategy for locating the singular point

of a single smooth curve, then we stop looking for singular points. In this case the tangent

complex has four connected components (as in Example 1.3), which is non-standard.

The next step is to divide the window into four smaller windows and repeat the homology

calculation in each window (Figure 2, left panel). In this case, one of the windows is empty

and two of the windows contain a standard curve, and hence have standard homology. As

indicated by the shading, we discard these three windows and apply the algorithm recursively

on the single remaining non-standard window. Two further iterations of this process are

shown in last two panels of Figure 2. The result is a nested sequence of windows converging

on the singular point. If there are several singular points, then the process will have several

active branches converging separately to the diﬀerent singular points.

When implementing this algorithm in practice, we need to take account of the fact that

we are dealing with point cloud data. This presents two challenges:

•How do we recover homology from a space represented as point cloud data?

•How do we reconstruct a discrete tangent complex from point cloud data, when it

depends on limiting information concerning the underlying space?

In this paper we have taken a straightforward approach to the ﬁrst question. Given

a point cloud space we build a simplicial complex approximation called the Rips complex

which depends on a choice of length scale and which has a vertex for every data point

considered. Given a simplicial complex, the homology calculation is straightforward linear

algebra. The Rips complex is simple to implement but not particularly eﬃcient; it suﬃces

for the examples given here. A more sophisticated approach is the the synthetic Delaunay

triangulation developed in [1] .

We reconstruct the tangent complex by using local principal components analysis§at a

small number of base points in the complex to obtain a an approximation to the tangent

space at these points; then we sample the unit spheres in these tangent spaces uniformly to

4

(a) (b) (c)

Figure 3: Example spaces with easily-computed homology

obtain a point cloud in Rn×Sn−1. The resulting point cloud space is amenable to the Rips

complex construction, and the homology of the tangent complex can be recovered reliably

given suﬃcient data.

2 Homological Preliminaries

In this section, we will discuss the properties of homology groups we will need. The reader

is encouraged to consult a standard text such as [6] or [8] for a more detailed exposition of

these ideas.

Homology is a technique for assigning, to every topological space Xand nonnegative

integer n, a vector spaces Hn(X). We will deal exclusively with “mod 2 homology”, in

which case these are vector spaces over the ﬁnite ﬁeld F2={0,1}. The dimension of this

vector space is referred to as the n-th Betti number of Xwith mod 2 coeﬃcients, and will

be written βn(X). In an informal sense, the n-th Betti number of Xmeasures the number

of n-dimensional holes in the space X.

Example 2.1 Suppose that X=S1is the unit circle in the plane. Then H1(X)∼

=F2, so

β1(X) = 1. This represents the one dimensional hole “in the middle of the circle”.

Example 2.2 Suppose that Xis a bouquet of two circles, as shown in Figure 3(b). In this

case, β1(X) = 2, representing two distinct one dimensional holes.

Example 2.3 Suppose that X=S2, the unit sphere in 3-space. Then β2(X) = 1, measuring

the two dimensional hole in the sphere. More generally, we have that βi(Sn) = 0 when i6= 0, n

and βi(Sn) = 1 for i= 0, n.

Example 2.4 Suppose that Xconsists of kdistinct points. Then β0(X) = k. In general,

β0measures the number of path components of X.

The homology groups have the following properties.

•Hnis functorial, i.e. every continuous map f:X→Yinduces a linear transformation

Hn(f): Hn(X)→Hn(Y) for all n.

•Hnis homotopy invariant, i.e. if two maps f, g:X→Yare homotopic, then the induced

linear transformations Hn(f) and Hn(g) are equal. This is an extremely important

property of these linear transformations. We say two spaces Xand Yare homotopy

equivalent if there are maps f:X→Yand g:Y→Xso that fg is homotopic to idY

and gf is homotopic to idX. The homotopy property for Hnimplies that if Xand Y

are homotopy equivalent, then Hn(X) and Hn(Y) are isomorphic, and in particular

βn(X) = βn(Y).

5

The phrase “can be deformed into” is loosely synonymous with “is homotopy equivalent to”,

and conveys roughly the right idea.

Example 2.5 The circle in Figure 3(a) and the annulus in Figure 3(b) are homotopy equiv-

alent and so have the same Betti numbers.

•When a space is broken up as the union of diﬀerent pieces, the homology can be com-

puted from the homology of the pieces and all possible overlaps of these pieces, using

Mayer–Vietoris techniques ([6], [8]).

When a space is described as a simplicial complex, the computation of homology re-

duces to straightforward linear algebra over the ﬁeld F2. A simplicial complex is a subspace

of Rnexpressed as a union of simplices which overlap in faces, i.e. the intersection of any

pair of simplices is a face of each of the two simplices. Such a space is determined up to

homeomorphism by simple combinatorial data.

Deﬁnition 2.6 By an abstract simplicial complex, we will mean a pair (V, Σ), where Vis

a ﬁnite set whose objects are referred to as vertices, and where Σis a collection of subsets

of V, so that if σ∈Σ, and σ⊆τ, then τ∈Σ. The elements of Σare referred to as faces.

If a face τ∈Σconsists of exactly k+ 1 elements of Vthen we say that τ={v0, v1, . . . , vk}

is a k-simplex of Σwith vertices v0, v1,...,vk.

Any simplicial complex Sdetermines an abstract simplicial complex as follows. Let Vbe

the set of vertices of S, and let Σ consist of those sets of vertices τ={v0, v1, . . . , vk}which

span a simplex in S. Conversely, we can recover the topological type of Sfrom the abstract

simplicial complex by taking a simplex for each face of Σ and gluing these simplices together

appropriately.

The homology of a simplicial complex Sis computed from the abstract simplicial complex

associated to it. The idea is to set up a chain complex, which is a sequence of vector spaces

and linear maps between them:

0∂0

←− C0

∂1

←− C1←− · · · ←− Ck−1

∂k

←− Ck←− · · ·

Each Ckis a vector space over the ﬁeld F2with a basis vector ¯τfor each k-simplex τ∈Σ.

The linear map ∂kis known as the boundary operator and is deﬁned as follows. First

choose an ordering of the vertex set V. Writing τ={v0, v1,...,vk}with the vertices listed

in increasing order, we deﬁne the j-th face of τto be the (k−1)-simplex τjobtained by

deleting the vertex vjfrom the list. Then ∂kis deﬁned to be the linear map deﬁned by

∂k¯τ=

k

X

j=0

(−1)j¯τj

on basis vectors ¯τ, and extended by linearity to all of Ck. [Note: the (−1)jterms shown here

are necessary in general, but in our case they happen to be redundant since we are working

over F2.]

If σ⊂τis a (k−2)-simplex, then it is a face of a (k−1)-face of τin exactly two diﬀerent

ways. Using this observation it can be shown that ∂k−1◦∂k= 0 for all k. In other words, the

boundary of a boundary is always zero. Let Zk⊆Ckdenote the null space of the operator ∂k,

6

(a) (b) (c)

Figure 4: Chains and cycles

(a) (b)

Figure 5: A boundary cycle

and let Bk⊆Ckdenote the image of the operator ∂k+1. It follows that Bk⊆Zk, and we

deﬁne the k-th homology group Hkto be the quotient vector space Zk/Bk. The structure

of Hkcan therefore be expressed in terms of matrix calculations over the ﬁeld F2.

2.1 Chains, cycles and boundaries

It may be helpful to give some examples of how the deﬁnition Hk=Zk/Bkworks in practice.

We introduce the language of chains,cycles and boundaries.

Ak-chain is an element of the F2vector space Ckderived from a simplicial complex S.

There is a coeﬃcient, 0 or 1, for each k-simplex of S; thus we can regard a k-chain simply as

a set of k-simplices, by picking out those simplices with coeﬃcient 1. A k-cycle is an element

of Zk; in other words a k-chain whose boundary is zero (empty). Finally a k-boundary is a

k-chain which is the boundary of some (k+ 1)-chain. Every k-boundary is automatically a

k-cycle; this is equivalent to the assertion ∂◦∂= 0. The homology Hkis deﬁned to be the

space of k-cycles modulo all the uninteresting k-cycles that be created cheaply by taking the

boundary of some (k+ 1)-chain.

In Figure 4(a), a typical 1-chain is shown highlighted in red. It is not a 1-cycle, since

its boundary 0-chain is nonempty (Figure 4(b)). Figure 4(c) shows a 1-cycle. This is not

the boundary of any 2-chain, so it corresponds to a genuine non-zero element of H1. On

the other hand, the cycle in Figure 5(a) is the boundary of the 2-cycle shown in Figure ref-

ﬁg:1boundary(b), and so it is zero in homology.

3 Point Cloud Data

We have seen that homology is readily computable for spaces which are equipped with a

triangulation, i.e. a homeomorphism to a simplicial complex. The geometric objects we

will deal with will rarely come equipped with such a structure. In fact, we will be trying

to recover topological information about a geometric object from point cloud data obtained

from the space, by which we mean a ﬁnite set of points sampled from the object. In order

7

to make calculations, this means that we must somehow construct a simplicial complex from

the point cloud data, which we believe approximates the space in question.

The idea is as follows. Let Xbe a topological space, and suppose we have a ﬁnite covering

U={Uα}α∈Aof Xindexed by a set A.

Deﬁnition 3.1 The Cech complex of U,C(U), is the simplicial complex whose vertex set

is A, and where a subset {α0, α1,...,αk}is a simplex if and only if

Uα0∩Uα1∩...∩Uαk6=∅

It is frequently the case that the Cech complex of the covering Uis homotopy equivalent

to X, and therefore has homology isomorphic to that of X. For example, if all sets of the

form

Uα0∩Uα1∩...∩Uαk

are either empty or contractible, then C(U) is homotopy equivalent to X. For any Rieman-

nian manifold M, there is an so that if {x1,...,xN}has the property that the balls B(xi)

cover M, then the Cech complex of the covering {B(x1),...,B(xN)}is homotopy equivalent

to M.

If Sis a ﬁnite subset of a metric space, we write C(S) to mean C(B), where Bis the

collection of metric balls {B(s) : s∈S}. In the case of Euclidean data there is the following

approximation theorem.

Theorem 3.2 If S⊂Rnis a ﬁnite set of points in Euclidean space, then C(S)is homotopy

equivalent to the space:

S=[

s∈S

B(s)

When Sis sampled from a space X⊂Rn, it may well be the case that the union of balls S

covers and is homotopy equivalent to X. If so, then this theorem implies that C(S) has the

same homology as X.

There is a second complex we can construct to approximate the homotopy type of a space

which is equipped with a metric.

Deﬁnition 3.3 Suppose that Xis a metric space, with metric d. For any ﬁnite subset S

of X, and any > 0, we deﬁne the -Rips complex of the subset Sto be the abstract simplicial

complex whose vertex set is S, and where a subset {s0, s1,...,sk}is a simplex if and only if

d(xi, xj)≤for all i, j so that 0≤i, j ≤k. We write R(S)for this complex.

Suppose again that Xis a metric space, and that Sis a ﬁnite subset so that

[

s∈S

B(s) = X

We have an evident inclusion C/2(S)→R(S): the vertex sets of the two complexes are

the same, and it follows from the triangle inequality that if B/2(s1)∩B/2(s2)6=∅, then

d(s1, s2)≤. If we are dealing with points in Rn, there is also an inclusion R(S)→C(S),

as one can readily check. This comparability suggests that both complexes can be useful in

approximating homotopy types.

8

(a) (b)

Figure 6: ‘Holes’ due to uneven sampling lead to incorrect homology.

Remark. The two complexes have diﬀerent useful properties. The Cech complex is theoret-

ically amenable in that there are results (such as Theorem 3.2 above) which establish that

under certain conditions the homotopy type of the Cech complex of a covering of Xis the

same as that of X. However the Cech complex is computationally more involved, since one

needs to determine for every collection of metric balls whether they have a common intersec-

tion. This is a slightly awkward calculation even in Euclidean space. The Rips complex, on

the other hand, does not have such good theoretical properties, but is computationally more

convenient, since one only needs to identify the 1-simplices (edges), which then determine

the rest of the complex.

3.1 Uneven sampling, and persistent homology

In spite of the theorems alluded to above, in practice it is unusual for the Cech complex to

exactly recover the homotopy type of the underlying space X. The usual problem is that

our sampling from the geometric object may not be adequate.

To see how this happens, consider Figure 6. Here we suppose that we have obtained

point cloud data by sampling from an annulus, which has the homotopy type of a circle.

However, the sampling is not completely uniform. The blue shaded region in (a) represents

the cloud of sampled points, with the white holes representing subregions where there are no

sample points. Each of the holes which is entirely contained in the shaded region will create

a new generator in homology, so when we compute the homology of the Cech complex, for

a suitable small value of , we ﬁnd that rank H1(C) = 4 instead of the desired rank of 1.

The simple solution—make bigger–is not always as helpful as it seems; see Figure 6(b).

Here we have thickened the data cloud (green region), to represent the eﬀect of choosing a

larger value 0> . Although we have successfully closed the three small holes in (a), a new

hole has formed and this time rank H1(C0) = 2, again not the desired value.

This phenomenon suggests that instead of computing homology for a Cech or Rips com-

plex for a single value of , we instead compute homology for several values of , and consider

the image of the homomorphism

Hi(R(X)) →Hi(R0(X))

for < 0. This construction is known as persistent homology becuse it picks out those

homology classes already existing in Cwhich persist when we move to the larger complex C0.

9

Equivalently, we consider the k-cycles of Cmodulo those which can be expressed as

boundaries of (k+ 1)-chains in C0. In the example of Figure 6, a 1-cycle encircling any

of the three small holes in Cbecomes the boundary of a 2-chain when we move to C0.

On the other hand, the newly-created hole in C0does not correspond to any 1-cycle in C

itself. Thus the persistent homology with respect to , 0detects only a single nontrivial

1-dimensional homology class, coming from the obvious cycle which encircles the annulus.

This is the approach we adopt. We will select diﬀerent length scales for our complexes,

which we believe will be of the right scale to capture the features we are interested in, and

so that any spurious classes vanish under passage to the longer length scale.

Note: The idea of considering homology for Cech complexes of varying length scales and

deﬁning persistent homology groups was introduced by H. Edelsbrunner in [2]. An eﬀective

algorithm for simultaneously computing all the persistent homology groups over an interval

range of values for , 0is given in [2].

4 The Tangent Complex

In this section, we will consider subsets Xof Euclidean space Rn, which in many cases

are contractible, but which nevertheless carry features which we would intuitively regard as

qualitative. The idea of this section is that it is possible to make a construction on X, whose

homotopy type is sensitive to non-smooth features in X.

Deﬁnition 4.1 Let X⊆Rn. We deﬁne the open tangent complex to X,T0(X)to be the

subset of X×Sn−1deﬁned by

T0(X) = (x, v) : lim

t→0+

d(x+tv, X )

t= 0

where d(ξ, X)denotes infx∈Xd(ξ , x). We deﬁne the closed tangent complex T(X)to be the

closure of T0(X)in X×Sn−1.

Note ﬁrst that T(X) comes equipped with a projection p:T(X)→X. For any x∈X, we

will denote by Tx(X) the ﬁber at x, i.e. p−1(x). There is also the projection q:T(X)→Sn−1.

We have the following two useful propositions concerning this construction.

Proposition 4.2 Suppose that x∈Xis a smooth point of X, i.e. so that there is a

neighborhood Uof xin Rn, and a smooth function f:U→Rm, so that

•U∩X=f−1(0)

•Df(ξ)has rank mfor every ξin U

Then Tx(X)∼

=Sn−m−1.

Example 4.3 Let Lbe a line in the xy-plane, given by the equation ξ·(x−x0) = 0, for

vectors ξand x0. Then we have q(T(L)) = {±η}, where ηis a unit vector perpendicular

to ξ, and

T(L)∼

=L× {±η}

More generally, Let W⊆Rnbe the hyperplane determined by the equation ξ·(x−x0) = 0,

where ξand x0are n-vectors. Then T(W)∼

=W×S(ξ), where S(ξ)denotes the unit sphere

10

in the plane of vectors perpendicular to ξ. This result holds with Wreplaced by any halfplane

or quadrant in W.

It is typically easy to work directly with the deﬁnition of the tangent complex in the case

of one-dimensional objects in the plane.

Example 4.4 Consider the example in the introduction, with X⊆R2,X=R+× {0} ∪

{0} × R+.

We evaluate the ﬁbers Tx(X) directly. For any smooth point x, the ﬁber will consist

of two distinct points, i.e. a zero dimensional sphere S0. For points along the x-axis, the

two points will be (x, (1,0)) and (x, (−1,0)), and along the y-axis, they will be (x, (0,1))

and (x, (0,−1)). At the origin, though, the ﬁber T(0,0) (X) consists of four points, namely

((0,0),(±1,0)) and ((0,0),(0,±1)). We can easily verify that the tangent complex is actually

the union of two pieces, one from the tangent complex of R+×0 and the other from the

tangent complex of 0 ×R+:

T(R+× {0}) = (R+× {0})× {±e1}

and

T({0} × R+) = ({0} × R+)× {±e2}

Thus T(X) is equal to:

(R+× {0})×{±e1} ∪ ({0} × R+)×{±e2}

It is easy to see that this space is a disjoint union of four distinct half lines.

Example 4.5 Let X⊆R3be the boundary of the positive octant, i.e.

X={(x, y, z) : x, y, z ≥0,and one of x,y,z is equal to zero}

In this case, Xis a union of three pieces, namely the intersections of Xwith the three co-

ordinate planes. Denote the intersection of Xwith the xy-plane by Xxy , and let Xyz and Xxz

be the other intersections. Each of these intersections is a quadrant in the corresponding

coordinate plane. From the previous example, we ﬁnd that T(Xxy)∼

=Xxy ×S1

xy, where S1

xy

denotes the unit circle in the xy-plane. There are similar descriptions for each of the other

coordinate planes. If we now examine the ﬁbers of the projection T(X)→X, we ﬁnd the

following.

•For any smooth point v(i.e. any point of Xwhich does not lie on a coordinate axis),

the ﬁber Tv(X) is a circle.

•For any point vwhich lies on a coordinate axis, but which is not equal to the “cone

point” (0,0,0), Tv(X) is the union of two circles which overlap at a a pair of antipodal

points.

•For the cone point, we have T(0,0,0) (X) is homeomorphic to the union of three circles

which pairwise overlap at pairs of antipodal points.

In order to analyze some higher dimensional examples, we will give a result which analyzes

the eﬀect of taking the product of a set in Rnwith a copy of R. We ﬁrst recall the notion of

the join.

11

Deﬁnition 4.6 Let X⊆Sn−1⊆Rn. By the join of Xwith Sk−1⊆Rk, we will mean all

points (x, v)in Sn+k−1⊆Rn+kso that x

kxk∈Xwhenever x6= 0.

The join has an intrinsic meaning in terms of Xwithout reference to the embedding. The

join of Xand Yis denoted by X∗Y, and is deﬁned to be the quotient X×Y×[0,1]/', where

'is the equivalence relation generated by the equivalences (x, y, 0) '(x0, y, 0) for all x, x0,

and (x, y, 1) '(x, y0,1) for all y, y0. The join of any space Xwith Skis homeomorphic to

the (k+ 1)-fold suspension of the space. In particular, we have Sn∗Sm∼

=Sn+m+1.

Proposition 4.7 Let X⊆Rn, and let Y=X×R⊆Rn+1. Then the ﬁbre T(x,t)(Y)is equal

to the join of the ﬁber Tx(X)with S0⊆R. Informally we say that T(x,t)(Y)is the ﬁberwise

join of T(X)with the 0-sphere.

To illustrate the application of this idea, suppose that Xis obtained by folding a plane

in R3along a line in the plane. For example, consider the set X={(x, y, z)|x≥0, y ≥

0,and x= 0 or y= 0}. This set is the product of the set Yin R2given by Y={(x, y)|x≥

0, y ≥0,and x= 0 or y= 0}and R. We analyzed the tangent complex for Yin this set

in Example 4.4 above, and found that the ﬁber T(0,0)(Y) consisted of four distinct points.

Proposition 4.7 now tells us that the ﬁber T(0,0,0)(X) is the join of these four distinct points

with the S0.

Note that this ﬁber is homeomorphic to the union of two circles along two points, as in

In general, it is possible to give an explicit description of the ﬁbers Tx(X) in the case

when xis a conelike singular point.

12

Deﬁnition 4.8 For any subset Lof Sn−1⊆Rn, we deﬁne the cone on L,cL, to be the set

cL ={rv|r∈[0,1],and v∈L}. Let X⊆Rn. We say x∈Xis a conelike point in X

if there is a neighborhood Ucontaining xin Rn, with boundary ∂U , so that there is a map

f:U Dn, which is smooth and has a smooth inverse, so that f(X∩U) = c(f(X∩∂U )). In

other words, the singularity is locally diﬀeomorphic to the cone on the space f(X∩∂U ).

Remark: Conelike singularities are common. For instance, if Xis an algebraic variety,

and xis an isolated singular point, then xis conelike in the above described sense.

It is possible to analyze the ﬁber Tx(X) in the case of a conelike singularity. Since the

topological type of the tangent complex is unchanged by smooth changes of coordinates, it

is enough to study the case of cL, where L⊆Sn−1⊆Rn.Lis a subset of Rn, and as such we

may study its tangent complex T(L). For each x∈L, we have the ﬁber Tx(L)⊆Rn×Sn−1. If

we let q:Rn×Sn−1→Sn−1denote the projection, we obtain the subset q(Tx(L)) ⊆Sn−1. In

order to describe T(cL), we coordinatize the cone cL via coordinates (t, λ), where 0 ≤t≤1,

and λ∈L, with all points with t= 0 being identiﬁed with the single cone point. Here tis

the parameter describing the line segment from a point x∈Lto the cone point.

Proposition 4.9 T(cL)is described as follows.

•For t > 0,T(t,λ)(cL)is the join of Tλ(L)with S0, so is homeomorphic to the suspension

of Tλ(L).

•Let pdenote the cone point, i.e. the origin. Then

q(Tp(cL)) = [

λ∈L

S0∗q(Tλ(L))

Example 4.10 Consider the cone singularity, which occurs at the point (0,0,0) of the sub-

set Xof R3deﬁned by

x2+y2=z2and z≥0

In this case the ﬁber Tv(X) consists of a circle for all points vaway from the origin, since

these points are all smooth. However, the ﬁber at the origin is given by

T(0,0,0)(X)∼

={ξ∈S2such that ξ·(0,0,1) ≤1/2}

This space is homeomorphic to an annulus S1×[0,1].

5 Homology detection of singular points

In this section, we will show that in many cases, homology groups can be used to detect and

distinguish between singular points. Let X⊆Rnbe a subset. What we will show is that for

many choices of Xand x∈X, the Betti numbers βkwill provide useful information about

the nature of the point x.

13

Example 5.1 Suppose that xis a smooth point in X, i.e. a point for which there is a

neighborhood U⊆Xof x, so that Uis diﬀeomorphic to a Euclidean disc Dkfor some k.

Then βj(Tx(X)) = 0 for j6= 0, k −1, and βj(Tx(X)) = 1 for j= 0, k −1

Example 5.2 Suppose that Xis a union of llines in R2, intersecting in a single point p.

Then β0(Tp(X)) = 2l, and βi(Tp(X)) = 0 otherwise.

Example 5.3 We consider the case where Xis the surface of a polyhedron. There are now

three distinct possibilities for a point P∈X, namely

1. Pis in the interior of a face of X

2. Plies on an edge of X, but is not a vertex.

3. Pis a vertex

It turns out that in all cases, β0(TP(X)) = 1, and that βi(TP(X)) = 0 for i≥2. We examine

the behavior of β1

1. Pis in the interior of a face of the polyhedron. In this case, Pis a smooth point of X,

so TP(X)is a circle. This tells us that β1(TP(X)) = 1.

2. In this case, a local smooth model for the space Xnear xis as the product of a line with

the space Y⊆R2which is the union of the non-negative xand y-axes. It now follows

from Proposition 4.7 that TP(X)is the join of S0with T(0,0) (Y), which is the union

of two circles with intersection a pair of distinct points. It is now readily veriﬁed that

β1(TP(X)) = 3.

3. In this case, we must count the number Nof faces containing P, or equivalently the

number of edges containing P.TP(X)is a union of Ncircles, with each pair of circles

intersecting in a pair of distinct points, and where all of the pairs of points are disjoint.

One ﬁnds that β1(TP(X)) = 1 + PN−1

i=0 2i= 1 + N(N−1). Note that in this case

N≥3.

Observe that all the diﬀerent cases are distinguished by the value of β1on TP(X).

6 Locating singular points

In the last section, we have shown how to use homology to determine whether or not a given

point is a singular point, and what type it is. An important question, though, is whether

one can use homological methods to locate singular points without prior knowledge of where

they might be. The key idea is the following.

14

Proposition 6.1 Let X⊆Sn−1⊆Rn, and as before let CX ⊆Rndenote the cone on X.

Let pdenote the cone point. Then the inclusion Tp(CX)→T(CX)is a homotopy equiv-

alence, and hence induces an isomorphism on homology. More generally, let CRXdenote

{z∈CX :kzk ≤ R}. Then Tp(CRX)→T(CRX)is also a homotopy equivalence.

Proof. There is a smooth deformation retraction of CX into the single point p. It is covered

by a deformation retraction of T(CX) into Tp(CX)

This means that if we have found a conelike neighborhood of a conelike singular point,

we can compute the homology of the ﬁber over the singular point. This fact suggests the

existence of an algorithm for location of singular points in that portion a set Xwhich is

contained in a rectangular subset U⊆Rn, consisting of the following steps.

1. Compute H∗(T(X)). If the homology is that of a smooth subset, i.e. H∗(T(X)) '

H∗(Sk) for some k, then we assume that the rectangular region in question does not

contain any singular points, and we remove this rectangular region from consideration.

2. Divide the rectangular region into a family of smaller rectangular regions {Uα}α∈A, say

by bisecting or trisecting in each of the coordinate directions.

3. Apply step 1 to each of the smaller windows, retaining only those rectangular regions Uα

in which H∗(T(X∩Uα)) is not that of a sphere.

4. Repeat step 3 until one arrives at a suﬃciently good approximation to the singular set.

Remark. The assumption that the “homological standardness” of the intersection X∩Uα

implies that there are no singular points of Xin Uαis not a rigorous one. It is surely possible

to construct situations where H∗(T(X∩Uα)) is isomorphic to H∗(Sk) for some k, but where

X∩Uαdoes contain singular points. However, one generally expects that homological

complexity of Tx(X) will carry into homological complexity of T(X∩Uα). If one suspects

that one has missed a singular point, though, one can subdivide the region more ﬁnely, and

begin at a ﬁner level of subdivision.

Remark. As we have described the algorithm above, it is designed to search for all possible

singular points. However, it is possible to modify it to search for singular points of a particular

type. For instance, if one is searching for the vertices of a cube, and is not interested in the

edges, one can use the calculations in Example 5.3 to see that if one’s criterion for retaining

a rectangular region is that β1(T(X∩Uα)) ≥7, one will locate the vertices.

7 Point cloud approximation to T X

In order to apply the ideas described above to point cloud data, an attractive option is to ﬁnd

a method for associating to a set of point cloud data D ⊆ Rnwhich is obtained by sampling

from a geometric object Xa new set of point cloud data T(D) which one believes is what

one might obtain by sampling a ﬁnite set of points from T(X). There are many subtle and

interesting issues regarding such constructions, and many natural ways in which one might

proceed. One problem with all these methods is that they construct very large complexes.

We plan to discuss these issues in a systematic way in a future paper, but for the present we

will restrict ourselves to an ad hoc construction of a simplicial complex which is well related

15

to the tangent complex T(X), and for which the algorithm described above successfully

locates the singular set in a number of examples. The goal throughout the construction is

to make sure that not only is the vertex set as small as possible, but that the collections

of simplices should also be as small as possible. Therefore, in addition to choosing a small

vertex set, we use a criterion described below to “prune” edges. Our construction proceeds

as follows.

We suppose that we know the dimension of the original subset X, say l. The construction

begins by selecting a set B={β1, β2,...,βN}of base points from D. In order to maximize

coverage of the space by these points, one chooses them in a way which is biased in favor

of large interpoint distances. Speciﬁcally, a relatively large set Ris sampled from D, then

the sequence of points {βi}is chosen from Rin such a way that βiis the furthest point

in Rfrom the collection {β1, β2,...,βi−1}. The number of Nbase points is set in advance.

At each base point β, we ﬁnd the knearest neighbors {βi, β2, . . . , βk}to βin the set D,

where kis a parameter we choose beforehand. We then perform local principal component

analysis [9] to obtain the best linear subspace approximation to Dnear β, and we write Lβ

for this subspace. For us, this means that we form the n×kmatrix Awhose columns are

the diﬀerences {β1−β, β2−β, . . . βk−β}, then construct the covariance matrix C=AAT.

We then diagonalize this matrix, and let Lβbe the span of the eigenvectors corresponding

to the llargest eigenvalues. If the set of llargest eigenvalues doesn’t “stand out”, we assume

that there is not a natural best ﬁtting l-dimensional linear subspace, and we omit the base

point β. Our criterion for “standing out” is as follows. We let λ1≤λ2≤ · · · ≤ λldenote the

llargest eigenvalues of the matrix C, and our criterion for inclusion is that λl/λ1should be

less than a ﬁxed threshold, which is a parameter in the algorithm. We also choose parameters

δ,ρ, and a parameter ν. We next build a small simplicial complex whose vertex set is Bby

considering the Rips complex on Bfor a suitable value of ν, and then removing edges in a

way which is biased in favor of short edges and against 2-simplices with small angles. This

is done as follows. For each edge e={β1, β2}, we let L(e) denote the length of e. For any

other edge e0={β1, β0}, which contains β1as a vertex, we deﬁne σ(e, e0) to be the length of

the vector

β2−β1

kβ2−β1k−β0−β1

kβ0−β1k

and similarly for edges e00 ={β0, β2}. We let θ(e) denote the minimum value of σ(e, e0), as

e0varies over all edges which share a vertex with e. We now assign to the edge ethe score

L(e)

θ(e)1.5. We let Sdenote the subcomplex of the Rips complex obtained by removing all edges

whose score is greater than a certain threshhold. This threshhold is also a parameter in the

algorithm. We have constructed a small complex modelling the base space, i.e. the original

data set. In order to build a complex Tfor the tangent complex, we proceed as follows. For

each β∈ B, we now sample a ﬁxed number tof points {vβ

1, vβ

2,...,vβ

t}uniformly from the unit

sphere in Lβ. The vertex set of Tis the set {(β, vβ

i)}β∈B,1≤i≤t. We deﬁne a graph structure

on this set as follows. For every pair of points {(β, vβ

i),{(β, vβ

j)}, we insert this potential edge

if and only if d(vi, vj)≤δ. For βand β0which are adjacent in S, we deﬁne a bipartite graph

structure on the set V(β, β0) = {vβ

1, vβ

2,...,vβ

t} ∪ {vβ0

1, vβ0

2,...,vβ0

t}as the intersection of two

bipartite graph structures Γ1and Γ2on V(β , β0). A pair {(β, vβ

i),(β0, vβ0

j)}spans an edge in

Γ1if and only if d(vβ

i, vβ0

j)≤pδ2+ρ2. In Γ2, we say {(β, vβ

i),(β0, vβ0

j)}spans an edge if and

16

only if vβ

iis among the mclosest points to vβ0

jin the set {vβ

1, vβ

2, . . . , vβ

t}and vβ0

jis among

the mclosest points to vβ

iin {vβ0

1, vβ0

2,...,vβ0

t}.mis again a parameter. We insert the edge

{(β, vβ

i),(β0, vβ0

j)}if and only if it is an edge both in Γ1and Γ2. If β, β0∈ B are not adjacent

in the complex S, we do not insert any edges of the form {(β , vβ),(β0, vβ0)}. This completes

the construction of the complex T. The rationale for this complicated construction is that

it in practice succeeds in removing small loops which otherwise distort the calculation.

8 Sample Results

We show the results of running our algorithm on various example point sets. The reader

will notice that in some cases, the singular set we obtain is “chunky”, i.e. that we have only

obtained a neighborhood of the singular set. This performance can certainly be improved

with more sampling. The purpose of this paper is show the validity of the concept, rather

than to demonstrate a fully optimized algorithm.

Figure 7: This ﬁgure shows the results of applying the algorithm to a curve with intersections in the plane.

Increasing redness indicates longer survival under the algorithm, and so the “reddest points” are those found

by the algorithm to be singular points. In this case, the algorithm searches for small sets for which the

tangent complex has more than two components, i.e for which β0>2. In this example, 5000 points were

used, and the algorithm had a running time of c:a 10 seconds.

References

[1] Carlsson, Gunnar and de Silva, Vin, Synthetic Delaunay triangulations, (in preparation).

[2] Edelsbrunner, Herbert, Letscher, David and Zomorodian, Afra, Topological persistence

and simpliﬁcation, Discrete Comput. Geom. 28 (2002), 511-533.

[3] Fan, Ting-Jun, Describing and Recognizing 3D Objects Using Surface Properties,

Springer Verlag, Berlin–New York, 1990.

17

Figure 8: These ﬁgures show the results of applying the algorithm to two curved surfaces which meet

transversely in a curve, which becomes the singular locus of the union of the two surfaces. This is obtained

by searching for small sets for which the tangent complex has β1>1. In both cases, point clouds of 20,000

points were used, with a running time of c:a 2 minutes.

[4] Federer, Herbert, Geometric measure theory, Die Grundlehren der mathematischen Wis-

senschaften, Band 153, Springer-Verlag New York Inc., New York 1969.

[5] Fisher, Robert B., From Surfaces to Objects: Computer Vision and Three-Dimensional

Scene Analysis, John Wiley and Sons, New York, 1989.

[6] Greenberg, Marvin J. and Harper, John R., Algebraic topology. A ﬁrst course, Mathe-

matics Lecture Note Series, 58 . Benjamin/Cummings Publishing Co., Inc., Advanced

Book Program, Reading, Mass., 1981.

[7] Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. The elements of statistical

learning. Data mining, inference, and prediction, Springer Series in Statistics. Springer-

Verlag, New York, 2001.

[8] Hatcher, Allen, Algebraic topology, Cambridge University Press, Cambridge, 2002.

[9] Jolliﬀe, I.T., Principal component analysis, Springer Series in Statistics. Springer-Verlag,

New York, 1986.

18

Figure 9: This ﬁgure shows the results for a search for a vertex in a portion of the surface of a cube. In

this case, the search is for small sets for which the tangent complex has β1>3. Sample size: 20,000 points,

running time: c:a 1.5 minutes.

00.2 0.4 0.6 0.8 100.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 10: This ﬁgure shows the results for a search for the edges in a three simplex. Sample size: 10,000

points, running time: c:a 2 minutes.

Figure 11: Results of a search for the edges in an icosahedron. Sample size: 20,000 points, running time: c:a

3 minutes.

19

Figure 12: A two dimensional projection of the results of searching for the vertices in a 4-simplex. Sample

size: 80,000 points, running time: c:a 5 minutes.

20