A Generic Scheme for the Design of Efficient On-Line Algorithms for Lattices.
-
Article: Building and Maintaining Analysis-Level Class Hierarchies Using Galois Lattices
[show abstract] [hide abstract]
ABSTRACT: Software reuse is one of the most advertised advantages of object-orientation. Inheritance, in all its forms, plays an important part in achieving greater reuse, at all stages of development. Class hierarchies start taking shape at the analysis level, where classes that share application-significant data and application-meaningful external behavior are grouped under more general classes. At the design level, such hierarchies are augmented with implementation classes, and possibly reorganized to take into account implementation factors such as performance or code reuse [22]. Getting the analysis-level hierarchy "right" is very important for the understandability and traceability of the models and the reusability of the resulting code [22]. In this paper, we propose a formal method that organizes a set of class interfaces into a lattice structure called Galois Lattice [10]. Such a lattice has several advantages including: 1) embodying protocol conformance, 2) supporting an incremental up...03/1995; -
Article: An Incremental Concept Formation Approach for Learning from Databases.
Theor. Comput. Sci. 01/1994; 133:387-419. -
SourceAvailable from: Robert Godin
Article: Incremental concept formation algorithms based on Galois (concept) lattices
[show abstract] [hide abstract]
ABSTRACT: . The Galois (or concept) lattice produced from a binary relation has been proved useful for many applications. Building the Galois lattice can be considered as a conceptual clustering method since it results in a concept hierarchy. This article presents incremental algorithms for updating the Galois lattice and corresponding graph, resulting in an incremental concept formation method. Different strategies are considered based on a characterization of the modifications implied by such an update. Results of empirical tests are given in order to compare the performance of the incremental algorithms to three other batch algorithms. Surprisingly, when the total time for incremental generation is used, the simplest and less efficient variant of the incremental algorithms outperforms the batch algorithms in most cases. When only the incremental update time is used, the incremental algorithm outperforms all the batch algorithms. Empirical evidence shows that, on the average, the incremental u...01/1999;
Page 1
A generic scheme for the design of efficient
on-line algorithms for lattices
Petko Valtchev1, Mohamed Rouane Hacene1, and Rokia Missaoui2
1DIRO, Universit´ e de Montr´ eal, C.P. 6128, Succ. “Centre-Ville”,
Montr´ eal, Qu´ ebec, Canada, H3C 3J7
2D´ epartement d’informatique et d’ing´ enierie, UQO, C.P. 1250, succursale B
Gatineau, Qu´ ebec, Canada, J8X 3X7
Abstract. A major issue with large dynamic datasets is the process-
ing of small changes in the input through correspondingly small re-
arrangements of the output. This was the motivation behind the de-
sign of incremental or on-line algorithms for lattice maintenance, whose
work amounts to a gradual construction of the final lattice by repeat-
edly adding rows/columns to the data table. As an attempt to put the
incremental trend on strong theoretical grounds, we present a generic
algorithmic scheme that is based on a detailed analysis of the lattice
transformation triggered by a row/column addition and of the underly-
ing sub-structure. For each task from the scheme we suggest an efficient
implementation strategy and put a lower bound on its worst-case com-
plexity. Moreover, an instanciation of the incremental scheme is presented
which is as complex as the best batch algorithm.
1Introduction
Formal concept analysis (FCA) [5] studies the lattice structures built on top
of binary relations (called concept lattices or Galois lattices as in [1]). As a
matter of fact, the underlying algorithmic techniques are increasingly used in the
resolution of practical problems from software engineering [6], data mining [7]
and information retrieval [3].
Our study investigates the new algorithmic problems related to the analysis of
volatile data sets. As a particular case, on-line or incremental lattice algorithms,
as described in [8, 3], basically maintain lattice structures upon the insertion of
a new row/column into the binary table. Thus, given a binary relation K and its
corresponding lattice L, and a new row/column o, the lattice L+corresponding
to the augmented relation K+= K ∪ {o} is computed. Most of the existing on-
line algorithms have been designed with practical concerns in mind, e.g., efficient
handling of large but sparse binary tables [8] and therefore prove inefficient
whenever data sets get denser [9].
Here, we explore the suborder of L+made up of all new nodes with respect
to L and use an isomorphic suborder of L (the generators of the new nodes) that
works as a guideline for the completion of L to L+. Structural properties of the
latter suborder underly the design of a generic completion scheme, i.e., a sequence
Page 2
of steps that can be separately examined for efficient implementations. As a first
offspring of the scheme, we describe a novel on-line algorithm that relies both on
insights on the generator suborder and on some cardinality-based reasoning while
bringing down the overall cost of lattice construction by subsequent completions
to the current lower bound for batch construction.
The paper starts by recalling some basic FCA results (Section 2) and funda-
mentals of lattice construction (Section 3). The structure of the generator/new
suborders in the initial/target lattice, respectively, is then examined (Section 4).
Next, a generic scheme for lattice completion is sketched and for each task of the
scheme implementation, directions are discussed (Section 5). Finally, the paper
presents an effective algorithm for lattice maintenance and clarifies its worst-case
complexity (Section 6).
2Formal concept analysis background
FCA [5] studies the partially ordered structure, known under the names of Galois
lattice [1] or concept lattice, which is induced by a binary relation over a pair of
sets O (objects) and A (attributes).
Definition 1. A formal context is a triple K = (O,A,I) where O and A are
sets and I is a binary (incidence) relation, i.e., I ⊆ O × A.
Within a context (see Figure 1 on the left), objects are denoted by numbers and
attribute by small letters. Two functions, f and g, summarize the context-related
links between objects and attributes.
Definition 2. The function f maps a set of objects into the set of common
attributes, whereas g3is the dual for attribute sets:
– f : P(O) → P(A), f(X) = X?= {a ∈ A|∀o ∈ X,oIa}
– g : P(A) → P(O), g(Y ) = Y?= {o ∈ O|∀a ∈ Y,oIa}
For example, f(14) = fgh4. Furthermore, the compound operators g ◦ f(X)
and f ◦g(Y ) are closure operators over P(O) and P(A) respectively. Thus, each
of them induces a family of closed subsets, called Coand Carespectively, with f
and g as bijective mappings between both families. A couple (X,Y ), of mutually
corresponding closed subsets is called a (formal) concept.
Definition 3. A formal concept is a couple (X,Y ) where X ∈ P(O), Y ∈ P(A),
X = Y?and Y = X?. X is called the extent and Y the intent of the concept.
Thus, (178,bcd) is a concept, but (16,efh) is not. Moreover, the set CKof
all concepts of the context K = (O,A,I) is partially ordered by intent/extent
inclusion: (X1,Y1) ≤K(X2,Y2) ⇔ X1⊆ X2(Y2⊆ Y1).
3Hereafter, both f and g are denoted by?.
4We use a separator-free form for sets, e.g. 127 stands for {1,2,7} and g(abc) = 127
w.r.t. the table K in figure 1, on the left, and ab for {a,b}.
Page 3
a b c d e f g h
X X X
2 XX
3X X X X X
4
5
6 X X X X
7X X X
8X
9 X X
1X X
X
X XX
X X X
#1
12345678
#2
1267
c
13678d
#3
g
h134 135
#4
#5
#6
#7
#8 #9
#11#12
#13
acbcd
dgh
efh
bcdghdefgh
abcdefgh
31
261671335
6
abcd
#10
Fig.1. Left: Binary table K = (O = {1,2,...,8},A = {a,b,...,h},R) and the object 9.
Right: The Hasse diagram of the lattice derived from K.
Theorem 1. The partial order L = ?CK,≤K? is a complete lattice with joins
and meets as follows:
–?k
The Hasse diagram of the lattice L drawn from K = ({1,2,...,8},A,R) is
shown on the right-hand side of Figure 1 where intents and extents are drawn
on both sides of a node representing a concept. For example, the join and the
meet of c#6 = (26,ac) and c#3 = (13678,d) are (12345678,∅) and (6,abcd)
respectively.
i=1(Xi,Yi) = ((?k
i=1Xi)??,?k
i=1Yi),
i=1Yi)??).
–?k
i=1(Xi,Yi) = (?k
i=1Xi,(?k
3Constructing the lattice efficiently
A variety of efficient algorithms exists for constructing the concept set or the
entire concept lattice of a context (see [9] for a detailed study). As we are in-
terested in incremental algorithms as opposed to batch ones, we consider those
two groups separately.
3.1Batch approaches
The construction of a Galois lattice may be carried out with two different levels
of output structuring. Indeed, one may look only for the set C of all concepts of
a given context K without any hierarchical organization:
Problem Compute-Concepts
Given : a context K = (O,A,I),
Find : the set C of all concepts from K.
Page 4
An early FCA algorithm has been suggested by Ganter [4] based on a particular
order among concepts that helps avoid computing a given concept more than
once.
However, of greater interest to us are algorithms that not only discover C,
but also infer the lattice order ≤, i.e., construct the entire lattice L. This more
complex problem may be formalized as follows:
Problem Compute-Lattice
Given : a context K = (O,A,I),
Find : the lattice L = ?C,≤? corresponding to K.
Batch algorithms for the Compute-Lattice problem have been proposed first
by Bordat [2] and later on by Nourine and Raynaud [10]. The former algorithm
relies on structural properties of the precedence relation in L to generate the
concepts in an appropriate order. Thus, from each concept the algorithm gener-
ates its upper covers which means that a concept will be generated a number of
times that corresponds to the number of its lower covers. Recently, Nourine and
Raynaud suggested an efficient procedure for constructing a family of open sets
and showed how it may be used to construct the lattice (see Section 5.4).
There is a known difficulty in estimating the complexity of lattice construc-
tion algorithms uniquely with respect to the size of the input data. Actually,
there is no known bound (other than the trivial one, i.e., the number of all sub-
sets of O or A) of the number of concepts depending on the dimensions of the
binary relation, i.e., the size of the object set, of the attribute set, or of the binary
relation. Even worse, it has been recently proven that the problem of estimating
the size of L from K is #P-complete. For the above reasons, it is admitted to
include the size of the result, i.e., the number of the concepts, in the complexity
estimation. Thus, with |L| as a factor, the worst-case complexity expression of
the classical algorithms solving Compute-Concept is O((k + m)lkm), where
l = |L|, k = |O|, and m = |A|. The algorithm of Bordat can be assessed to be of
complexity O((k+m)l|I|) where the size of the binary relation (i.e., the number
of positive entries in K) is taken into account. Finally, the work of Nourine and
Raynaud has helped reduce the complexity order of the problem to O((k+m)lk).
3.2Incremental approaches
On-line or incremental algorithms do not actually construct the lattice, but
rather maintain its integrity upon the insertion of a new object/attribute into
the context:
Problem Compute-Lattice-Inc
Given : a context K = (O,A,I) with its lattice L and an object o,
Find : the lattice L+corresponding to K+= (O ∪ {o},A,I ∪ {o} × o?).
Page 5
Obviously, the problem Compute-Lattice may be polynomially reduced to
Compute-Lattice-Inc by iterating Compute-Lattice-Inc on the entire set
O (A). In other words, an (extended) incremental method can construct the lat-
tice L starting from a single object o1and gradually incorporating any new object
oi(on its arrival) into the lattice Li−1(over a context K = ({o1,...,oi−1},A,I)),
each time carrying out a set of structural updates.
Godin et al. [8] suggested an incremental procedure which locally modifies
the lattice structure (insertion of new concepts, completion of existing ones, dele-
tion of redundant links, etc.) while keeping large parts of the lattice untouched.
The basic approach follows a fundamental property of the Galois connection es-
tablished by f and g on (P(O), P(A)): both families Coand Caare closed under
intersection [1]. Thus, the whole insertion process is aimed at the integration
into Li−1of all concepts whose intents correspond to intersections of {oi}?with
intents from Ca
(further called new concepts in N+(o)), are inserted into the lattice at a par-
ticular place, i.e., each new concept is preceded by a specific counterpart from
the initial lattice, called its generator (the set of generators is denoted G(o)).
Two other categories of concepts in L = Li−1are distinguished: modified (M(o))
concepts correspond to intersections of {oi}?with members of Ca
exist in Ca
old or unchanged. In the final lattice L+= Li, the old concepts preserve all their
characteristics, i.e., intent, extent as well as upper and lower covers. Generators
do not experience changes in their information content, i.e., intent and extent,
but a new concept is added to their upper covers. In a modified concept, the
extent is augmented by the new object o while in the set of its lower covers, any
generator is replaced by the corresponding new concept. In the next sections, we
shall stick to this intuitive terminology, but we shall put it on a formal ground
while distinguishing the sets of concepts in the initial lattice (M(o) and G(o))
from their counterparts in the final one (M(o)+and G(o)+, respectively).
i−1, which are not themselves in Ca
i−1. These additional concepts
i−1that already
i−1, while the remaining set of concepts in the initial lattice are called
Example 1 (Insertion of object 9). Assume L is the lattice induced by the ob-
ject set 12345678 (see Figure 1 on the right) and consider 9 as the new ob-
ject. The set of unchanged concepts has two elements, {c#6,c#10}, where as
the set of modified and generators are M(o) = {c#1,c#2,c#3,c#4,c#5,c#8} and
G(o) = {c#7,c#9,c#11,c#12,c#13} respectively. The result of the whole oper-
ation is the lattice L in Figure 2. Thus, the set of the new concept intents is:
{cd,fh,cdgh,dfgh,cdfgh}.
Another incremental algorithm for lattice construction has been suggested
by Carpineto and Romano [3].
In a recent paper [11], we generalized the incremental approach of Godin
et al.. For this purpose, we applied some structural results from the lattice as-
sembly framework defined in [14]. In particular, we showed that the incremental
problem Compute-Lattice-Inc is a special case of the more general lattice
assembly problem Assembly-Lattice. More recently, we have presented a the-
oretical framework that clarifies the restructuring involved in the resolution of
Page 6
#1
#2 #4
#5
#6#8
#9
#12
#16
#11
1
#18
9
#17
#13
123456789
1349
1359
1679139
167 1939 35
3
c12679
d 136789#3
gh
ac 26dghfh359
bcd cdgh
dfghefh
abcd6
#10
bcdgh
cdfgh
defgh
abcdefgh
cd
#7
#14
#15
Fig.2. The Hasse diagram of the concept (Galois) lattice derived from K with O =
{1,2,3,...,9}.
Compute-Lattice-Inc [13] and further enables the design of procedures that
explore only a part of the lattice L (see Section 6).
In the next section, we recall the basic results from our framework.
4Theoretical foundations
For space limitation reasons, only key definitions and results that help the un-
derstanding of the more topical developments are provided in this section.
First, a set of mappings is given linking the lattices L and L+5. The mapping
σ sends a concept from L to the concept in L+with the same intent whereas
γ works other way round, but respects extent preservation (modulo o). The
mappings χ and χ+send a concept in L to the maximal element of its class []Q
in L and L+, respectively.
Definition 1 Assume the following mappings:
– γ : C+→ C with γ(X,Y ) = (X1,X?
– σ : C → C+with σ(X,Y ) = (Y?,Y ) where Y?is computed in K+,
– χ : C → C with χ(X,Y ) = (Y?
– χ+: C → C+with χ+(X,Y ) = (Y?
The above mappings are depicted in Figure 3. Observe that σ is a join-preserving
order embedding, whereas γ is a meet-preserving function with γ ◦ σ = idC.
Moreover, both mappings underly the necessary definitions (skipped here) for
the sets G(o) and M(o) in L and their counterparts G+(o) and M+(o) in L+
to replace the intuitive descriptions we used so far.
1), where X1= X − {o},
1,Y??
1), where Y1= Y ∩ {o}?,
1,Y1), where Y1= Y ∩ {o}?(?over K+).
5In the following, the correspondence operator?is computed in the respective context
of the application co-domain (i.e. K or K+).
Page 7
???????????
???????????
???????????
???????????
???????????
???????????
???????????
G+(o)
???????????
???????????
N(o)
???????????
???????????
???????????
???????????
M+(o)
???????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
?????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
G(o)
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
???????????
?????????
?????????
?????????
?????????
?????????
???????????
???????????
???????????
M(o)
???????????
???????????
?????????
?????????
?????????
?????????
?????????
?????????
???????
???????
???????
???????
???????
???????
???????
???????
???????
???????
γ
σ
χ
µ(ο)
L
2
Q
+
L
+
χ
χ
o’
Fig.3. The lattices L, L+and 2Arelated by the mappings χ, χ+, σ, γ and Q.
A first key result states that G(o) and M(o) are exactly the maximal concepts
in the equivalence classes induced by the function Q : C → 2Adefined as Q(c) =
Y ∩ {o}?where c = (X,Y ). Moreover, the suborder of L made up of G(o) and
M(o) is isomorphic, via χ+, to ↑ ν(o), i.e., the prime filter of L+generated by
the minimal concept including o. Consequently, (G(o)∪M(o),≤) is a meet-semi-
lattice.
Finally, the precedence order in L+evolves from the precedence in L as
follows. Given a new concept c, its generator σ(c) is a lower cover of c while
the possible other lower covers of c (Covl(c)) lay in N+(o). The upper covers of
c are the concepts from M+(o) ∪ N+(o), that correspond, via σ, to the upper
covers of the generator σ(c) in the semi-lattice (G(o) ∪ M(o),≤). The latter set
may be extracted from the set of actual upper covers of σ(c) in L, Covl(σ(c)),
by considering the maxima of their respective classes for Q, i.e., the values of χ
on Covl(sigma(c)), and keeping only the minimal values of those values. With a
modified concept c in M+(o), its lower covers in L+differ from the lower covers
of γ(c) in L by (i) the (possible) inclusion of concepts from N+(o), and (ii) the
removal of all members of G+(o). These facts are summarized as follows:
Property 1 The relation ≺+is obtained from ≺ as follows:
≺+=
∪ {(c,¯ c) | c ∈ N+(o), ¯ c ∈ Min({χ(ˆ c) | γ(c) ≺ ˆ c})}
∪ {(c1,c2) | (γ(c1),γ(c2)) ∈ (≺ − G(o) × M(o))}
{(σ(γ(c)),c) | c ∈ N+(o)}
5A generic scheme for incremental lattice construction
The structural results from the previous paragraphs underlie a generic procedure
that, given an object o, transforms L into L+.
5.1Principles of the method
A generic procedure solving Compute-Lattice-Inc may be sketched out of the
following main tasks: (i) partition of the concepts in L into classes (by comput-
Page 8
ing intent intersections), (ii) detection of maxima for every class []Q and test
of its status, i.e., modified or generator, (iii) update of modified concepts, (iv)
creation of new elements and computation of their intent and extent, (v) com-
putation of lower and upper covers for each new element, and (vi) elimination
of obsolete links for each generator. These tasks, when executed in the previ-
ously indicated order, complete a data structure representing the lattice L into
a structure representing L+as shown in Algorithm 1 hereafter.
1: procedure Compute-Lattice-Inc(In/Out: L = ?C,≤? a lattice; In: o an object)
2:
3: for all c in C do
4:Put c in its class in L/Qw.r.t. Q(c)
5: for all []Q in L/Qdo
6:Find c = max([]Q)
7:
if Intent(c) ⊆ o?then
8: Put c in M(o)
9:
else
10:Put c in G(o)
11: for all c in M(o) do
12: Extent(c) ← Extent(c) ∪ {o}
13: for all c in G(o) do
14:
ˆ c ← New-Concept(Extent(c) ∪ {o}?,Q(c))
15:Put ˆ c in N(o)
16: for all ˆ c in N(o) do
17:Connect ˆ c as an upper cover of its generator c
18:
Compute-Upper-Covers(ˆ c,c)
19: for all c in G(o) do
20:
for all ¯ c in Covu(c) ∩ M(o) do
21:Disconnect c and ¯ c
Algorithm 1: Generic scheme for the insertion of a new object into a concept
(Galois) lattice.
The above procedure is an algorithmic scheme that generalizes the existing
incremental algorithms in the sense of specifying the full scope of the work to
be done and the order of the tasks to be carried out. However, the exact way
a particular algorithm might instantiate the scheme deserves a further clarifi-
cation. On one hand, some of the tasks might remain implicit in a particular
method. Thus, the task (i) is not explicitly described in most of the methods
from the literature, except in some recent work on lattice-based association rule
mining [13, 12]. However, all incremental methods do compute the values of the
function Q for every concept in L, as a preliminary step in the detection of class
maxima. On the other hand, there is a large space for combining subtasks into
larger steps, as major existing algorithms actually do. For example, the algo-
rithms in [8, 3] perform all the sub-tasks simultaneously, whereas Algorithm 7
in [13] separates the problem into two stages: tasks (i−iii) are first carried out,
Page 9
followed by tasks (iv−vi). In the next paragraphs, we discuss various realizations
of the above subtasks.
5.2Partitioning of C into classes []Q
All incremental algorithms explore the lattice, most of the time in a top-down
breadth-first traversal of the lattice graph. Classes are usually not directly ma-
nipulated. Instead, at each lattice node, the status of the corresponding concept
within its class is considered. Classes are explicitly considered in the methods
described in [13, 12], which, although designed for a simpler problem, i.e., up-
date of (Ca,⊆) and Ca, respectively, can be easily extended to first-class methods
for Compute-Lattice-Inc. Both methods apply advanced techniques in order
to avoid the traversal of the entire lattice when looking for class maxima. The
method in [13] skips the entire class induced by the empty intersection, i.e.,
Q−1(∅). Except for small and very dense contexts where it can even be void,
Q−1(∅) is by far the largest class, and skipping it should result in substantial
performance gains. An alternative strategy consists to explore class convexity
(see Property 2 below) in order to only partially examine each class [12]. For
this purpose, a bottom-up (partial) traversal of the lattice is implemented: when-
ever a non-maximal member of a class is examined, the method “jumps” straight
to the maximum of that class.
5.3Detection of class maxima
Top-down breadth-first traversal of the lattice eases the direct computation of
each class maxima, i.e., without constructing the class explicitly. The whole
traversal may be summarized as a gradual computation of the functions Q. Thus,
it is enough to detect each concept c that produces a particular intersection Int =
Intent(c)∩o?, for the first time. For this task, the method of Godin et al. relies
on a global memory for intersections that have already been met. This approach
could be efficiently implemented with a trie structure which helps speed-up the
lookups for a particular intersection (see Algorithms 3 and 4 in [13]). However, we
suggest here another technique, based exclusively on locally available information
about a lattice node. The technique takes advantage of the convexity of the
classes []Q:
Property 2 All classes []Qin L, are convex sets:
∀c,¯ c,c ∈ C,c ≤ c ≤ ¯ c and [¯ c]Q= [c]Q⇒ [¯ c]Q= [c]Q.
In short, for a non-maximal element c, there is always an upper cover of c, say ¯ c,
which is in [c]Q. Thus, the status of c in [c]Qcan be established by only looking
at its upper covers. Moreover, as Q is a monotonous function (c ≤ ¯ c entails
Q(¯ c) ⊆ Q(c)), one set inclusion can be tested on set sizes.
Page 10
5.4Computation of the upper covers of a new concept
Given a generator c, “connecting” the new concept ˆ c = χ+(c) in the lattice
requires the upper and lower covers of ˆ c. A top-down breadth-first traversal of
L allows the focus to be limited on upper covers while the work on lower cov-
ers is done for free. Moreover, at the time ˆ c is created, all its upper covers in
L+are already processed so they are available for lookup and link creation.
In [8], a straightforward technique for upper cover computation is presented
which amounts to looking for all successors of c that are not preceded by an-
other successor. A more sophisticated technique as in [10] uses a property of
the set difference between extents of two concepts (sometimes called the face
between the concepts in the literature). The property states that a concept c
precedes another concept ¯ c in the lattice, iff for any object ¯ o in the set differ-
ence Extent(¯ c)−Extent(c), the closure of the set {¯ o}∪Extent(c) is Extent(¯ c):
Property 3 For any c = (X,Y ),¯ c = (¯ X,¯Y ) ∈ L, c ≺ ¯ c iff¯ X − X = {¯ o ∈
O|({¯ o} ∪ X)??=¯ X}.
This is easily checked through intersections of concept intents and a subsequent
comparison of set cardinalities. To detect all upper covers of a concept c =
(X,Y ), one needs to check the closures of {¯ o} ∪ X for every ¯ o ∈ O − X and
select successors of c that satisfy the above property. This leads to a complexity
of k(k+m) per concept, where k comes from the factor O−X and m is the cost
of set-theoretic operations on intents.
To further cut the complexity of the task, we suggest a method that should
at least improve the practical performances. It can be summarized as follows
(see [14] for details). First, instead of considering all the potential successors
of a new concept c, we select a subset of them, Candidates = {χ+(¯ c) | ¯ c ∈
Covu(γ(c))}, i.e., the images by χ+of all upper covers of the generator γ(c).
Candidates is a (not necessarily strict) subset of ↑ c − {c}, whereby the con-
vexity of the classes []Q and the monotonicity of Q, insure the inclusion of all
upper covers of Covu(c) = min(↑ c − {c}) in the former set. Since the concepts
in Covu(c) coincide with the minima of Candidates, the former set can be com-
puted through a direct application of a basic property of formal concepts stating
that extent faces between c and the members of Covu(c) are pairwise disjoint.
Property 4 For any c = (X,Y ) ∈ L, and ¯ c1 = (¯ X1,¯Y1),¯ c2 = (¯ X2,¯Y2) ∈
Covu(c),¯ X1∩¯ X2= X.
For any ˆ c = (ˆ X1,ˆY1) from Candidates − Covu(c) there is an upper cover ¯ c =
(¯ X,¯Y ) such that ¯ c ≤ ˆ c whenceˆ X ∩¯ X =¯ X ⊇ X, where X is the extent of
c. The elements of Candidates − Covu(c) can therefore be filtered by a set of
inclusion tests on Candidates. To do this efficiently and avoid testing of all
possible couples, a buffer of attributes can be used to cumulate all the faces of
valid upper covers of c that are met so far. Provided that candidates are listed
in an order compatible with ≤ (so that smaller candidates are met before larger
ones), a simple intersection with the buffer is enough to test whether a candidate
is un upper cover or not. This above filtering strategy eliminates non-minimal
Page 11
candidates while also discarding copies of the same concept (as several upper
covers of c may belong to the same class). Finally, the computation of χ+which
is essential for the upward detection of class maxima is straightforward: while
modified concepts in L take their own σ values for χ+(same intent), generators
take the respective new concept, and unchanged concepts simply “inherit” the
appropriate value from an upper cover that belongs to the same class []Q.
To assess the cost of the operation, one may observe that |Covu(γ(c))| oper-
ations are needed, which is at most d(L), i.e., the (outer) degree of the lattice
taken as an oriented graph. Moreover, the operations of extent intersection and
union, with ordered sets of objects in concept extents takes linear time in the
size of the arguments, i.e., no more than k = |O|. Only a fixed number of such
operations are executed per member of Candidates, so the total cost is in the
order of O(kd(L)). Although the complexity order remains comparable to O(k2),
the factor d(L) will be most of the time strictly smaller than k, and, in sparse
datasets, the difference could be significant.
5.5Obsolete link elimination
Any modified ˆ c which is an immediate successor of a generator ¯ c in L should
be disconnected from ¯ c in L+since χ+(ˆ c) is necessarily an upper cover of the
corresponding new element c = χ+(¯ c):
Property 5 For any ¯ c ∈ G(o), ˆ c ∈ M(o) : ¯ c ≺ ˆ c ⇒ ˆ c ∈ min({χ+(ˆ c) | ˆ c ∈
Covu(¯ c)}).
As the set Covu(¯ c) is required in the computation of Covu(c), there is no addi-
tional cost in eliminating ˆ c from the list of the upper covers of ¯ c. This is done
during the computation of Candidates. Conversely, deleting ¯ c from the list of
the lower covers of ˆ c (if such list is used), is done free of extra effort, i.e., by
replacing ¯ c with c = χ+(¯ c).
6 An efficient instantiation of the scheme
The algorithm takes a lattice and a new object6and outputs the updated lattice
using the same data structure L to represent both the initial and the resulting
lattices. The values of Q and χ+are supposed to be stored in a generic structure
allowing indexing on concept identifiers (structure ChiPlus).
First, the concept set is sorted to a linear extension of the order ≤ required
for the top-down traversal of L (primitive Sort on line 3). The overall loop
(lines 4 to 20) examines every concept c in L and establishes its status in [c]Q
by comparing |Q(c)| to the maximal |Q(¯ c)| where ¯ c is an upper cover of c (line
6). To this end, the variable new-max is used. Initialized with the upper cover
maximizing |Q| (line 5), new-max eventually points to the concept in L+whose
intent equals Q(c), i.e., χ+(c). Class maxima are further divided into modified
6The set A is assumed to be known from the beginning, i.e., {o}?⊆ A.
Page 12
and generators (line 7). A modified concept c (lines 8 to 10) has its extent
updated. Then, such a c is set as its own value for χ+, χ+(c) = c (via new-max).
Generators, first, give rise to a new concept (line 12). Then, the values of χ+for
their upper covers are picked up (in the Candidates list, line 13) to be further
filtered for minimal concepts (Min-Closed, line 14). Minima are connected to
the new concept and those of them which are modified in L are disconnected
from the generator c (lines 15 to 17). Finally, the correct maximum of the class
[c]Qin L+, i.e., χ+(c) is set (line 18) and the new concept is added to the lattice
(line 19). At the end of the loop, the value of χ+is stored for further use.
1: procedure Add-Object(In/Out: L = ?C,≤? a lattice; In: o an object)
2:
3: Sort(C)
4: for all c in C do
5:new-max ← argmax({|Q(¯ c)| | ¯ c ∈ Covu(c)})
6:
if |Q(c)| ?= |Q(new-max)| then
7:
if |Q(c)| = |Intent(c)| then
8:Extent(c) ← Extent(c) ∪ {o}
9:
M(o) ← M(o) ∪ {c}
10: new-max ← c
11:
else
12:
ˆ c ← New-Concept(Extent(c) ∪ {o}?,Q(c))
13:Candidates ← {ChiPlus(¯ c) | ¯ c ∈ Covu(c)}
14:
for all ¯ c in Min-Closed(Candidates) do
15:
New-Link(ˆ c,¯ c)
16:
if ¯ c ∈ M(o) then
17:
Drop-Link(c,¯ c)
18:new-max ← ˆ c
19:
L ← L ∪ {ˆ c}
20:ChiPlus(c) ← new-max
Algorithm 2: Insertion of a new object into a Galois lattice.
{c is modified}
{c is generator}
Example 2. Consider the same situation as in Example 1. The trace of the algo-
rithm is given in the following table which provides the intent intersection and
the χ+image for each concept. Concepts in L+are underlined to avoid confusion
with their counterparts in L).
c
Q(c) χ+(c) Cat.
c#1
∅
c#4
gc#4 mod. c#5
h
c#7
cd c#14 gen. c#8 dgh
c#10
cdc#14
old
c#11cdgh c#16 gen. c#12dfgh c#17 gen.
c#13cdfgh c#18 gen.
To illustrate the way our algorithm proceeds, consider the processing of concept
c#12= (3,defgh). The value of Q(c#12) is dfgh whereas Candidates contains
c
Q(c) χ+(c) Cat.
cc#2 mod. c#3
c#5 mod. c#6
c#8 mod. c#9
c
Q(c) χ+(c) Cat.
dc#3 mod.
cc#2
fhc#15 gen.
c#1 mod. c#2
old
Page 13
the images by χ+of the upper covers of c#12, i.e., c#8and c#9: Candidates=
{c#8= (139,dgh),c#15= (359,fh)}. Obviously, neither of the intents is as big
as Q(c#12), so c#12is a maximum, more precisely a generator. The new concept,
c#17is (39,dfgh) and its upper covers are both concepts in Candidates (since
both are incomparable). Finally, as c#8is in M(o), its link to c#12is removed.
6.1Complexity issues
Let ∆(l) = |C+|−|C| and let us split the cost of a single object addition into two
factors: the cost of the traversal of L (lines 3−7 and 20 of Algorithm 2) and the
cost of the restructuring of L, i.e., the processing of class maxima (lines 8−19).
First, as sorting concepts to a linear extension of ≤ only requires comparison of
intent sizes, which are bound by m, it can be done in O(l). Moreover, the proper
traversal takes O(l) concept examinations which are all in O(k + m). Thus, the
first factor is in O(l(k + m)). The second factor is further split into modified
and generator costs whereby the first cost is linear in the size of M(o) (since
lines 8 − 10 may be executed in constant time even with sorted extents) and
therefore could be ignored. The generator-related cost has a factor ∆(l) whereas
the remaining factor is the cost of creating and properly connecting a single new
concept. The dominant component of the latter is the cost of the lattice order
update (lines 14−17) which is in O(k2) as we mentioned earlier. Consequently,
the global restructuring overhead is in O(∆(l)k2). This leads to a worst case
complexity of O(∆(l)k2+l(k+m)) for a single insertion, which is a lower bound
for the complexity of Compute-Lattice-Inc (see also [11]).
The assessment of the entire lattice construction via incremental updates is
delicate since it requires summing on all k insertions whereas the cost of steps 1
to k − 1 depends on parameters of the intermediate structures. Once again, we
sum on the above high-level complexity factors separately. Thus, the total cost
of the k lattice traversals is bound by k times the cost of the most expensive
traversal (the last one), i.e., it is in O(kl(k + m)). The total cost of lattice
restructuring is in turn bound by the number of all new concepts (the sum of
∆(li)) times the maximal cost of a new concept processing. The first factor is
exactly l = |C+| since each concept in the final lattice is created exactly once
which means the restructuring factor of the construction is in O(l(k + m)k),
thus leading to a global complexity in the same class O(l(k + m)k). The above
figures indicate that the complexity of Compute-Lattice, whenever reduced
to a series of Compute-Lattice-Inc, remains in the same class as the best
known lower bound for batch methods [10].
7Conclusion
The present study is motivated by the need for both efficient and theoretically-
grounded algorithms for incremental lattice construction. In this paper, we com-
plete our own characterization of the substructure that should be integrated into
the initial lattice upon each insertion of an object/attribute into the context.
Page 14
Moreover, we show how the relevant structural properties support the design
of an effective maintenance methods which, unlike previous algorithms, avoid
redundant computations. As guidelines for such design, we provide a generic
algorithmic scheme that states the limits of the minimal work that needs to be
done in the restructuring. A concrete method that instantiates the scheme is
proposed whose worst-case complexity is O(ml + ∆(l)k2), i.e., a function which
puts a new and smaller upper bound for the cost of the problem Compute-
Lattice-Inc. Surprisingly enough, when applied as a batch method for lattice
construction, the new algorithm shows the best known theoretical complexity,
O((k + m)lk), which is only achieved by one algorithm. As a next stage of our
study, we are currently examining the pragmatic benefits of the scheme, i.e., the
practical performances of specific scheme instantiations.
References
[1] M. Barbut and B. Monjardet. Ordre et Classification: Alg` ebre et combinatoire.
Hachette, 1970.
[2] J.-P. Bordat.Calcul pratique du treillis de Galois d’une correspondance.
Math´ ematiques et Sciences Humaines, 96:31–47, 1986.
[3] C. Carpineto and G. Romano. A Lattice Conceptual Clustering System and Its
Application to Browsing Retrieval. Machine Learning, 24(2):95–122, 1996.
[4] B. Ganter. Two basic algorithms in concept analysis. preprint 831, Technische
Hochschule, Darmstadt, 1984.
[5] B. Ganter and R. Wille. Formal Concept Analysis, Mathematical Foundations.
Springer-Verlag, 1999.
[6] R. Godin and H. Mili. Building and maintaining analysis-level class hierarchies
using Galois lattices. In Proceedings of OOPSLA’93, Washington (DC), USA,
special issue of ACM SIGPLAN Notices, 28(10), pages 394–410, 1993.
[7] R. Godin and R. Missaoui. An Incremental Concept Formation Approach for
Learning from Databases. Theoretical Computer Science, 133:378–419, 1994.
[8] R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms
based on galois (concept) lattices.Computational Intelligence, 11(2):246–267,
1995.
[9] S. Kuznetsov and S. Ob’edkov. Algorithms for the Construction of the Set of
All Concept and Their Line Diagram. preprint MATH-AL-05-2000, Technische
Universit¨ at, Dresden, June 2000.
[10] L. Nourine and O. Raynaud. A Fast Algorithm for Building Lattices. Information
Processing Letters, 71:199–204, 1999.
[11] P. Valtchev and R. Missaoui. Building concept (Galois) lattices from parts: gener-
alizing the incremental methods. In H. Delugach and G. Stumme, editors, Proceed-
ings, ICCS-01, volume 2120 of Lecture Notes in Computer Science, pages 290–303,
Stanford (CA), USA, 2001. Springer-Verlag.
[12] P. Valtchev and R. Missaoui. A Framework for Incremental Generation of Frequent
Closed Itemsets. Discrete Applied Mathematics, submitted.
[13] P. Valtchev, R. Missaoui, R. Godin, and M. Meridji. Generating Frequent Itemsets
Incrementally: Two Novel Approaches Based On Galois Lattice Theory. Journal
of Experimental & Theoretical Artificial Intelligence, 14(2-3):115–142, 2002.
[14] P. Valtchev, R. Missaoui, and P. Lebrun. A partition-based approach towards
building Galois (concept) lattices. Discrete Mathematics, 256(3):801–829, 2002.