A Fast Algorithm for Finding Matching Responses in a Survey Data Table
ABSTRACT The paper addresses an algorithm to perform an analysis on survey data tables with some irreliable entries. The algorithm has almost linear complexity depending on the number of elements in the table. The proposed technique is based on a monotonicity property. An implementation procedure of the algorithm contains a recommendation that might be realistic for clarifying the analysis results. Keywords: Survey; Boolean; Data Table; Matrix. 1.
 [show abstract] [hide abstract]
ABSTRACT: This paper deals with maximization of set functions defined as minimum values of monotone linkage functions. In previous research, it has been shown that such a set function can be maximized by a greedy type algorithm over a family of all subsets of a finite set. In this paper, we extend this finding to meetsemilattices.We show that the class of functions defined as minimum values of monotone linkage functions coincides with the class of quasiconcave set functions. Quasiconcave functions determine a chain of upper level sets each of which is a meetsemilattice. This structure allows development of a polynomial algorithm that finds a minimal set on which the value of a quasiconcave function is maximum. One of the critical steps of this algorithm is a set closure. Some examples of closure computation, in particular, a closure operator for convex geometries, are considered.Discrete Applied Mathematics. 01/2008;  SourceAvailable from: Vadim Levit[show abstract] [hide abstract]
ABSTRACT: In this article, we have investigated monotone linkage functions defined on convex geometries, antimatroids, and semilattices in general. It has been shown that the class of functions defined as minimum values of monotone linkage functions has close relationship with the class of quasiconcave set functions. Quasiconcave functions defined on semilattices, antimatroids and convex geometries determine special substructures of these set families. This structures allow building efficient algorithms that find minimal sets on which values of quasiconcave functions are maximum. The mutual critical step of these algorithms is how to describe the set closure operator. If an efficient algorithm of the closure construction exists it causes the optimization algorithm to be efficient as well. On the other hand, we think that the closure construction problem is interesting enough to be investigated separately. Thus, we suppose that for an arbitrary semilattice the problem of closure construction has exponential complexity. An interesting direction for future work is to develop our methods for relational databases, where a polynomial algorithm for closure construction is known [3]. We have considered some applications of quasiconcave functions to clustering a structured data set, where together with pairwise similarities between objects, we are also given additional information about objects organization. We focused on a simple structure  a partition model of data where the objects are a priori partitioned into groups. While clustering such data, we also considered an additional requirement of being able to differentiate between pairwise similarities across different partite sets. Existing clustering methods do not solve this problem, since they are limited to finding clusters in a collection of isolated objects. The requirement of differentially treating pairwise relationships across different groups was modeled by a multipartite graph along with a hierarchical relationship between these groups. The problem was reduced to finding the cluster (subgraph) of the highest density in the multipartite graph.11/2008; , ISBN: 9789537619275  SourceAvailable from: Joseph Emmanuel Mullat[show abstract] [hide abstract]
ABSTRACT: This article reports not only theoretical solution of bargaining problem as used by game theoreticians but also an adequate calculus. By adequate calculus we understand an algorithm that can lead us to the result within reasonable timetable using either the computing power of nowadays computers or widely accepted classical Hamiltonian method of function maximization with constraints. Our motive is quite difficult to meet but we hope to move in this direction in order to close the gap at least for one nontrivial situation on Boolean Tables.01/2002;
Page 1
A
s
z
F
t
'
I
!
(
D
A
t
\ /
2
.
H

5
\J
a
r
o
\oI
;J
z
o
t'){\)
6
I
;J
(n
7
o
U)
ooo
ooo
a())
\o
UI
(,
I
l.J
Lal
F+)
FD
ct)
F}
FD
oe
o

H
!
'
S
s
O
q
\
o
F
S
*
F
r
o
€
v)
CD
a
o
o
Em
Page 2
Mathematical Social Sciences 30 (1995) 195205
NorthHolland, ELSEVIER. We alert the readers’ obligation
with respect to copyrighted material.
1
A fast algorithm for finding matching responses in a
survey data table *
Joseph E. Mullat
Byvej 269, 2650 Hvidovre, Copenhagen, Denmark **
Abstract
The paper addresses an algorithm to perform an analysis on survey data tables with some irreliable entries. The al
gorithm has almost linear complexity depending on the number of elements in the table. The proposed technique is
based on a monotonicity property. An implementation procedure of the algorithm contains a recommendation that
might be realistic for clarifying the analysis results.
Keywords:
Survey; Boolean; Data Table; Matrix.
1. Introduction
Situations in which customer responses being studied are measured by means of survey
data arise in the market investigations. They present problems for producing longterm
forecasts because the traditional methods based on counting the matching responses in the
survey with a large customer population are hampered by unreliable human nature in the
answering and recording process. Analysis institutes are making considerable and expensive
efforts to overcome this uncertainty by using different questioning techniques, including pri
vate interviews, special arrangements, logical tests, “random” data collection, questionnaire
scheme preparatory spot tests, etc. However, percentages of responses representing the sta
tistical parameters rely on misleading human nature and not on a normal distribution. It ap
pears thereby impossible to exploit the most simple null hypothesis technique because the
distributions of similar answers are unknown. The solution developed in this paper to over
come the hesitation effect of the respondent, and sometimes unwillingness, rests on the idea
of searching socalled “agreement lists” of different questions. In the agreement list, a sig
nificant number of respondents do not hesitate in choosing the identical answer options,
thereby expressing their willingness to answer. These respondents and the agreement lists
are classified into some twodimensional lists – "highly reliable blocks".
*
The idea explained also in http://www.datalaundering.com/download/cleaning.pdf appears to be clear for those indifferent
to higher level of abstraction.
**Residence in Denmark since 1980, Ph.D. in computer science, assoc. Prof., economic division,
Tallinn Technical University, Estonia (from 1979  1980).
Page 3
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
2
For survey analysts with different levels of research experience, or for the people mostly
interested in receiving results by their methods, or merely for those who are familiar with
only one, "the best survey analysis technique", our approach has some advantages. Indeed,
in the survey, data are collected in such a way that can be regarded as respondents answer
ing a series of questions. A specific answer is an option such as displeased, satisfied, well
contented, etc. Suppose that all respondents participating in the survey have been inter
viewed using the same questionnaire scheme. The resulting survey data can then be ar
=
q
X
, where
q
ranged in a table
〉〈
ix
ix is a Boolean vector of options available, while the
respondent i is answering the question q. In this respect, the primary table X is a collection
of Boolean columns where each column in the collection is filled with Boolean elements
from only one particular answer option. Our algorithm will always try to detect some highly
reliable blocks in the Table X bringing together similar columns, where only some trust
worthy respondents are answering identically. Detecting these blocks, we can separate the
survey data. Then, we can reconstruct the data back from those blocks into the primary sur
′
〈
=
′
q
X
format, where some "nonmatching/ doubtful" answers are re vey data table
〉
ix
moved. Such a "dataswitch" is not intended to replace the researchers’ own methods, but
may be complementary used as a "preliminary data filter”  separator. The analysts’ conclu
sions will be more accurate after the dataswitch has been done because each filtered data
item is a representative for some "well known subtables".
Our algorithm in an ordinary form dates back to Mullat (1971). At first glance, the ordi
nary form seems similar to the greedy heuristic (Edmonds 1971), but this is not the case.
The starting point for the ordinary version of the algorithm is the entire table from which the
elements are removed. Instead, the greedy heuristic starts with the empty set, and the ele
ments are added until some criterion for stopping is fulfilled. However, the algorithm devel
oped in the present paper is quite different. The key to our paper is that the properties of the
algorithm remain unchanged under the current construction. For matching responses in the
Boolean table, it has a lower complexity.
Page 4
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
3
The monotone property of the proposed technique  “monotone systems idea”  is a
common basis for all theoretical results. It is exactly the same property (iii) of submodular
functions brought up by Nemhauser et al. (1978, p.269). Nevertheless, the similarity does
not itself diminish the fact that we are studying an independent object, while the property
(iii) of submodular set functions is necessary, but not sufficient.
From the very start, the theoretical apparatus called the "monotone system" has been de
voted to the problem of finding some parts in a graph that are more "saturated" than any
other part with "small" graphs of the same type (see Mullat, 1976). Later, the graph presen
tation form was replaced by a Markov chain where the rowscolumns may be split imple
menting the proposed technique into some sequence of submatrices (see Mullat, 1979).
There are numerous applications exploiting the monotone systems ideas; see Ojaveer et al.
(1975). Many authors have developed a thorough theoretical basis extending the original
conception of the algorithm; see Libkin et al. (1990) and Genkin and Muchnik (1993).
The rest of the paper is organized as follows. In S Se ec ct ti io on n 2 2, a reliability criterion will be
defined for blocks in the Boolean table B. This criterion guarantees that the shape of the
top set of our theoretical construction is a submatrix  a block; see the P Pr ro op po os si it ti io on n 1 1. How
ever, the point of the whole monotone system idea is not limited by our specific criterion as
described in S Se ec ct ti io on n 2 2. This idea addresses the question: How to synthesize an analysis
model for data matrix using quite simple rules? In order to obtain a new analysis model, the
researcher has only to find a family of π functions suitable for the particular data. The
shape of top sets for each particular choice of the family of π functions might be different;
see the note prior to our formal construction. For practical reasons, especially in order to
help the process of interpretation of the analysis results, in S Se ec ct ti io on n 3 3 there are some recom
mendations on how to use the algorithm on the somewhat extended Boolean tables
±
B .
S Se ec ct ti io on n 4 4 is devoted to an exposition of the algorithm and its formal mathematical proper
ties, which are not yet utilized widely by other authors.
Page 5
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
4
2. Reliability Criterion
In this Section we deal with the criterion of reliability for blocks in the Boolean tables
originating from the survey data. In our case we analyze the Boolean table
〉〈
=
jibB
repre
senting all respondents
〉〈
n ,...,i ,...,
1
, but including only some columns
〉〈
m ,...j,...,
1
from
the primary survey data table
〉〈
=
q
ixX
; see above. The resulting data of each table B can
be arranged in a
mn×
matrix. Those Boolean tables are then subjected to our algorithm
separately, for which reason there is no difference between any subtable in the primary sur
vey data and a Boolean table. A typical example is respondent satisfaction with services of
fered, where
1
=
jib
if respondent i is satisfied with a particular service j level, and
0
=
jib
if he is unsatisfied. Thus, we analyze any Boolean table of the survey data inde
pendently.
Let us find a column j with the most significant frequency F of 1elements among all
columns and throughout all rows in table B. Such rows arrange a
1
=
g
one column subt
able pointing out only those respondents who prefer one specific most significant column j.
We will treat, however, a more general criterion. We suggest looking at some significant
number of respondents where at least F of them are granting at least g Boolean 1elements
in each single row within the range of a particular number of columns. Those columns ar
range what we call an agreement list,
,...
3
,g
2
=
; g is an agreement level.
The problem of how to find such a significant number of respondents, where the F cri
terion reaches its global maximum, is solved in S Se ec ct ti io on n 4 4. An optimum table
*
S , which
represents the outcome of the search among all “subsets” H in the Boolean table B, is the
solution; see Theorem I. The main result of the Theorem I ensures that there are at least F
positive responses in each column in table
*
S . No superior subtable can be found where the
number of positive responses in each column is greater than F . Beyond that, the agreement
level is at least equal to
,...
3
,g
2
=
in each row belonging to the best subtable
*
S ; g is the
number of positive responses within the agreement list represented by columns in subtable
Page 6
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
5
*
S . In case of an agreement level
1
=
g
, our algorithm in S Se ec ct ti io on n 4 4 will find out only one
column j with the most significant positive frequency F among all columns in table B and
throughout all respondents, see above. Needless to say that it is worthless to apply our al
gorithm in that particular case
1
=
g
, but the problem becomes fundamental as soon as
,...
3
,g
2
=
.
Let us look at the problem more closely. The typical attitude of the respondents towards
the entire list of options  columns in table B can be easily "accumulated" by the total num
ber of respondent i positive hits  options selected:
∑
=
,...,
1
=
mj
jii
br
.
Similarly, each column  option can be measured by means of the entire Boolean table B as
∑
=
1
=
n,...,i
jij
bc
.
It might appear that it should be sufficient to choose the whole table B to solve our
problem provided that
n ,...,
1
i , gri
=≥
. Nevertheless, let us look throughout the whole
table and find the worse case where the number
m ,...,
1
j ,cj
=
reaches its minimum F .
Strictly speaking, it does not mean that the whole table B is the best solution just because
some "poor" columns (options with rare responses  hits) may be removed in order to raise
the worstcase criterion F on the remaining columns. On the other hand, it is obvious that
while removing "poor" columns, we are going to decrease some ir numbers, and now it is
not clear whether in each row there are at least
,...
3
,g
2
=
positive responses. Trying to
proceed further and removing those "poor" rows, we must take into account that some of
j c numbers decrease and, consequently, the F criterion decreases as well. This leads to
the problem of how to find the optimum subtable
*
S , where the worst case  F criterion
reaches its global maximum? The solution is in S Se ec ct ti io on n 4 4.
Page 7
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
6
Finally, we argue that the intuitively well adapted model of 100% matching 1blocks is
ruled out by any approach trying to qualify the real structure of the survey data. It is well
known that the survey data matrices arising from questionnaires are fairly empty. Those ma
trices contain plenty of small 100% matching 1blocks, whose individual selection makes no
sense. We believe that the local worst case criterion F top set, found by the algorithm, is a
reasonable compromise. Instead of 100% matching 1blocks, we detect somewhat blocks
less than 100% filled with 1elements, but larger in size.
3. Recommendations
We consider the interpretation of the survey analysis results as an essential part of the re
search. This Section is designed to give a guidance on how to make the interpretation proc
ess easier. In each survey data it is possible to conditionally select two different types of
questions: (1) The answer option is a fact, event, happening, issue, etc.; (2) The answer is
an opinion, namely displeased, satisfied, well contented etc.; see above. It does not appear
from the answer to options of type 1, which of them is positive or negative, whereas type 2
allows us to separate them. The goal behind this splitting of type 2 opinions is to extract
from the primary survey data table two Boolean subtables: table
+
B , which includes type 1
options mixed with the positive options from type 2 questions, and table
−
B where type 1
options are mixed together with the negative type 2 options  opinions. It should be noticed
that doing it this way, we are replacing the analysis of primary survey data by two Boolean
tables where each option is represented by one column. Tables
+
B and
−
B are then sub
jected to the algorithm separately.
To initiate our procedure, we construct a subtable
+
1
K implementing the algorithm on
table
+
B . Then, we replace subtable
+
1
K in
+
B by zeros, constructing a restriction of table
±
B . Next, we implement the algorithm on this restriction and find a subtable
+
2
K , after
which the process of restrictions and subtables sought by the algorithm may be continued.
For practical purposes we suggest stopping the extraction with three subtables:
+
1
K ,
+
2
K
and
+
3
K . We can use the same procedure on the table
−
B , extracting subtables
−
1
K ,
−
2
K
and
−
3
K .
Page 8
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
7
The number of optionscolumns in the survey Boolean tables
±
B is quite significant.
Even a simple questionnaire scheme might have hundreds of options  the total number of
options in all questions. It is difficult, perhaps almost impossible, within a short time to ob
serve those options among thousands of respondents. Unlike Boolean tables
±
B , the subt
ables
±
1 ,,
32
K
have reasonable dimensions. This leads to the following interpretation opportu
nity: the positive options in
+
1 ,,
32
K
tables indicate some most successful phenomena in the
research while the negative options in
−
1 ,,
32
K
point in the opposite direction. Moreover, the
positive and negative subtables
±
1 ,,
32
K
enable the researcher in a short time to “catch” the
“sense” in relations between the survey options of type 1 and positive/negative options of
the type 2. For instance, to observe all Pearson’s r correlation’s a calculator has to perform
)mn(O
2
⋅
operations depending on the
mn×
table dimension, nrows and mcolumns.
The reasonable dimensions of the subtables
±
1 ,,
32
K
can reduce the amount of calculations
drastically. Those subtables  blocks
±
1 ,,
32
K
, which we recommend to select in the next Sec
tion as indexfunction
)H(F
top sets found via the algorithm, are not embedded and may
not have intersections; see the P Pr ro op po os si it ti io on n 1 1. Concerning the interpretation, it is hoped that
this simple approach can be of some use to researchers in elaborating their reports with re
gard to the analysis of results.
4. Definitions and Formal Mathematical Properties of the Algorithm
In this Section, our basic approach is formalized to deal with the analysis of the Boolean
mn×
table B, nrows and mcolumns. Henceforth, the table B will be the Boolean ta
ble
±
B  see above  representing certain optionscolumns in the survey data table. Let us
consider the problem of how to find a subtable consisting of a subset
columns in the original table B with the properties: (1) that
max
b
S
of the rows and
≥
g
and (2) the
∑
j
=
jii
r
minimum over j of
∑
i
=
jij
bc
is as large as possible, precisely – the global maximum. The
following algorithm solves the problem.
Page 9
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
8
Algorithm.
Step I. To set the initial values.
1i. Set minimum and maximum bounds a ,b on threshold u for
j c values.
Step A. To find that the next step B produces a nonempty subtable.
/ )ba(
+
If it succeeds, replace a by u. If it fails replace b by u.
2a. Go to 1a.
Step B. To test whether the minimum over j can be at least u.
1a. Test u as
2
using step B.
1b. Delete all rows whose sums
This step B fails if all must be deleted; return to step A.
gri<
.
2b. Delete all columns whose sums
This step B fails if all must be deleted, return to step A.
3b. Perform step T if none deleted in 1b and 2b; otherwise go to 1b.
ucj≤
.
Step T. To test that the global maximum is found.
1t. Among numbers
With this new value as u test performing step B.
If it succeeds, return to step A. If it fails final stop.
jc find the minimum.
Step B performed through the step T tests correctly whether a submatrix of B can have
the rows sums at least g and the column sums at least u. Removing row i, we need to
perform no more than m operations to recalculate
j c values; removing column j, we need
no more than noperations. We can proceed through 1b no more than ntimes and
through 2b, mtimes. Thus, the total number of operations in step B is
)nm(O
. The step
A tests the step B no more than
n
2
log
times. Thus, the total complexity of the algorithm is
)nmn×
2
O(log
operations.
Note. It is important to keep in mind that the algorithm itself is a particular case of our
theoretical construction. As one can see, we are deleting rows and columns including their
elements all together, thereby ensuring that the outcome from the algorithm is a submatrix.
But, in order to expose the properties of the algorithm, we look at the Boolean elements
separately. However, in our particular case of π functions it makes no difference. The dif
ference will be evident if we utilize some other family of π functions, for instance
Page 10
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
9
) c , r
i
max(c
jj
=
π
. We may detect top binary relations, which we call kernels, different
from submatrices. It may happen that some kernel includes two blocks  one quite long in
the vertical direction and the other  in the horizontal. All elements in the empty area be
tween these blocks in some cases cannot be added to the kernel. In general, we cannot
guarantee either the above low complexity of the algorithm for all families of π functions,
but the complexity still remains in reasonable limits.
We now consider the properties of the algorithm in a rigorous mathematical form. Below
we use the notation
BH ⊆
. The notation H contained in B will be understood in an or
dinary settheoretical vocabulary, where the Boolean table B is a set of its Boolean
1elements. All 0elements will be dismissed from the consideration. Thus, H as a binary
relation is also a subset of a binary relation B. However, we shall soon see that the top bi
nary relations  kernels from the theoretical point of view are also submatrices for our spe
cific choice of π functions. Below, we refer to an element we assume that it is a Boolean
1element.
For an element
B
∈α
in the row i and column j we use the similarity index
j c
=
π
if
gri≥
and
0
=π
if
gri<
, counting only on Boolean elements belonging to H . The value
of π depends on each subset
BH ⊆
and we may thereby write
)H,(απ π ≡
: the set H
is called the π function parameter. The π function values are the real numbers  the simi
larity indices. In S Se ec ct ti io on n 2 2 we have already introduced these indices on the entire table B.
Similarity indices, as one can see, may only concurrently increase with the “expansion” and
decrease with the “shrinking” of the parameter H . This leads us to the fundamental defini
tion.
Definition 1. Basic monotone property. By a monotone system will be understood a family
{}
BH : )H,(
⊆απ
of π functions, such that the set H is to be considered as a pa
rameter with the following monotone property: for any two subsets
GL ⊂
representing
two particular values of the parameter H the inequality
)G,() L ,(
απαπ≤
holds for
all elements
B
∈
α
.
Page 11
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
10
We note that this definition indicates exactly that the fulfilment of the inequality is required
for all elements
B
∈
α
. However, in order to prove the Theorems 1,2 and the
P Pr ro op po os si it ti io on n 1 1, it is sufficient to demand the inequality fulfilment only for elements
L
∈
α
;
even the numbers π themselves may not be defined for
L
∉
α
. On the other hand, the
fulfilment of the inequality is necessary to prove the argument of the T Th he eo or re em m 3 3 and the
P Pr ro op po os si it ti io on n 2 2. It is obvious that similarity indices
j c
=π
comply with the monotone system
requirements.
Definition 2. Let
)H(V
for a non empty subset
BH ⊆
(
απ
by means of a given arbitrary
} °≥
u)
. The nonempty H set
threshold °
u be the subset
°
S is called a stable point with reference to the threshold
)
°
and there exists an element
ξ
{
α∈=
H,:B)H(V
indicated by
=°
°
u if
S(VS
°∈S
, where
°=°
u) S ,(ξπ
. See Mullat
(1981, p.991) for a similar concept.
Definition 3. By monotone system kernel will be understood a stable set
u =
*
S with the
maximum possible threshold value
max
*
u
.
We will prove later that the very last pass through the step T detects the largest kernel
S
=Γ
. Below we are using the set function notation
*
p
)X,()X(F
X
απ
α∈
= min
.
Definition 4. An ordered sequence
which exhausts the whole table,
110
∑
−
d
,...,,
ααα
=
of distinct elements in the table B,
j , i
jibd
, is called a defining sequence if there ex
ists a sequence of sets
p
ΓΓΓ
⊃
}
⊃⊃
. . .
10
such that:
A. Let the set
{
α
∉
11kk=
α
H
−+
dk ,...,
α
Γ
is strictly less than
Γ there does not exist a proper subset L, which satisfies the strict ine
)L(F)(
Γ
.
D of the set B is called definable if there exists a defining se
αα
such that
p
D
=
Γ
.
,
α
. The value
)H,(
k
=
k
απ
of an arbitrary element
jk
Γα ∈
, but
1
+
jk
)(F
j 1
+
Γ
,
11
,
0
−
p ,...,j
.
B. In the set
p
quality
F
p <
Definition 5. A subset
α
*
quence
110
−
d
,...,,
*
Theorem 1. For the subset
*
S of B to be the largest kernel of the monotone system  to
contain all other kernels  it is necessary and sufficient that this set is definable:
S =
. The definable set
D is unique.
**
D
*
We note that the existence of the largest kernel will be established later by the T Th he eo or re em m 3 3.
Page 12
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
11
Proof.
Necessity. If the set
*
S is the largest kernel, let us look at the following sequence of
=⊃
10
ΓΓ
. Suppose we have found elements
=
the value
(
απ
only two sets
*
SB
=
k
,...,
ααα
})
α
10
in
*
SB\
such that for each
k ,...,
1
i
{
α
,...,B,
ii
10
−
α
\
. is less than
max
o
uu =
, and
{
α0
\
k
,...,
αα
}
α
10
does not exhaust
*
SB\
. Then, some
}
k
) ,...,
α
1
+
k
exists in
k
*
,...,)SB(
α
{
α0
\
\
such that
{
α
**
k
u)SB ( ,
1
(
<
+
απ
0
\\
. For if not, then
the set
}
k
*
,...,)SB(
α
\
is a kernel larger than
*
S with the same value
*
u . Thus
the induction is complete. This gives the ordering with the property (a). If the property
(b) failed, then
*
u would not be a maximum, contradicting the definition of the kernel.
This proves the necessity.
Sufficiency. Note that each time the algorithm  see above  passes the step T, some
°
S is established as a set
stable point
°= S
j
Γ
,
11
,
0
−=
p ,...,j
, where
)S ,(u
Sj
°=
°∈
απ
α
Γ
min
. Obviously, these stable points arrange an embedded chain of
sets
*
p
D ...B
=⊃⊃⊃=
ΓΓ
10
. Let a set
BL ⊆
be the largest kernel. Suppose
that L is a proper subset of
*
D , then by property (b),
)L(F)D(F
*
≥
and so
*
D is
also a kernel. The set L as the largest kernel cannot be the proper subset of
D . Suppose now that L is not the subset of
{}
11
−+
dkkk
,...,,=H
ααα
which includes L. The value
*
D and
H be must therefore be equal to
**
D . Let
s
the smallest set
)H,(
ss
απ
by our
basic monotone property must be grater than, or at least equal to
*
u , since
s
α is an ele
ment of
s
H and it is also an element of the kernel L and
s
HL ⊆
−
. By property (a) this
value is strictly less than
)(F
j 1
+
Γ
for some
11
,
0
=
p,...,j
. But that contradicts the
maximality of
*
u . This proves the sufficiency. Moreover, it proves that any largest ker
nel equals
*
D so that it is the unique largest kernel. This concludes the proof. ?
Proposition 1. The largest kernel is a submatrix of the table B.
Proof. Let
*
S be the largest kernel. If we add to
*
S any element lying in a row and a col
umn where
*
S has existing elements, then the threshold value
*
u cannot decrease. So by
maximality of the set
*
S this element must already be in
*
S . ?
Page 13
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
12
Now, we need to focus on the individual properties of the sets
p
...
ΓΓΓ
⊃⊃⊃
10
,
which have a close relation to the case
max
uu <
 a subject for a separate inquiry. Let us
look at the step T of the algorithm originating the series of mapping initiating from the
whole table B in form of
),...B(V(V ),B(V
with some particular threshold u. We denote
))B(V(V
by
)B(V2
, etc.
Definition 6. The chain of sets
),...B(V ),B(V ,B
2
with some particular threshold u is
called the central series of monotone system; see Mullat (1981) for exactly the same no
tion.
Theorem 2. Each set
p
...
ΓΓΓ
⊃
lim=
W
⊃⊃
10
in the defining sequence
Vk
,...,
32
as well as the stable point for some
(Fu ...uu
<<<=
10
u
110
−
d
,...,,
ααα
is the
central series convergence point
)B(
k
particular thresholds values
)S
≥
)(F
*
n=
. Each
=
j
Γ is the
)
j
Γ
.
largest stable point  including all others for threshold values
(Fu
j
It is not our intention to prove the statement of Theorem 2 since this proof is similar to
that of Theorem 1. T Th he eo or re em m 1 1 is a particular case for Theorem 2 statement regarding
threshold value
p
uu =
.
Next, let us look at the formal properties of all kernels and not only the largest one found
by the algorithm. It can easily be proved that with respect to the threshold
p max
uu
=
the
subsystem of all kernels classifies a structure, which is known as an upper semilattice in lat
tice theory.
Theorem 3. The set of all kernels  stable points  for
max
u
is a full semilattice.
Proof. Let Ω be a set of kernels and let
u)K,(
≥
1
απ
,
(
απ
separately, they are also true for the union set
Ω
K and
K ∪
∈
1
K
and
Ω
∈
2
K
. Since the inequalities
u)K,
≥
2
are true for all
12
due to the basic monotone
K ∪∈
ξ
is some H set for some u′ greater
Vk
2
,
K elements on each
21K,K
21
K
property. Moreover, since
∪
1
ξπ
than
max
u
max
uu =
, we can always find an element
K ∪
21
K
where
u)KK,(
=
2
. Otherwise, the set
21
K
. Now, let us look at the sequence of sets
)KK(
1∪
,...
3
,k
2
=
, which
certainly converges to some non empty set  stable point K. If there exists any other ker
nel
21
KKK
∪⊃
′
, it is obvious, that applying the basic monotone property we get that
K ⊇
′
. ?
K
Page 14
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
13
With reference to the highestranking possible threshold value
maxp
uu =
, the statement of
T Th he eo or re em m 3 3 guarantees the existence of the largest stable point and the largest kernel
*
S
(compare this with equivalent statement of T Th he eo or re em m 1 1).
Proposition 2. Kernels of the monotone system are submatrices of the table B.
Proof. The proof is similar to P Pr ro op po os si it ti io on n 1 1. However, we intend to repeat it. In the
monotone system all elements outside a particular kernel lying in a row and a column
where the kernel has existing elements belong to the kernel. Otherwise, the kernel is not
a stable point because these elements may be added to it without decreasing the
threshold value
max
u
.
Note that P Pr ro op po os si it ti io on ns s 1 1,2 are valid for our specific choice of similarity indices
j c
=
π
. The
point of interest might be to verify what π function properties guarantee that the shape of
the kernels still is a submatrix.
The defining sequence of table B elements constructed by the algorithm represents only
<<<<
210
of the threshold values existing for central series in the some part
p
u...uuu
monotone system. On the other hand, the original algorithm, Mullat (1971), similar to the
inverse greedy heuristic, produces the entire set of all possible threshold values u for all
possible central series, what is sometimes unnecessary from a practical point of view. There
fore, the original algorithm always has the higher complexity.
Acknowledgments
The author is grateful to an anonymous referee for useful comments, style corrections
and especially for the suggestion regarding the induction mechanism in the proof of the ne
cessity of the main theorem argument.
Page 15
J. E. Mullat / Mathematical Social Sciences 30 (1995) 195205
14
References
J. Edmonds, Matroids and the Greedy Algorithm, Math. Progr., No. 1 (1971)
127136.
A.V. Genkin and I.B. Muchnik, Fixed Points Approach to Clustering, Journal of Clas
sification 10 (1993) 219240, http://www.datalaundering.com/download/fixed.pdf .
L.O. Libkin, I.B. Muchnik, L.V. Shvartser, Quasilinear monotone systems, Automa
tion and Remote Control 50 (1990) 12491259,
http://www.datalaundering.com/download/quasil.pdf .
J.E. Mullat, On the Maximum Principle for Some Set Functions, Tallinn Technical
University Proceedings., Ser. A, No. 313 (1971) 3744,
http://www.datalaundering.com/download/modular.pdf .
J.E. Mullat, Extremal Subsystems of Monotonic Systems, I,II,III, Automation and
Remote Control 37 (1976) 758766, 12861294; 38 (1977) 8996,
http://www.datalaundering.com/mono/extremal.htm .
J.E. Mullat, Application of Monotonic system to study of the structure of Markov
chains, Tallinn Technical University Proceedings, No. 464, 71 (1979),
http://www.datalaundering.com/download/markov.pdf .
J.E. Mullat, Contramonotonic Systems in the Analysis of the Structure of Multivariate
Distributions, Automation and Remote Control 42 (1981) 986993,
http://www.datalaundering.com/download/contra.pdf .
G.L. Nemhauser, L.A. Walsey and M.L. Fisher, An Analysis of Approximations for
Maximizing Submodular Set Functions, Mathematical Programming 14 (1978)
265294.
E. Ojaveer, J. Mullat and L. Vôhandu, A Study of Infraspecific Groups of the Baltic
East Coast Autumn Herring by two new Methods Based on Cluster Analysis,
Estonian Contributions to the International Biological Program 6, Tartu (1975)
2850, http://www.datalaundering.com/download/herring.pdf .
View other sources
Hide other sources
 Available from Joseph Emmanuel Mullat · Feb 5, 2012
 Available from psu.edu