Content uploaded by Eduardo Corrêa Gonçalves
Author content
All content in this area was uploaded by Eduardo Corrêa Gonçalves on Apr 20, 2016
Content may be subject to copyright.
Mining Strong Associations and Exceptions in
the STULONG Data Set
Eduardo Corrˆea Gon¸calves and Alexandre Plastino?
Universidade Federal Fluminense, Department of Computer Science,
Rua Passo da P´atria, 156 - Bloco E - 3oandar - Boa Viagem
24210-240, Niter´oi, RJ, Brazil
{egoncalves, plastino}@ic.uff.br
http://www.ic.uff.br
Abstract. Multidimensional association rules represent an important
type of knowledge that can be mined from large relational databases or
data warehouses. These rules describe combinations of attribute values
that often occur together in a database and can reveal hidden and useful
patterns. This paper presents both strong multidimensional association
rules and exceptions mined from the STULONG data set, prepared for
the Discovery Challenge of ECML/PKDD-2004. The STULONG data
set keeps information about risk factors of atherosclerosis in patients
from the Czech Republic. We adopted an approach that aims at finding
exceptions, which are represented by association rules that become much
weaker in some specific subsets of the database. The results found are
reported and commented.
1 Introduction
The STULONG 1data set is a real database that keeps information about the
study of the development of atherosclerosis risk factors in a population of middle
aged men. This study lasted for more than 20 years. At a first step, entry exami-
nations were performed on 1417 patients from 1975 to 1979. These patients were
requested to fill in a form with their personal information and general habits.
They were also submitted to physical and biochemical examinations. The follow-
ing aspects were defined by the specialists as risk factors: arterial hypertension,
?Work sponsored by CNPq research grant 300879/00-8.
1The study (STULONG) was realized at the 2nd Department of Medicine, 1st Faculty
of Medicine of Charles University and Charles University Hospital, U nemocnice 2,
Prague 2 (head. Prof. M. Aschermann, MD, SDr, FESC), under the supervision
of Prof. F. Boud´ık, MD, ScD, with collaboration of M. Tomeˇckov´a, MD, PhD and
Ass. Prof. J. Bultas, MD, PhD. The data were transferred to the electronic form by
the European Centre of Medical Informatics, Statistics and Epidemiology of Charles
University and Academy of Sciences (head. Prof. RNDr. J. Zv´arov´a, DrSc). The
data resource is on the web pages http://euromise.vse.cz/STULONG. At present
time the data analysis is supported by the grant of the Ministry of Education CR
Nr LN 00B 107.
high level of total or LDL cholesterol, low level of HDL cholesterol, glycemy, high
level of uric acid, hypertriglyceridemy, obesity, positive family case history and
the habit of smoking many cigarettes. According to these risk factors and to the
results of the entry examinations, the patients were classified into three groups:
A. Normal Group. Men without the presence of any risk factor.
B. Risk Group. Men with the presence of one or more risk factors, but without
the manifestation of any cardiovascular disease.
C. Pathologic Group. Men with either an identified cardiovascular disease or
other serious disease.
The data collected by the STULONG project were prepared for the Discovery
Challenge of ECML/PKDD-2004. Four tables were made available:
1. Entry: stores data related to the entry examinations.
2. Control: stores data related to long-term observations performed on patients.
3. Letter: stores additional information about the health status of 403 men.
4. Death: stores data related to the patients that became dead.
This paper aims at presenting strong association rules and exceptions mined
from the Entry table. We focused the mining process on finding answers to some
of the proposed analytic questions [3]. The rest of this paper is organized as
follows. An overview of multidimensional association rules and their interest
measures is given in Sect. 2. In Sect. 3 we present the adopted approach to mine
exceptions in databases. The data preparation process is described in Sect. 4
and the associations and exceptions mined from the STULONG data set are
presented in Sect. 5. Some concluding remarks are made in Sect. 6.
2 Multidimensional Association Rules
Multidimensional association rules [4] represent combinations of attribute values
that often occur together in a database, revealing hidden and useful patterns.
These rules can be mined from data warehouses or relational databases, where
attributes can be categorical or quantitative. An example of multidimensional
association rule mined from the Entry table is: (DailyBeerCons = “ >1l”) ⇒
(Smoking = “ >20 cig/day”). This rule indicates that men who are heavy beer
consumers (the ones who drink more than a liter of beer per day) are more likely
to be also heavy smokers (they smoke more than 20 cigarettes per day). This
example involves two attributes (or dimensions, following the terminology used
in multidimensional databases): DailyBeerC ons and Smoking.
A multidimensional association rule can be formally defined as follows:
A1=a1, . . . , An=an⇒B1=b1, . . . , Bm=bm,
where Ai(1 ≤i≤n) and Bj(1 ≤j≤m) represent distinct database attributes
and aiand bjare values from the domains of Aiand Bj, respectively. To sim-
plify the notation, in the remainder of this section we will represent a generic
multidimensional association rule as an expression of the form A⇒B, where
Aand Bare sets of conditions over different attributes. We say that Ais the
antecedent and Bis the consequent of the rule. A multidimensional association
rule can involve several attributes in both the antecedent and the consequent.
The support of a rule A⇒Bin a relational database, is the probability that a
tuple matches all conditions in A∪B. The confidence of A⇒Bis the probability
that a tuple matches B, given that it matches A. Typically, the problem of
mining association rules from databases consists in finding all rules that match
user-provided minimum support and minimum confidence. However this model
presents some problems, as pointed in [2]. The “support/confidence framework”
often generates a huge number of association rules that are obvious or, even,
untrue. The following example demonstrates this fact. Consider two association
rules extracted from the Entry table, which are shown in Table 1. The values in
the third and fourth columns (SupAand SupB) represent the probability that a
tuple matches all conditions in the antecedent and the consequent, respectively.
The values in the fifth and sixth columns (Sup and Conf ) represent the support
and the confidence values for each association rule, respectively.
Table 1. Example of support and confidence indices
Id Association Rule SupASupBSup Conf
R1(DailyBeerCons=“>1l”) ⇒0.1193 0.2602 0.0448 0.3758
(Smoking=“>20 cig/day”)
R2(DailyBeerCons=“>1l”) ⇒0.1193 0.8487 0.0905 0.7584
(Married=“yes”)
The rule R2should imply that men who are heavy beer consumers tend to
be married. The support and confidence values of R2are higher than the R1
ones. This fact could lead to the conclusion that R2is more interesting than R1.
However, note that the confidence for the rule R2indicates that 75.84% of heavy
beer consumers are married. Observing the column SupB, we can see that 84.87%
of men in the Entry table are married. Therefore, we can conclude that married
men are less likely to be heavy beer consumers. There is a negative dependence
between being married and being a heavy beer consumer. On the other hand,
the confidence value for the rule R1- which represents the probability for a men
to be a heavy smoker, given that he is a heavy beer consumer - is 37.58%. Once
again, we can see in the fourth column (SupB) that 26.02% of men are heavy
smokers. Then, in fact, heavy beer consumers are more likely to smoke a lot.
There is a positive dependence between these attributes.
In order to find interesting relationships, we consider that support and confi-
dence measures should be used along with other statistical indices that are able
to capture the type of dependence between the antecedent and the consequent
of the rules. We consider that a rule is interesting if it holds with support value
greater than its expected support value. This expected support is computed based
on the support of the conditions that compose the rule:
ExpSup(A⇒B) = ExpSup(A∪B) = Sup(A)×Sup(B).(1)
The lift index [2] (also known as interest) can be used to evaluate dependen-
cies. Given an association rule A⇒B, this measure computes how much more
frequent is Bwhen Aoccurs:
Lift(A⇒B) = Conf (A⇒B)
Sup(B)=Sup(A∪B)
Sup(A)×Sup(B)=Sup(A∪B)
ExpSup(A∪B).(2)
If Lift(A⇒B) = 1, then Aand Bare independent. If Lift(A⇒B)>1,
then Aand Bare positively dependent. Else, Aand Bare negatively dependent.
The rule interest index (RI) [5] computes the percentage of additional tuples
matched by an association rule that are above the expected:
RI(A⇒B) = Support(A∪B)−ExpS up(A∪B).(3)
If RI(A⇒B) = 0 we say that Aand Bare independent. If RI(A⇒B)>0,
then Aand Bare positively dependent. Else, Aand Bare negatively dependent.
Returning to the example shown in Table 1, the lift and RI values for R1
are given by: 0.3758 ÷0.2602 = 1.44 and 0.0448 −(0.1193 ×0.2602) = 0.014,
respectively. Therefore, R1is an interesting association rule. The lift and RI
values for R2are given by: 0.7584÷0.8487 = 0.89 and 0.0908−(0.1193×0.8487) =
−0.010. Therefore, R2is, indeed, an uninteresting association rule.
We believe that the use of different interest measures provides alternative
analysis of the same data, giving a better understanding about the associations.
Section 5 presents strong rules mined from the Entry table.
3 Mining Exceptions in the STULONG Data Set
In this section we present the adopted approach to mine exceptions in the STU-
LONG data set. The following example motivates our approach. Consider, again,
the rule R1: (DailyBeerCons = “ >1l”) ⇒(Smoking = “ >20 cig/day”).
Suppose we are interested in discovering if this rule becomes weaker on some
sub-population of men stratified by the attribute Group. Then a strategy to
mine exceptions would be able to find the rule:
R3: (DailyB eerC ons = “ >1l”) ∧(Group = “A”) 6⇒ (Smoking = “ >20 cig/day”)
This negative pattern indicates that among the men who belong to the group
A, the support value of the association between being a heavy beer consumer
and being a heavy smoker is surprisingly smaller than what is expected. This
situation evidences an exception associated with a previously mined rule. The
exception was obtained because the association (DailyBeerCons = “ >1l”) ∧
(Group = “A”) ⇒(Smoking = “ >20 cig/day”) did not achieve an expected
support. The expected support is evaluated from the support of the original rule
R1and the support of the condition (Group = “A”).
Let Dbe a relational database. Let R:A⇒Bbe an association rule
extracted from D. Let Z={Z1=z1, . . . , Zk=zk}be a set of conditions defined
over attributes from D, where {Z1=z1, . . . , Zk=zk} ∩ {A1=a1, . . . , An=
an, B1=b1, . . . , Bm=bm}=∅.Zis named probe set. An exception related to
the positive rule Ris an implication of the form A∧Z6⇒ B.
Exceptions are extracted only if they do not achieve an expected support.
This expectation is evaluated based on the support of the original rule A⇒B
and the support of the conditions in the probe set Z. The expected support for
A∧Z⇒Bcan be computed as:
ExpSup(A∧Z⇒B) = Sup(A∪B)×Sup(Z).(4)
To evaluate if an exception is interesting, we use two interest measures based
on the lift measure. The first one, called IM (interest measure), considers that
an exception E:A∧Z6⇒ Bis potentially interesting if the actual support
value for the rule A∧Z⇒Bis much lower than its expected support value:
IM(E) = 1 −µSup(A∧Z⇒B)
ExpSup(A∧Z⇒B)¶.(5)
This measure captures the type of dependence between Zand the conditions
that form A⇒B. This measure grows when the actual support value is lower
and far from the expected support value, indicating a negative dependence.
The closer the value is from 1 (which is the highest value for this measure),
the more the negative dependence is. Consider the example presented at the
beginning of this section. The rule C1: (DailyBeerCons = “ >1l”) ∧(Group =
“A”) ⇒(Smoking = “ >20 cig/day”) was generated combining the rule R1
with the probe set Z={(Group = “A”)}. The support of Zin the Entry table is
22.10%. The support of the rule R1is 4.48% (as shown in Table 1). The expected
support for C1can be computed as 22.10% ×4.48% = 0.99%. The actual support
of C1in the Entry table is equal to 0.08%. We say that the exception E1:
(DailyBeerCons = “ >1l”) ∧(Group = “A”) 6⇒ (Smoking = “ >20 cig/day”)
is potentially interesting because IM(E1) = 1 −(0.08 ÷0.99) = 0.92.
However, observe that the IM measure does not take into consideration the
type of dependence between Zand A, and between Zand B. The measure DU
(degree of unexpectedness) is used to solve this question.
DU (E) = I M (E)−max((1 −Sup(A∪Z)
ExpSup(A∪Z)),(1 −Sup(B∪Z)
ExpSup(B∪Z))) .(6)
This measure captures how much the negative dependence between a probe
set Zand a rule A⇒Bis higher than the negative dependence between Zand
either Aor B. The greater the value is from 0, the more interesting the exception
will be. If DU (E)≤0 the exception is uninteresting. Returning to the previous
example, the support of the condition A={(DailyBeerCons = “ >1l”)}is
11.93%. The support value of the set {A∪Z}is 1.52%. The negative dependence
between Aand Zcan be computed as 1 −(1.52% ÷(11.93% ×22.10%)) = 0.42.
The support of the condition B={(Smoking = “ >20 cig/day”)}is 26.02%.
The actual support value of the set {B∪Z}is 1.52%. The negative dependence
between Band Zcan be computed as 1 −(1.52% ÷(26.02% ×22.10%)) =
0.73. The exception E1: (DailyBeerCons = “ >1l”) ∧(Group = “A”) 6⇒
(Smoking = “ >20 cig/day”) is, in fact, interesting because DU (E1)=0.92 −
max(0.42,0.73) = 0.19.
The adopted approach to mine exceptions was motivated by the concept of
negative association rules presented in [6], where a negative pattern represents a
large deviation between the expected support and the actual support of a rule.
In [7] a proposal for representing and extracting different categories of exceptions
can be found. However, we adopted an alternative approach, which allowed us to
characterize an exception as a rule that, unexpectedly, becomes much weaker in
some specific subsets of the database. We consider that exceptions are interesting
if they hold with high IM and DU values. Exceptions mined from the Entry
table are presented in Sect. 5.
4 Data Preparation
Some data transformations were necessary before the mining process. All field
names and values were translated into English words. We enriched data with
new fields, derived from original ones, such as the field Age which was derived
from rokvstup (year of the examination) and roknar (year of birth). Numeric
fields, such as Cholesterol, were adequately classified into ranges. Table 2 shows
a summary of the data preparation process. Additional explanation is needed
for the fields Skin Folds and Blood Pressure. The Skin Folds field is the result of
the sum of the fields tric and subsc. To generate the Blood Pressure field, we first
picked the minimum value between the fields sist1 and sist2, denoted as s. Then
we picked the minimum value between the fields diast1 and diast2, denoted as
d. If s≤129 and d≤84, the blood pressure was categorized as “normal”. If
s > 139 or d > 89, the category was denoted as “high”. Otherwise, the category
was denoted as “normal/high”.
We developed two programs in C++, which were compiled with the g++
compiler. The first one is an implementation based on the classical Apriori algo-
rithm [1], which is used to mine strong associations. The second program is an
implementation of the adopted approach to mine exceptions. Both require the
data set in the ARFF format, specified in [8]. We generated a relation in the
ARFF format that we named as EN T RYT OT . This relation contains 1249 tu-
ples, regarding the men classified into the groups A, B, and C. We excluded from
this table the men who, originally, were not allocated to any of these groups (at-
tribute konskup = 6). From the EN T RYT OT relation, we also generated three
separated relations with the same attributes. We denoted these three derived
relations as EntryA(276 tuples, containing only patients from the group A),
EntryB(859 tuples, patients from the group B), and EntryC(114 tuples, pa-
tients from the group C). We mined rules in these three tables to compare the
associations between the characteristics of men in the respective groups.
Table 2. Data transformation
Field Original Field / Derivation Possible Values
Group konskup “A”, “B”, “C”.
Cholesterol chlst “desirable” (chlst < 200),
“bordering” (200 ≤chlst < 240),
“high” (chlst ≥240).
Triglycerides trigl “desirable” (trigl < 150),
“bordering” (150 ≤trigl ≤200),
“high” (201 ≤trigl ≤499),
“very high” (trigl ≥500).
Age (rokvstup −roknar) “38-39”, “40-44”, “45-49”, “≥50”.
BMI (vaha)÷(vyska2) “underweight” (BMI ≤20),
“normal” (20 ≤BMI < 25),
“overweight” (25 ≤BMI < 30),
“obese” (30 ≤B M I < 40),
“morbidly obese” (BM I ≥40).
Blood Pressure min(syst1, syst2),“normal”, “normal/high”, “high”.
min(diast1, diast2).
Skin Folds (tric +subsc) “8-20”, “21-30”, “31-40”, “>40”.
5 Results
At first, the relation EN T RYT OT was mined for interesting associations regard-
ing the basic groups, with the minimum support threshold set to 1%. Hundreds
of associations were obtained. We selected some of them to state things we have
learned from the EN T RYT OT table. These selected results are shown in Table 3
along with different interest measures. From the rule R4we were able to observe
that there is a strong correlation (R4.lift = 1.430) between belonging to the
Normal Group (group A) and having reached university education. In fact, the
group Ais the only one that is predominantly composed by men with university
degree (R4.conf = 39.49%). In contrast, the Pathologic Group (group C), is pre-
dominantly formed by men with the apprentice school education degree (R6.conf
= 35.09%). R7shows a strong correlation (R7.lift = 1.692) between belonging
to the group Aand practicing physical activities intensely in free time. R8in-
dicates that the percentage of heavy beer consumers is slightly greater than the
expected in the Risk Group (group B,R8.RI = 0.0116). Finally, from the rule
R9, we could discover a strong positive dependence between being 50 years old
or above and belonging to the group C(R9.lift = 1.768).
The results shown in Tables 4, 5, and 6 present rules regarding alcohol con-
sumption, regarding social factors, and relating skin folds and BMI, respectively.
These rules were mined from tables EntryA,EntryB, and E ntryC. The objec-
tive is to observe the differences of the interest measures in the respective groups.
The third column (G) specifies the mined table (EntryA,EntryBor EntryC).
Columns 4 to 9 show the values of the interest measures. The minimum support
threshold was set to 1%. Due to the lack of space we will not comment all results.
Table 3. Association rules - [EN T RYT OT ]
Id Association Rule SupASupBSup Conf Lif t RI
R4(Group =“A”) ⇒0.2210 0.2762 0.0873 0.3949 1.430 0.0262
(Education = “university”)
R5(Group =“B”) ⇒0.6877 0.2866 0.2090 0.3038 1.060 0.0118
(Education = “apprentice sch.”)
R6(Group =“C”) ⇒0.0913 0.2866 0.0320 0.3509 1.224 0.0058
(Education = “apprentice sch.”)
R7(Group =“A”) ⇒0.2210 0.0857 0.0320 0.1449 1.692 0.0131
(PhysActAfterJob = “great activity”)
R8(Group =“B”) ⇒0.6877 0.1193 0.0937 0.1362 1.142 0.0116
(DailyBeerCons = “>1l”)
R9(Group =“C”) ⇒(Age = “≥50”) 0.0913 0.2282 0.0368 0.4035 1.768 0.0160
Table 4 shows strong associations regarding alcohol consumption in the re-
spective basic groups. Rules R10 to R13 show that both heavy beer consumers
and heavy liquor consumers tend to smoke more, independently of the group
(see the lift and RI values of these rules). However it is important to observe
that there are much fewer smokers in group A(observe the SupBcolumn). It is
also noticeable that men from group Btend to smoke and drink more (observe
SupA,SupB, and Sup columns). The rule R10 has a support value inferior to 1%
(the minimum value) in the EntryAtable. Therefore, it could not be extracted
in this table. Rule R14 indicates that the ones who do not drink alcohol are more
likely to have the BMI in the normal range in the three groups. Rule R17 shows
that drinking wine moderately and having normal blood pressure are positively
dependent in groups Aand Band negatively dependent in group C(observe the
lift and RI values). Rules R18,R19,R21 , and R22 indicate positive correlations
found in the three groups. Rule R20 indicates that patients who are heavy liquor
consumers are more likely to have high level of total cholesterol in groups Band
C, but not in the group A(observe the lift and RI values).
Table 5 shows associations regarding social factors. It could be possible to
discover that people with higher educational degree tend to smoke and drink less,
independently of the group (R23 and R26). On the other hand, people with lower
educational degree tend to smoke more (R24) and are more likely to be heavy
beer consumers (R27 ). The percentage of men who drink alcohol occasionally
is almost the same in the three groups (see the SupBcolumn, R26 ). Rule R25
evidences that there is a strong positive dependence between being an ex-smoker
and being 50 years old or above, specially in group C(R25.RI = 0.0326). Rules
R29 and R30 examine the correlation between the education and BMI of men.
Rules R34 and R35 indicate that blood pressure is dependent on the age of the
patient, independently of the group.
Table 6 shows the relations of skin folds and BMI in the particular basic
groups. Note that some rules in the group Acould not be extracted, because
there are no obese men in this group (since obesity is a risk factor).
Exceptions mined from the EN T RYT OT table are shown in Table 7. Let
us explain the intuitive meaning of exceptions using the rule R42. One of the
strongest correlations in the database, is given by: patients whose education
degree is “apprentice school”, tend to smoke a lot (15-20 cig/day). We show that
this rule is valid for the three group of patients in Table 5, rule R24. We use the
exception illustrated in rule R42 to indicate that the presence of the condition
(P hysActAf terJ ob = “great activity”) reduces the probability of occurrence
for the rule (Education = “apprentice sch.”) ⇒(Smoking = “15−20 cig/day”).
The exception illustrated in rule R42 can be interpreted as “among the men
who practice physical activities intensely in free time, the rule (Education =
“apprentice sch.”) ⇒(Smoking = “15−20 cig/day”) is much weaker”. The value
in the IM column indicates that the actual support of the rule (Education =
“apprentice sch.”) ∧(P hysActAf terJ ob = “great activity”) ⇒(Smoking =
“15−20 cig/day”) is 47.55% below the expected support. The value shown in the
DU column indicates the strength (degree of unexpectedness) of the exception.
The same interpretation can be given to the remainder of the rules presented in
Table 7. We represented all probe sets in italic characters. We use the following
thresholds on the experiments: minimum IM = 0.30 and minimum DU = 0.05.
6 Conclusions
In this work we presented 50 strong association rules and exceptions mined from
the STULONG data set, concerning the Entry examinations. Strong association
rules were used to analyze the differences of the correlations concerning the
characteristics of the patients from the three basic groups. Exceptions were used
to illustrate negative patterns associated with previously known strong positive
rules. As a future work we intend to apply the same approach on the evaluation
of the Control,Letter, and Death tables.
References
1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In 20th
VLDB Intl. Conf. (1994).
2. Brin, S., Motowani, R., Ullman, J. D., Tsur, S.: Dynamic Itemset Counting and
Implication Rules for Market Basket Data. In ACM SIGMOD Intl. Conf. (1997).
3. ECML/PKDD2004 Discovery Challenge homepage
[http://lisp.vse.cz/challenge/ecmlpkdd2004/] (2004).
4. Han, J., Kamber, M.: Data Mining: Concepts and Techiniques. 2nd edn. Morgan
Kaufmann (2001).
5. Piatetsky-Shapiro, G.: Discovery, Analysis and Presentation of Strong Rules. Knowl-
edge Discovery in Databases. AAAI/MIT Press. (1991).
6. Savasere, A., Omiecinski, E., Navathe, S.: Mining for Strong Negative Associations
in a Large Database of Costumer Transactions. In 14th ICDE Intl. Conf. (1998).
7. Suzuki, E., Zytkow, J. M.: Unified Algorithm for Undirected Discovery of Exception
Rules. In 4th PKDD Intl. Conf. (2000).
8. Witten, I. H., Frank, E.: Data Mining: Practical Machine Learning Tools and Te-
chiniques with Java Implementations. Morgan Kaufmann (2000).
Table 4. Association rules in the basic groups - [alcohol consumption]
Id Association Rule G SupASupBS up Conf Lif t RI
R10 (DailyBeerCons = “>1l”) ⇒A - - - - - -
(Smoking=“>21 cig/day”) B 0.1362 0.3190 0.0559 0.4103 1.286 0.0124
C 0.1140 0.2807 0.0614 0.5385 1.918 0.0294
R11 (DailyBeerCons = “>1l”) ⇒A 0.0688 0.1667 0.0145 0.2105 1.263 0.0030
(SmokingDuration=“>20 years”) B 0.1362 0.5751 0.0908 0.6667 1.159 0.0125
C 0.1140 0.4737 0.0789 0.6923 1.461 0.0249
R12 (DailyLiquorCons = “>100cc”) ⇒A 0.0471 0.1667 0.0181 0.3846 2.308 0.0103
(SmokingDuration=“>20 years”) B 0.0652 0.5751 0.0419 0.6429 1.118 0.0044
C 0.0351 0.4737 0.0263 0.7500 1.583 0.0097
R13 (Liquor = “no”) ⇒A 0.5507 0.5109 0.3043 0.5526 1.082 0.0230
(Smoking=“no”) B 0.5204 0.1793 0.0990 0.1902 1.061 0.0057
C 0.5439 0.2018 0.1316 0.2419 1.199 0.0218
R14 (Alcohol = “no”) ⇒A 0.0870 0.5326 0.0543 0.6250 1.173 0.0080
(BMI=“normal”) B 0.0861 0.3586 0.0384 0.4459 1.244 0.0075
C 0.1316 0.2632 0.0526 0.4000 1.520 0.0180
R15 (DailyBeerCons = “≤1l”) ⇒A 0.5616 0.4601 0.2609 0.4645 1.001 0.0024
(BMI=“overweight”) B 0.5576 0.5157 0.3108 0.5574 1.081 0.0232
C 0.4211 0.5439 0.2456 0.5833 1.073 0.0166
R16 (DailyBeerCons = “>1l”) ⇒A - - - - - -
(BMI=“obese”) B 0.1362 0.1071 0.0210 0.1538 1.436 0.0064
C 0.1140 0.1667 0.0351 0.3077 1.846 0.0161
R17 (DailyWineCons = “≤500ml”) ⇒A 0.5181 0.5761 0.3188 0.6154 1.068 0.0204
(Blood Pressure=“normal”) B 0.4936 0.3958 0.2200 0.4458 1.126 0.0246
C 0.4386 0.3333 0.1404 0.3200 0.960 -0.0058
R18 (DailyBeerCons = “>1l”) ⇒A 0.0688 0.1812 0.0145 0.2105 1.162 0.1186
(Blood Pressure=“high”) B 0.1362 0.4342 0.0710 0.5214 1.201 0.0119
C 0.1140 0.5263 0.0702 0.6154 1.169 0.0101
R19 (Alcohol = “no”) ⇒A 0.0870 0.3370 0.0507 0.5833 1.731 0.0214
(Cholesterol=“desirable”) B 0.0861 0.1828 0.0186 0.2162 1.183 0.0029
C 0.1316 0.1316 0.0263 0.2000 1.520 0.0090
R20 (DailyLiquorCons = “>100cc”) ⇒A 0.0471 0.2065 0.0072 0.1538 0.745 -0.0025
(Cholesterol=“high”) B 0.0652 0.4854 0.0361 0.5536 1.140 0.0044
C 0.0351 0.5175 0.0263 0.7500 1.449 0.0082
R21 (DailyBeerCons = “>1l”) ⇒A 0.0688 0.1159 0.0109 0.1579 1.362 0.0029
(Triglycerides=“high”) B 0.1362 0.1839 0.0338 0.2479 1.348 0.0087
C 0.1140 0.2193 0.0351 0.3077 1.403 0.0101
R22 (Alcohol = “occasionally”) ⇒A 0.5217 0.1812 0.1123 0.2153 1.188 0.0178
(Triglycerides=“bordering”) B 0.5378 0.2002 0.1106 0.2056 1.027 0.0029
C 0.5175 0.1491 0.0789 0.1525 1.023 0.0018
Table 5. Association rules in the basic groups - [social factors]
Id Association Rule G SupASupBSup Conf Lif t RI
R23 (Education = “university”) ⇒A 0.3949 0.5109 0.2210 0.5596 1.095 0.0193
(Smoking=“no”) B 0.2526 0.1793 0.0664 0.2627 1.465 0.0211
C 0.1667 0.2018 0.0877 0.5263 2.608 0.0541
R24 (Education = “apprentice sch.”) ⇒A 0.2065 0.0580 0.0217 0.1053 1.816 0.0098
(Smoking=“15-20 cig/day”) B 0.3038 0.3655 0.1211 0.3985 1.090 0.0100
C 0.3509 0.2719 0.1140 0.3250 1.119 0.0186
R25 (Age = “≥50”) ⇒A 0.2246 0.2138 0.0543 0.2419 1.132 0.0063
(Ex-Smoker=“yes, >1 year”) B 0.2061 0.0920 0.0268 0.1299 1.413 0.0078
C 0.4035 0.2018 0.1140 0.2826 1.401 0.0326
R26 (Education = “university”) ⇒A 0.3949 0.5217 0.2319 0.5872 1.125 0.0258
(Alcohol=“occasionally”) B 0.2526 0.5378 0.1583 0.6267 1.165 0.0225
C 0.1667 0.5175 0.1053 0.6316 1.220 0.0190
R27 (Education = “basic school”) ⇒A 0.0580 0.0688 0.0109 0.1875 2.724 0.0689
(DailyBeerCons=“>1l”) B 0.1234 0.1362 0.0384 0.3113 2.286 0.0216
C 0.1316 0.1140 0.0263 0.2000 1.754 0.0113
R28 (JobRespons. = “managerial”) ⇒A 0.1920 0.4493 0.1014 0.5283 1.176 0.0152
(Liquor=“yes”) B 0.2177 0.4796 0.1199 0.5508 1.148 0.0155
C 0.1667 0.4561 0.0877 0.5263 1.154 0.0117
R29 (Education = “apprentice sch.”) ⇒A - - - - - -
(BMI=“obese”) B 0.3038 0.1071 0.0501 0.1648 1.538 0.0175
C 0.3509 0.1667 0.0789 0.2250 1.350 0.0205
R30 (Education = “university”) ⇒A 0.3949 0.5326 0.2101 0.5321 0.999 -0.0002
(BMI=“normal”) B 0.2526 0.3586 0.0990 0.3917 1.092 0.0084
C - - - - - -
R31 (Education = “university”) ⇒A 0.3949 0.5797 0.3116 0.7890 1.361 0.0826
(PhysActInJob=“mainly sits”) B 0.2526 0.5122 0.2084 0.8249 1.610 0.0790
C 0.1667 0.4211 0.0877 0.5263 1.250 0.0175
R32 (Education = “university”) ⇒A 0.3949 0.6957 0.2826 0.7156 1.028 0.0078
(PhysActAfterJob=“moderate”) B 0.2526 0.7276 0.1863 0.7373 1.013 0.0025
C 0.1667 0.7456 0.1053 0.6316 0.847 -0.0190
R33 (Education = “apprentice sch.”) ⇒A 0.2065 0.1594 0.0399 0.1930 1.210 0.0069
(PhysActAfterJob=“mainly sits”) B 0.3038 0.1956 0.0722 0.2375 1.215 0.0127
C 0.3509 0.2193 0.0702 0.2000 0.912 -0.0068
R34 (Age = “40-44”) ⇒A 0.3297 0.5761 0.1920 0.5824 1.011 0.0021
(Blood Pressure=“normal”) B 0.3015 0.3958 0.1292 0.4286 1.083 0.0099
C 0.1667 0.3333 0.0965 0.5789 1.737 0.0409
R35 (Age = “≥50”) ⇒A 0.2246 0.1812 0.0435 0.1935 1.068 0.0028
(Blood Pressure=“high”) B 0.2061 0.4342 0.1048 0.5085 1.171 0.0153
C 0.4035 0.5263 0.2281 0.5652 1.074 0.0157
Table 6. Association rules in the basic groups - [skin folds] x [BMI]
Id Association Rule G SupASupBSup Conf Lif t RI
R36 (Skin Folds = “≤20”) ⇒A 0.2319 0.5326 0.1558 0.6719 1.261 0.0323
(BMI=“normal”) B 0.2154 0.3586 0.1478 0.6865 1.914 0.0706
C 0.1140 0.2632 0.0789 0.6923 2.631 0.0489
R37 (Skin Folds = “21-31”) ⇒A 0.4565 0.4601 0.2355 0.5159 1.121 0.0254
(BMI=“overweight”) B 0.3632 0.5157 0.2095 0.5769 1.119 0.0222
C 0.3421 0.5439 0.2368 0.6923 1.273 0.0501
R38 (Skin Folds = “31-40”) ⇒A 0.1159 0.4601 0.0471 0.4063 0.883 -0.0062
(BMI=“overweight”) B 0.2305 0.5157 0.1362 0.5909 1.146 0.0173
C 0.1842 0.5439 0.1140 0.6190 1.138 0.0138
R39 (Skin Folds = “31-40”) ⇒A - - - - - -
(BMI=“obese”) B 0.2305 0.1071 0.0442 0.1919 1.792 0.0195
C 0.1842 0.1667 0.0351 0.1905 1.143 0.0044
R40 (Skin Folds = “≥40”) ⇒A 0.0507 0.4601 0.0362 0.7143 1.552 0.0130
(BMI=“overweight”) B 0.1362 0.5157 0.0827 0.6068 1.177 0.0124
C 0.1667 0.5439 0.0702 0.4211 0.774 -0.0205
R41 (Skin Folds = “≥40”) ⇒A - - - - - -
(BMI=“obese”) B 0.1362 0.1071 0.0349 0.2564 2.394 0.0203
C 0.1667 0.1667 0.0877 0.5263 3.159 0.0599
Table 7. Exceptions
Id Exception IM DU
R42 (Education = “apprentice sch.”) ∧0.4755 0.2069
(PhysActAfterJob=“great activity”) 6⇒
(Smoking = “15-20 cig/day”)
R43 (Education = “apprentice sch.”) ∧0.5035 0.1689
(DailyBeerCons=“does not drink beer”) 6⇒
(Smoking = “15-20 cig/day”)
R44 (DailyBeerCons = “>1l”) ∧(Group=“A”) 6⇒ 0.8011 0.1515
(SmokingDuration = “>20 years”)
R45 (DailyBeerCons = “>1l”) ∧0.3586 0.1054
(PhysActAfterJob=“great activity”) 6⇒
(SmokingDuration = “>20 years”)
R46 (DailyBeerCons = “>1l”) ∧(Group=“A”) 6⇒ 0.9192 0.1837
(Smoking = “>20 cig/day”)
R47 (DailyBeerCons = “>1l”) ∧(Age=“≥50”) 6⇒ 0.5304 0.2358
(Smoking = “>20 cig/day”)
R48 (Education = “university”) ∧(Group=“C”) 6⇒ 0.7018 0.3052
(BMI = “normal”)
R49 (DailyWineCons = “≤500ml”) ∧(Group=“C”) 6⇒ 0.4017 0.1770
(Blood Pressure = “normal”)
R50 (Age = “≥50”) ∧(Group=“B”) 6⇒ 0.3442 0.0577
(Ex-Smoker = “yes, >1 year”)