Case retrieval in medical databases by fusing heterogeneous information.
ABSTRACT A novel contentbased heterogeneous information retrieval framework, particularly well suited to browse medical databases and support new generation computer aided diagnosis (CADx) systems, is presented in this paper. It was designed to retrieve possibly incomplete documents, consisting of several images and semantic information, from a database; more complex data types such as videos can also be included in the framework. The proposed retrieval method relies on image processing, in order to characterize each individual image in a document by their digital content, and information fusion. Once the available images in a query document are characterized, a degree of match, between the query document and each reference document stored in the database, is defined for each attribute (an image feature or a metadata). A Bayesian network is used to recover missing information if need be. Finally, two novel information fusion methods are proposed to combine these degrees of match, in order to rank the reference documents by decreasing relevance for the query. In the first method, the degrees of match are fused by the Bayesian network itself. In the second method, they are fused by the DezertSmarandache theory: the second approach lets us model our confidence in each source of information (i.e., each attribute) and take it into account in the fusion process for a better retrieval performance. The proposed methods were applied to two heterogeneous medical databases, a diabetic retinopathy database and a mammography screening database, for computer aided diagnosis. Precisions at five of 0.809 ± 0.158 and 0.821 ± 0.177, respectively, were obtained for these two databases, which is very promising.

Conference Paper: Graphbased retrieval of multimodality medical images: A comparison of representations using simulated images
[Show abstract] [Hide abstract]
ABSTRACT: Contentbased image retrieval (CBIR) is an image search technique that utilises visual features as search criteria; it has potential clinical applications in evidencebased diagnosis, physician training, and biomedical research. Graphbased CBIR techniques have high accuracy when retrieving images by the similarity of the spatial arrangement of their constituent objects but these techniques were initially designed for singlemodality images and have limited retrieval capabilities when multimodality images, such as combined positron emission tomography and computed tomography (PETCT), are considered. In this paper, we present a graphbased CBIR approach for multimodality images that integrates modalityspecific features on graph vertices and adapts a wellestablished graph similarity scheme to account for varying vertex feature sets. Furthermore, we propose a graph pruning method that removes redundant edges using the spatial proximity of image regions. We evaluated our work using two simulated data sets, consisting of 2D liver shapes and 3D wholebody lymphoma images. In our experiments we achieved a higher level of retrieval precision using our graph method when compared to conventional graphbased retrieval, demonstrating that our proposed method enabled new capabilities and improved multimodality CBIR.ComputerBased Medical Systems (CBMS), 2012 25th International Symposium on; 01/2012 
Conference Paper: A semantic annotation approach for calcifications in mammogram using Bayesian network model
[Show abstract] [Hide abstract]
ABSTRACT: To realize the medical semantic annotation of mammogram, a semantic modeling approach for calcifications in mammogram based on hierarchical Bayesian network was proposed. Firstly, support vector machines was used to map lowlevel image feature into feature semantics, then highlevel semantic was captured through feature semantic fusion using Bayesian network, finally semantic model was established. To validate the method, the model was applied to annotate the semantic information of mammograms. In this experiment, we chose 142 images as training set and 50 images as testing set, the results showed that the precision ratio of malignant samples is 81.48%, and benign samples is 73.91%.Strategic Technology (IFOST), 2011 6th International Forum on; 01/2011  SourceAvailable from: Giuseppe AcriGiuseppe Acri, Barbara Testagrossa, giuseppe Totaro, sabina strocchi, Raffaele Novario, Giuseppe vermiglioTHE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE. 05/2014; 2(5):2321919.
Page 1
IEEE TRANSACTIONS ON MEDICAL IMAGING1
Case Retrieval in Medical Databases by Fusing
Heterogeneous Information
Gw´ enol´ e Quellec, Mathieu Lamard, Guy Cazuguel, Member, IEEE, Christian Roux, Fellow Member, IEEE and
B´ eatrice Cochener
Abstract—A novel contentbased heterogeneous information
retrieval framework, particularly well suited to browse med
ical databases and support new generation Computer Aided
Diagnosis (CADx) systems, is presented in this paper. It was
designed to retrieve possibly incomplete documents, consisting
of several images and semantic information, from a database;
more complex data types such as videos can also be included in
the framework. The proposed retrieval method relies on image
processing, in order to characterize each individual image in a
document by their digital content, and information fusion. Once
the available images in a query document are characterized,
a degree of match, between the query document and each
reference document stored in the database, is defined for each
attribute (an image feature or a metadata). A Bayesian network
is used to recover missing information if need be. Finally, two
novel information fusion methods are proposed to combine these
degrees of match, in order to rank the reference documents
by decreasing relevance for the query. In the first method, the
degrees of match are fused by the Bayesian network itself. In
the second method, they are fused by the DezertSmarandache
theory: the second approach lets us model our confidence in each
source of information (i.e. each attribute) and take it into account
in the fusion process for a better retrieval performance. The
proposed methods were applied to two heterogeneous medical
databases, a diabetic retinopathy database and a mammography
screening database, for computer aided diagnosis. Precisions at
five of 0.809±0.158 and 0.821±0.177, respectively, were obtained
for these two databases, which is very promising.
Index Terms—Medical databases, Heterogeneous information
retrieval, Information fusion, Diabetic retinopathy, Mammogra
phy
I. INTRODUCTION
T
tion from images and combination of the extracted features
with other sources of information to automatically or semi
automatically generate a reliable diagnosis. One promising
WO main tasks in Computer Aided Diagnosis (CADx)
using medical images are extraction of relevant informa
Copyright (c) 2010 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubspermissions@ieee.org.
G. Quellec, G. Cazuguel, and C. Roux are with the INSTITUT TELE
COM/TELECOM Bretagne, Dpt ITI, Brest, F29200 France, and also with
the Institut National de la Sant´ e et de la Recherche M´ edicale (INSERM),
U650, Brest, F29200 France (email: gwenole.quellec@telecombretagne.eu;
guy.cazuguel@telecombretagne.eu; christian.roux@telecombretagne.eu).
M. Lamard is with the University of Bretagne Occidentale, Brest, F
29200 France, and also with the Institut National de la Sant´ e et de
la Recherche M´ edicale (INSERM), U650, Brest, F29200 France (email:
mathieu.lamard@univbrest.fr).
B. Cochener is with the Centre Hospitalier Universitaire de Brest, Service
d’Ophtalmologie, Brest, F29200 France, also with the University of Bretagne
Occidentale, Brest, F29200 France, and also with the Institut National de la
Sant´ e et de la Recherche M´ edicale (INSERM), U650, Brest, F29200 France
(email: Beatrice.Cochenerlamard@chubrest.fr)
way to achieve the second goal is to take advantage of
the growing number of digital medical databases either for
heterogeneous data mining, i.e. for extracting new knowledge,
or for heterogeneous information retrieval, i.e. for finding
similar heterogeneous medical records (e.g. consisting of
digital images and metadata). This paper presents a generic
solution to use digital medical databases for heterogeneous
information retrieval, and solve CADx problems using Case
Based Reasoning (CBR) [1].
CBR was introduced in the early 1980s as a new decision
support tool. It relies on the idea that analogous problems have
similar solutions. In CBR, interpreting a new situation revolves
around the retrieval of relevant documents in a case database.
The knowledge of medical experts is a mixture of textbook
knowledge and experience through real life clinical cases, so
the assumption that analogous problems have similar solutions
makes sense to them. This is the reason why there is a growing
interest in CBR for the development of medical decision
support systems [2]. Medical CBR systems are intended to
be used as follows: should a physician be doubtful about
his/her diagnosis, he/she can send the available data about
the patient to the system; the system selects and displays the
most similar documents, along with their associated medical
interpretations, which may help him/her confirm or invalidate
his/her diagnosis by analogy. Therefore, the purpose of such
a system is not to replace physicians’ diagnosis, but rather to
aid their diagnosis. Medical documents often consist of digital
information such as images and symbolic information such as
clinical annotations. In the case of Diabetic Retinopathy, for
instance, physicians analyze heterogeneous series of images
together with contextual information such as the age, sex and
medical history of the patient. Moreover, medical information
is sometimes incomplete and uncertain, two problems that
require a particular attention. As a consequence, original
CBR systems, designed to process simple documents such
as homogeneous and comprehensive attribute vectors, are
clearly unsuited to complex CADx applications. On one hand,
some CBR systems have been designed to manage symbolic
information [3]. On the other hand, some others, based on
ContentBased Image Retrieval [4], have been designed to
manage digital images [5]. However, few attempts have been
made to merge the two kinds of approaches. We consider in
this paper a larger class of problems: CBR in heterogeneous
databases.
To retrieve heterogeneous information, some simple ap
proaches, based on early fusion (i.e. attributes are fused in
feature space) [6], [7] or late fusion (i.e. attributes are fused
in semantic space) [8], [9], [10] have been presented in the
Page 2
IEEE TRANSACTIONS ON MEDICAL IMAGING2
literature. A few applicationspecific approaches [11], [12],
[13], [14], [15], as well as a generic retrieval system, based
on dissimilarity spaces and relevance feedback [16], have also
been presented. We introduce in this paper a novel generic
approach that does not require relevance feedback from the
user. The proposed system is able to manage incomplete
information and the aggregation of heterogeneous attributes:
symbolic and multidimensional digital information (we focus
on digital images, but the same principle can be applied to
any ndimensional signals). The proposed approach is based
on a Bayesian network and the DezertSmarandache theory
(DSmT) [17]. Bayesian networks have been used previously
in retrieval systems, either for keyword based retrieval [18],
[19] or for contentbased image or video retrieval [20], [21].
The DezertSmarandache theory is more and more widely used
in remote sensing applications [17], however, to our knowl
edge, this is its first medical application. In our approach, a
Bayesian network is used to model the relationships between
the different attributes (the extracted features of each digital
image and each contextual information field): we associate
each attribute with a variable in the Bayesian network. It lets us
compare incomplete documents: the Bayesian network is used
to estimate the probability of unknown variables (associated
with missing attributes) knowing the value of other variables
(associated with available attributes). Information coming from
each attribute is then used to derive an estimation of the degree
of match between a query document and a reference document
in the database. Then, these estimations are fused; two fusion
operators are introduced in this paper for this purpose. The
first fusion operator is incorporated in the Bayesian network:
the computation of the degree of match, with respect to a
given attribute, relies on the design of conditional probabilities
relating this attribute to the overall degree of match. An
evolution of this fusion operator that models our confidence in
each source of information (i.e. each attribute) is introduced. It
is based on the DezertSmarandache theory. In order to model
our confidence in each source of information, within this
second fusion operator, an uncertainty component is included
in the belief mass function characterizing the evidence coming
from this source of information.
The main advantage of the proposed approach, over standard
feature selection / feature classification approaches, is that a
retrieval model is trained separately for each attribute. This
is useful to process incomplete documents: in the proposed
approach, we simply combine the models associated with all
available attributes; as a comparison, a standard classifier relies
on feature combinations, and therefore may become invalid
when input feature vectors are incomplete. Also, because each
attribute is processed separately, the curse of dimensionality
is avoided. Therefore, it is not necessary to select the most
relevant features: instead, we simply weight each feature by a
confidence measure.
The paper is organized as follows. Section II presents
the proposed Bayesian network based retrieval. Section III
presents the Bayesian network and DezertSmarandache theory
based retrieval. These methods are applied in section IV to
CADx in two heterogeneous databases: a diabetic retinopa
thy database and a mammography database. We end with a
discussion and a conclusion in section V.
II. BAYESIAN NETWORK BASED RETRIEVAL
A. Description of Bayesian Networks
A Bayesian network [22] is a probabilistic graphical model
that represents a set of variables and their probabilistic depen
dencies. It is a directed acyclic graph whose nodes represent
variables, and whose edges encode conditional independencies
between the variables. Examples of Bayesian networks are
given in Fig. 1.
(a)(b)(c)
Fig. 1.
shows a polytree, i.e. a network in which there is at most one (undirected)
path between two nodes. Fig. (c) shows a network containing a cycle: <
A,D,E,C,A >.
Examples of Bayesian Networks. Fig. (a) shows a chain. Fig. (b)
In the example of Fig. 1 (b), the edge from the parent
node A to its child node D indicates that variable A has a
direct influence on variable D. Each edge in the graph is
associated with a conditional probability matrix expressing
the probability of a child variable given one of its parent
variables. For instance, if A = {a0,a1} and D = {d0,d1,d2},
then A → D is assigned the following (3 × 2) conditional
probability matrix P(DA):
P(D = d2A = a0) P(D = d2A = a1)
A directed acyclic graph is a Bayesian Network relative
to a set of variables {X1,...,Xn} if the joint distribution
P(X1,...,Xn) can be expressed as in equation 2:
P(DA) =
P(D = d0A = a0) P(D = d0A = a1)
P(D = d1A = a0) P(D = d1A = a1)
(1)
P(X1,...,Xn) =
n
∏
i=1
P(Xiparents(Xi))
(2)
where parents(X) is the set of nodes such that Y → X is in
the graph ∀ Y ∈ parents(X). Because a Bayesian network
can completely model the variables and their relationships,
it can be used to answer queries about them. Typically, it
is used to estimate unknown probabilities for a subset of
variables when other variables (the evidence variables) are
observed. This process of computing the posterior distribution
of variables, given evidence, is called probabilistic inference.
In Bayesian networks containing cycles, exact inference is
a NPhard problem. Approximate inference algorithms have
been proposed, but their accuracies depend on the network’s
structure; therefore, they are not general. By transforming the
network into a cyclefree hypergraph, and performing infer
ence in this hypergraph, Lauritzen and Spiegelhalter proposed
an exact inference algorithm with relatively low complexity
[23]; this algorithm was used in the proposed system.
Page 3
IEEE TRANSACTIONS ON MEDICAL IMAGING3
B. Learning a Bayesian Network from Data
A Bayesian network is defined by a structure and the
conditional probability of each node given its parents in that
structure (or its prior probability if it does not have any parent).
These parameters can be learned automatically from data.
Defining the structure consists in finding pairs of nodes (X,Y )
directly dependent, i.e. such that:
• X and Y are not independent (P(X,Y ) ?= P(X)P(Y ))
• There is no node set Z such that X and Y are indepen
dent given Z (P(X,Y Z) ?= P(XZ)P(Y Z))
Independence and conditional independence can be assessed
by mutual information (see equation 3) and conditional mutual
information (see equation 4), respectively.
∑
∑
Two nodes are independent (resp. conditionally independent)
if mutual information (resp. conditional mutual information)
is smaller than a given threshold ?, 0 ≤ ? < 1. Ideally, ?
should be equal to 0. However, in the presence of noise, some
meaningless edges (links) can appear. These edges can also
unnecessarily increase the computation time. To avoid this, in
this study, ? was chosen in advance to be equal to 0.1. This
number is independent of dataset cardinality [24].
The structure of the Bayesian network, as well as edge
orientation, was obtained by Cheng’s algorithm [24]. This
algorithm was chosen for its complexity: complexity is poly
nomial in the number of variables, as opposed to exponential
in competing algorithms.
I(X,Y ) =
x,y
P(x,y)log
P(x,y)
P(x)P(y)
(3)
I(X,Y Z) =
x,y,z
P(x,y,z)log
P(x,yz)
P(xz)P(yz)
(4)
C. Including Images in a Bayesian Network
Contextual information are included as usual in a Bayesian
network: a variable with a finite set of states, one for each
possible attribute value, is defined for each field.
To include images in a Bayesian network, we first define a
variable for each image in the sequence. For each “image
variable”, we follow the usual steps of ContentBased Image
Retrieval (CBIR) [4]: 1) building a signature for each image
(i.e. extracting a feature vector summarizing their digital
content), and 2) defining a distance measure between two
signatures (see section IIC1). Thus, measuring the distance
between two images comes down to measuring the distance
between two signatures. Similarly, in a Bayesian network,
defining states for an “image variable” comes down to defining
states for the signature of the corresponding images. To
this aim, similar image signatures are clustered, as described
below, and each cluster is associated with a state. Thanks to
this process, image signatures can be included in a Bayesian
network like any other variable.
1) Image Signature and Distance Measure: in previous
works on CBIR, we proposed to extract a signature for
images from their wavelet transform [25]. These signatures
model the distribution of the wavelet coefficients in each
subband of the decomposition; as a consequence they provide
a multiscale description of images. To characterize the wavelet
coefficient distribution in a given subband, Wouwer’s work
was applied [26]: Wouwer has shown that this distribution can
be modeled by a generalized Gaussian function. The maximum
likelihood estimators of the wavelet coefficient distribution
in each subband are used as a signature. These estimators
can be computed directly from waveletbased compressed
images (such as JPEG2000 compressed images), which can
be useful when a large number of images has to be processed.
A simplified version of Do’s generalized Gaussian parameter
estimation method [27], [25] is proposed in appendix A to
reduce computation times. Any wavelet basis can be used to
decompose images. However, the effectiveness of the extracted
signatures largely depends on the choice of this basis. For
this reason, we proposed to search for an optimal wavelet
basis [25] within the lifting scheme framework, which is
implemented in the compression standards. To compare two
signatures, Do proposed the use of the KullbackLeibler di
vergence between wavelet coefficient distributions P and Q
in two subbands [27]:
∫
where p and q are the densities of P and Q, respectively.
A symmetric version of the KullbackLeibler divergence was
used, since clustering algorithms require (symmetric) distance
measures:
1
2(D(PQ) + D(QP))
Finally, the distance between two images is defined as a
weighted sum of these distances over the subbands, noted
WSD; weights are tuned by a genetic algorithm to maximize
retrieval performance on the training set [25]. The ability to
select a weight vector and a wavelet basis makes this image
representation highly tunable. We have shown in previous
works the superiority of the proposed image signature, in
terms of retrieval performance, over several wellknown image
signatures [25].
2) Signature Clustering: in order to define several states
for an “image variable”, similar images are clustered with
an unsupervised classification algorithm, thanks to the image
signatures and the associated distance measure above. Any
algorithm can be used, provided that the distance measure
can be specified. We chose the wellknown Fuzzy CMeans
algorithm (FCM) [28] and replaced the Euclidean distance by
WSD described above. In this algorithm, each document is
assigned to each cluster k = 1..K with a fuzzy membership
uk, 0 ≤ uk ≤ 1, such that
be interpreted as a probability. Finding the right number of
clusters is generally a difficult problem. However, when each
sample has been assigned a class label, mutual information
between clusters and class labels can be used to determine the
optimal number of clustersˆK [29] (see equation (7)).
D(PQ) =
R
p(x)logp(x)
q(x)dx
(5)
(6)
∑K
k=1uk = 1, which can
ˆK = argmax
K
C
∑
c=1
K
∑
k=1
P(c,k)logC+K
P(c,k)
P(c)P(k)
(7)
where c = 1..C are the class labels, P(c,k) is the joint proba
bility distribution function of the class and cluster labels, P(c)
Page 4
IEEE TRANSACTIONS ON MEDICAL IMAGING4
and P(k) are the marginal probability distribution functions.
Other continuous variables can be discretized similarly: the
age of a person, onedimensional signals, videos, etc.
D. System Design
QUERY  case in the testing setOFFLINE  on the training set
Learn the probabilistic
relationships
between variables
(section II.B).
Compute the correlation
between two
states of a variable
(section IIE2).
Intermediate network
(Fig 3(a))
Correlations
Compute the probabilistic
relationships between the
variables and the query node
(section IIE).
Add a query node Q to
the intermediate network
(section IID).
Queryspecific network
(Fig 3(b))
case x in the training set
Probabilistic inference on
the queryspecific network
using x as evidence
(sections IIA, IIF)
Cases in the training set
ranked in decreasing
order of P(Qx)
Fig. 2.
to” or “is followed by” and dashedlined arrows mean “is used by”.
Bayesian Network based Retrieval. Solidlined arrows mean “leads
Let xq be a query document and M be the number of
attributes.
Definition: A document x is said to be relevant for xq if x
and xqbelong to the same class.
To assess the relevance of each reference document in a
database for xq, we define a Bayesian network with the
following variables:
• a set of variables {Ai,i = 1..M}, where Airepresents
the ithattribute of x,
• a Boolean variable Q = “x is relevant for xq” (¯Q = “x
is not relevant for xq”).
The design of the system is described hereafter and illustrated
in Fig. 2. To build the network, the first step is to learn the
different relationships between the attributes {Ai,i = 1..M}.
So, an intermediate network is built from data, using Cheng’s
algorithm (see section IIB). In that purpose, the studied
database is divided into a training dataset and a test dataset.
Cheng’s algorithm is applied to the training dataset. In our
experiments, the query document xqbelongs to the test dataset
and x belongs to the training dataset. To build this Bayesian
network, a finite number of states aij is defined for each
variable Ai, i = 1..M. To learn the relationships between these
variables, we use the membership degree of any document y
in the training dataset to each state aij of each variable Ai,
noted αij(y). If Aiis a nominal variable, αij(y) is boolean;
for instance, if y is a male then α“sex??,“male??(y) = 1 and
α“sex??,“female??(y) = 0. If Aiis a continuous variable (such
as an imagebased feature), αik(y) is the fuzzy membership
of y to each cluster k = 1..K (see section IIC2). An example
of intermediate network is given in Fig. 3 (a).
(a) Intermediate network
(b) Queryspecific network
Fig. 3.
section IVA). In the example of Fig. (b), attributes A1, ..., A6, A8, A10,
A13, A14, A15, A17, A18, A22, A23are available for the query document
xq, so the associated nodes are then connected to node Q.
Retrieval Bayesian Network (built for the database presented in
Q is then integrated in the network. For retrieval, the
attributes of x are observable evidences for Q, as a con
sequence the associated variables should be descendants of
Q. In the retrieval network, the probabilistic dependences
between Q and each variable Ai depend on xq. In fact, xq
specifies which attributes should be found in the retrieved
documents in order to meet the user’s needs. So, when the
ithattribute of xq is available, we connect the two nodes Q
and Aiand we estimate the associated conditional probability
matrix Pq(Ai = aijQ) according to xq (see Fig. 3 (b)).
The index q denotes that the probability depends on xq. A
queryspecific network is obtained: its structure depends on
which attributes are available for the query document and the
conditional probability matrices depend on the value taken for
these available attributes by the query document. This network
is used to assess the relevance of any reference document for
xq.
E. Computing the Conditional Probabilities Pq(Ai= aijQ)
To compute Pq(Ai= aijQ), we first estimate Pq(QAi=
aij): the probability that a reference document x, with full
membership to the state aij of attribute Ai, is relevant.
Pq(Ai = aijQ) can then be computed thanks to Bayes’
theorem (see equation (8)). The prior probability Pq(Q) is
required; it can be estimated by the probability that two
documents belong to the same class, i.e. the probability that
both documents belong to class 1 or that both documents
belong to class 2, etc., hence equation 9:
P(AB) =P(BA)P(A)
P(B)
(8)
Pq(Q) =
C
∑
c=1
(P(c))2
(9)
Page 5
IEEE TRANSACTIONS ON MEDICAL IMAGING5
where c = 1..C are the class labels (as a consequence the
prior probability Pq(Q) is actually independent of xq).
1) Objectives: we want to define Pq(QAi= aij) such that
the posterior probability Pq(Qx) is as close to 1 as possible
if x and xq belong to the same class, and as close to 0 as
possible otherwise (note that the class label of xqis unknown).
We define the semantic similarity between documents x and
xq, with respect to Ai, as follows:
∑
where Siklis the correlation between two states aik and ail
of Ai, regarding the class of the documents at these states.
2) Correlation Between Two States of a Variable: to com
pute Sikl, we first compute the mean membership Dikc(resp.
Dilc) of documents y in a given class c to the state aik(resp.
ail) (y belongs to the training dataset):
{
j
∑
k
αij(x)Sijkαik(xq)
(10)
Dijc= β
∑C
∑
yδ(y,c)αij(y)
∑
yδ(y,c)
c=1(Dijc)2= 1,∀(i,j)
(11)
where δ(y,c) = 1 if y is in class c, δ(y,c) = 0 otherwise, and
β is a normalizing factor chosen to meet the second relation.
Siklis given by equation 12:
Sikl=
C
∑
c=1
DikcDilc
(12)
3) Degree of Match Between x and xq
spect to Ai:when computing the posterior probability
Pq(Qx), the Bayesian inference algorithm fuses probabilities
Pq(QAi)P(Ai) coming from each node Aiconnected to Q
(see Fig. 3 (b)). In the remainder of this paper, probability
dmi(x,xq) = Pq(QAi)P(Ai) is referred to as the degree of
match between x and xq with respect to attribute Ai. This
degree of match can be rewritten as follows:
∑
where αij(x), the membership degree of x to the state
aij of Ai, is known or computed by the Bayesian net
work. Pq(QAi = aij) is chosen proportional to rij =
∑M
sequence, the reference documents maximizing the semantic
similarity with xq will maximize Pq(Qx), which was our
objective. Computation details for Pq(QAi= aij) are given
in appendix B.
With Re
dmi(x,xq) =
j
Pq(QAi= aij)αij(x)
(13)
k=1αik(xq)Sijk. It implies that dmi(x,xq) is proportional
to the semantic similarity between x and xq (13). As a con
F. Retrieval Process
The different reference documents in the database are then
processed sequentially. To process a document x, every avail
able attribute for x is processed as evidence and Lauritzen
and Spiegelhalter’s inference algorithm is used to compute the
posterior probability of each variable, the posterior probability
of Q, Pq(Qx), in particular (see Fig. 4 (a)). The reference
documents in the database are then ranked in decreasing order
of the computed posterior probability Pq(Qx).
(a) Bayesian network based method
(b) Bayesian network + DSmT based method
Fig. 4.
by the proposed methods. In this example, attributes A6, A7, A14, A15,
A16, A20, A22and A23are available for xq. Evidence nodes are colored in
gray and target nodes are brightly encircled. In Fig. (b), the fusion system is
colored in gray (⊕).
Assessing the relevance of a reference document x for the query
III. BAYESIAN NETWORK AND DEZERTSMARANDACHE
BASED RETRIEVAL
A. Description of the DezertSmarandache Theory
The DezertSmarandache Theory (DSmT) of plausible and
paradoxical reasoning, proposed in recent years [17], lets us
combine any types of independent sources of information rep
resented in term of belief functions. It generalizes the theory of
belief functions (DempsterShafer Theory  DST) [30], which
itself generalizes the Bayesian theory, used in the system
above. DSmT is mainly focused on the fusion of uncertain,
highly conflicting and imprecise sources of evidence.
Let θ = {θ1,θ2,...} be a set of hypotheses under consideration
for the fusion problem; θ is called the frame of discern
ment. For our problem, θ = {¯Q,Q}. In Bayesian theory,
a probability P(θi) is assigned to each element θi of the
frame, such that∑
power set 2θ={∅,Q,¯Q,Q ∪¯Q}, i.e. the set of all composite
that m(∅) = 0 and∑
confidence intervals on probabilities: depending on external
circumstances, the probability of Q can range from m(Q)
and m(Q) + m(Q ∪¯Q). DSmT takes one step further: a
(generalized) belief mass m(A) is assigned to each element
A of the hyperpower set D(θ) ={∅,Q,¯Q,Q ∩¯Q,Q ∪¯Q},
of θ with ∩ and ∪ operators, such that m(∅) = 0 and
∑
for each source of information, i = 1..M (mifunctions used
θi∈θP(θi) = 1. More generally, in DST,
a belief mass m(A) is assigned to each element A of the
propositions built from elements of θ with ∪ operators, such
A∈2θm(A) = 1. Belief masses let us
express our uncertainty; it is possible for instance to define
i.e. the set of all composite propositions built from elements
A∈D(θ)m(A) = 1.
The belief mass functions mimust be first specified by the user
Page 6
IEEE TRANSACTIONS ON MEDICAL IMAGING6
in our system are described below, in paragraph IIIC). Then,
mass functions miare fused into the global mass function mf,
according to a given rule of combination. Another difference
between DST and DSmT comes from the underlying rules
of combinations. Several rules, designed to better manage
conflicts between sources, were proposed in DSmT, including
the hybrid rule of combination [17] and the Proportional
Conflict Redistribution (PCR) rules [31]. It is possible to
introduce constraints in the model [17]: we can specify pairs of
incompatible hypotheses (θa,θb), i.e. each subset A of θa∩θb
must have a null mass, noted A ∈ C(θ).
Once the fused mass function mfhas been computed, we can
compute the belief (credibility) and the plausibility of each
hypothesis A (or any other element of D(θ)) as follows:
∑
∑
Belief and plausibility are respectively pessimistic and opti
mistic. Pignistic probability [32], a possible compromise, is
used instead (see below, in paragraph IIID); other probabilis
tic transformations are available [33].
Bel(A) =
Bi⊆A,Bi∈D(θ)
mf(Bi)
(14)
Pl(A) =
Bi∩A∈C(θ)∪∅,Bi∈D(θ)
mf(Bi) = 1 − Bel(¯A) (15)
B. Link with Bayesian Network based Retrieval
Our motivation for using the theory of belief functions,
instead of the Bayesian theory, is that the former lets us model
our confidence in each source of information, instead of taking
each piece of information at face value. This property is partic
ularly attractive for a medical decision support system where
heterogeneous sources of information, with varying reliability,
are combined. Because its fusion operators better manage
conflicting sources of information, a common occurrence when
these sources are unreliable, DSmT was used instead of the
original theory of belief functions.
In the Bayesian network based method (see section II), the
relevance of a reference document for the query, according to
a given attribute Ai, has been estimated through the design
of conditional probabilities Pq(QAi= aij). The M sources
of information (represented by the network variables Ai,
i = 1..M) were then fused by the Bayesian network inference
algorithm (see Fig. 3 (b)) to compute the posterior probability
of Q, Pq(Qx), for a document x in the database. We can
translate this Bayesian fusion problem into the framework
of the belief mass theory. Let θ = {¯Q,Q} be the frame of
discernment. For each source i (Ai), we defined (13) a degree
of match dmi(x,xq) between x and the query xq, which may
be viewed as the belief mass mi(Q) assigned to hypothesis Q
and consequently mi(¯Q)=1−mi(Q) was assigned to¯Q.
In that first approach, we did not model our confidence in
the estimation of the relevance provided by each source of
evidence (through the design of conditional probabilities). And
poor estimations of the relevance provided by some sources
might mislead the computation of the fused estimation. So we
would like to give more importance in the fusion process to
the trusted sources of evidence. We propose to use DSmT to
model our confidence in each source of evidence, as explained
below.
C. System Design
QUERY  case in the testing setOFFLINE  on the training set
Find the optimal test Ti
on each attribute Ai
(section IIIC).
Intermediate network
(Fig 2 / Fig 3(a))
Correlations (Fig 2)
For each attribute Ai,
compute the degree of match
between x and the query
(equation 13).
Case x in the training set
Belief mass function for Ai
(sections IIIC)
Cases in the training set
ranked in decreasing
order of betP(Q)
Threshold, sensitivity
and specificity of each
test Ti
Membership degree
of x to each state
of each variable
Fusing the belief masses
according to the PCR rule
(sections IIIA)
Fig. 5. Bayesian Network and DezertSmarandache based Retrieval.
To extend the previous method in the DSmT framework,
we assign a mass not only to Q and¯Q, but to each element
in D(θ) =
Q ∩¯Q is meaningless, so we only assign a mass to elements
in D(θ) \ Q ∩¯Q =
Shafer’s model [30]).
To compute the belief masses for a given source of infor
mation i, we defined a test Tion the degree of match dmi:
Ti(x,xq) is true if dm(x,xq) >= τi, 0 ≤ τi≤ 1, and false
otherwise. The mass functions are then assigned according to
Ti(x,xq):
• if Ti(x,xq) is true:
– mi(Q) = P(Ti(x,xq)x is relevant for xq) (the
sensitivity of Ti)
– mi(Q ∪¯Q) = 1 − mi(Q)
– mi(¯Q) = 0
• else
– mi(¯Q) = P(Ti(x,xq)xisnotrelevantforxq) (the
specificity of Ti)
– mi(Q ∪¯Q) = 1 − mi(¯Q)
– mi(Q) = 0
The sensitivity (resp. the specificity) represents the degree
of confidence in a positive (resp. negative) answer to test
Ti; mi(Q ∪¯Q) is assigned the degree of uncertainty. The
sensitivity of Ti, for a given threshold τi, is defined as
the percentage of pairs of training documents (y1,y2) from
the same class such that Ti(y1,y2) is true. Similarly, the
specificity of Tiis defined as the percentage of pairs of training
documents (z1,z2) from different classes such that Ti(z1,z2)
is false. Test Tiis relevant if it is both sensitive and specific.
As τiincreases, sensitivity increases and specificity decreases.
So, we set τias the intersection of the two curves “sensitivity
{∅,Q,¯Q,Q ∩¯Q,Q ∪¯Q}. Assigning a mass to
{∅,Q,¯Q,Q ∪¯Q}
= 2θ(it is actually
Page 7
IEEE TRANSACTIONS ON MEDICAL IMAGING7
TABLE I
STRUCTURED CONTEXTUAL INFORMATION FOR DIABETIC RETINOPATHY PATIENTS
category attributespossible values
general clinical context
family clinical context
medical clinical context
surgical clinical context
ophthalmologic
clinical context
diabetes type
diabetes duration
diabetes stability
treatments
diabetes, glaucoma, blindness, misc.
arterial hypertension, dyslipidemia, protenuria, renal dialysis, allergy, misc.
cardiovascular, pancreas transplant, renal transplant, misc.
cataract, myopia, AMD, glaucoma, unclear medium,
cataract surgery, glaucoma surgery, misc.
none, type I, type II
< 1 year, 1 to 5 years, 5 to 10 years, > 10 years
good, bad, fast modifications, glycosylated hemoglobin
insulin injection, insulin pump, antidiabetic drug + insulin,
antidiabetic drug, pancreas transplant
none, systematic ophthalmologic screening  known diabetes, recently
diagnosed diabetes by checkup, diabetic diseases other than ophthalmic ones
none, infection, unilateral decreased visual acuity (DVA), bilateral DVA,
neovascular glaucoma, intraretinal hemorrhage, retinal detachment, misc.
focal edema, diffuse edema, none, ischemic
examination and diabetes context
eye symptoms reported
ophthalmologically
symptomatic
ophthalmologically
asymptomatic
maculopathy
before the angiography test
maculopathy
according to τi” and “specificity according to τi”. A binary
search is used to find the optimal τi.
D. Retrieval Process
To process a reference document x, every available attribute
for x is processed as evidence and Lauritzen and Spiegel
halter’s inference algorithm is used to estimate αij(x) ∀j,
i = 1..M. If the ithattribute of xq is available, the degree
of match dmi(x,xq) is computed according to αij(x) (see
equation 13) and the belief masses are computed according
to test Ti(x,xq). The sources available for xqare then fused.
Usual rules of combination have a time complexity exponential
in M, which might be a limitation. So we proposed a rule
of combination for twohypotheses problems (Q and¯Q in
our application), adapted from the PCR rules, with a time
complexity polynomial in M [34]. Once the sources available
for xq are fused by the proposed rule of combination, the
pignistic probability betP(Q) is computed following equation
16.
betP(Q) = mf(Q) +mf(Q ∪¯Q)
2
(16)
The process is illustrated in Fig. 4 (b) and Fig. 5. The reference
documents are then ranked in decreasing order of betP(Q).
IV. APPLICATION TO MEDICAL IMAGE DATABASES
The proposed method has been applied to CADx on two
heterogeneous databases. First, it has been applied to diabetic
retinopathy severity assessment on a dataset (DRD) built at the
Inserm U650 laboratory, in collaboration with ophthalmolo
gists of Brest University Hospital. Then, it has been applied to
breast cancer screening on a public access database (DDSM).
A. Diabetic Retinopathy Database (DRD)
The diabetic retinopathy database contains retinal images of
diabetic patients, with associated anonymized information on
the pathology. Diabetes is a metabolic disorder characterized
by sustained inappropriately high blood sugar levels. This
progressively affects blood vessels in many organs, which may
lead to serious renal, cardiovascular, cerebral and also retinal
complications. The latter case, namely diabetic retinopathy,
can lead to blindness. The database consists of 67 patient
files containing 1112 photographs altogether. Images have a
definition of 1280 pixels/line for 1008 lines/image. They are
lossless compressed images. Patients have been recruited at
Brest University Hospital (France) since June 2003 and images
were acquired by experts using a Topcon Retinal Digital
Camera (TRC50IA) connected to a computer. An image series
is given in Fig. 6.
(a)(b)(c)(d) (e)
(f)(g)(h)(i)(j)
Fig. 6.
photographs obtained with different color filters. Images (d) to (j) constitute
a temporal angiographic series: a contrast agent (fluorescein) is injected and
photographs are taken at different stages (early (d), intermediate (e)(i), late
(j)). At the intermediate stage, photographs from the periphery of the retina
are available.
Photograph sequence of a patient eye. Images (a), (b) and (c) are
The contextual information available is the age and sex of
the patient, as well as structured medical information (see table
I). Patients records consist of at most 10 images per eye (see
Fig. 6) and 13 contextual attributes; 12.1% of these images
and 40.5% of these contextual attribute values are missing.
The disease severity level, according to ICDRS classification
[35], was assessed by a single expert for all 67 patients:
because of intraobserver variability, the reference standard is
imperfect. The distribution of the disease severity among the
abovementioned 67 patients is given in table II.
B. Digital Database for Screening Mammography (DDSM)
The DDSM project [36], involving the Massachusetts Gen
eral Hospital, the University of South Florida and the San
dia National laboratories, has built a mammographic image
database for research on breast cancer screening. It consists
of 2277 patient files. Each of them includes two images of
Page 8
IEEE TRANSACTIONS ON MEDICAL IMAGING8
TABLE II
PATIENT DISEASE SEVERITY DISTRIBUTION
database disease severitynumber of
patients
7
9
22
9
9
11
695
669
913
DRD
no apparent diabetic retinopathy
mild nonproliferative
moderate nonproliferative
severe nonproliferative
proliferative
treated/non active diabetic retinopathy
normal
benign
cancer
DDSM
each breast, associated with patient information (age at time
of study, subtlety rating for abnormalities, American College
of Radiology breast density rating and keyword description
of abnormalities) and image information (scanner, spatial
resolution, ...). The following contextual attributes are used
in this study:
• the age at time of study
• the breast density rating
Images have a varying definition, of about 2000 pixels/line for
5000 lines/image. An example of image sequence is given in
Fig. 7. There is no missing information in DDSM.
(a)(b)(c)(d)
Fig. 7.
two views of the left breast, (c) and (d) are two views of the right one.
Mammographic image sequence of the same patient. (a) and (b) are
Each patient file has been graded by a physician. Patients
are then classified in three groups: normal, benign and cancer.
The distribution of grades among the patients is given in table
II. The reference standard is also affected by intra and inter
observer variability in this dataset.
C. Objective of the System
Definition: let xq be a query document, and x1,x2,...,xK
be its K most similar documents within the training set. The
precision at K for xq is the fraction of documents, among
{x1,x2,...,xK}, that belong to the same class as xq.
For each query document, we want to retrieve the most
similar reference documents in a given database. Satisfaction
of the user’s needs can thus be assessed by the precision at
K. The average precision at K measures how good a fusion
method is at combining featurespecific distance measures into
a semantically meaningful distance measure.
D. Patient File Features
In those databases, each patient file consists of both digital
images and contextual information. Contextual attributes (13
in DRD, 2 in DDSM) are processed asis in the CBR system.
Images need to be processed in order to extract relevant
digital features. A possible solution is to segment these images
and extract domain specific information (such as the number
of lesions); for DRD, the number of automatically detected
microaneurysms (the most frequent lesion of diabetic retinopa
thy) [37] is used. However, this kind of approach requires
expert knowledge and a robust segmentation of images, which
is not always possible because of acquisition variability. So,
an additional solution to characterize images by their digital
content, without segmenting images, is proposed: a feature
vector is extracted from the wavelet decomposition of the
image [25]. An image signature is computed for each image
field in a document (4 in DDSM: RCC, RMLO, LCC, LMLO
and 10 in DRD); each image signature is associated with
an attribute (see section IIC). In conclusion, there are 24
attributes in DRD and 6 attributes in DDSM.
E. Training and Test Sets
Retrieval performance is assessed as follows. Both datasets
are randomly divided into five subsets V1,V2,...,V5of equal
size. Each subset Vi,i = 1..5, is used in turn as test set while
the remaining four subsets are used for training the retrieval
system. Note that the test set is completely independent from
the training process.
F. Results
The number of documents proposed by the system is
typically set to K ∈ {5,10,20}. Precisions obtained with each
fusion method are reported in table III. Because the cardinality
of each class is small in DRD, performance was expected
to decrease as K increases. For both databases, at K = 5,
the average precision is greater than 0.8; it means that, on
average, more than 80% of the selected documents are relevant
for a query. We can see that, on DRD, the use of DSmT
increases the average precision at K = 5 by about 10%, but
not on DDSM. This can be explained by the the fact that, on
DRD, many sources of information are contextual: less reliable
similarity measures are derived from these contextual sources
(the sensitivity/specificity values of the corresponding tests
Tiare lower), hence the interest of DSmT for this database.
To assess the performance of the proposed fusion framework,
independently of the underlying image signatures (described
in IIC1), it was compared to an early fusion [6] and a late
fusion method [8] based on the same image signatures. The
results we obtained for these methods are summarized in table
III.
The average computation time to retrieve the five closest
documents for the second method is given in table IV (compu
tation times are similar with the first method). Clearly, most of
the time is spent during the computation of image signatures.
All experiments were conducted using an AMD Athlon 64bit
based computer running at 2 GHz.
To study the robustness of the method with respect to
missing values the following test was carried out:
• for each document xiin the database, 100 new documents
were generated as follows. Let ni be the number of
Page 9
IEEE TRANSACTIONS ON MEDICAL IMAGING9
TABLE III
PRECISION OBTAINED WITH DIFFERENT METHODS
Dataset DRD
10
DDSM
10
0.813±0.179
0.801±0.185
0.799±0.186
0.740±0.194
0.731±0.192
0.717±0.191
Number of retrieved documents (K)
Bayesian network (see section II)
Bayesian network + DSmT (see section III)
Bayesian network + DSmT (simplified signature computation)
Bayesian network + DSmT (images only)
Early fusion [6]
Late fusion [8]
5 205 20
0.704±0.168
0.809±0.158
0.806±0.158
0.704±0.176
0.430±0.207
0.394±0.210
0.654±0.174
0.693±0.165
0.693±0.165
0.640±0.181
0.448±0.203
0.431±0.194
0.551±0.191
0.590±0.180
0.587±0.180
0.529±0.200
0.432±0.212
0.427±0.204
0.821±0.177
0.803±0.182
0.800±0.184
0.759±0.192
0.714±0.193
0.703±0.192
0.798±0.191
0.787±0.188
0.787±0.189
0.725±0.194
0.718±0.196
0.700±0.200
TABLE IV
COMPUTATION TIMES FOR THE DSMT BASED METHOD
database
retrieval (once signatures are computed)
Do’s generalized Gaussian estimation method
computing the signatures (for 1 image)
average retrieval time (the average number of images
per document is ? 9 for DRD and 4 for DDSM)
Simplified generalized Gaussian estimation method — see appendix A
computing the signatures (for 1 image)
average retrieval time
DRD
0.37 s
DDSM
4.67 s
4.57 s
40.58 s
35.89 s
148.27 s
0.25 s
2.58 s
2.23 s
13.59 s
attributes available for xi, each new example was ob
tained by removing a number of attribute values randomly
selected in {0,1,...,ni}.
• the precision at five obtained for these generated docu
ments, with respect to the number of available attributes,
was plotted in Fig. 8.
Fig. 8.
returned at random when no attributes are available (0 on the xaxis).
Robustness with respect to missing values. Note that documents are
Finally, for comparison purposes, the proposed system was
applied to abnormal (‘benign’ or ‘cancer’) versus ‘normal’
document classification:
• for each document xiin the database (1364 abnormal and
695 normal), an abnormality index a(xi) was defined;
a(xi) is the percentage of abnormal documents among
the topmost K results (if xibelongs to Vj, then the results
are selected within the database minus Vj),
• the ReceiverOperating Curve (ROC) [38] of a(.) was
plotted and the area under this curve, noted Az, was
computed.
An area under the ROC curve of Az= 0.921, Az= 0.917 and
Az= 0.914 was obtained for K = 5, K = 10 and K = 20,
respectively. In comparison, for the task of classifying regions
of interest of 512 × 512 pixels (489 malignant masses, 412
benign masses and 919 normal breasts), Mazurowski et al.
obtained an area under the ROC curve of Az= 0.907±0.024
using mutual information [38].
V. DISCUSSION AND CONCLUSIONS
In this paper, we introduced two methods to include image
series and their signatures, with contextual information, in a
CBR system. The first method uses a Bayesian network to
model the relationships between attributes. It allows us to
manage missing information, and to fuse several sources of in
formation. In particular, a method to include image signatures
in a Bayesian network was proposed. In this first method, we
modeled the relevance of a reference document in the database
for the query, according to a given attribute Ai, through the
design of conditional probabilities Pq(Ai= aijQ). The sec
ond method, based on the DezertSmarandache theory, extends
the first one by improving the fusion operator: we modeled
our confidence in each estimation of the relevance through
the design of belief mass functions. These methods have
been successfully applied to two medical image databases.
These methods are generic: they can be extended to databases
containing sound, video, etc. The wavelet transform based
signature, presented in section IIC, can be applied to any
ndimensional digital signal, using its ndimensional wavelet
transform (n = 1 for sound, n = 3 for video, etc) [39].
Extending the proposed image signature to ndimensional
wavelet transforms is trivial: characterizing the distribution
of wavelet coefficients simply implies iterating over rows,
columns, depth (or time), etc., instead of rows and columns for
a 2D image (see appendix A). The proposed methods are also
convenient in the sense that they do not need to be retrained
each time a new document is included in the database.
The precision at five obtained for DRD (0.809±0.158) is
particularly interesting, considering the few examples avail
able, the large number of missing values and the large number
of classes taken into account. On this database, the methods
outperform usual methods by almost a factor of 2 in terms
of precision at 5. The improvement is also noticeable on
DDSM (0.821±0.177 compared to 0.714±0.193). The pro
posed retrieval methods are fast: most of the computation
time is spent during the image processing steps. The code
may be parallelized to decrease computation times further.
Moreover, sufficient precision can be reached before all the
Page 10
IEEE TRANSACTIONS ON MEDICAL IMAGING10
attributes are provided by the user. As a consequence, the user
can stop formulating his query when the returned results are
satisfactory. On DRD for instance, a precision at five of 0.6
can be reached by providing less than 30% of the attributes
(see Fig. 8): with this precision, the majority of the retrieved
documents (3 out of 5) belong to the right class. Table III
shows that the difference, in terms of retrieval performance,
between single image retrieval [25] and heterogeneous docu
ment retrieval, comes from the combination of image features
extracted from several images, more than the inclusion of
contextual attributes.
This study has three limitations. First, only one type of
image feature [25] has been included in the retrieval system
(two for DRD [25], [37]). In particular, the inclusion of
applicationspecific image features will have to be validated
on several medical image databases. Second, the reference
standards are affected by inter and intra observer variability,
further validation and observer studies are needed. Finally,
as it has been shown by Cheng et al., the size of the
dataset has an influence on the correctness of the generated
Bayesian networks. DRD, in particular, is small compared
to the datasets used to validate Bayesian network generation
methods [24]. The limited size of the dataset may also impact
the performance on the test set, especially if K is larger than
(or is in the order of) the number of cases belonging to some
of the classes within the dataset.
As a conclusion, using appropriate information fusion oper
ators, heterogeneous case retrieval in medical digital databases
is a powerful tool to build reliable CADx systems. In future
works, we will try to improve retrieval performance further
through the use of relevance feedback [4] and through the
inclusion of localized image features. A web interface, that
will permit relevance feedback, is being developed to allow
assessment of clinical usefulness by physicians.
APPENDIX A
FAST PARAMETER ESTIMATION FOR GENERALIZED
GAUSSIAN DISTRIBUTIONS
In Do’s parameter estimation method [27], the parameters of
the wavelet coefficient distribution in a M ×N subband X =
{xi,j,i = 1..M,j = 1..N}, namely ˆ α andˆβ, are obtained by
iterating over all coefficients in this subband. For instance, ˆ α
is obtained as follows:
MN
i=1
ˆ α =
β
M
∑
N
∑
j=1
xi,jβ
1
β
(17)
where β is an approximation ofˆβ, which is iteratively refined
using the NewtonRaphson procedure [27]. The computation
of β relies, for each wavelet coefficient, on multiple eval
uations of the logarithm and the digamma function, which
implies slow computations.
We propose to significantly reduce the number of such evalu
ations by applying Do’s estimation method, not directly to X,
but to a histogram of X:
1) the standard deviation σ of X is computed,
2) a Bbins histogram of X, restricted to the [−nσ;nσ]
interval, is computed (we used B=64 and n=5 — these
numbers were chosen to reduce the approximation error
on an independent dataset1),
3) let hkbe the number of coefficients assigned to the kth
bin, and vkthe centroid of that bin
vk= −nσ +(k −1
Equation 17 becomes
2
)2nσ
B
(18)
ˆ α =
(
β
MN
B
∑
k=1
hkvkβ
)1
β
(19)
All other equations in [27] are modified similarly.
APPENDIX B
Pq(Ai= aijQ): COMPUTATION DETAILS
For each attribute Ai, i = 1..M, we want Pq(QAi= aij)
to be proportional to rij =∑M
aiargmaxj(rij)). Let ˜ rij=
have to be satisfied:
k=1αik(xq)Sijk (see section
IIE). In that purpose, we first determine pi = Pq(QAi =
rij
maxk(rik). The following constraints
Pq(QAi= aij) + Pq(¯QAi= aij)
∑
∑
where Pq(Q), Pq(¯Q) and P(Ai= aij) are prior probabilities.
Injecting piand ˜ rijin equation 21, we obtain equation 23.
∑
piis then extracted from equation 23:
=1
(20)
(21)
j
Pq(QAi= aij)P(Ai= aij)=Pq(Q)
j
Pq(¯QAi= aij)P(Ai= aij)=Pq(¯Q)
(22)
j
pi.˜ rij.P(Ai= aij) = Pq(Q),i = 1..M
(23)
pi=
Pq(Q)
∑
j˜ rij.P(Ai= aij),i = 1..M
(24)
Once piis computed, Pq(¯QAi= aiargmaxj(rij)) = 1−pican
be computed (see equation 20). Other conditional probabilities
are deduced from the definition of ˜ rij : Pq(QAi = aij) =
pi.˜ rij.
If the most desirable state for attribute Ai(argmaxj(rij)) is
a rare state, it is possible that pi > 1. Indeed, in constraint
21, Pq(QAi= aiargmaxk(rik)) is multiplied by a small value
(P(Ai= aiargmaxk(rik))), the result of this product is small
and the other terms of the sum (with a value Pq(QAi= aij)
smaller than Pq(QAi= aiargmaxj(rij)) by definition) might
be too small for the sum to reach Pq(Q). In that case, the
conditional probabilities should be changed as follows:
• we set pi= 1,
• each ˜ rij, j ?= argmaxk(rik), is multiplied by a constant
γ > 0.
1http://vismod.media.mit.edu/vismod/imagery/VisionTexture/vistex.html
Page 11
IEEE TRANSACTIONS ON MEDICAL IMAGING11
With this setup, constraint 21 becomes equation 25.
∑
Finally, γ is extracted from equation 26 and conditional
probabilities from equation 27.
P(Ai= aij) +
j?=argmaxk(rik)
γ.˜ rij.P(Ai= aij) = Pq(Q)
(25)
γ =
Pq(Q) − P(Ai= aiargmaxj(rij))
∑
Pq(QAi= aij) = γ.˜ rij,
The inequality Pq(Q) ≥ P(Ai = aiargmaxk(rik)) always
holds, as a consequence γ > 0. Indeed Pq(Q) ≥ Pq(QAi=
aiargmaxk(rik))P(Ai = aiargmaxk(rik)) (according to con
straint 21), i.e. Pq(Q) ≥ pi.P(Ai = aiargmaxk(rik)); given
that pi= 1, the following inequality holds: Pq(Q) ≥ P(Ai=
aiargmaxk(rik)).
j?=argmaxk(rik)˜ rij.P(Ai= aij)
j ?= argmaxj(rij)
(26)
(27)
REFERENCES
[1] A. Aamodt, “Casebased reasoning: Foundational issues, methodological
variations, and system approaches,” AI Communications, vol. 7, no. 1,
pp. 39–59, 1994.
[2] I. Bichindaritz and C. Marling, “Casebased reasoning in the health
sciences: What’s next?” Artif Intell Med, vol. 36, no. 2, pp. 127–135,
2006.
[3] J.M. Cauvin, C. le Guillou, B. Solaiman, M. Robaszkiewicz, P. le Beux,
and C. Roux, “Computerassisted diagnosis system in digestive en
doscopy,” IEEE Trans Inf Technol Biomed, vol. 7, no. 4, pp. 256–262,
2003.
[4] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,
“Contentbased image retrieval at the end of the early years,” IEEE Trans
Pattern Anal Mach Intell, vol. 22, no. 12, pp. 1349–1380, 2000.
[5] H. M¨ uller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of
contentbased image retrieval systems in medical applications  clinical
benefits and future directions,” Int J Med Inform, vol. 73, no. 1, pp.
1–23, 2004.
[6] D. R. Wilson and T. R. Martinez, “Improved heterogeneous distance
functions,” J Artif Intell Res, vol. 6, pp. 1–34, 1997.
[7] R. K. Srihari, A. Rao, B. Han, S. Munirathnam, and X. Wu, “A model
for multimodal information retrieval,” in IEEE International Conference
on Multimedia and Expo, New York City, NY, USA, 2000, pp. 701–704.
[8] R. Nuray and F. Can, “Automatic ranking of information retrieval
systems using data fusion,” Inf Process Manag, vol. 42, no. 3, pp. 595–
614, 2006.
[9] S. Wu and S. McClean, “Performance prediction of data fusion for
information retrieval,” Inf Process Manag, vol. 42, no. 4, pp. 899–915,
2006.
[10] H. Shao, W.C. Cui, and H. Zhao, “Medical image retrieval based on
visual contents and text information,” in IEEE International Conference
on Systems, Man and Cybernetics, The Hague, The Netherlands, 2004,
pp. 1098–1103.
[11] Z. Zhang, R. Zhang, and J. Ohya, “Exploiting the cognitive synergy
between different media modalities in multimodal information retrieval,”
in IEEE International Conference on Multimedia and Expo, Taipei,
Taiwan, 2004, pp. 2227–2230.
[12] P. Buitelaar, P. Cimiano, A. Frank, M. Hartung, and S. Racioppa,
“Ontologybased information extraction and integration from hetero
geneous data sources,” Int J Hum Comput Stud, vol. 66, no. 11, pp.
759–788, 2008.
[13] S. Antani, L. R. Long, and G. R. Thoma, “A biomedical information
system for combined contentbased retrieval of spine xray images and
associated text information,” in Proceedings of the Indian Conference on
Computer Vision, Graphics, and Image Processing, Ahmadabad, India,
2002, pp. 242–247.
[14] C. le Bozec, E. Zapletal, M. C. Jaulent, D. Heudes, and P. Degoulet,
“Towards contentbased image retrieval in a HISintegrated PACS,” in
Proceedings of the Annual Symposium of the American Society for
Medical Informatics, Los Angeles, CA, USA, 2000, pp. 477–481.
[15] E. Chaum, T. P. Karnowski, V. P. Govindasamy, M. Abdelrahman, and
K. W. Tobin, “Automated diagnosis of retinopathy by contentbased
image retrieval,” Retina, vol. 28, no. 10, pp. 1463–1477, 2008.
[16] E. Bruno, N. MoenneLoccoz, and S. MarchandMaillet, “Design of
multimodal dissimilarity spaces for retrieval of video documents,” IEEE
Trans Pattern Anal Mach Intell, vol. 30, no. 9, pp. 1520–1533, 2008.
[17] F. Smarandache and J. Dezert, Advances and Applications of DSmT
for Information Fusion I.American Research Press Rehoboth, 2004,
http://fs.gallup.unm.edu/DSmTbook1.pdf.
[18] H. Turtle, “Inference networks for document retrieval,” Ph.D. disserta
tion, University of Massachusetts, MA, USA, 1991.
[19] M. Indrawan, “A framework for information retrieval based on bayesian
networks,” Ph.D. dissertation, Monash University, Australia, 1998.
[20] C. Wilson, B. Srinivasan, and M. Indrawan, “A general inference
network based architecture for multimedia information retrieval,” in
IEEE International Conference on Multimedia and Expo, New York City,
NY, USA, 2000, pp. 347–350.
[21] H. Ghosh, P. Poornachander, A. Mallik, and S. Chaudhury, “Learning
ontology for personalized video retrieval,” in International Multimedia
Conference, Augsburg, Germany, 2007, pp. 39–46.
[22] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference.Morgan Kaufmann, 1988.
[23] S. L. Lauritzen and D. J. Spiegelhalter, “Local computations with
probabilities on graphical structures and their application to expert
systems,” J R Stat Soc, vol. 50, no. 2, pp. 157–224, 1988.
[24] J. Cheng, R. Greiner, J. Kelly, D. Bell, and W. Liu, “Learning bayesian
networks from data: An informationtheory based approach,” Artif Intell,
vol. 137, no. 1, pp. 43–90, 2002.
[25] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, and C. Roux,
“Wavelet optimization for contentbased image retrieval in medical
databases,” Med Image Anal, vol. 14, no. 2, pp. 227–241, 2010.
[26] G. van de Wouwer, P. Scheunders, and D. van Dyck, “Statistical texture
characterization from discrete wavelet representations,” IEEE Trans
Image Process, vol. 8, no. 4, pp. 592–598, 1999.
[27] M. N. Do and M. Vetterli, “Waveletbased texture retrieval using
generalized gaussian density and KullbackLeibler distance,” IEEE Trans
Image Process, vol. 11, no. 2, pp. 146–158, 2002.
[28] J. C. Bezdek, “Fuzzy mathemathics in pattern classification,” Ph.D.
dissertation, Cornell University, NY, USA, 1973.
[29] A. Strehl, “Relationshipbased clustering and cluster ensembles for high
dimensional data mining,” Ph.D. dissertation, The University of Texas,
TX, USA, 2002.
[30] G. Shafer, A Mathematical Theory of Evidence.
Press, 1976.
[31] F. Smarandache and J. Dezert, Advances and Applications of DSmT
for Information Fusion II. American Research Press Rehoboth, 2006,
http://fs.gallup.unm.edu/DSmTbook2.pdf.
[32] P. Smets, “Constructing the pignistic probability function in a context
of uncertainty,” in Proceedings of the Fifth Annual Conference on
Uncertainty in Artificial Intelligence, NorthHolland, 1990, pp. 29–40.
[33] J. Dezert and F. Smarandache, “An introduction to DSmT,” French
Aerospace Research Lab. & University of New Mexico, Tech. Rep.,
2009, http://fs.gallup.unm.edu/IntroductionToDSmT.pdf.
[34] G. Quellec, “Indexation et fusion multimodale pour la recherche
d’information par le contenu. application aux bases de donn´ ees d’images
m´ edicales.” Ph.D. dissertation, TELECOM Bretagne, France, 2008.
[35] C. Wilkinson, F. Ferris, R. Klein, and al., “Proposed international clinical
diabetic retinopathy and diabetic macular edema disease severity scales,”
Ophthalmology, vol. 110, no. 9, pp. 1677–1682, 2003.
[36] M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. P. Kegelmeyer,
“The digital database for screening mammography,” in Proceedings of
the Fifth International Workshop on Digital Mammography, Toronto,
Canada, 2000, pp. 212–218.
[37] G. Quellec, M. Lamard, P. M. Josselin, G. Cazuguel, B. Cochener,
and C. Roux, “Optimal wavelet transform for the detection of microa
neurysms in retina photographs,” IEEE Trans Med Imaging, vol. 27,
no. 9, pp. 1230–1241, 2008.
[38] M. A. Mazurowski, P. A. Habas, J. M. Zurada, and G. D. Tourassi,
“Decision optimization of casebased computeraided decision systems
using genetic algorithms with application to mammography,” Phys Med
Biol, vol. 53, no. 4, pp. 895–908, February 2008.
[39] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, and C. Roux,
“Adaptive nonseparable wavelet transform via lifting and its application
to contentbased image retrieval,” IEEE Trans Image Process, vol. 19,
no. 1, pp. 25–35, January 2010.
Princeton University