ArticlePDF AvailableLiterature Review

A Review of Data Fusion Techniques


Abstract and Figures

The integration of data and knowledge from several sources is known as data fusion. This paper summarizes the state of the data fusion field and describes the most relevant studies. We first enumerate and explain different classification schemes for data fusion. Then, the most common algorithms are reviewed. These methods and algorithms are presented using three different categories: (i) data association, (ii) state estimation, and (iii) decision fusion.
This content is subject to copyright. Terms and conditions apply.
Hindawi Publishing Corporation
e Scientic World Journal
Volume , Article ID ,  pages.//
Review Article
A Review of Data Fusion Techniques
Federico Castanedo
Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain
Correspondence should be addressed to Federico Castanedo;
Received  August ; Accepted  September 
Academic Editors: Y. Takama and D. Ursino
Copyright ©  Federico Castanedo. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e integration of data and knowledge from several sources is known as data fusion. is paper summarizes the state of the data
fusion eld and describes the most relevant studies. We rst enumerate and explain dierent classication schemes for data fusion.
en, the most common algorithms are reviewed. ese methods and algorithms are presented using three dierent categories: (i)
data association, (ii) state estimation, and (iii) decision fusion.
1. Introduction
In general, all tasks that demand any type of parameter
estimation from multiple sources can benet from the use
of data/information fusion methods. e terms information
fusion and data fusion are typically employed as synonyms;
but in some scenarios, the term data fusion is used for
raw data (obtained directly from the sensors) and the term
information fusion is employed to dene already processed
data. In this sense, the term information fusion implies a
higher semantic level than data fusion.Othertermsassoci-
ated with data fusion that typically appear in the literature
include decision fusion, data combination, data aggregation,
multisensor data fusion, and sensor fusion.
Researchers in this eld agree that the most accepted
denition of data fusion was provided by the Joint Directors
of Laboratories (JDL) workshop []: “A m u l t i - l e v e l p r o c e s s
dealing with the association, correlation, combination of data
and information from single and multiple sources to achieve
rened position, identify estimates and complete and timely
assessments of situations, threats and their signicance.
Hall and Llinas [] provided the following well-known
denition of data fusion: “data fusion techniques combine data
from multiple sensors and related information from associated
databases to achieve improved accuracy and more specic
inferences than could be achieved by the use of a single sensor
Briey, we can dene data fusion as a combination of
multiple sources to obtain improved information; in this
context, improved information means less expensive, higher
quality, or more relevant information.
Data fusion techniques have been extensively employed
on multisensor environments with the aim of fusing and
aggregating data from dierent sensors; however, these tech-
niques can also be applied to other domains, such as text
processing. e goal of using data fusion in multisensor envi-
ronments is to obtain a lower detection error probability and
a higher reliability by using data from multiple distributed
e available data fusion techniques can be classied into
three nonexclusive categories: (i) data association, (ii) state
estimation, and (iii) decision fusion. Because of the large
number of published papers on data fusion, this paper does
not aim to provide an exhaustive review of all of the studies;
instead, the objective is to highlight the main steps that are
involved in the data fusion framework and to review the most
common techniques for each step.
e remainder of this paper continues as follows. e
next section provides various classication categories for data
fusion techniques. en, Section  describes the most com-
mon methods for data association tasks. Section  provides
a review of techniques under the state estimation category.
Next, the most common techniques for decision fusion are
enumerated in Section . Finally, the conclusions obtained
e Scientic World Journal
from reviewing the dierent methods are highlighted in
Section .
2. Classification of Data Fusion Techniques
Data fusion is a multidisciplinary area that involves several
elds, and it is dicult to establish a clear and strict classi-
cation. e employed methods and techniques can be divided
according to the following criteria:
() attending to the relations between the input data
sources, as proposed by Durrant-Whyte []. ese
relations can be dened as (a) complementary, (b)
redundant, or () cooperative data;
() according to the input/output data types and their
nature, as proposed by Dasarathy [];
() following an abstraction level of the employed data:
(a) raw measurement, (b) signals, and (c) characteris-
tics or decisions;
() based on the dierent data fusion levels dened by the
() Depending on the architecture type: (a) centralized,
(b) decentralized, or (c) distributed.
2.1. Classication Based on the Relations between the Data
Sources. Based on the relations of the sources (see Figure ),
Durrant-Whyte [] proposed the following classication
() complementary: when the information provided by
the input sources represents dierent parts of the
global information. For example, in the case of visual
sensor networks, the information on the same target
provided by two cameras with dierent elds of view
is considered complementary;
() redundant: when two or more input sources provide
information about the same target and could thus be
fused to increment the condence. For example, the
data coming from overlapped areas in visual sensor
networks are considered redundant;
() cooperative: when the provided information is com-
bined into new information that is typically more
complex than the original information. For example,
multi-modal (audio and video) data fusion is consid-
ered cooperative.
2.2. Dasarathy’s Classication. One of the most well-known
data fusion classication systems was provided by Dasarathy
[] and is composed of the following ve categories (see
Figure ):
() data in-data out (DAI-DAO): this type is the most
sidered in classication. is type of data fusion
process inputs and outputs raw data; the results
are typically more reliable or accurate. Data fusion at
gathered from the sensors. e algorithms employed
() data in-feature out (DAI-FEO): at this level, the data
fusion process employs raw data from the sources
to extract features or characteristics that describe an
entity in the environment;
() feature in-feature out (FEI-FEO): at this level, both
features. us, the data fusion process addresses a
set of features with to improve, rene or obtain new
features. is process is also known as feature fusion,
symbolic fusion, information fusion or intermediate-
level fusion;
() feature in-decision out (FEI-DEO): this level obtains a
set of features as input and provides a set of decisions
as output. Most of the classication systems that
perform a decision based on a sensor’s inputs fall into
this category of classication;
() Decision In-Decision Out (DEI-DEO): is type of
classication is also known as decision fusion. It fuses
input decisions to obtain better or new decisions.
e main contribution of Dasarathy’s classication is the
specication of the abstraction level either as an input or an
output, providing a framework to classify dierent methods
or techniques.
2.3. Classication Based on the Abstraction Levels. Luo et al.
[] provided the following four abstraction levels:
() signal level: directly addresses the signals that are
acquired from the sensors;
() pixel level: operates at the image level and could be
used to improve image processing tasks;
() characteristic: employs features that are extracted
from the images or signals (i.e., shape or velocity),
() symbol: at this level, information is represented as
symbols; this level is also known as the decision level.
Information fusion typically addresses three levels of
abstraction: () measurements, () characteristics, and ()
decisions. Other possible classications of data fusion based
on the abstraction levels are as follows:
() low level fusion: the raw data are directly provided
more accurate data (a lower signal-to-noise ratio)
than the individual sources;
() medium level fusion: characteristics or features
(shape, texture, and position) are fused to obtain
features that could be employed for other tasks. is
level is also known as the feature or characteristic
e Scientic World Journal
F : Whyte’s classication based on the relations between the data sources.
Data in-data out
Data in-feature out
Feature in-decision out
Decision in-decision out
Feature in-feature out
F : Dasarathy’s classication.
() high level fusion: this level, which is also known
as decision fusion, takes symbolic representations as
sources and combines them to obtain a more accurate
decision. Bayesian’s methods are typically employed at
this level;
() multiple level fusion: this level addresses data pro-
vided from dierent levels of abstraction (i.e., when
2.4. JDL Data Fusion Classication. is classication is the
most popular conceptual model in the data fusion commu-
nity. It was originally proposed by JDL and the American
Department of Defense (DoD) []. ese organizations clas-
sied the data fusion process into ve processing levels, an
associated database, and an information bus that connects
the ve components (see Figure ). e ve levels could be
grouped into two groups, low-level fusion and high-level
fusion, which comprise the following components:
(i) sources: the sources are in charge of providing
the input data. Dierent types of sources can be
employed, such as sensors, a priori information (ref-
erences or geographic data), databases, and human
(ii) human-computer interaction (HCI): HCI is an inter-
face that allows inputs to the system from the oper-
ators and produces outputs to the operators. HCI
includes queries, commands, and information on the
obtained results and alarms;
(iii) database management system: the database manage-
ment system stores the provided information and
the fused results. is system is a critical component
because of the large amount of highly diverse infor-
mation that is stored.
In contrast, the ve levels of data processing are dened as
() level —source preprocessing: source preprocessing
is the lowest level of the data fusion process, and
it includes fusion at the signal and pixel levels. In
the case of text sources, this level also includes the
information extraction process. is level reduces the
amount of data and maintains useful information for
the high-level processes;
() level —object renement: object renement employs
the processed data from the previous level. Com-
mon procedures of this level include spatio-temporal
alignment, association, correlation, clustering or
grouping techniques, state estimation, the removal of
false positives, identity fusion, and the combining of
features that were extracted from images. e output
e Scientic World Journal
Fusion domain
Level 0Level 1Level 2Level 3
Information bus
Level 4Database
(classication and identication) and object track-
ing (state of the object and orientation). is stage
transforms the input information into consistent data
() level —situation assessment: this level focuses on
a higher level of inference than level . Situation
assessment aims to identify the likely situations given
the observed events and obtained data. It establishes
relationships between the objects. Relations (i.e.,
proximity, communication) are valued to determine
the signicance of the entities or objects in a specic
environment. e aim of this level includes perform-
ing high-level inferences and identifying signicant
activities and events (patterns in general). e output
is a set of high-level inferences;
() level —impact assessment: this level evaluates the
proper perspective. e current situation is evaluated,
and a future projection is performed to identify
possible risks, vulnerabilities, and operational oppor-
tunities. is level includes () an evaluation of the
risk or threat and () a prediction of the logical
() level —process renement: this level improves the
process from level  to level  and provides resource
and sensor management. e aim is to achieve e-
cient resource management while accounting for task
priorities, scheduling, and the control of available
High-level fusion typically starts at level  because the
type, localization, movement, and quantity of the objects
are known at that level. One of the limitations of the JDL
method is how the uncertainty about previous or subsequent
results could be employed to enhance the fusion process
(feedback loop). Llinas et al. [] propose several renements
and extensions to the JDL model. Blasch and Plano []
proposed to add a new level (user renement) to support a
terminology for the data fusion domain. However, because
their roots originate in the military domain, the employed
terms are oriented to the risks that commonly occur in
these scenarios. e Dasarathy model diers from the JDL
model with regard to the adopted terminology and employed
approach. e former is oriented toward the dierences
among the input and output results, independent of the
employed fusion method. In summary, the Dasarathy model
provides a method for understanding the relations between
the fusion tasks and employed data, whereas the JDL model
presents an appropriate fusion perspective to design data
fusion systems.
2.5. Classication Based on the Type of Architecture. One of
the main questions that arise when designing a data fusion
system is where the data fusion process will be performed.
Based on this criterion, the following types of architectures
could be identied:
() centralized architecture: in a centralized architecture,
the fusion node resides in the central processor that
receives the information from all of the input sources.
erefore, all of the fusion processes are executed
in a central processor that uses the provided raw
measurements from the sources. In this schema, the
sources obtain only the observationas measurements
data fusion process is performed. If we assume that
data alignment and data association are performed
correctly and that the required time to transfer the
data is not signicant, then the centralized scheme is
theoretically optimal. However, the previous assump-
tions typically do not hold for real systems. Moreover,
the large amount of bandwidth that is required to send
raw data through the network is another disadvantage
for the centralized approach. is issue becomes a
bottleneck when this type of architecture is employed
the time delays when transferring the information
between the dierent sources are variable and aect
e Scientic World Journal
the results in the centralized scheme to a greater
degree than in other schemes;
() decentralized architecture: a decentralized architec-
ture is composed of a network of nodes in which each
node has its own processing capabilities and there is
fuses its local information with the information that
is received from its peers. Data fusion is performed
autonomously, with each node accounting for its local
information and the information received from its
peers. Decentralized data fusion algorithms typically
communicate information using the Fisher and Shan-
non measurements instead of the object’s state [];
e main disadvantage of this architecture is the
communication cost, which is (2)at each com-
munication step, where is the number of nodes;
additionally, the extreme case is considered, in which
each node communicates with all of its peers. us,
this type of architecture could suer from scalability
problems when the number of nodes is increased;
() distributed architecture: in a distributed architecture,
measurements from each source node are processed
independently before the information is sent to the
fusion node; the fusion node accounts for the infor-
mation that is received from the other nodes. In other
words, the data association and state estimation are
performed in the source node before the information
is communicated to the fusion node. erefore, each
node provides an estimation of the object state based
on only their local views, and this information is
the input to the fusion process, which provides a
fused global view. is type of architecture provides
dierent options and variations that range from only
one fusion node to several intermediate fusion nodes;
() hierarchical architecture: other architectures com-
prise a combination of decentralized and distributed
nodes, generating hierarchical schemes in which the
data fusion process is performed at dierent levels in
the hierarchy.
In principle, a decentralized data fusion system is more
communication requirements. However, in practice, there is
no single best architecture, and the selection of the most
appropriate architecture should be made depending on the
requirements, demand, existing networks, data availability,
node processing capabilities, and organization of the data
fusion system.
e reader might think that the decentralized and
distributed architectures are similar; however, they have
meaningful dierences (see Figure ). First, in a distributed
architecture, a preprocessing of the obtained measurements is
performed, which provides a vector of features as a result (the
features are fused thereaer). In contrast, in the decentralized
architecture, the complete data fusion process is conducted
in each node, and each of the nodes provides a globally
fused result. Second, the decentralized fusion algorithms
typically communicate information, employing the Fisher
and Shannon measurements. In contrast, distributed algo-
rithms typically share a common notion of state (position,
velocity, and identity) with their associated probabilities,
which are used to perform the fusion process []. ird,
because the decentralized data fusion algorithms exchange
information instead of states and probabilities, they have
the advantage of easily separating old knowledge from new
knowledge. us, the process is additive, and the associative
meaning is not relevant when the information is received
and fused. However, in the distributed data fusion algorithms
(i.e., distributed by Kalman Filter), the state that is going
to be fused is not associative, and when and how the fused
estimates are computed is relevant. Nevertheless, in contrast
to the centralized architectures, the distributed algorithms
reduce the necessary communication and computational
costs because some tasks are computed in the distributed
nodes before data fusion is performed in the fusion node.
3. Data Association Techniques
e data association problem must determine the set of
measurements that correspond to each target (see Figure ).
Let us suppose that there are targets that are being tracked
by only one sensor in a cluttered environment (by a cluttered
environment, we refer to an environment that has several
targets that are to close each other). en, the data association
problem can be dened as follows:
(i) each sensor’s observation is received in the fusion
node at discrete time intervals;
(ii) the sensor might not provide observations at a specic
(iii) some observations are noise, and other observations
originate from the detected target;
(iv) for any specic target and in every time interval, we
do not know (a priori) the observations that will be
generated by that target.
erefore, the goal of data association is to establish the
the same target over time. Hall and Llinas []providedthe
following denition of data association: “e process of assign
and compute the weights that relates the observations or tracks
(A track can be dened as an ordered set of points that follow
a path and are generated by the same target.) from one set to
the observation of tracks of another set.
As an example of the complexity of the data association
problem, if we take a frame-to-frame association and assume
that possible points could be detected in all frames, then
movement of the points.
Data association is oen performed before the state
estimation of the detected targets. Moreover, it is a key
step because the estimation or classication will behave
incorrectly if the data association phase does not work
coherently. e data association process could also appear in
all of the fusion levels, but the granularity varies depending
on the objective of each level.
e Scientic World Journal
Alignment Association Estimation
of the
Centralized architecture
Decentralized architecture
Distributed architecture
Fusion node
of the
of the
of the
of the
F : Classication based on the type of architecture.
In general, an exhaustive search of all possible combina-
tions grows exponentially with the number of targets; thus,
the data association problem becomes NP complete. e
most common techniques that are employed to solve the data
association problem are presented in the following sections
(from Sections . to .).
3.1. Nearest Neighbors and K-Means. Nearest neighbor
(NN) is the simplest data association technique. NN is
a well-known clustering algorithm that selects or groups
the most similar values. How close the one measurement is
to another depends on the employed distance metric and
typically depends on the threshold that is established by the
designer. In general, the employed criteria could be based on
() an absolute distance, () the Euclidean distance, or () a
statistical function of the distance.
imate) solution in a small amount of time. However, in a
cluttered environment, it could provide many pairs that have
e Scientic World Journal
Targets Sensors Observations Tracks
Track 1
Track 2
False alarms
Track n
F : Conceptual overview of the data association process from multiple sensors and multiple targets. It is necessary to establish the set
of observations over time from the same object that forms a track.
error propagation []. Moreover, this algorithm has poor
are frequent, which are in highly noisy environments.
All neighbors use a similar technique, in which all of the
measurements inside a region are included in the tracks.
-Means [] method is a well-known modication of
the NN algorithm. -Means divides the dataset values into
dierent clusters. -Means algorithm nds the best local-
ization of the cluster centroids, where best means a centroid
that is in the center of the data cluster. -Means is an iterative
algorithm that can be divided into the following steps:
() obtain the input data and the number of desired
clusters ();
() randomly assign the centroid of each cluster;
() match each data point with the centroid of each
() move the cluster centers to the centroid of the cluster;
() if the algorithm does not converge, return to step ().
-Means is a popular algorithm that has been widely
employed; however, it has the following disadvantages:
(i) the algorithm does not always nd the optimal solu-
tion for the cluster centers;
one must assume that this number is the optimum;
(iii) the algorithm assumes that the covariance of the
ere are several options for overcoming these limita-
tions. For the rst one, it is possible to execute the algorithm
several times and obtain the solution that has less variance.
For the second one, it is possible to start with a low value
of and increment the values of until an adequate result
is obtained. e third limitation can be easily overcome by
multiplying the data with the inverse ofthe covariance matrix.
Many variations have been proposed to Lloyd’s basic
-Means algorithm [], which has a computational upper
bound cost of (),whereis the number of input points
and is the number of desired clusters. Some algorithms
modify the initial cluster assignments to improve the separa-
tions and reduce the number of iterations. Others introduce
so or multinomial clustering assignments using fuzzy logic,
probabilistic, or the Bayesian techniques. However, most of
through the data space to converge to a reasonable solution.
is issue becomes a major disadvantage in several real-
time applications. A new approach that is based on having
a large (but still aordable) number of cluster candidates
compared to the desired clusters is currently gaining
attention. e idea behind this computational model is that
the algorithm builds a good sketch of the original data while
reducing the dimensionality of the input space signicantly.
In this manner, a weighted -Meanscanbeappliedtothe
large candidate clusters to derive a good clustering of the
original data. Using this idea, [] presented an ecient
and scalable -Means algorithm that is based on random
projections. is algorithm requires only one pass through
the input data to build the clusters. More specically, if the
input data distribution holds some separability requirements,
then the number of required candidate clusters grows only
according to (log ),whereisthenumberofobservations
in the original data. is salient feature makes the algorithm
scalable in terms of both the memory and computational
3.2. Probabilistic Data Association. e probabilistic data
association (PDA) algorithm was proposed by Bar-Shalom
and Tse [] and is also known as the modied lter of all
neighbors. is algorithm assigns an association probability
to each hypothesis from a valid measurement of a target.
A valid measurement refers to the observation that falls in
the validation gate of the target at that time instant. e
validation gate, , which is the center around the predicted
measurements of the target, is used to select the set of basic
measurements and is dened as
(|−1))−1 ()(()−(|−1)),()
where is the temporal index, ()is the covariance gain,
and determines the gating or window size. e set of valid
measurements at time instant is dened as
()=(), =1,...,,()
e Scientic World Journal
where ()is the -measurement in the validation region at
time instant .WegivethestandardequationsofthePDA
algorithm next. For the state prediction, consider
where (−1)is the transition matrix at time instant −1.
To calculate the measurement prediction, consider
where () is the linearization measurement matrix. To
To calculate the covariance prediction, consider
where ()is the process noise covariance matrix. To com-
pute the innovation covariance ()andtheKalmangain()
To obtain the covariance update in the case in which the mea-
surements originated by the target are known, consider
e total update of the covariance is computed as
=1 ()V()V()−V()V()(),
where is the number of valid measurements in the instant
by the position and velocity, is given by
Finally, the association probabilities of PDA are as follows:
=0 (),()
if =0
exp −1
2V()−1 ()V()if  =0
0in other cases,
where is the dimension of the measurement vector, is the
density of the clutter environment, is the detection prob-
ability of the correct measurement, and is the validation
probability of a detected value.
In the PDA algorithm, the state estimation of the target is
computed as a weighted sum of the estimated state under all
of the hypotheses. e algorithm can associate dierent mea-
surements to one specic target. us, the association of the
dierent measurements to a specic target helps PDA to
estimate the target state, and the association probabilities
algorithm are the following:
(i) loss of tracks: because PDA ignores the interference
with other targets, it sometimes could wrongly clas-
sify the closest tracks. erefore, it provides a poor
performance when the targets are close to each other
or crossed;
(ii) the suboptimal Bayesian approximation: when the
source of information is uncertain, PDA is the sub-
optimal Bayesian approximation to the association
(iii) one target: PDA was initially designed for the asso-
ciation of one target in a low-cluttered environment.
e number of false alarms is typically modeled with
the Poisson distribution, and they are assumed to be
distributed uniformly in space. PDA behaves incor-
rectly when there are multiple targets because the false
alarm model does not work well;
(iv) track management: because PDA assumes that the
track is already established, algorithms must be pro-
vided for track initialization and track deletion.
PDA is mainly good for tracking targets that do not
make abrupt changes in their movement patterns. PDA will
movement patterns.
3.3. Joint Probabilistic Data Association. Joint probabilistic
data association (JPDA) is a suboptimal approach for tracking
multiple targets in cluttered environments []. JPDA is
similar to PDA, with the dierence that the association
probabilities are computed using all of the observations
considers various hypotheses together and combines them.
JPDA determines the probability
()that measurement is
originated from target , accounting for the fact that under
this hypothesis, the measurement cannot be generated by
evaluates the dierent options of the measurement-target
association (for the most recent set of measurements) and
combines them into the corresponding state estimation. If
the association probability is known, then the Kalman lter
updating equation of the track can be written as
( | )and
(|−1)are the estimation and
prediction of target ,and()istheltergain.eweighted
e Scientic World Journal
sum of the residuals associated with the observation ()of
target is as follows:
where V
()( | 1). erefore, this method
incorporates all of the observations (inside the neighborhood
of the target’s predicted position) to update the estimated
position by using a posterior probability that is a weighted
e main restrictions of JPDA are the following:
(i) a measurement cannot come from more than one
(ii) two measurements cannot be originated by the same
target (at one time instant);
(iii) the sum of all of the measurements’ probabilities that
e main disadvantages of JPDA are the following:
(i) it requires an explicit mechanism for track initial-
ization. Similar to PDA, JPDA cannot initialize new
tracks or remove tracks that are out of the observation
(ii) JPDA is a computationally expensive algorithm when
because the number of hypotheses is incremented
exponentially with the number of targets.
In general, JPDA is more appropriate than MHT in
situations in which the density of false measurements is high
(i.e., sonar applications).
3.4. Multiple Hypothesis Test. e underlying idea of the
multiple hypothesis test (MHT) is based on using more than
two consecutive observations to make an association with
better results. Other algorithms that use only two consecutive
observations have a higher probability of generating an error.
In contrast to PDA and JPDA, MHT estimates all of the
possible hypotheses and maintains new hypotheses in each
MHT was developed to track multiple targets in cluttered
environments; as a result, it combines the data association
problem and tracking into a unied framework, becoming
an estimation technique as well. e Bayes rule or the
Bayesian networks are commonly employed to calculate the
MHT hypothesis. In general, researchers have claimed that
MHT outperforms JPDA for the lower densities of false
positives. However, the main disadvantage of MHT is the
computational cost when the number of tracks or false
a window could solve this limitation.
e Reid [] tracking algorithm is considered the stan-
dard MHT algorithm, but the initial integer programming
formulation of the problem is due to Moreeld []. MHT is
an iterative algorithm in which each iteration starts with a set
of correspondence hypotheses. Each hypothesis is a collec-
tion of disjoint tracks, and the prediction of the target in the
next time instant is computed for each hypothesis. Next, the
predictions are compared with the new observations by using
a distance metric. e set of associations established in each
hypothesis (based on a distance) introduces new hypotheses
in the next iteration. Each new hypothesis represents a new
Note that each new measurement could come from (i) a
new target in the visual eld of view, (ii) a target being tracked,
or (iii) noise in the measurement process. It is also possible
that a measurement is not assigned to a target because the
target disappears, or because it is not possible to obtain a
target measurement at that time instant.
MHT maintains several correspondence hypotheses for
each target in each frame. If the hypothesis in the instant
is represented by () = [(), = 1,...,],then
the probability of the hypothesis ()could be represented
recursively using the Bayes rule as follows:
where (−1)is the hypothesis of the complete set until
thetimeinstant−1;()is the th possible association of the
track to the object; ()is the set of detections of the current
frame, and is a normal constant.
e rst term on the right side of the previous equation
is the likelihood function of the measurement set ()given
is the probability of the association hypothesis of the current
data given the previous hypothesis (−1).ethirdterm
is the probability of the previous hypothesis from which the
current hypothesis is calculated.
e MHT algorithm has the ability to detect a new
track while maintaining the hypothesis tree structure. e
probability of a true track is given by the Bayes decision model
where ( | ) is the probability of obtaining the set of
measurements given ,()is the a priori probability of
the source signal, and ()is the probability of obtaining the
set of detections .
MHT considers all of the possibilities, including both
the track maintenance and the initialization and removal
of tracks in an integrated framework. MHT calculates the
possibility of having an object aer the generation of a set
of measurements using an exhaustive approach, and the
algorithm does not assume a xed number of targets. e key
e baseline MHT algorithm can be extended as follows:
(i) use the hypothesis aggregation for missed targets births,
 e Scientic World Journal
cardinality tracking, and closely spaced objects; (ii) apply
a multistage MHT for improving the performance and
robustness in challenging settings; and (iii) use a feature-
aided MHT for extended object surveillance.
e main disadvantage of this algorithm is the compu-
tational cost, which grows exponentially with the number of
tracks and measurements. erefore, the practical implemen-
tation of this algorithm is limited because it is exponential in
both time and memory.
With the aim of reducing the computational cost, []
presented a probabilistic MHT algorithm in which the
associations are considered to be random variables that
are statistically independent and in which performing an
exhaustive search enumeration is avoided. is algorithm is
known as PMHT. e PMHT algorithm assumes that the
number of targets and measurements is known. With the
same goal of reducing the computational cost, []presented
an ecient implementation of the MHT algorithm. is
implementation was the rst version to be applied to perform
tracking in visual environments. ey employed the Murty
[] algorithm to determine the best set of hypotheses
in polynomial time, with the goal of tracking the points of
MHT typically performs the tracking process by employ-
ing only one characteristic, commonly the position. e
Bayesian combination to use multiple characteristics was
proposed by Liggins II et al. [].
A linear-programming-based relaxation approach to the
optimization problem in MHT tracking was proposed inde-
pendently by Coraluppi et al. []andStormsandSpieksma
[]. Joo and Chellappa []proposedanassociationalgo-
rithm for tracking multiple targets in visual environments.
eir algorithm is based on in MHT modication in which
a measurement can be associated with more than one target,
and several targets can be associated with one measurement.
ey also proposed a combinatorial optimization algorithm
to generate the best set of association hypotheses. eir
other models, which are approximate. Coraluppi and Carthel
[] presented a generalization of the MHT algorithm using
a recursion over hypothesis classes rather than over a single
hypothesis. is work has been applied in a special case of
in which they observed the number of sensor measurements
instead of the target states.
3.5. Distributed Joint Probabilistic Data Association. e dis-
tributed version of the joint probabilistic data association
(JPDA-D) was presented by Chang et al. []. In this tech-
nique, the estimated state of the target (using two sensors)
aer being associated is given by
|1,2, ()
where ,=1,2, is the last set of measurements of
sensor  and , ,=1,2,isthesetofaccumulativedata,
and is the association hypothesis. e rst term of the right
side of the equation is calculated from the associations that
were made earlier. e second term is computed from the
individual association probabilities as follows:
where are the joint hypotheses involving all of the
measurements and all of the objectives, and
()are the
binary indicators of the measurement-target association. e
additional term (1,2)depends on the correlation of the
individual hypothesis and reects the localization inuence
of the current measurements in the joint hypotheses.
ese equations are obtained assuming that commu-
nication exists aer every observation, and there are only
approximations in the case in which communication is
sporadic and when a substantial amount of noise occurs.
erefore, this algorithm is a theoretical model that has some
limitations in practical applications.
3.6. Distributed Multiple Hypothesis Test. e distributed
version of the MHT algorithm (MHT-D) [,] follows a
similar structure as the JPDA-D algorithm. Let us assume the
case in which one node must fuse two sets of hypotheses and
tracks. If the hypotheses and track sets are represented by
()and ()with =1,2,thehypothesisprobabilities
are represented by
; and the state distribution of the tracks
)and ( | ,
maximum available information in the fusion node is =
obtain the set of hypotheses (),thesetoftracks(),the
hypothesis probabilities ( | ), and the state distribution
(|,)for the observed data.
e MHT-D algorithm is composed of the following
() hypothesis formation: for each hypothesis pair 1
,whichcouldbefused,atrackis formed by
associating the pair of tracks 1
and 2
pair comes from one node and could originate from
of hypotheses denoted by ()and the fused tracks
() hypothesis evaluation: in this stage, the association
probability of each hypothesis and the estimated
state of each fused track are obtained. e dis-
tributed estimation algorithm is employed to calcu-
late the likelihood of the possible associations and
the obtained estimations at each specic association.
e Scientic World Journal 
Using the information model, the probability of each
fused hypothesis is given by
∈() |()()
where is a normalizing constant, and ( | ) is the
likelihood of each hypothesis pair.
e main disadvantage of the MHT-D is the high com-
putational cost that is in the order of (),whereis the
number of possible associations and is the number of
variables to be estimated.
3.7. Graphical Models. Graphical models are a formalism for
representing and reasoning with probabilities and indepen-
dence. A graphical model represents a conditional decom-
position of the joint probability. A graphical model can be
represented as a graph in which the nodes denote random
variables; the edges denote the possible dependence between
the random variables, and the plates denote the replication of
a substructure, with the appropriate indexing of the relevant
variables. e graph captures the joint distribution over the
random variables, which can be decomposed into a product
of factors that each depend on only a subset of variables. ere
are two major classes of graphical models: (i) the Bayesian
networks [], which are also known as the directed graphical
models, and (ii) the Markov random elds, which are also
known as undirected graphical models. e directed graph-
ical models are useful for expressing causal relationships
between random variables, whereas undirected models are
better suited for expressing so constraints between random
variables. We refer the reader to the book of Koller and
Friedman [] for more information on graphical models.
A framework based on graphical models can solve the
problem of distributed data association in synchronized
sensor networks with overlapped areas and where each sensor
receives noisy measurements; this solution was proposed
by Chen et al. [,]. eir work is based on graphical
models that are used to represent the statistical dependence
between random variables. e data association problem is
treated as an inference problem and solved by using the
max-product algorithm []. Graphical models represent
statistical dependencies between variables as graphs, and
the max-product algorithm converges when the graph is
a tree structure. Moreover, the employed algorithm could
be implemented in a distributed manner by exchanging
messages between the source nodes in parallel. With this
algorithm, if each sensor has possible combinations of
associations and there are variables to be estimated, it has
a complexity of (2), which is reasonable and less than
the ()complexity of the MHT-D algorithm. However,
aspecial attention must be given to the correlated variables
when building the graphical model.
4. State Estimation Methods
State estimation techniques aim to determine the state of
the target under movement (typically the position) given
the observation or measurements. State estimation tech-
niques are also known as tracking techniques. In their general
form, it is not guaranteed that the target observations are
relevant, which means that some of the observations could
actually come from the target and others could be only noise.
e state estimation phase is a common stage in data fusion
algorithms because the target’s observation could come from
dierent sensors or sources, and the nal goal is to obtain a
global target state from the observations.
e estimation problem involves nding the values of the
vector state (e.g., position, velocity, and size) that ts as much
as possible with the observed data. From a mathematical
perspective, we have a set of redundant observations, and
the goal is to nd the set of parameters that provides the
best t to the observed data. In general, these observations
are corrupted by errors and the propagation of noise in the
measurement process. State estimation methods fall under
level  of the JDL classication and could be divided into two
broader groups:
() linear dynamics and measurements: here, the esti-
mation problem has a standard solution. Specically,
when the equations of the object state and the mea-
surements are linear, the noise follows the Gaussian
distribution, and we do not refer to it as a clutter
environment; in this case, the optimal theoretical
solution is based on the Kalman lter;
() nonlinear dynamics: the state estimation problem
becomes dicult, and there is not an analytical solu-
tion to solve the problem in a general manner. In prin-
ciple, there are no practical algorithms available to
solve this problem satisfactorily.
Most of the state estimation methods are based on control
theory and employ the laws of probability to compute a
vector state from a vector measurement or a stream of vector
measurements. Next, the most common estimation methods
are presented, including maximum likelihood and maxi-
mum posterior (Section .), the Kalman lter (Section .),
particle lter (Section .), the distributed Kalman lter
(Section .),distributedparticlelter(Section .)and,
covariance consistency methods (Section .).
4.1. Maximum Likelihood and Maximum Posterior. e max-
imum likelihood (ML) technique is an estimation method
that is based on probabilistic theory. Probabilistic estimation
methods are appropriate when the state variable follows an
unknown probability distribution []. In the context of
data fusion, is the state that is being estimated, and =
((1),...,()) is a sequence of previous observations of
. e likelihood function () is dened as a probability
density function of the sequence of observations given the
true value of the state . Consider
e ML estimator nds the value of that maximizes the
likelihood function:
()=arg max
 e Scientic World Journal
which can be obtained from the analytical or empirical
models of the sensors. is function expresses the probability
of the observed data. e main disadvantage of this method
in practice is that it requires the analytical or empirical model
of the sensor to be known to provide the prior distribution
and compute the likelihood function. is method can also
systematically underestimate the variance of the distribution,
which leads to a bias problem. However, the bias of the ML
solution becomes less signicant as the number of data
points increases and is equal to the true variance of the
distribution that generated the data at the limit →∞.
e maximum posterior (MAP) method is based on the
Bayesian theory. It is employed when the parameter to
be estimated is the output of a random variable that has a
known probability density function (). In the context of
data fusion, is the state that is being estimated and =
((1),...,())is a sequence of previous observations of .
e MAP estimator nds the value of that maximizes the
posterior probability distribution as follows:
()=arg max
Both methods (ML and MAP) aim to nd the most likely
valueforthestate.However,MLassumesthatis a xed
MAP considers to be the output of a random variable with
a known a priori probability density function. Both of these
methods are equivalent when there is no a priori information
about , that is, when there are only observations.
4.2. e Kalman Filter. e Kalman lter is the most popular
estimation technique. It was originally proposed by Kalman
[] and has been widely studied and applied since then. e
Kalman lter estimates the state of a discrete time process
governed by the following space-time model:
with the observations or measurements at time of the state
represented by ()=()()+V(),()
where Φ()is the state transition matrix, ()is the input
matrix transition, () is the input vector, () is the
measurement matrix, and and Vare the random Gaussian
variables with zero mean and covariance matrices of ()
and (), respectively. Based on the measurements and on
the system parameters, the estimation of (),whichis
represented by
(),andthepredictionof(+ 1),which
is represented by
(+1|), are given by the following:
respectively, where is the lter gain determined by
where ( | 1)is the prediction covariance matrix and
can be determined by
e Kalman lter is mainly employed to fuse low-level
data. If the system could be described as a linear model and
recursive Kalman lter obtains optimal statistical estimations
[]. However, other methods are required to address nonlin-
ear dynamic models and nonlinear measurements. e modi-
ed Kalman lter known as the extended Kalman lter (EKF)
is an optimal approach for implementing nonlinear recursive
lters []. e EKF is one of the most oen employed
methods for fusing data in robotic applications. However,
it has some disadvantages because the computations of the
Jacobians are extremely expensive. Some attempts have been
made to reduce the computational cost, such as linearization,
but these attempts introduce errors in the lter and make it
e unscented Kalman lter (UKF) []hasgained
popularity, because it does not have the linearization step and
the associated errors of the EKF []. e UKF employs a
deterministic sampling strategy to establish the minimum set
of points around the mean. is set of points captures the
true mean and covariance completely. en, these points are
propagated through nonlinear functions, and the covariance
of the estimations can be recuperated. Another advantage of
the UKF is its ability to be employed in parallel implementa-
4.3. Particle Filter. Particle lters are recursive implemen-
tations of the sequential Monte Carlo methods []. is
method builds the posterior density function using several
random samples called particles. Particles are propagated
over time with a combination of sampling and resampling
steps. At each iteration, the sampling step is employed to
discard some particles, increasing the relevance of regions
with a higher posterior probability. In the ltering process,
several particles of the same state variable are employed,
and each particle has an associated weight that indicates
the quality of the particle. erefore, the estimation is the
result of a weighted sum of all of the particles. e standard
particle lter algorithm has two phases: () the predicting
phase and () the updating phase. In the predicting phase,
each particle is modied according to the existing model
and accounts for the sum of the random noise to simulate
the noise eect. en, in the updating phase, the weight of
observation, and particles with lower weights are removed.
Specically, a generic particle lter comprises the following
e Scientic World Journal 
() Initialization of the particles:
(i) let be equal to the number of particles;
(ii) ()(1)=[(1),(1),0,0]for =1,...,.
() Prediction step:
(i) for each particle =1,...,, evaluate the state
(+1| )of the system using the state at time
instant with the noise of the system at time .
() (+1|)=()
() ()
where ()is the transition matrix of the sys-
() Evaluate the particle weight. For each particle =
(i) compute the predicted observation state of the
system using the current predicted state and the
noise at instant . Consider
() (+1|)=(+1)
() (+1|)
(ii) compute the likelihood (weights) according to
the given distribution. Consider
likelihood() =
() (+1|);() (+1),var;()
(iii) normalize the weights as follows
() =likelihood()
=1 likelihood() .()
() Resampling/Selection: multiply particles with higher
weights and remove those with lower weights. e
current state must be adjusted using the computed
(i) Compute the cumulative weights. Consider
Cum Wt() =
(ii) Generate uniform distributed random variables
from () ∼ (0,1)with the number of steps
equal to the number of particles.
(iii) Determine which particles should be multiplied
and which ones removed.
() Propagation phase:
(i) incorporate the new values of the state aer the
resampling of instant to calculate the value at
instant +1. Consider
(1:) (+1|+1)=
(ii) compute the posterior mean. Consider
(+1)=mean (+1|+1), =1,...,; ()
(iii) repeat steps  to  for each time instant.
Particle lters are more exible than the Kalman lters
and can cope with nonlinear dependencies and non-Gaussian
densities in the dynamic model and in the noise error.
However, they have some disadvantages. A large number
of particles are required to obtain a small variance in the
estimator. It is also dicult to establish the optimal number of
particles in advance, and the number of particles aects the
computational cost signicantly. Earlier versions of particle
lters employed a xed number of particles, but recent studies
have started to use a dynamic number of particles [].
4.4. e Distributed Kalman Filter. e distributed Kalman
lter requires a correct clock synchronization between each
source, as demonstrated in []. In other words, to correctly
use the distributed Kalman lter, the clocks from all of
the sources must be synchronized. is synchronization is
typically achieved through using protocols that employ a
shared global clock, such as the network time protocol (NTP).
Synchronization problems between clocks have been shown
producing inaccurate estimations [].
is known (or the estimations are uncorrelated), then it is
possible to use the distributed Kalman lters []. However,
the cross covariance must be determined exactly, or the
observations must be consistent.
We refer the reader to Liggins II et al. [] for more details
about the Kalman lter in a distributed and hierarchical
4.5. Distributed Particle Filter. Distributed particle lters
have gained attention recently []. Coates []useda
distributed particle lter to monitor an environment that
involving nonlinear dynamics and observations and non-
Gaussian noise.
In contrast, earlier attempts to solve out-of-sequence
measurements using particle lters are based on regenerating
the probability density function to the time instant of the
out-of-sequence measurement []. In a particle lter, this
step requires a large computational cost, in addition to the
necessary space to store the previous particles. To avoid
this problem, Orton and Marrs [] proposed to store the
information on the particles at each time instant, saving the
cost of recalculating this information. is technique is close
 e Scientic World Journal
to optimal, and when the delay increases, the result is only
slightly aected []. However, it requires a very large amount
of space to store the state of the particles at each time instant.
4.6. Covariance Consistency Methods: Covariance Intersec-
tion/Union. Covariance consistency methods (intersection
and union) were proposed by Uhlmann [] and are general
and fault-tolerant frameworks for maintaining covariance
means and estimations in a distributed network.ese meth-
ods do not comprise estimation techniques; instead, they are
similar to an estimation fusion technique. e distributed
Kalman lter requirement of independent measurements or
known cross-covariances is not a constraint with this method.
4.6.1. Covariance Intersection. If the Kalman lter is employ-
ed to combine two estimations, (1,1)and (2,2),thenit
is assumed that the joint covariance is in the following form:
where the cross-covariance should be known exactly so
that the Kalman lter can be applied without diculty.
Because the computation of the cross-covariances is compu-
tationally intensive, Uhlmann [] proposed the covariance
intersection (CI) algorithm.
Let us assume that a joint covariance can be dened
with the diagonal blocks 1>1and 2>2. Consider
for every possible instance of the unknown cross-covariance
; then, the components of the matrix could be employed
in the Kalman lter equations to provide a fused estimation
(,) that is considered consistent. e key point of this
method relies on generating a joint covariance matrix that
can represent a useful fused estimation (in this context, useful
refers to something with a lower associated uncertainty). In
summary, the CI algorithm computes the joint covariance
matrix , where the Kalman lter provides the best fused
estimation (,)with respect to a xed measurement of the
covariance matrix (i.e., the minimum determinant).
Specic covariance criteria must be established because
there is not a specic minimum joint covariance in the
order of the positive semidenite matrices. Moreover, the
joint covariance is the basis of the formal analysis of the
CI algorithm; the actual result is a nonlinear mixture of the
information stored on the estimations being fused, following
the following equation.
where is the transformation of the fused state-space
estimation to the space of the estimated state .evalues
of canbecalculatedtominimizethecovariancedetermi-
nant using convex optimization packages and semipositive
matrix programming. e result of the CI algorithm has
dierent characteristics compared to the Kalman lter. For
example, if two estimations are provided (,) and (,)
and their covariances are equal =,sincetheKalman
lter is based on the statistical independence assumption, it
produces a fused estimation with covariance  = (1/2).
In contrast, the CI method does not assume independence
and, thus, must be consistent even in the case in which
the estimations are completely correlated, with the estimated
fused covariance =. In the case of estimations where
<, the CI algorithm does not provide information about
the estimation (,);thus,thefusedresultis(,).
Every joint-consistent covariance is sucient to produce
a fused estimation, which guarantees consistency. However,
it is also necessary to guarantee a lack of divergence. Diver-
gence is avoided in the CI algorithm by choosing a specic
measurement (i.e., the determinant), which is minimized in
each fusion operation. is measurement represents a non-
divergence criterion, because the size of the estimated covari-
ance according to this criterion would not be incremented.
e application of the CI method guarantees consis-
tency and nondivergence for every sequence of mean and
covariance-consistent estimations. However, this method
does not work well when the measurements to be fused are
4.6.2. Covariance Union. CI solves the problem of correlated
inputs but not the problem of inconsistent inputs (inconsistent
inputs refer to dierent estimations, each of which has a
high accuracy (small variance) but also a large dierence
from the states of the others); thus, the covariance union
(CU) algorithm was proposed to solve the latter []. CU
addresses the following problem: two estimations (1,1)
and (2,2)relate to the state of an object and are mutually
inconsistent from one another. is issue arises when the
dierence between the average estimations is larger than
the provided covariance. Inconsistent inputs can be detected
using the Mahalanobis distance [] between them, which is
dened as
=1−21+2−1 1−2, ()
and detecting whether this distance is larger than a given
e Mahalanobis distance accounts for the covariance
information to obtain the distance. If the dierence between
the estimations is high but their covariance is also high,
the Mahalanobis distance yields a small value. In contrast,
if the dierence between the estimations is small and the
covariances are small, it could produce a larger distance
value. A high Mahalanobis distance could indicate that the
estimations are inconsistent; however, it is necessary to
have a specic threshold established by the user or learned
e CU algorithm aims to solve the following prob-
lem: let us suppose that a ltering algorithm provides two
observations with mean and covariance (1,1)and (2,2),
e Scientic World Journal 
and the other is erroneous. However, the identity of the
correct estimation is unknown and cannot be determined.
In this situation, if both estimations are employed as an
the Kalman lter only guarantees a consistent output if the
observation is updated with a measurement consistent with
both of them. In the specic case, in which the measurements
correspond to the same object but are acquired from two
Because it is not possible to know which estimation is correct,
the only way to combine the two estimations rigorously is
to provide an estimation (,)that is consistent with both
estimations and to obey the following properties:
where some measurement of the matrix size (i.e., the deter-
minant) is minimized.
In other words, the previous equations indicate that if the
estimation (1,1)is consistent, then the translation of the
vector 1to requires to increase the covariance by the sum
of a matrix at least as big as the product of (1)in order to
(2,2)in order to be consistent.
A simple strategy is to choose the mean of the estimation
case, the value of must be chosen, such that the estimation
is consistent with the worst case (the correct measurement is
2). However, it is possible to assign an intermediate value
between 1and 2to decrease the value of . erefore, the
CU algorithm establishes the mean fused value that has
the least covariance but is suciently large for the two
measurements (1and 2)forconsistency.
Because the matrix inequalities presented in previous
equations are convex, convex optimization algorithms must
be employed to solve them. e value of canbecomputed
with the iterative method described by Julier et al. [].
e obtained covariance could be signicantly larger than
any of the initial covariances and is an indicator of the
existing uncertainty between the initial estimations. One of
the advantages of the CU method arises from the fact that
the same process could be easily extended to inputs.
5. Decision Fusion Methods
A decision is typically taken based on the knowledge of the
perceived situation, which is provided by many sources in
the data fusion domain. ese techniques aim to make a
high-level inference about the events and activities that are
produced from the detected targets. ese techniques oen
use symbolic information, and the fusion process requires to
reason while accounting for the uncertainties and constraints.
ese methods fall under level  (situation assessment) and
level  (impact assessment) of the JDL data fusion model.
5.1. e Bayesian Methods. Information fusion based on the
Bayesian inference provides a formalism for combining evi-
dence according to the probability theory rules. Uncertainty
is represented using the conditional probability terms that
zero indicates a complete lack of belief and one indicates an
absolute belief. e Bayesian inference is based on the Bayes
rule as follows:
where the posterior probability, ( | ),representsthe
belief in the hypothesis given the information .is
of the hypothesis ()by the probability of having given
that is true, ( | ).evalue() is used as a
normalizing constant. e main disadvantage of the Bayesian
inference is that the probabilities ()and ( | )must
be known. To estimate the conditional probabilities, Pan
et al. [] proposed the use of NNs, whereas Cou´
proposed the Bayesian programming.
Hall and Llinas [] described the following problems
associated with Bayesian inference.
(i) Diculty in establishing the value of a priori proba-
(ii) Complexity when there are multiple potential hypo-
theses and a substantial number of events that depend
on the conditions.
(iii) e hypothesis should be mutually exclusive.
(iv) Diculty in describing the uncertainty of the deci-
5.2. e Dempster-Shafer Inference. e Dempster-Shafer
inference is based on the mathematical theory introduced
by Dempster []andShafer[], which generalizes the
Bayesian theory. e Dempster-Shafer theory provides a
formalism that could be used to represent incomplete knowl-
edge, updating beliefs, and a combination of evidence and
allows us to represent the uncertainty explicitly [].
A fundamental concept in the Dempster-Shafer reason-
ing is the frame of discernment, which is dened as follows.
Let Θ={
1,2,...,}be the set of all possible states
that dene the system, and let Θbe exhaustive and mutually
exclusive due to the system being only in one state ∈Θ,
where 1.esetΘis called a frame of discernment,
because its elements are employed to discern the current state
of the system.
e elements of the set 2Θare called hypotheses. In
the Dempster-Shafer theory, based on the evidence ,a
to the basic assignment of probabilities or the mass function
:2Θ→ [0.1], which satises
()=0. ()
 e Scientic World Journal
us, the mass function of the empty set is zero. Furthermore,
the mass function of a hypothesis is larger than or equal to
zero for all of the hypotheses. Consider
()≥0, ∀∈2Θ.()
e sum of the mass function of all the hypotheses is one.
∈2Θ()=1. ()
To express incomplete beliefs in a hypothesis ,theDemp-
ster-Shafer theory denes the belief function bel :2
[0,1]over Θas
bel ()=
where bel()=0,andbel(Θ)=1.edoubtlevelincan be
expressed in terms of the belief function by
dou ()=bel (¬)=
To express the plausibility of each hypothesis, the function
pl :2Θ→ [0,1]over Θis dened as
pl ()=1−dou ()=
Intuitive plausibility indicates that there is less uncer-
tainty in hypothesis if it is more plausible. e condence
interval [bel(),pl()]denes the true belief in hypothesis
2, the Dempster-Shafer theory denes a rule 1⊕2as
1⊕2()=∩= 1()2()
1−∩= 1()2().()
In contrast to the Bayesian inference, a priori probabilities
are not required in the Dempster-Shafer inference, because
they are assigned at the instant that the information is pro-
vided. Several studies in the literature have compared the use
of the Bayesian inference and the Dempster-Shafer inference,
such as []. Wu et al. []usedtheDempster-Shafer
theory to fuse information in context-aware environments.
is work was extended in []todynamicallymodifythe
associated weights to the sensor measurements. erefore,
the fusion mechanism is calibrated according to the recent
measurements of the sensors (in cases in which the ground-
truth is available). In the military domain [], the Dempster-
Shafer reasoning is used with the a priori information stored
in a database for classifying military ships. Morbee et al. []
described the use of the Dempster-Shafer theory to build D
occupancy maps from several cameras and to evaluate the
contribution of subsets of cameras to a specic task. Each task
is the observation of an event of interest, and the goal is to
assess the validity of a set of hypotheses that are fused using
the Dempster-Shafer theory.
5.3. Abductive Reasoning. Abductive reasoning, or inferring
the best explanation, is a reasoning method in which a
hypothesis is chosen under the assumption that in case it
is true, it explains the observed event most accurately [].
In other words, when an event is observed, the abduction
method attempts to nd the best explanation.
In the context of probabilistic reasoning, abductive infer-
ence nds the posterior ML of the system variables given
some observed variables. Abductive reasoning is more a
reasoning pattern than a data fusion technique. erefore,
dierent inference methods, such as NNs [] or fuzzy logic
[], can be employed.
5.4. Semantic Methods. Decision fusion techniques that
employ semantic data from dierent s ources as an input could
provide more accurate results than those that rely on only
single sources. ere is a growing interest in techniques that
automatically determine the presence of semantic features in
videos to solve the semantic gap [].
Semantic information fusion is essentially a scheme in
which raw sensor data are processed such that the nodes
exchange only the resultant semantic information. Semantic
information fusion typically covers two phases: (i) build-
ing the knowledge and (ii) pattern matching (inference).
e rst phase (typically oine) incorporates the most
appropriate knowledge into semantic information. en, the
second phase (typically online or in real-time) fuses relevant
attributes and provides a semantic interpretation of the
sensor data [].
Semantic fusion could be viewed as an idea for integrating
and translating sensor data into formal languages. erefore,
the obtained resulting language from the observations of
the environment is compared with similar languages that
similar behaviors represented by formal languages are also
semantically similar. is type of method provides savings
in the cost of transmission, because the nodes need only
transmit the formal language structure instead of the raw
in a database in advance, which might be dicult in some
6. Conclusions
niques for performing data/information fusion. To determine
whether the application of data/information fusion methods
is feasible, we must evaluate the computational cost of the
process and the delay introduced in the communication.
A centralized data fusion approach is theoretically optimal
when there is no cost of transmission and there are sucient
computational resources. However, this situation typically
does not hold in practical applications.
e selection of the most appropriate technique depends
on the type of the problem and the established assumptions
of each technique. Statistical data fusion methods (e.g., PDA,
JPDA, MHT, and Kalman) are optimal under specic condi-
tions []. First, the assumption that the targets are moving
e Scientic World Journal 
independently and the measurements are normally dis-
tributed around the predicted position typically does not
hold. Second, because the statistical techniques model all
of the events as probabilities, they typically have several
parameters and a priori probabilities for false measurements
and detection errors that are oen dicult to obtain (at
least in an optimal sense). For example, in the case of the
MHT algorithm, specic parameters must be established that
are nontrivial to determine and are very sensitive []. In
contrast, statistical methods that optimize over several frames
are computationally intensive, and their complexity typically
grows exponentially with the number of targets. For example,
in the case of particle lters, tracking several targets can be
accomplished jointly as a group or individually. If several
targets are tracked jointly, the necessary number of particles
grows exponentially. erefore, in practice, it is better to
perform tracking on them individually, with the assumption
that targets do not interact between the particles.
In contrast to centralized systems, the distributed data
fusion methods introduce some challenges in the data fusion
process, such as (i) spatial and temporal alignments of the
information, (ii) out-of-sequence measurements, and (iii)
data correlation reported by Castanedo et al. [,]. e
inherent redundancy of the distributed systems could be
exploited with distributed reasoning techniques and cooper-
ative algorithms to improve the individual node estimations
reported by Castanedo et al. []. In addition to the previous
studies, a new trend based on the geometric notion of a low-
dimensional manifold is gaining attention in the data fusion
community. An example is the work of Davenport et al. [],
which proposes a simple model that captures the correlation
between the sensor observations by matching the parameter
values for the dierent obtained manifolds.
us Garc´
ıa, Miguel A.
Patricio, and James Llinas for their interesting and related
discussions on several topics that were presented in this
[] JDL, Data Fusion Lexicon. Technical Panel For C3,F.E.White,
San Diego, Calif, USA, Code 20,.
[] D. L. Hall and J. Llinas, “An introduction to multisensor data
fusion,Proceedings of the IEEE,vol.,no.,pp.,.
[] H. F. Durrant-Whyte, “Sensor models and multisensor integra-
tion,International Journal of Robotics Research,vol.,no.,pp.
–, .
[] B. V. Dasarathy, “Sensor fusion potential exploitation-inno-
vative architectures and illustrative applications,Proceedings of
the IEEE,vol.,no.,pp.,.
integration: approaches, applications, and future research direc-
tions,IEEE Sensors Journal,vol.,no.,pp.,.
[] J. Llinas, C. Bowman, G. Rogova, A. Steinberg, E. Waltz, and
F. White, “Revisiting the JDL data fusion model II,” Technical
Report, DTIC Document, .
[] E. P. Blasch and S. Plano, “JDL level  fusion model “user rene-
ment” issues and applications in group tracking,” in Proceedings
of the Signal Processing, Sensor Fusion, and Target Recognition
XI, pp. –, April .
[] H. F. Durrant-Whyte and M. Stevens, “Data fusion in decen-
tralized sensing networks,” in Proceedings of the 4th Interna-
tional Conference on Information Fusion, pp. –, Montreal,
Canada, .
[] J. Manyika and H. Durrant-Whyte, Data Fusion and Sensor
Management: A Decentralized Information-eoretic Approach,
Prentice Hall, Upper Saddle River, NJ, USA, .
[] S. S. Blackman, “Association and fusion of mult iple sensor data,
in Multitarget-Multisensor: Tracking Advanced Applications,pp.
–, Artech House, .
[] S. Lloyd, “Least squares quantization in pcm,IEEE Transactions
on Information eory,vol.,no.,pp.,.
[] M. Shindler, A. Wong, and A. Meyerson, “Fast and accurate
-means for large datasets,” in Proceedings of the 25th Annual
Conference on Neural Information Processing Systems (NIPS ’11),
pp. –, December .
[] Y. Bar-Shalom and E. Tse, “Tracking in a cluttered environment
withprobabilisticdataassociation,Automatica, vol. , no. ,
[] T. E. Fortmann, Y. Bar-Shalom, and M. Schee, “Multi-target
tracking using joint probabilistic data association,” in Pro-
ceedings of the 19th IEEE Conference on Decision and Control
including the Symposium on Adaptive Processes,vol.,pp.
, December .
[] D. B. Reid, “An algorithm for tracking multiple targets,IEEE
Transactions on Automatic Control,vol.,no.,pp.,
[] C. L. Moreeld, “Application of - integer programming to
multitarget trac king problems,IEEE Transactions on Automatic
Control, vol. , no. , pp. –, .
[] R. L. Streit and T. E. Luginbuhl, “Maximum likelihood method
for probabilistic multihypothesis tracking,” in Proceedings of the
Signal and Data Processing of Small Targets,vol.ofPro-
ceedings of SPIE,p.,.
[] I. J. Cox and S. L. Hingorani, “Ecient implementation of Reid’s
multiple hypothesis tracking algorithm and its evaluation for
the purpose of visual tracking,IEEE Transactions on Pattern
Analysis and Machine Intelligence,vol.,no.,pp.,
[] K. G. Murty, “An algorithm for ranking all the assignments in
order of increasing cost,Operations Research,vol.,no.,pp.
–, .
[] M. E. Liggins II, C.-Y. Chong, I. Kadar et al., “Distributed fusion
architectures and algorithms for target tracking,” Proceedings of
the IEEE,vol.,no.,pp.,.
source track and identity fusion,” in Proceedings of the National
Symposium on Sensor and Data Fusion, .
[] P. Storms and F. Spieksma, “An lp-based algorithm for the data
association problem in multitarget tracking,” in Proceedings of
the 3rd IEEE International Conference on Information Fusion,
vol. , .
[] S.-W. Joo and R. Chellappa, “A multiple-hypothesis approach
for multiobject visual tracking,IEEE Transactions on Image
[] S. Coraluppi and C. Carthel, “Aggregate surveillance: a cardinal-
ity tracking approach,” in Proceedings of the 14th International
Conference on Information Fusion (FUSION ’11), July .
 e Scientic World Journal
[] K. C. Chang, C. Y. Chong, and Y. Bar-Shalom, “Joint proba-
bilistic data association in distributed sensor networks,” IEEE
Transactions on Automatic Control,vol.,no.,pp.,
[] Y. Chong, S. Mori, and K. C. Chang, “Information lusion in
distributed sensor networks,” in Proceedings of the 4th American
Control Conference, Boston, Mass, USA, June .
[] Y. Chong, S. Mori, and K. C. Chang, “Distributed multitar-
get multisensor tracking,” in Multitarget-Multisensor Tracking:
Advanced Applications,vol.,pp.,.
[] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference, Morgan Kaufmann, San Mateo, Calif,
USA, .
[] Koller and N. Friedman, Probabilistic Graphical Models: Princi-
ples and Techniques, MIT press, .
[] L.Chen,M.C¸ etin, and A. S. Willsky, “Distributed data associ-
ation for multi-target tracking in sensor networks,” in Proceed-
ings of the 7th International Conference on Information Fusion
(FUSION ’05),pp.,July.
[] L. Chen, M. J. Wainwright, M. Cetin, and A. S. Willsky, “Data
association based on optimization in graphical models with
application to sensor networks,Mathematical and Computer
Modelling, vol. , no. -, pp. –, .
[] Y. Weiss and W. T. Freeman, “On the optimality of solutions
of the max-product belief-propagation algorithm in arbitrary
graphs,IEEE Transactions on Information eory,vol.,no.,
[] C. Brown, H. Durrant-Whyte, J. Leonard, B. Rao, and B. Steer,
“Distributed data fusion using Kalman ltering: a robotics
application,” in Data,FusioninRoboticsandMachineIntelli-
[] R. E. Kalman, “A new approach to linear ltering and prediction
problems,Journal of Basic Engineering,vol.,no.,pp.,
[] R. C. Luo and M. G. Kay, “Data fusion and sensor integration:
state-of-the-art s,” in Data Fusion in Robotics and Machine
[] Welch and G. Bishop, An Introduction to the Kalman Filter,
ACM SIC-CRAPH,  Course Notes, .
[] S. J. Julier and J. K. Uhlmann, “A new extension of the Kalman
lter to nonlinear systems,” in Proceedings of the International
Symposium on Aerospace/Defense Sensing, Simulation and Con-
[] A. Wan and R. Van Der Merwe, “e unscented kalman lter
for nonlinear estimation,” in ProceedingsoftheAdaptiveSystems
for Signal Processing, Communications, and Control Symposium
(AS-SPCC ’00), pp. –, .
[] D. Crisan and A. Doucet, “A survey of convergence results on
particle ltering methods for practitioners,” IEEE Transactions
on Signal Processing, vol. , no. , pp. –, .
[] J. Martinez-del Rincon, C. Orrite-Urunuela, and J. E. Herrero-
Jaraba, “An ecient particle lter for color-based tracking in
complex scenes,” in Proceedings of the IEEE Conference on
Advanced Video and Signal Based Surveillance, pp. –, .
[] S. Ganeriwal, R. Kumar, and M. B. Srivastava, “Timing-sync
protocol for sensor networks,” in Proceedings of the 1st Inter-
national Conference on Embedded Networked Sensor Systems
(SenSys ’03),pp.,November.
[] M. Manzo, T. Roosta, and S. Sastry, “Time synchronization in
networks,” in Proceedings of the 3rd ACM Workshop on Security
of Ad Hoc and Sensor Networks (SASN ’05),pp.,
November .
[] J. K. Uhlmann, “Covariance consistency methods for fault-
tolerant distributed data fusion,Information Fusion,vol.,no.
[] S. Bashi, V. P. Jilkov, X. R. Li, and H. Chen, “Distributed imple-
mentations of particle lters,” in Proceedings of the 6th Interna-
tional Conference of Information Fusion, pp. –, .
[] M. Coates, “Distributed particle lters for sensor networks,” in
Proceeding s of the 3rd Inter national symposium on Information
Processing in Sensor Networks (ACM ’04),pp.,NewYork,
NY, USA, .
[] D. Gu, “Distributed particle lter for target tracking,” in Pro-
ceedings of the IEEE International Conference on Robotics and
Automation (ICRA ’07),pp.,April.
[] Y. Bar-Shalom, “Update with out-of-sequence measurements in
tracking: exact solution,IEEE Transactions on Aerospace and
Electronic Systems, vol. , no. , pp. –, .
[] M. Orton and A. Marrs, “A Bayesian approach to multi-target
tracking and data fusion with Out-of-Sequence Measurements,
IEE Colloquium,no.,pp.//,.
[] M. L. Hernandez, A. D. Marrs, S. Maskell, and M. R. Orton,
“Tracking and fusion for wireless sensor networks,” in Proceed-
ings of the 5th International Conference on Information Fusion,
[] P. C. Mahalanobis, “On the generalized distance in statistics,
Proceeding s National Institute of ScienceIndia,vol.,no.,pp.
–, .
for dealing with assignment ambiguity,” in Proceedings of the
American Control Conference (AAC ’04),vol.,pp.,
July .
[] H. Pan, Z.-P. Liang, T. J. Anastasio, and T. S. Huang, “Hybrid
NN-Bayesian architecture for information fusion,” in Proceed-
ings of the International Conference on Image Processing (ICIP
’98), pp. –, October .
[] C. Cou´
e, T. Fraichard, P. Bessi`
ere, and E. Mazer, “Multi-sensor
data fusion using Bayesian programming: an automotive appli-
cation,” in Proceedings of the IEEE/RSJ International Conference
on Intelligent Robots and Systems, pp. –, October .
[] D. L. Hall and J. Llinas, Handbook of Multisensor Data Fusion,
[] P. Dempster, “A Generalization of Bayesian Inference,Journal
of the Royal Statistical Society B,vol.,no.,pp.,.
[] A. Shafer, Mathematical eory of Evidence , Princeton Univer-
sity Press, Princeton, NJ, USA, .
[] G. M. Provan, “e validity of Dempster-Shafer belief func-
tions,International Journal of Approximate Reasoning,vol.,
[] D. M. Buede, “Shafer-Dempster and Bayesian reasoning: a
response to ‘Shafer-Dempster reasoning with applications to
multisensor target identication systems,” IEEE Transactions on
Systems, Man and Cybernetics, vol. , no. , pp. –, .
[] Y. Cheng and R. L. Kashyap, “Comparisonol Bayesian and
Dempster’s rules in evidence combination,” in Maximum-
Entropy and Bayesian Methods in Science and Engineering, .
[] B. R. Cobb and P. P. Shenoy, “A comparison of Bayesian and
belief function reasoning,Information Systems Frontiers,vol.,
no. , pp. –, .
[] H. Wu, M. Siegel, R. Stiefelhagen, and J. Yang, “Sensor fusion
using Dempster-Shafer theory,” in Proceedings of the 19th
IEEE Instrumentation and Measurement Technology Conference
(TMTC ’02), pp. –, May .
e Scientic World Journal 
[] H. Wu, M. Siegel, and S. Ablay, “Sensor fusion using dempster-
shafer theory II: static weighting and Kalman lter-like dynamic
weighting,” in Proceedings of the 20th IEEE Information and
Measurement Technology Conference (TMTC ’03),pp.,
May .
[] ´
E. Boss´
e, P. Valin, A.-C. Boury-Brisset, and D. Grenier, “Ex-
ploitation of a priori knowledge for information fusion,Infor-
mation Fusion,vol.,no.,pp.,.
[] M.Morbee,L.Tessens,H.Aghajan,andW.Philips,“Dempster-
Shafer based multi-view occupancy maps,Electronics Letters,
vol. , no. , pp. –, .
[] C. S. Peirce, Abduction and Induction. Philosophical Writings of
[] A. M. Abdelbar, E. A. M. Andrews, and D. C. Wunsch II,
Abductive reasoning with recurrent neural networks,Neural
[] J. R. Ag¨
uero and A. Vargas, “Inference of operative congu-
ration of distribution networks using fuzzy logic techniques.
Part II: extended real-time model,IEEE Transactions on Power
[] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R.
Jain, “Content-based image retrieval at the end of the early
years,IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. , no. , pp. –, .
[] D. S. Friedlander and S. Phoha, “Semantic information fusion
for coordinated signal processing in mobile sensor networks,
International Journal of High Performance Computing Applica-
[] S. Friedlander, “Semantic information extraction, in Dis-
tributed Sensor Networks,.
[] K.Whitehouse,J.Liu,andF.Zhao,“SemanticStreams:aframe-
work for composable inference over sensor data,” in Proceedings
of the 3rd European Workshop on Wireless Sensor Networks,
Lecture Notes in Computer Science, Springer, February .
[] J. Cox, “A review of statistical data association techniques for
motion correspondence,International Journal of Computer
[] C. J. Veenman, M. J. T. Reinders, and E. Backer, “Resolving
motion correspondence for densely moving points,IEEE
Transactions on Pattern Analysis and Machine Intelligence,vol.
[] F. Castanedo, M. A. Patricio, J. Garc´
ıa, and J. M. Molina,
“Bottom-up/top-down coordination in a multiagent visual
sensor network,” in Proceedings of the IEEE Conference on
Advanced Video and Signal Based Surveillance (AVSS ’07),pp.
–, September .
[] F. Castanedo, J. Garc´
Analy sis of distributed fusion a lternatives in coordinated v ision
agents,” in Proceedings of the 11th International Conference on
Information Fusion (FUSION ’08),July.
[] F. Castanedo, J. Garc´
fusion to improve trajectory tracking in a cooperative surveil-
lance multi-agent architecture,Information Fusion, vol. , no.
, pp. –, .
[] M. A. Davenport, C. Hegde, M. F. Duarte, and R. G. Baraniuk,
“Joint manifolds for data fusion,IEEE Transactions on Image