ArticlePDF AvailableLiterature Review

A Review of Data Fusion Techniques


Abstract and Figures

The integration of data and knowledge from several sources is known as data fusion. This paper summarizes the state of the data fusion field and describes the most relevant studies. We first enumerate and explain different classification schemes for data fusion. Then, the most common algorithms are reviewed. These methods and algorithms are presented using three different categories: (i) data association, (ii) state estimation, and (iii) decision fusion.
This content is subject to copyright. Terms and conditions apply.
Hindawi Publishing Corporation
e Scientic World Journal
Volume , Article ID ,  pages.//
Review Article
A Review of Data Fusion Techniques
Federico Castanedo
Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain
Correspondence should be addressed to Federico Castanedo;
Received August ; Accepted  September 
Academic Editors: Y. Takama and D. Ursino
Copyright ©  Federico Castanedo. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e integration of data and knowledge from several sources is known as data fusion. is paper summarizes the state of the data
fusion eld and describes the most relevant studies. We rst enumerate and explain dierent classication schemes for data fusion.
en, the most common algorithms are reviewed. ese methods and algorithms are presented using three dierent categories: (i)
data association, (ii) state estimation, and (iii) decision fusion.
1. Introduction
In general, all tasks that demand any type of parameter
estimation from multiple sources can benet from the use
of data/information fusion methods. e terms information
fusion and data fusion are typically employed as synonyms;
but in some scenarios, the term data fusion is used for
raw data (obtained directly from the sensors) and the term
information fusion is employed to dene already processed
data. In this sense, the term information fusion implies a
higher semantic level than data fusion.Othertermsassoci-
ated with data fusion that typically appear in the literature
include decision fusion, data combination, data aggregation,
multisensor data fusion, and sensor fusion.
Researchers in this eld agree that the most accepted
denition of data fusion was provided by the Joint Directors
of Laboratories (JDL) workshop []: “A m u l t i - l e v e l p r o c e s s
dealing with the association, correlation, combination of data
and information from single and multiple sources to achieve
rened position, identify estimates and complete and timely
assessments of situations, threats and their signicance.
Hall and Llinas [] provided the following well-known
denition of data fusion: data fusion techniques combine data
from multiple sensors and related information from associated
databases to achieve improved accuracy and more specic
inferences than could be achieved by the use of a single sensor
Briey, we can dene data fusion as a combination of
multiple sources to obtain improved information; in this
context, improved information means less expensive, higher
quality, or more relevant information.
Data fusion techniques have been extensively employed
on multisensor environments with the aim of fusing and
aggregating data from dierent sensors; however, these tech-
niques can also be applied to other domains, such as text
processing. e goal of using data fusion in multisensor envi-
ronments is to obtain a lower detection error probability and
a higher reliability by using data from multiple distributed
e available data fusion techniques can be classied into
three nonexclusive categories: (i) data association, (ii) state
estimation, and (iii) decision fusion. Because of the large
number of published papers on data fusion, this paper does
not aim to provide an exhaustive review of all of the studies;
instead, the objective is to highlight the main steps that are
involved in the data fusion framework and to review the most
common techniques for each step.
e remainder of this paper continues as follows. e
next section provides various classication categories for data
fusion techniques. en, Section describes the most com-
mon methods for data association tasks. Section provides
a review of techniques under the state estimation category.
Next, the most common techniques for decision fusion are
enumerated in Section . Finally, the conclusions obtained
e Scientic World Journal
from reviewing the dierent methods are highlighted in
Section .
2. Classification of Data Fusion Techniques
Data fusion is a multidisciplinary area that involves several
elds, and it is dicult to establish a clear and strict classi-
cation. e employed methods and techniques can be divided
according to the following criteria:
() attending to the relations between the input data
sources, as proposed by Durrant-Whyte []. ese
relations can be dened as (a) complementary, (b)
redundant, or () cooperative data;
() according to the input/output data types and their
nature, as proposed by Dasarathy [];
() following an abstraction level of the employed data:
(a) raw measurement, (b) signals, and (c) characteris-
tics or decisions;
() based on the dierent data fusion levels dened by the
() Depending on the architecture type: (a) centralized,
(b) decentralized, or (c) distributed.
2.1. Classication Based on the Relations between the Data
Sources. Based on the relations of the sources (see Figure ),
Durrant-Whyte [] proposed the following classication
() complementary: when the information provided by
the input sources represents dierent parts of the
global information. For example, in the case of visual
sensor networks, the information on the same target
provided by two cameras with dierent elds of view
is considered complementary;
() redundant: when two or more input sources provide
information about the same target and could thus be
fused to increment the condence. For example, the
data coming from overlapped areas in visual sensor
networks are considered redundant;
() cooperative: when the provided information is com-
bined into new information that is typically more
complex than the original information. For example,
multi-modal (audio and video) data fusion is consid-
ered cooperative.
2.2. Dasarathy’s Classication. One of the most well-known
data fusion classication systems was provided by Dasarathy
[] and is composed of the following ve categories (see
Figure ):
() data in-data out (DAI-DAO): this type is the most
sidered in classication. is type of data fusion
process inputs and outputs raw data; the results
are typically more reliable or accurate. Data fusion at
gathered from the sensors. e algorithms employed
() data in-feature out (DAI-FEO): at this level, the data
fusion process employs raw data from the sources
to extract features or characteristics that describe an
entity in the environment;
() feature in-feature out (FEI-FEO): at this level, both
features. us, the data fusion process addresses a
set of features with to improve, rene or obtain new
features. is process is also known as feature fusion,
symbolic fusion, information fusion or intermediate-
level fusion;
() feature in-decision out (FEI-DEO): this level obtains a
set of features as input and provides a set of decisions
as output. Most of the classication systems that
perform a decision based on a sensor’s inputs fall into
this category of classication;
() Decision In-Decision Out (DEI-DEO): is type of
classication is also known as decision fusion. It fuses
input decisions to obtain better or new decisions.
e main contribution of Dasarathy’s classication is the
specication of the abstraction level either as an input or an
output, providing a framework to classify dierent methods
or techniques.
2.3. Classication Based on the Abstraction Levels. Luo et al.
[] provided the following four abstraction levels:
() signal level: directly addresses the signals that are
acquired from the sensors;
() pixel level: operates at the image level and could be
used to improve image processing tasks;
() characteristic: employs features that are extracted
from the images or signals (i.e., shape or velocity),
() symbol: at this level, information is represented as
symbols; this level is also known as the decision level.
Information fusion typically addresses three levels of
abstraction: () measurements, () characteristics, and ()
decisions. Other possible classications of data fusion based
on the abstraction levels are as follows:
() low level fusion: the raw data are directly provided
more accurate data (a lower signal-to-noise ratio)
than the individual sources;
() medium level fusion: characteristics or features
(shape, texture, and position) are fused to obtain
features that could be employed for other tasks. is
level is also known as the feature or characteristic
e Scientic World Journal
F : Whyte’s classication based on the relations between the data sources.
Data in-data out
Data in-feature out
Feature in-decision out
Decision in-decision out
Feature in-feature out
F : Dasarathy’s classication.
() high level fusion: this level, which is also known
as decision fusion, takes symbolic representations as
sources and combines them to obtain a more accurate
decision. Bayesian’s methods are typically employed at
this level;
() multiple level fusion: this level addresses data pro-
vided from dierent levels of abstraction (i.e., when
2.4. JDL Data Fusion Classication. is classication is the
most popular conceptual model in the data fusion commu-
nity. It was originally proposed by JDL and the American
Department of Defense (DoD) []. ese organizations clas-
sied the data fusion process into ve processing levels, an
associated database, and an information bus that connects
the ve components (see Figure ). e ve levels could be
grouped into two groups, low-level fusion and high-level
fusion, which comprise the following components:
(i) sources: the sources are in charge of providing
the input data. Dierent types of sources can be
employed, such as sensors, a priori information (ref-
erences or geographic data), databases, and human
(ii) human-computer interaction (HCI): HCI is an inter-
face that allows inputs to the system from the oper-
ators and produces outputs to the operators. HCI
includes queries, commands, and information on the
obtained results and alarms;
(iii) database management system: the database manage-
ment system stores the provided information and
the fused results. is system is a critical component
because of the large amount of highly diverse infor-
mation that is stored.
In contrast, the ve levels of data processing are dened as
() level —source preprocessing: source preprocessing
is the lowest level of the data fusion process, and
it includes fusion at the signal and pixel levels. In
the case of text sources, this level also includes the
information extraction process. is level reduces the
amount of data and maintains useful information for
the high-level processes;
() level —object renement: object renement employs
the processed data from the previous level. Com-
mon procedures of this level include spatio-temporal
alignment, association, correlation, clustering or
grouping techniques, state estimation, the removal of
false positives, identity fusion, and the combining of
features that were extracted from images. e output
e Scientic World Journal
Fusion domain
Level 0Level 1Level 2Level 3
Information bus
Level 4Database
(classication and identication) and object track-
ing (state of the object and orientation). is stage
transforms the input information into consistent data
() level —situation assessment: this level focuses on
a higher level of inference than level . Situation
assessment aims to identify the likely situations given
the observed events and obtained data. It establishes
relationships between the objects. Relations (i.e.,
proximity, communication) are valued to determine
the signicance of the entities or objects in a specic
environment. e aim of this level includes perform-
ing high-level inferences and identifying signicant
activities and events (patterns in general). e output
is a set of high-level inferences;
() level —impact assessment: this level evaluates the
proper perspective. e current situation is evaluated,
and a future projection is performed to identify
possible risks, vulnerabilities, and operational oppor-
tunities. is level includes () an evaluation of the
risk or threat and () a prediction of the logical
() level —process renement: this level improves the
process from level to level and provides resource
and sensor management. e aim is to achieve e-
cient resource management while accounting for task
priorities, scheduling, and the control of available
High-level fusion typically starts at level because the
type, localization, movement, and quantity of the objects
are known at that level. One of the limitations of the JDL
method is how the uncertainty about previous or subsequent
results could be employed to enhance the fusion process
(feedback loop). Llinas et al. [] propose several renements
and extensions to the JDL model. Blasch and Plano []
proposed to add a new level (user renement) to support a
terminology for the data fusion domain. However, because
their roots originate in the military domain, the employed
terms are oriented to the risks that commonly occur in
these scenarios. e Dasarathy model diers from the JDL
model with regard to the adopted terminology and employed
approach. e former is oriented toward the dierences
among the input and output results, independent of the
employed fusion method. In summary, the Dasarathy model
provides a method for understanding the relations between
the fusion tasks and employed data, whereas the JDL model
presents an appropriate fusion perspective to design data
fusion systems.
2.5. Classication Based on the Type of Architecture. One of
the main questions that arise when designing a data fusion
system is where the data fusion process will be performed.
Based on this criterion, the following types of architectures
could be identied:
() centralized architecture: in a centralized architecture,
the fusion node resides in the central processor that
receives the information from all of the input sources.
erefore, all of the fusion processes are executed
in a central processor that uses the provided raw
measurements from the sources. In this schema, the
sources obtain only the observationas measurements
data fusion process is performed. If we assume that
data alignment and data association are performed
correctly and that the required time to transfer the
data is not signicant, then the centralized scheme is
theoretically optimal. However, the previous assump-
tions typically do not hold for real systems. Moreover,
the large amount of bandwidth that is required to send
raw data through the network is another disadvantage
for the centralized approach. is issue becomes a
bottleneck when this type of architecture is employed
the time delays when transferring the information
between the dierent sources are variable and aect
e Scientic World Journal
the results in the centralized scheme to a greater
degree than in other schemes;
() decentralized architecture: a decentralized architec-
ture is composed of a network of nodes in which each
node has its own processing capabilities and there is
fuses its local information with the information that
is received from its peers. Data fusion is performed
autonomously, with each node accounting for its local
information and the information received from its
peers. Decentralized data fusion algorithms typically
communicate information using the Fisher and Shan-
non measurements instead of the object’s state [];
e main disadvantage of this architecture is the
communication cost, which is (2)at each com-
munication step, where is the number of nodes;
additionally, the extreme case is considered, in which
each node communicates with all of its peers. us,
this type of architecture could suer from scalability
problems when the number of nodes is increased;
() distributed architecture: in a distributed architecture,
measurements from each source node are processed
independently before the information is sent to the
fusion node; the fusion node accounts for the infor-
mation that is received from the other nodes. In other
words, the data association and state estimation are
performed in the source node before the information
is communicated to the fusion node. erefore, each
node provides an estimation of the object state based
on only their local views, and this information is
the input to the fusion process, which provides a
fused global view. is type of architecture provides
dierent options and variations that range from only
one fusion node to several intermediate fusion nodes;
() hierarchical architecture: other architectures com-
prise a combination of decentralized and distributed
nodes, generating hierarchical schemes in which the
data fusion process is performed at dierent levels in
the hierarchy.
In principle, a decentralized data fusion system is more
communication requirements. However, in practice, there is
no single best architecture, and the selection of the most
appropriate architecture should be made depending on the
requirements, demand, existing networks, data availability,
node processing capabilities, and organization of the data
fusion system.
e reader might think that the decentralized and
distributed architectures are similar; however, they have
meaningful dierences (see Figure ). First, in a distributed
architecture, a preprocessing of the obtained measurements is
performed, which provides a vector of features as a result (the
features are fused thereaer). In contrast, in the decentralized
architecture, the complete data fusion process is conducted
in each node, and each of the nodes provides a globally
fused result. Second, the decentralized fusion algorithms
typically communicate information, employing the Fisher
and Shannon measurements. In contrast, distributed algo-
rithms typically share a common notion of state (position,
velocity, and identity) with their associated probabilities,
which are used to perform the fusion process []. ird,
because the decentralized data fusion algorithms exchange
information instead of states and probabilities, they have
the advantage of easily separating old knowledge from new
knowledge. us, the process is additive, and the associative
meaning is not relevant when the information is received
and fused. However, in the distributed data fusion algorithms
(i.e., distributed by Kalman Filter), the state that is going
to be fused is not associative, and when and how the fused
estimates are computed is relevant. Nevertheless, in contrast
to the centralized architectures, the distributed algorithms
reduce the necessary communication and computational
costs because some tasks are computed in the distributed
nodes before data fusion is performed in the fusion node.
3. Data Association Techniques
e data association problem must determine the set of
measurements that correspond to each target (see Figure ).
Let us suppose that there are targets that are being tracked
by only one sensor in a cluttered environment (by a cluttered
environment, we refer to an environment that has several
targets that are to close each other). en, the data association
problem can be dened as follows:
(i) each sensor’s observation is received in the fusion
node at discrete time intervals;
(ii) the sensor might not provide observations at a specic
(iii) some observations are noise, and other observations
originate from the detected target;
(iv) for any specic target and in every time interval, we
do not know (a priori) the observations that will be
generated by that target.
erefore, the goal of data association is to establish the
the same target over time. Hall and Llinas []providedthe
following denition of data association: “e process of assign
and compute the weights that relates the observations or tracks
(A track can be dened as an ordered set of points that follow
a path and are generated by the same target.) from one set to
the observation of tracks of another set.
As an example of the complexity of the data association
problem, if we take a frame-to-frame association and assume
that possible points could be detected in all frames, then
movement of the points.
Data association is oen performed before the state
estimation of the detected targets. Moreover, it is a key
step because the estimation or classication will behave
incorrectly if the data association phase does not work
coherently. e data association process could also appear in
all of the fusion levels, but the granularity varies depending
on the objective of each level.
e Scientic World Journal
Alignment Association Estimation
of the
Centralized architecture
Decentralized architecture
Distributed architecture
Fusion node
of the
of the
of the
of the
F : Classication based on the type of architecture.
In general, an exhaustive search of all possible combina-
tions grows exponentially with the number of targets; thus,
the data association problem becomes NP complete. e
most common techniques that are employed to solve the data
association problem are presented in the following sections
(from Sections . to .).
3.1. Nearest Neighbors and K-Means. Nearest neighbor
(NN) is the simplest data association technique. NN is
a well-known clustering algorithm that selects or groups
the most similar values. How close the one measurement is
to another depends on the employed distance metric and
typically depends on the threshold that is established by the
designer. In general, the employed criteria could be based on
() an absolute distance, () the Euclidean distance, or () a
statistical function of the distance.
imate) solution in a small amount of time. However, in a
cluttered environment, it could provide many pairs that have
e Scientic World Journal
Targets Sensors Observations Tracks
Track 1
Track 2
False alarms
Track n
F : Conceptual overview of the data association process from multiple sensors and multiple targets. It is necessary to establish the set
of observations over time from the same object that forms a track.
error propagation []. Moreover, this algorithm has poor
are frequent, which are in highly noisy environments.
All neighbors use a similar technique, in which all of the
measurements inside a region are included in the tracks.
-Means [] method is a well-known modication of
the NN algorithm. -Means divides the dataset values into
dierent clusters. -Means algorithm nds the best local-
ization of the cluster centroids, where best means a centroid
that is in the center of the data cluster. -Means is an iterative
algorithm that can be divided into the following steps:
() obtain the input data and the number of desired
clusters ();
() randomly assign the centroid of each cluster;
() match each data point with the centroid of each
() move the cluster centers to the centroid of the cluster;
() if the algorithm does not converge, return to step ().
-Means is a popular algorithm that has been widely
employed; however, it has the following disadvantages:
(i) the algorithm does not always nd the optimal solu-
tion for the cluster centers;
one must assume that this number is the optimum;
(iii) the algorithm assumes that the covariance of the
ere are several options for overcoming these limita-
tions. For the rst one, it is possible to execute the algorithm
several times and obtain the solution that has less variance.
For the second one, it is possible to start with a low value
of and increment the values of until an adequate result
is obtained. e third limitation can be easily overcome by
multiplying the data with the inverse ofthe covariance matrix.
Many variations have been proposed to Lloyd’s basic
-Means algorithm [], which has a computational upper
bound cost of (),whereis the number of input points
and is the number of desired clusters. Some algorithms
modify the initial cluster assignments to improve the separa-
tions and reduce the number of iterations. Others introduce
so or multinomial clustering assignments using fuzzy logic,
probabilistic, or the Bayesian techniques. However, most of
through the data space to converge to a reasonable solution.
is issue becomes a major disadvantage in several real-
time applications. A new approach that is based on having
a large (but still aordable) number of cluster candidates
compared to the desired clusters is currently gaining
attention. e idea behind this computational model is that
the algorithm builds a good sketch of the original data while
reducing the dimensionality of the input space signicantly.
In this manner, a weighted -Meanscanbeappliedtothe
large candidate clusters to derive a good clustering of the
original data. Using this idea, [] presented an ecient
and scalable -Means algorithm that is based on random
projections. is algorithm requires only one pass through
the input data to build the clusters. More specically, if the
input data distribution holds some separability requirements,
then the number of required candidate clusters grows only
according to (log ),whereisthenumberofobservations
in the original data. is salient feature makes the algorithm
scalable in terms of both the memory and computational
3.2. Probabilistic Data Association. e probabilistic data
association (PDA) algorithm was proposed by Bar-Shalom
and Tse [] and is also known as the modied lter of all
neighbors. is algorithm assigns an association probability
to each hypothesis from a valid measurement of a target.
A valid measurement refers to the observation that falls in
the validation gate of the target at that time instant. e
validation gate, , which is the center around the predicted
measurements of the target, is used to select the set of basic
measurements and is dened as
(|−1))−1 ()(()−(|−1)),()
where is the temporal index, ()is the covariance gain,
and determines the gating or window size. e set of valid
measurements at time instant is dened as
()=(), =1,...,,()
e Scientic World Journal
where ()is the -measurement in the validation region at
time instant .WegivethestandardequationsofthePDA
algorithm next. For the state prediction, consider
where (−1)is the transition matrix at time instant −1.
To calculate the measurement prediction, consider
where () is the linearization measurement matrix. To
To calculate the covariance prediction, consider
where ()is the process noise covariance matrix. To com-
pute the innovation covariance ()andtheKalmangain()
To obtain the covariance update in the case in which the mea-
surements originated by the target are known, consider
e total update of the covariance is computed as
=1 ()V()V()−V()V()(),
where is the number of valid measurements in the instant
by the position and velocity, is given by
Finally, the association probabilities of PDA are as follows:
=0 (),()
if =0
exp −1
2V()−1 ()V()if =0
0in other cases,
where is the dimension of the measurement vector, is the
density of the clutter environment, is the detection prob-
ability of the correct measurement, and is the validation
probability of a detected value.
In the PDA algorithm, the state estimation of the target is
computed as a weighted sum of the estimated state under all
of the hypotheses. e algorithm can associate dierent mea-
surements to one specic target. us, the association of the
dierent measurements to a specic target helps PDA to
estimate the target state, and the association probabilities
algorithm are the following:
(i) loss of tracks: because PDA ignores the interference
with other targets, it sometimes could wrongly clas-
sify the closest tracks. erefore, it provides a poor
performance when the targets are close to each other
or crossed;
(ii) the suboptimal Bayesian approximation: when the
source of information is uncertain, PDA is the sub-
optimal Bayesian approximation to the association
(iii) one target: PDA was initially designed for the asso-
ciation of one target in a low-cluttered environment.
e number of false alarms is typically modeled with
the Poisson distribution, and they are assumed to be
distributed uniformly in space. PDA behaves incor-
rectly when there are multiple targets because the false
alarm model does not work well;
(iv) track management: because PDA assumes that the
track is already established, algorithms must be pro-
vided for track initialization and track deletion.
PDA is mainly good for tracking targets that do not
make abrupt changes in their movement patterns. PDA will
movement patterns.
3.3. Joint Probabilistic Data Association. Joint probabilistic
data association (JPDA) is a suboptimal approach for tracking
multiple targets in cluttered environments []. JPDA is
similar to PDA, with the dierence that the association
probabilities are computed using all of the observations
considers various hypotheses together and combines them.
JPDA determines the probability
()that measurement is
originated from target , accounting for the fact that under
this hypothesis, the measurement cannot be generated by
evaluates the dierent options of the measurement-target
association (for the most recent set of measurements) and
combines them into the corresponding state estimation. If
the association probability is known, then the Kalman lter
updating equation of the track can be written as
( | )and
(|−1)are the estimation and
prediction of target ,and()istheltergain.eweighted
e Scientic World Journal
sum of the residuals associated with the observation ()of
target is as follows:
where V
()( | 1). erefore, this method
incorporates all of the observations (inside the neighborhood
of the target’s predicted position) to update the estimated
position by using a posterior probability that is a weighted
e main restrictions of JPDA are the following:
(i) a measurement cannot come from more than one
(ii) two measurements cannot be originated by the same
target (at one time instant);
(iii) the sum of all of the measurements’ probabilities that
e main disadvantages of JPDA are the following:
(i) it requires an explicit mechanism for track initial-
ization. Similar to PDA, JPDA cannot initialize new
tracks or remove tracks that are out of the observation
(ii) JPDA is a computationally expensive algorithm when
because the number of hypotheses is incremented
exponentially with the number of targets.
In general, JPDA is more appropriate than MHT in
situations in which the density of false measurements is high
(i.e., sonar applications).
3.4. Multiple Hypothesis Test. e underlying idea of the
multiple hypothesis test (MHT) is based on using more than
two consecutive observations to make an association with
better results. Other algorithms that use only two consecutive
observations have a higher probability of generating an error.
In contrast to PDA and JPDA, MHT estimates all of the
possible hypotheses and maintains new hypotheses in each
MHT was developed to track multiple targets in cluttered
environments; as a result, it combines the data association
problem and tracking into a unied framework, becoming
an estimation technique as well. e Bayes rule or the
Bayesian networks are commonly employed to calculate the
MHT hypothesis. In general, researchers have claimed that
MHT outperforms JPDA for the lower densities of false
positives. However, the main disadvantage of MHT is the
computational cost when the number of tracks or false
a window could solve this limitation.
e Reid [] tracking algorithm is considered the stan-
dard MHT algorithm, but the initial integer programming
formulation of the problem is due to Moreeld []. MHT is
an iterative algorithm in which each iteration starts with a set
of correspondence hypotheses. Each hypothesis is a collec-
tion of disjoint tracks, and the prediction of the target in the
next time instant is computed for each hypothesis. Next, the
predictions are compared with the new observations by using
a distance metric. e set of associations established in each
hypothesis (based on a distance) introduces new hypotheses
in the next iteration. Each new hypothesis represents a new
Note that each new measurement could come from (i) a
new target in the visual eld of view, (ii) a target being tracked,
or (iii) noise in the measurement process. It is also possible
that a measurement is not assigned to a target because the
target disappears, or because it is not possible to obtain a
target measurement at that time instant.
MHT maintains several correspondence hypotheses for
each target in each frame. If the hypothesis in the instant
is represented by () = [(), = 1,...,],then
the probability of the hypothesis ()could be represented
recursively using the Bayes rule as follows:
where (−1)is the hypothesis of the complete set until
thetimeinstant−1;()is the th possible association of the
track to the object; ()is the set of detections of the current
frame, and is a normal constant.
e rst term on the right side of the previous equation
is the likelihood function of the measurement set ()given
is the probability of the association hypothesis of the current
data given the previous hypothesis (−1).ethirdterm
is the probability of the previous hypothesis from which the
current hypothesis is calculated.
e MHT algorithm has the ability to detect a new
track while maintaining the hypothesis tree structure. e
probability of a true track is given by the Bayes decision model
where ( | ) is the probability of obtaining the set of
measurements given ,()is the a priori probability of
the source signal, and ()is the probability of obtaining the
set of detections .
MHT considers all of the possibilities, including both
the track maintenance and the initialization and removal
of tracks in an integrated framework. MHT calculates the
possibility of having an object aer the generation of a set
of measurements using an exhaustive approach, and the
algorithm does not assume a xed number of targets. e key
e baseline MHT algorithm can be extended as follows:
(i) use the hypothesis aggregation for missed targets births,
 e Scientic World Journal
cardinality tracking, and closely spaced objects; (ii) apply
a multistage MHT for improving the performance and
robustness in challenging settings; and (iii) use a feature-
aided MHT for extended object surveillance.
e main disadvantage of this algorithm is the compu-
tational cost, which grows exponentially with the number of
tracks and measurements. erefore, the practical implemen-
tation of this algorithm is limited because it is exponential in
both time and memory.
With the aim of reducing the computational cost, []
presented a probabilistic MHT algorithm in which the
associations are considered to be random variables that
are statistically independent and in which performing an
exhaustive search enumeration is avoided. is algorithm is
known as PMHT. e PMHT algorithm assumes that the
number of targets and measurements is known. With the
same goal of reducing the computational cost, []presented
an ecient implementation of the MHT algorithm. is
implementation was the rst version to be applied to perform
tracking in visual environments. ey employed the Murty
[] algorithm to determine the best set of hypotheses
in polynomial time, with the goal of tracking the points of
MHT typically performs the tracking process by employ-
ing only one characteristic, commonly the position. e
Bayesian combination to use multiple characteristics was
proposed by Liggins II et al. [].
A linear-programming-based relaxation approach to the
optimization problem in MHT tracking was proposed inde-
pendently by Coraluppi et al. []andStormsandSpieksma
[]. Joo and Chellappa []proposedanassociationalgo-
rithm for tracking multiple targets in visual environments.
eir algorithm is based on in MHT modication in which
a measurement can be associated with more than one target,
and several targets can be associated with one measurement.
ey also proposed a combinatorial optimization algorithm
to generate the best set of association hypotheses. eir
other models, which are approximate. Coraluppi and Carthel
[] presented a generalization of the MHT algorithm using
a recursion over hypothesis classes rather than over a single
hypothesis. is work has been applied in a special case of
in which they observed the number of sensor measurements
instead of the target states.
3.5. Distributed Joint Probabilistic Data Association. e dis-
tributed version of the joint probabilistic data association
(JPDA-D) was presented by Chang et al. []. In this tech-
nique, the estimated state of the target (using two sensors)
aer being associated is given by
|1,2, ()
where ,=1,2, is the last set of measurements of
sensor and , ,=1,2,isthesetofaccumulativedata,
and is the association hypothesis. e rst term of the right
side of the equation is calculated from the associations that
were made earlier. e second term is computed from the
individual association probabilities as follows:
where are the joint hypotheses involving all of the
measurements and all of the objectives, and
()are the
binary indicators of the measurement-target association. e
additional term (1,2)depends on the correlation of the
individual hypothesis and reects the localization inuence
of the current measurements in the joint hypotheses.
ese equations are obtained assuming that commu-
nication exists aer every observation, and there are only
approximations in the case in which communication is
sporadic and when a substantial amount of noise occurs.
erefore, this algorithm is a theoretical model that has some
limitations in practical applications.
3.6. Distributed Multiple Hypothesis Test. e distributed
version of the MHT algorithm (MHT-D) [,] follows a
similar structure as the JPDA-D algorithm. Let us assume the
case in which one node must fuse two sets of hypotheses and
tracks. If the hypotheses and track sets are represented by
()and ()with =1,2,thehypothesisprobabilities
are represented by
; and the state distribution of the tracks
)and ( | ,
maximum available information in the fusion node is =
obtain the set of hypotheses (),thesetoftracks(),the
hypothesis probabilities ( | ), and the state distribution
(|,)for the observed data.
e MHT-D algorithm is composed of the following
() hypothesis formation: for each hypothesis pair 1
,whichcouldbefused,atrackis formed by
associating the pair of tracks 1
and 2
pair comes from one node and could originate from
of hypotheses denoted by ()and the fused tracks
() hypothesis evaluation: in this stage, the association
probability of each hypothesis and the estimated
state of each fused track are obtained. e dis-
tributed estimation algorithm is employed to calcu-
late the likelihood of the possible associations and
the obtained estimations at each specic association.
e Scientic World Journal 
Using the information model, the probability of each
fused hypothesis is given by
∈() |()()
where is a normalizing constant, and ( | ) is the
likelihood of each hypothesis pair.
e main disadvantage of the MHT-D is the high com-
putational cost that is in the order of (),whereis the
number of possible associations and is the number of
variables to be estimated.
3.7. Graphical Models. Graphical models are a formalism for
representing and reasoning with probabilities and indepen-
dence. A graphical model represents a conditional decom-
position of the joint probability. A graphical model can be
represented as a graph in which the nodes denote random
variables; the edges denote the possible dependence between
the random variables, and the plates denote the replication of
a substructure, with the appropriate indexing of the relevant
variables. e graph captures the joint distribution over the
random variables, which can be decomposed into a product
of factors that each depend on only a subset of variables. ere
are two major classes of graphical models: (i) the Bayesian
networks [], which are also known as the directed graphical
models, and (ii) the Markov random elds, which are also
known as undirected graphical models. e directed graph-
ical models are useful for expressing causal relationships
between random variables, whereas undirected models are
better suited for expressing so constraints between random
variables. We refer the reader to the book of Koller and
Friedman [] for more information on graphical models.
A framework based on graphical models can solve the
problem of distributed data association in synchronized
sensor networks with overlapped areas and where each sensor
receives noisy measurements; this solution was proposed
by Chen et al. [,]. eir work is based on graphical
models that are used to represent the statistical dependence
between random variables. e data association problem is
treated as an inference problem and solved by using the
max-product algorithm []. Graphical models represent
statistical dependencies between variables as graphs, and
the max-product algorithm converges when the graph is
a tree structure. Moreover, the employed algorithm could
be implemented in a distributed manner by exchanging
messages between the source nodes in parallel. With this
algorithm, if each sensor has possible combinations of
associations and there are variables to be estimated, it has
a complexity of (2), which is reasonable and less than
the ()complexity of the MHT-D algorithm. However,
aspecial attention must be given to the correlated variables
when building the graphical model.
4. State Estimation Methods
State estimation techniques aim to determine the state of
the target under movement (typically the position) given
the observation or measurements. State estimation tech-
niques are also known as tracking techniques. In their general
form, it is not guaranteed that the target observations are
relevant, which means that some of the observations could
actually come from the target and others could be only noise.
e state estimation phase is a common stage in data fusion
algorithms because the target’s observation could come from
dierent sensors or sources, and the nal goal is to obtain a
global target state from the observations.
e estimation problem involves nding the values of the
vector state (e.g., position, velocity, and size) that ts as much
as possible with the observed data. From a mathematical
perspective, we have a set of redundant observations, and
the goal is to nd the set of parameters that provides the
best t to the observed data. In general, these observations
are corrupted by errors and the propagation of noise in the
measurement process. State estimation methods fall under
level of the JDL classication and could be divided into two
broader groups:
() linear dynamics and measurements: here, the esti-
mation problem has a standard solution. Specically,
when the equations of the object state and the mea-
surements are linear, the noise follows the Gaussian
distribution, and we do not refer to it as a clutter
environment; in this case, the optimal theoretical
solution is based on the Kalman lter;
() nonlinear dynamics: the state estimation problem
becomes dicult, and there is not an analytical solu-
tion to solve the problem in a general manner. In prin-
ciple, there are no practical algorithms available to
solve this problem satisfactorily.
Most of the state estimation methods are based on control
theory and employ the laws of probability to compute a
vector state from a vector measurement or a stream of vector
measurements. Next, the most common estimation methods
are presented, including maximum likelihood and maxi-
mum posterior (Section .), the Kalman lter (Section .),
particle lter (Section .), the distributed Kalman lter
(Section .),distributedparticlelter(Section .)and,
covariance consistency methods (Section .).
4.1. Maximum Likelihood and Maximum Posterior. e max-
imum likelihood (ML) technique is an estimation method
that is based on probabilistic theory. Probabilistic estimation
methods are appropriate when the state variable follows an
unknown probability distribution []. In the context of
data fusion, is the state that is being estimated, and =
((1),...,()) is a sequence of previous observations of
. e likelihood function () is dened as a probability
density function of the sequence of observations given the
true value of the state . Consider
e ML estimator nds the value of that maximizes the
likelihood function:
()=arg max
 e Scientic World Journal
which can be obtained from the analytical or empirical
models of the sensors. is function expresses the probability
of the observed data. e main disadvantage of this method
in practice is that it requires the analytical or empirical model
of the sensor to be known to provide the prior distribution
and compute the likelihood function. is method can also
systematically underestimate the variance of the distribution,
which leads to a bias problem. However, the bias of the ML
solution becomes less signicant as the number of data
points increases and is equal to the true variance of the
distribution that generated the data at the limit →∞.
e maximum posterior (MAP) method is based on the
Bayesian theory. It is employed when the parameter to
be estimated is the output of a random variable that has a
known probability density function (). In the context of
data fusion, is the state that is being estimated and =
((1),...,())is a sequence of previous observations of .
e MAP estimator nds the value of that maximizes the
posterior probability distribution as follows:
()=arg max
Both methods (ML and MAP) aim to nd the most likely
valueforthestate.However,MLassumesthatis a xed
MAP considers to be the output of a random variable with
a known a priori probability density function. Both of these
methods are equivalent when there is no a priori information
about , that is, when there are only observations.
4.2. e Kalman Filter. e Kalman lter is the most popular
estimation technique. It was originally proposed by Kalman
[] and has been widely studied and applied since then. e
Kalman lter estimates the state of a discrete time process
governed by the following space-time model:
with the observations or measurements at time of the state
represented by ()=()()+V(),()
where Φ()is the state transition matrix, ()is the input
matrix transition, () is the input vector, () is the
measurement matrix, and and Vare the random Gaussian
variables with zero mean and covariance matrices of ()
and (), respectively. Based on the measurements and on
the system parameters, the estimation of (),whichis
represented by
(),andthepredictionof(+ 1),which
is represented by
(+1|), are given by the following:
respectively, where is the lter gain determined by
where ( | 1)is the prediction covariance matrix and
can be determined by
e Kalman lter is mainly employed to fuse low-level
data. If the system could be described as a linear model and
recursive Kalman lter obtains optimal statistical estimations
[]. However, other methods are required to address nonlin-
ear dynamic models and nonlinear measurements. e modi-
ed Kalman lter known as the extended Kalman lter (EKF)
is an optimal approach for implementing nonlinear recursive
lters []. e EKF is one of the most oen employed
methods for fusing data in robotic applications. However,
it has some disadvantages because the computations of the
Jacobians are extremely expensive. Some attempts have been
made to reduce the computational cost, such as linearization,
but these attempts introduce errors in the lter and make it
e unscented Kalman lter (UKF) []hasgained
popularity, because it does not have the linearization step and
the associated errors of the EKF []. e UKF employs a
deterministic sampling strategy to establish the minimum set
of points around the mean. is set of points captures the
true mean and covariance completely. en, these points are
propagated through nonlinear functions, and the covariance
of the estimations can be recuperated. Another advantage of
the UKF is its ability to be employed in parallel implementa-
4.3. Particle Filter. Particle lters are recursive implemen-
tations of the sequential Monte Carlo methods []. is
method builds the posterior density function using several
random samples called particles. Particles are propagated
over time with a combination of sampling and resampling
steps. At each iteration, the sampling step is employed to
discard some particles, increasing the relevance of regions
with a higher posterior probability. In the ltering process,
several particles of the same state variable are employed,
and each particle has an associated weight that indicates
the quality of the particle. erefore, the estimation is the
result of a weighted sum of all of the particles. e standard
particle lter algorithm has two phases: () the predicting
phase and () the updating phase. In the predicting phase,
each particle is modied according to the existing model
and accounts for the sum of the random noise to simulate
the noise eect. en, in the updating phase, the weight of
observation, and particles with lower weights are removed.
Specically, a generic particle lter comprises the following
e Scientic World Journal 
() Initialization of the particles:
(i) let be equal to the number of particles;
(ii) ()(1)=[(1),(1),0,0]for =1,...,.
() Prediction step:
(i) for each particle =1,...,, evaluate the state
(+1| )of the system using the state at time
instant with the noise of the system at time .
() (+1|)=()
() ()
where ()is the transition matrix of the sys-
() Evaluate the particle weight. For each particle =
(i) compute the predicted observation state of the
system using the current predicted state and the
noise at instant . Consider
() (+1|)=(+1)
() (+1|)
(ii) compute the likelihood (weights) according to
the given distribution. Consider
likelihood() =
() (+1|);() (+1),var;()
(iii) normalize the weights as follows
() =likelihood()
=1 likelihood() .()
() Resampling/Selection: multiply particles with higher
weights and remove those with lower weights. e
current state must be adjusted using the computed
(i) Compute the cumulative weights. Consider
Cum Wt() =
(ii) Generate uniform distributed random variables
from () (0,1)with the number of steps
equal to the number of particles.
(iii) Determine which particles should be multiplied
and which ones removed.
() Propagation phase:
(i) incorporate the new values of the state aer the
resampling of instant to calculate the value at
instant +1. Consider
(1:) (+1|+1)=
(ii) compute the posterior mean. Consider
(+1)=mean (+1|+1), =1,...,; ()
(iii) repeat steps to for each time instant.
Particle lters are more exible than the Kalman lters
and can cope with nonlinear dependencies and non-Gaussian
densities in the dynamic model and in the noise error.
However, they have some disadvantages. A large number
of particles are required to obtain a small variance in the
estimator. It is also dicult to establish the optimal number of
particles in advance, and the number of particles aects the
computational cost signicantly. Earlier versions of particle
lters employed a xed number of particles, but recent studies
have started to use a dynamic number of particles [].
4.4. e Distributed Kalman Filter. e distributed Kalman
lter requires a correct clock synchronization between each
source, as demonstrated in []. In other words, to correctly
use the distributed Kalman lter, the clocks from all of
the sources must be synchronized. is synchronization is
typically achieved through using protocols that employ a
shared global clock, such as the network time protocol (NTP).
Synchronization problems between clocks have been shown
producing inaccurate estimations [].
is known (or the estimations are uncorrelated), then it is
possible to use the distributed Kalman lters []. However,
the cross covariance must be determined exactly, or the
observations must be consistent.
We refer the reader to Liggins II et al. [] for more details
about the Kalman lter in a distributed and hierarchical
4.5. Distributed Particle Filter. Distributed particle lters
have gained attention recently []. Coates []useda
distributed particle lter to monitor an environment that
involving nonlinear dynamics and observations and non-
Gaussian noise.
In contrast, earlier attempts to solve out-of-sequence
measurements using particle lters are based on regenerating
the probability density function to the time instant of the
out-of-sequence measurement []. In a particle lter, this
step requires a large computational cost, in addition to the
necessary space to store the previous particles. To avoid
this problem, Orton and Marrs [] proposed to store the
information on the particles at each time instant, saving the
cost of recalculating this information. is technique is close
 e Scientic World Journal
to optimal, and when the delay increases, the result is only
slightly aected []. However, it requires a very large amount
of space to store the state of the particles at each time instant.
4.6. Covariance Consistency Methods: Covariance Intersec-
tion/Union. Covariance consistency methods (intersection
and union) were proposed by Uhlmann [] and are general
and fault-tolerant frameworks for maintaining covariance
means and estimations in a distributed network.ese meth-
ods do not comprise estimation techniques; instead, they are
similar to an estimation fusion technique. e distributed
Kalman lter requirement of independent measurements or
known cross-covariances is not a constraint with this method.
4.6.1. Covariance Intersection. If the Kalman lter is employ-
ed to combine two estimations, (1,1)and (2,2),thenit
is assumed that the joint covariance is in the following form:
where the cross-covariance should be known exactly so
that the Kalman lter can be applied without diculty.
Because the computation of the cross-covariances is compu-
tationally intensive, Uhlmann [] proposed the covariance
intersection (CI) algorithm.
Let us assume that a joint covariance can be dened
with the diagonal blocks 1>1and 2>2. Consider
for every possible instance of the unknown cross-covariance
; then, the components of the matrix could be employed
in the Kalman lter equations to provide a fused estimation
(,) that is considered consistent. e key point of this
method relies on generating a joint covariance matrix that
can represent a useful fused estimation (in this context, useful
refers to something with a lower associated uncertainty). In
summary, the CI algorithm computes the joint covariance
matrix , where the Kalman lter provides the best fused
estimation (,)with respect to a xed measurement of the
covariance matrix (i.e., the minimum determinant).
Specic covariance criteria must be established because
there is not a specic minimum joint covariance in the
order of the positive semidenite matrices. Moreover, the
joint covariance is the basis of the formal analysis of the
CI algorithm; the actual result is a nonlinear mixture of the
information stored on the estimations being fused, following
the following equation.
where is the transformation of the fused state-space
estimation to the space of the estimated state .evalues
of canbecalculatedtominimizethecovariancedetermi-
nant using convex optimization packages and semipositive
matrix programming. e result of the CI algorithm has
dierent characteristics compared to the Kalman lter. For
example, if two estimations are provided (,) and (,)
and their covariances are equal =,sincetheKalman
lter is based on the statistical independence assumption, it
produces a fused estimation with covariance = (1/2).
In contrast, the CI method does not assume independence
and, thus, must be consistent even in the case in which
the estimations are completely correlated, with the estimated
fused covariance =. In the case of estimations where
<, the CI algorithm does not provide information about
the estimation (,);thus,thefusedresultis(,).
Every joint-consistent covariance is sucient to produce
a fused estimation, which guarantees consistency. However,
it is also necessary to guarantee a lack of divergence. Diver-
gence is avoided in the CI algorithm by choosing a specic
measurement (i.e., the determinant), which is minimized in
each fusion operation. is measurement represents a non-
divergence criterion, because the size of the estimated covari-
ance according to this criterion would not be incremented.
e application of the CI method guarantees consis-
tency and nondivergence for every sequence of mean and
covariance-consistent estimations. However, this method
does not work well when the measurements to be fused are
4.6.2. Covariance Union. CI solves the problem of correlated
inputs but not the problem of inconsistent inputs (inconsistent
inputs refer to dierent estimations, each of which has a
high accuracy (small variance) but also a large dierence
from the states of the others); thus, the covariance union
(CU) algorithm was proposed to solve the latter []. CU
addresses the following problem: two estimations (1,1)
and (2,2)relate to the state of an object and are mutually
inconsistent from one another. is issue arises when the
dierence between the average estimations is larger than
the provided covariance. Inconsistent inputs can be detected
using the Mahalanobis distance [] between them, which is
dened as
=1−21+2−1 1−2, ()
and detecting whether this distance is larger than a given
e Mahalanobis distance accounts for the covariance
information to obtain the distance. If the dierence between
the estimations is high but their covariance is also high,
the Mahalanobis distance yields a small value. In contrast,
if the dierence between the