- Access to this full-text is provided by Hindawi.
- Learn more

Download available

Content available from The Scientific World Journal

This content is subject to copyright. Terms and conditions apply.

Hindawi Publishing Corporation

e Scientic World Journal

Volume , Article ID , pages

http://dx.doi.org/.//

Review Article

A Review of Data Fusion Techniques

Federico Castanedo

Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain

Correspondence should be addressed to Federico Castanedo; castanedofede@gmail.com

Received August ; Accepted September

Academic Editors: Y. Takama and D. Ursino

Copyright © Federico Castanedo. is is an open access article distributed under the Creative Commons Attribution License,

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

e integration of data and knowledge from several sources is known as data fusion. is paper summarizes the state of the data

fusion eld and describes the most relevant studies. We rst enumerate and explain dierent classication schemes for data fusion.

en, the most common algorithms are reviewed. ese methods and algorithms are presented using three dierent categories: (i)

data association, (ii) state estimation, and (iii) decision fusion.

1. Introduction

In general, all tasks that demand any type of parameter

estimation from multiple sources can benet from the use

of data/information fusion methods. e terms information

fusion and data fusion are typically employed as synonyms;

but in some scenarios, the term data fusion is used for

raw data (obtained directly from the sensors) and the term

information fusion is employed to dene already processed

data. In this sense, the term information fusion implies a

higher semantic level than data fusion.Othertermsassoci-

ated with data fusion that typically appear in the literature

include decision fusion, data combination, data aggregation,

multisensor data fusion, and sensor fusion.

Researchers in this eld agree that the most accepted

denition of data fusion was provided by the Joint Directors

of Laboratories (JDL) workshop []: “A m u l t i - l e v e l p r o c e s s

dealing with the association, correlation, combination of data

and information from single and multiple sources to achieve

rened position, identify estimates and complete and timely

assessments of situations, threats and their signicance.”

Hall and Llinas [] provided the following well-known

denition of data fusion: “data fusion techniques combine data

from multiple sensors and related information from associated

databases to achieve improved accuracy and more specic

inferences than could be achieved by the use of a single sensor

alone.”

Briey, we can dene data fusion as a combination of

multiple sources to obtain improved information; in this

context, improved information means less expensive, higher

quality, or more relevant information.

Data fusion techniques have been extensively employed

on multisensor environments with the aim of fusing and

aggregating data from dierent sensors; however, these tech-

niques can also be applied to other domains, such as text

processing. e goal of using data fusion in multisensor envi-

ronments is to obtain a lower detection error probability and

a higher reliability by using data from multiple distributed

sources.

e available data fusion techniques can be classied into

three nonexclusive categories: (i) data association, (ii) state

estimation, and (iii) decision fusion. Because of the large

number of published papers on data fusion, this paper does

not aim to provide an exhaustive review of all of the studies;

instead, the objective is to highlight the main steps that are

involved in the data fusion framework and to review the most

common techniques for each step.

e remainder of this paper continues as follows. e

next section provides various classication categories for data

fusion techniques. en, Section describes the most com-

mon methods for data association tasks. Section provides

a review of techniques under the state estimation category.

Next, the most common techniques for decision fusion are

enumerated in Section . Finally, the conclusions obtained

e Scientic World Journal

from reviewing the dierent methods are highlighted in

Section .

2. Classification of Data Fusion Techniques

Data fusion is a multidisciplinary area that involves several

elds, and it is dicult to establish a clear and strict classi-

cation. e employed methods and techniques can be divided

according to the following criteria:

() attending to the relations between the input data

sources, as proposed by Durrant-Whyte []. ese

relations can be dened as (a) complementary, (b)

redundant, or () cooperative data;

() according to the input/output data types and their

nature, as proposed by Dasarathy [];

() following an abstraction level of the employed data:

(a) raw measurement, (b) signals, and (c) characteris-

tics or decisions;

() based on the dierent data fusion levels dened by the

JDL;

() Depending on the architecture type: (a) centralized,

(b) decentralized, or (c) distributed.

2.1. Classication Based on the Relations between the Data

Sources. Based on the relations of the sources (see Figure ),

Durrant-Whyte [] proposed the following classication

criteria:

() complementary: when the information provided by

the input sources represents dierent parts of the

sceneandcouldthusbeusedtoobtainmorecomplete

global information. For example, in the case of visual

sensor networks, the information on the same target

provided by two cameras with dierent elds of view

is considered complementary;

() redundant: when two or more input sources provide

information about the same target and could thus be

fused to increment the condence. For example, the

data coming from overlapped areas in visual sensor

networks are considered redundant;

() cooperative: when the provided information is com-

bined into new information that is typically more

complex than the original information. For example,

multi-modal (audio and video) data fusion is consid-

ered cooperative.

2.2. Dasarathy’s Classication. One of the most well-known

data fusion classication systems was provided by Dasarathy

[] and is composed of the following ve categories (see

Figure ):

() data in-data out (DAI-DAO): this type is the most

basicorelementarydatafusionmethodthatiscon-

sidered in classication. is type of data fusion

process inputs and outputs raw data; the results

are typically more reliable or accurate. Data fusion at

thislevelisconductedimmediatelyaerthedataare

gathered from the sensors. e algorithms employed

atthislevelarebasedonsignalandimageprocessing

algorithms;

() data in-feature out (DAI-FEO): at this level, the data

fusion process employs raw data from the sources

to extract features or characteristics that describe an

entity in the environment;

() feature in-feature out (FEI-FEO): at this level, both

theinputandoutputofthedatafusionprocessare

features. us, the data fusion process addresses a

set of features with to improve, rene or obtain new

features. is process is also known as feature fusion,

symbolic fusion, information fusion or intermediate-

level fusion;

() feature in-decision out (FEI-DEO): this level obtains a

set of features as input and provides a set of decisions

as output. Most of the classication systems that

perform a decision based on a sensor’s inputs fall into

this category of classication;

() Decision In-Decision Out (DEI-DEO): is type of

classication is also known as decision fusion. It fuses

input decisions to obtain better or new decisions.

e main contribution of Dasarathy’s classication is the

specication of the abstraction level either as an input or an

output, providing a framework to classify dierent methods

or techniques.

2.3. Classication Based on the Abstraction Levels. Luo et al.

[] provided the following four abstraction levels:

() signal level: directly addresses the signals that are

acquired from the sensors;

() pixel level: operates at the image level and could be

used to improve image processing tasks;

() characteristic: employs features that are extracted

from the images or signals (i.e., shape or velocity),

() symbol: at this level, information is represented as

symbols; this level is also known as the decision level.

Information fusion typically addresses three levels of

abstraction: () measurements, () characteristics, and ()

decisions. Other possible classications of data fusion based

on the abstraction levels are as follows:

() low level fusion: the raw data are directly provided

asaninputtothedatafusionprocess,whichprovide

more accurate data (a lower signal-to-noise ratio)

than the individual sources;

() medium level fusion: characteristics or features

(shape, texture, and position) are fused to obtain

features that could be employed for other tasks. is

level is also known as the feature or characteristic

level;

e Scientic World Journal

S1S2S3S4S5

Complementary

fusion

Redundant

fusion

Cooperative

fusion

Fused

information

Sources

Information

(a+b)(b)(c)

AB

AB

BC

C

C

F : Whyte’s classication based on the relations between the data sources.

Data

Data

Features

Features

Decisions

Data

Features

Features

Decisions

Decisions

Data in-data out

(DAI-DAO)

Data in-feature out

(DAI-FEO)

Feature in-decision out

(FEI-DEO)

Decision in-decision out

(DEI-DEO)

Feature in-feature out

(FEI-FEO)

F : Dasarathy’s classication.

() high level fusion: this level, which is also known

as decision fusion, takes symbolic representations as

sources and combines them to obtain a more accurate

decision. Bayesian’s methods are typically employed at

this level;

() multiple level fusion: this level addresses data pro-

vided from dierent levels of abstraction (i.e., when

ameasurementiscombinedwithafeaturetoobtaina

decision).

2.4. JDL Data Fusion Classication. is classication is the

most popular conceptual model in the data fusion commu-

nity. It was originally proposed by JDL and the American

Department of Defense (DoD) []. ese organizations clas-

sied the data fusion process into ve processing levels, an

associated database, and an information bus that connects

the ve components (see Figure ). e ve levels could be

grouped into two groups, low-level fusion and high-level

fusion, which comprise the following components:

(i) sources: the sources are in charge of providing

the input data. Dierent types of sources can be

employed, such as sensors, a priori information (ref-

erences or geographic data), databases, and human

inputs;

(ii) human-computer interaction (HCI): HCI is an inter-

face that allows inputs to the system from the oper-

ators and produces outputs to the operators. HCI

includes queries, commands, and information on the

obtained results and alarms;

(iii) database management system: the database manage-

ment system stores the provided information and

the fused results. is system is a critical component

because of the large amount of highly diverse infor-

mation that is stored.

In contrast, the ve levels of data processing are dened as

follows:

() level —source preprocessing: source preprocessing

is the lowest level of the data fusion process, and

it includes fusion at the signal and pixel levels. In

the case of text sources, this level also includes the

information extraction process. is level reduces the

amount of data and maintains useful information for

the high-level processes;

() level —object renement: object renement employs

the processed data from the previous level. Com-

mon procedures of this level include spatio-temporal

alignment, association, correlation, clustering or

grouping techniques, state estimation, the removal of

false positives, identity fusion, and the combining of

features that were extracted from images. e output

e Scientic World Journal

Fusion domain

Level 0Level 1Level 2Level 3

Source

preprocessing

Object

renement

Situation

assessment

reat

assessment

Information bus

Sources

Sensors

Databases

Knowledge

Level 4Database

management

Process

renement

User

interface

F:eJDLdatafusionframework.

resultsofthisstagearetheobjectdiscrimination

(classication and identication) and object track-

ing (state of the object and orientation). is stage

transforms the input information into consistent data

structures;

() level —situation assessment: this level focuses on

a higher level of inference than level . Situation

assessment aims to identify the likely situations given

the observed events and obtained data. It establishes

relationships between the objects. Relations (i.e.,

proximity, communication) are valued to determine

the signicance of the entities or objects in a specic

environment. e aim of this level includes perform-

ing high-level inferences and identifying signicant

activities and events (patterns in general). e output

is a set of high-level inferences;

() level —impact assessment: this level evaluates the

impactofthedetectedactivitiesinleveltoobtaina

proper perspective. e current situation is evaluated,

and a future projection is performed to identify

possible risks, vulnerabilities, and operational oppor-

tunities. is level includes () an evaluation of the

risk or threat and () a prediction of the logical

outcome;

() level —process renement: this level improves the

process from level to level and provides resource

and sensor management. e aim is to achieve e-

cient resource management while accounting for task

priorities, scheduling, and the control of available

resources.

High-level fusion typically starts at level because the

type, localization, movement, and quantity of the objects

are known at that level. One of the limitations of the JDL

method is how the uncertainty about previous or subsequent

results could be employed to enhance the fusion process

(feedback loop). Llinas et al. [] propose several renements

and extensions to the JDL model. Blasch and Plano []

proposed to add a new level (user renement) to support a

humanuserinthedatafusionloop.eJDLmodelrepresents

thersteorttoprovideadetailedmodelandacommon

terminology for the data fusion domain. However, because

their roots originate in the military domain, the employed

terms are oriented to the risks that commonly occur in

these scenarios. e Dasarathy model diers from the JDL

model with regard to the adopted terminology and employed

approach. e former is oriented toward the dierences

among the input and output results, independent of the

employed fusion method. In summary, the Dasarathy model

provides a method for understanding the relations between

the fusion tasks and employed data, whereas the JDL model

presents an appropriate fusion perspective to design data

fusion systems.

2.5. Classication Based on the Type of Architecture. One of

the main questions that arise when designing a data fusion

system is where the data fusion process will be performed.

Based on this criterion, the following types of architectures

could be identied:

() centralized architecture: in a centralized architecture,

the fusion node resides in the central processor that

receives the information from all of the input sources.

erefore, all of the fusion processes are executed

in a central processor that uses the provided raw

measurements from the sources. In this schema, the

sources obtain only the observationas measurements

andtransmitthemtoacentralprocessor,wherethe

data fusion process is performed. If we assume that

data alignment and data association are performed

correctly and that the required time to transfer the

data is not signicant, then the centralized scheme is

theoretically optimal. However, the previous assump-

tions typically do not hold for real systems. Moreover,

the large amount of bandwidth that is required to send

raw data through the network is another disadvantage

for the centralized approach. is issue becomes a

bottleneck when this type of architecture is employed

forfusingdatainvisualsensornetworks.Finally,

the time delays when transferring the information

between the dierent sources are variable and aect

e Scientic World Journal

the results in the centralized scheme to a greater

degree than in other schemes;

() decentralized architecture: a decentralized architec-

ture is composed of a network of nodes in which each

node has its own processing capabilities and there is

nosinglepointofdatafusion.erefore,eachnode

fuses its local information with the information that

is received from its peers. Data fusion is performed

autonomously, with each node accounting for its local

information and the information received from its

peers. Decentralized data fusion algorithms typically

communicate information using the Fisher and Shan-

non measurements instead of the object’s state [];

e main disadvantage of this architecture is the

communication cost, which is (2)at each com-

munication step, where is the number of nodes;

additionally, the extreme case is considered, in which

each node communicates with all of its peers. us,

this type of architecture could suer from scalability

problems when the number of nodes is increased;

() distributed architecture: in a distributed architecture,

measurements from each source node are processed

independently before the information is sent to the

fusion node; the fusion node accounts for the infor-

mation that is received from the other nodes. In other

words, the data association and state estimation are

performed in the source node before the information

is communicated to the fusion node. erefore, each

node provides an estimation of the object state based

on only their local views, and this information is

the input to the fusion process, which provides a

fused global view. is type of architecture provides

dierent options and variations that range from only

one fusion node to several intermediate fusion nodes;

() hierarchical architecture: other architectures com-

prise a combination of decentralized and distributed

nodes, generating hierarchical schemes in which the

data fusion process is performed at dierent levels in

the hierarchy.

In principle, a decentralized data fusion system is more

diculttoimplementbecauseofthecomputationand

communication requirements. However, in practice, there is

no single best architecture, and the selection of the most

appropriate architecture should be made depending on the

requirements, demand, existing networks, data availability,

node processing capabilities, and organization of the data

fusion system.

e reader might think that the decentralized and

distributed architectures are similar; however, they have

meaningful dierences (see Figure ). First, in a distributed

architecture, a preprocessing of the obtained measurements is

performed, which provides a vector of features as a result (the

features are fused thereaer). In contrast, in the decentralized

architecture, the complete data fusion process is conducted

in each node, and each of the nodes provides a globally

fused result. Second, the decentralized fusion algorithms

typically communicate information, employing the Fisher

and Shannon measurements. In contrast, distributed algo-

rithms typically share a common notion of state (position,

velocity, and identity) with their associated probabilities,

which are used to perform the fusion process []. ird,

because the decentralized data fusion algorithms exchange

information instead of states and probabilities, they have

the advantage of easily separating old knowledge from new

knowledge. us, the process is additive, and the associative

meaning is not relevant when the information is received

and fused. However, in the distributed data fusion algorithms

(i.e., distributed by Kalman Filter), the state that is going

to be fused is not associative, and when and how the fused

estimates are computed is relevant. Nevertheless, in contrast

to the centralized architectures, the distributed algorithms

reduce the necessary communication and computational

costs because some tasks are computed in the distributed

nodes before data fusion is performed in the fusion node.

3. Data Association Techniques

e data association problem must determine the set of

measurements that correspond to each target (see Figure ).

Let us suppose that there are targets that are being tracked

by only one sensor in a cluttered environment (by a cluttered

environment, we refer to an environment that has several

targets that are to close each other). en, the data association

problem can be dened as follows:

(i) each sensor’s observation is received in the fusion

node at discrete time intervals;

(ii) the sensor might not provide observations at a specic

interval;

(iii) some observations are noise, and other observations

originate from the detected target;

(iv) for any specic target and in every time interval, we

do not know (a priori) the observations that will be

generated by that target.

erefore, the goal of data association is to establish the

setofobservationsormeasurementsthataregeneratedby

the same target over time. Hall and Llinas []providedthe

following denition of data association: “e process of assign

and compute the weights that relates the observations or tracks

(A track can be dened as an ordered set of points that follow

a path and are generated by the same target.) from one set to

the observation of tracks of another set.”

As an example of the complexity of the data association

problem, if we take a frame-to-frame association and assume

that possible points could be detected in all frames, then

thenumberofpossiblesetsis(!)−1.Notethatfromall

ofthesepossiblesolutions,onlyonesetestablishesthetrue

movement of the points.

Data association is oen performed before the state

estimation of the detected targets. Moreover, it is a key

step because the estimation or classication will behave

incorrectly if the data association phase does not work

coherently. e data association process could also appear in

all of the fusion levels, but the granularity varies depending

on the objective of each level.

e Scientic World Journal

Preprocessing

Preprocessing

Preprocessing

Alignment Association Estimation

State

of the

object

Centralized architecture

Decentralized architecture

Distributed architecture

S1

S2

Fusion node

Preprocessing

State

of the

object

State

of the

object

State

of the

object

S1

S2

S1

S2

Preprocessing

Preprocessing

Preprocessing

Preprocessing

Preprocessing

Alignment

Alignment

Alignment

Alignment

Alignment

Alignment

Alignment

Association

Association

Association

Association

Association

Association

Association

Estimation

Estimation

Estimation

Estimation

Estimation

Estimation

Estimation

Sn

Sn

Sn

State

of the

object

F : Classication based on the type of architecture.

In general, an exhaustive search of all possible combina-

tions grows exponentially with the number of targets; thus,

the data association problem becomes NP complete. e

most common techniques that are employed to solve the data

association problem are presented in the following sections

(from Sections . to .).

3.1. Nearest Neighbors and K-Means. Nearest neighbor

(NN) is the simplest data association technique. NN is

a well-known clustering algorithm that selects or groups

the most similar values. How close the one measurement is

to another depends on the employed distance metric and

typically depends on the threshold that is established by the

designer. In general, the employed criteria could be based on

() an absolute distance, () the Euclidean distance, or () a

statistical function of the distance.

NNisasimplealgorithmthatcanndafeasible(approx-

imate) solution in a small amount of time. However, in a

cluttered environment, it could provide many pairs that have

thesameprobabilityandcouldthusproduceundesirable

e Scientic World Journal

Targets Sensors Observations Tracks

Track 1

Track 2

False alarms

Association

S1

S2

.

.

.

Sn

Track n

y1,y

2,...,y

n

F : Conceptual overview of the data association process from multiple sensors and multiple targets. It is necessary to establish the set

of observations over time from the same object that forms a track.

error propagation []. Moreover, this algorithm has poor

performanceinenvironmentsinwhichfalsemeasurements

are frequent, which are in highly noisy environments.

All neighbors use a similar technique, in which all of the

measurements inside a region are included in the tracks.

-Means [] method is a well-known modication of

the NN algorithm. -Means divides the dataset values into

dierent clusters. -Means algorithm nds the best local-

ization of the cluster centroids, where best means a centroid

that is in the center of the data cluster. -Means is an iterative

algorithm that can be divided into the following steps:

() obtain the input data and the number of desired

clusters ();

() randomly assign the centroid of each cluster;

() match each data point with the centroid of each

cluster;

() move the cluster centers to the centroid of the cluster;

() if the algorithm does not converge, return to step ().

-Means is a popular algorithm that has been widely

employed; however, it has the following disadvantages:

(i) the algorithm does not always nd the optimal solu-

tion for the cluster centers;

(ii)thenumberofclustersmustbeknownaprioriand

one must assume that this number is the optimum;

(iii) the algorithm assumes that the covariance of the

datasetisirrelevantorthatithasbeennormalized

already.

ere are several options for overcoming these limita-

tions. For the rst one, it is possible to execute the algorithm

several times and obtain the solution that has less variance.

For the second one, it is possible to start with a low value

of and increment the values of until an adequate result

is obtained. e third limitation can be easily overcome by

multiplying the data with the inverse ofthe covariance matrix.

Many variations have been proposed to Lloyd’s basic

-Means algorithm [], which has a computational upper

bound cost of (),whereis the number of input points

and is the number of desired clusters. Some algorithms

modify the initial cluster assignments to improve the separa-

tions and reduce the number of iterations. Others introduce

so or multinomial clustering assignments using fuzzy logic,

probabilistic, or the Bayesian techniques. However, most of

thepreviousvariationsstillmustperformseveraliterations

through the data space to converge to a reasonable solution.

is issue becomes a major disadvantage in several real-

time applications. A new approach that is based on having

a large (but still aordable) number of cluster candidates

compared to the desired clusters is currently gaining

attention. e idea behind this computational model is that

the algorithm builds a good sketch of the original data while

reducing the dimensionality of the input space signicantly.

In this manner, a weighted -Meanscanbeappliedtothe

large candidate clusters to derive a good clustering of the

original data. Using this idea, [] presented an ecient

and scalable -Means algorithm that is based on random

projections. is algorithm requires only one pass through

the input data to build the clusters. More specically, if the

input data distribution holds some separability requirements,

then the number of required candidate clusters grows only

according to (log ),whereisthenumberofobservations

in the original data. is salient feature makes the algorithm

scalable in terms of both the memory and computational

requirements.

3.2. Probabilistic Data Association. e probabilistic data

association (PDA) algorithm was proposed by Bar-Shalom

and Tse [] and is also known as the modied lter of all

neighbors. is algorithm assigns an association probability

to each hypothesis from a valid measurement of a target.

A valid measurement refers to the observation that falls in

the validation gate of the target at that time instant. e

validation gate, , which is the center around the predicted

measurements of the target, is used to select the set of basic

measurements and is dened as

≥(()−

(|−1))−1 ()(()−(|−1)),()

where is the temporal index, ()is the covariance gain,

and determines the gating or window size. e set of valid

measurements at time instant is dened as

()=(), =1,...,,()

e Scientic World Journal

where ()is the -measurement in the validation region at

time instant .WegivethestandardequationsofthePDA

algorithm next. For the state prediction, consider

(|−1)=(−1)

(−1|−1),()

where (−1)is the transition matrix at time instant −1.

To calculate the measurement prediction, consider

(|−1)=()

(|−1),()

where () is the linearization measurement matrix. To

computethegainortheinnovationofthe-measurement,

consider

V()=()−

(|−1).()

To calculate the covariance prediction, consider

(|−1)=(−1)

(−1|−1)(−1)+(),

()

where ()is the process noise covariance matrix. To com-

pute the innovation covariance ()andtheKalmangain()

()=()

(|−1)()+,

()=

(|−1)()()−1.()

To obtain the covariance update in the case in which the mea-

surements originated by the target are known, consider

0(|)=

(|−1)−()()().()

e total update of the covariance is computed as

V()=𝑘

=1()V(),

()=()𝑘

=1 ()V()V()−V()V()(),

()

where is the number of valid measurements in the instant

.eequationtoupdatetheestimatedstate,whichisformed

by the position and velocity, is given by

(|)=

(|−1)+()V().()

Finally, the association probabilities of PDA are as follows:

()=()

∑𝑘

=0 (),()

where

()=

(2Π)/2()1−

if =0

exp −1

2V()−1 ()V()if =0

0in other cases,

()

where is the dimension of the measurement vector, is the

density of the clutter environment, is the detection prob-

ability of the correct measurement, and is the validation

probability of a detected value.

In the PDA algorithm, the state estimation of the target is

computed as a weighted sum of the estimated state under all

of the hypotheses. e algorithm can associate dierent mea-

surements to one specic target. us, the association of the

dierent measurements to a specic target helps PDA to

estimate the target state, and the association probabilities

areusedasweights.emaindisadvantagesofthePDA

algorithm are the following:

(i) loss of tracks: because PDA ignores the interference

with other targets, it sometimes could wrongly clas-

sify the closest tracks. erefore, it provides a poor

performance when the targets are close to each other

or crossed;

(ii) the suboptimal Bayesian approximation: when the

source of information is uncertain, PDA is the sub-

optimal Bayesian approximation to the association

problem;

(iii) one target: PDA was initially designed for the asso-

ciation of one target in a low-cluttered environment.

e number of false alarms is typically modeled with

the Poisson distribution, and they are assumed to be

distributed uniformly in space. PDA behaves incor-

rectly when there are multiple targets because the false

alarm model does not work well;

(iv) track management: because PDA assumes that the

track is already established, algorithms must be pro-

vided for track initialization and track deletion.

PDA is mainly good for tracking targets that do not

make abrupt changes in their movement patterns. PDA will

mostlikelylosethetargetifitmakesabruptchangesinits

movement patterns.

3.3. Joint Probabilistic Data Association. Joint probabilistic

data association (JPDA) is a suboptimal approach for tracking

multiple targets in cluttered environments []. JPDA is

similar to PDA, with the dierence that the association

probabilities are computed using all of the observations

andallofthetargets.us,incontrasttoPDA,JPDA

considers various hypotheses together and combines them.

JPDA determines the probability

()that measurement is

originated from target , accounting for the fact that under

this hypothesis, the measurement cannot be generated by

othertargets.erefore,foraknownnumberoftargets,it

evaluates the dierent options of the measurement-target

association (for the most recent set of measurements) and

combines them into the corresponding state estimation. If

the association probability is known, then the Kalman lter

updating equation of the track can be written as

(|)=

(|−1)+()V(),()

where

( | )and

(|−1)are the estimation and

prediction of target ,and()istheltergain.eweighted

e Scientic World Journal

sum of the residuals associated with the observation ()of

target is as follows:

V()=()

=1

()V

(),()

where V

=

()−( | −1). erefore, this method

incorporates all of the observations (inside the neighborhood

of the target’s predicted position) to update the estimated

position by using a posterior probability that is a weighted

sumofresiduals.

e main restrictions of JPDA are the following:

(i) a measurement cannot come from more than one

target;

(ii) two measurements cannot be originated by the same

target (at one time instant);

(iii) the sum of all of the measurements’ probabilities that

areassignedtoonetargetmustbe:∑()

=0

()=1.

e main disadvantages of JPDA are the following:

(i) it requires an explicit mechanism for track initial-

ization. Similar to PDA, JPDA cannot initialize new

tracks or remove tracks that are out of the observation

area;

(ii) JPDA is a computationally expensive algorithm when

itisappliedinenvironmentsthathavemultipletargets

because the number of hypotheses is incremented

exponentially with the number of targets.

In general, JPDA is more appropriate than MHT in

situations in which the density of false measurements is high

(i.e., sonar applications).

3.4. Multiple Hypothesis Test. e underlying idea of the

multiple hypothesis test (MHT) is based on using more than

two consecutive observations to make an association with

better results. Other algorithms that use only two consecutive

observations have a higher probability of generating an error.

In contrast to PDA and JPDA, MHT estimates all of the

possible hypotheses and maintains new hypotheses in each

iteration.

MHT was developed to track multiple targets in cluttered

environments; as a result, it combines the data association

problem and tracking into a unied framework, becoming

an estimation technique as well. e Bayes rule or the

Bayesian networks are commonly employed to calculate the

MHT hypothesis. In general, researchers have claimed that

MHT outperforms JPDA for the lower densities of false

positives. However, the main disadvantage of MHT is the

computational cost when the number of tracks or false

positivesisincremented.Pruningthehypothesistreeusing

a window could solve this limitation.

e Reid [] tracking algorithm is considered the stan-

dard MHT algorithm, but the initial integer programming

formulation of the problem is due to Moreeld []. MHT is

an iterative algorithm in which each iteration starts with a set

of correspondence hypotheses. Each hypothesis is a collec-

tion of disjoint tracks, and the prediction of the target in the

next time instant is computed for each hypothesis. Next, the

predictions are compared with the new observations by using

a distance metric. e set of associations established in each

hypothesis (based on a distance) introduces new hypotheses

in the next iteration. Each new hypothesis represents a new

setoftracksthatisbasedonthecurrentobservations.

Note that each new measurement could come from (i) a

new target in the visual eld of view, (ii) a target being tracked,

or (iii) noise in the measurement process. It is also possible

that a measurement is not assigned to a target because the

target disappears, or because it is not possible to obtain a

target measurement at that time instant.

MHT maintains several correspondence hypotheses for

each target in each frame. If the hypothesis in the instant

is represented by () = [(), = 1,...,],then

the probability of the hypothesis ()could be represented

recursively using the Bayes rule as follows:

()|()=(−1),()|()

=1

()|(−1),()

∗()|(−1)∗(−1),

()

where (−1)is the hypothesis of the complete set until

thetimeinstant−1;()is the th possible association of the

track to the object; ()is the set of detections of the current

frame, and is a normal constant.

e rst term on the right side of the previous equation

is the likelihood function of the measurement set ()given

thejointlikelihoodandcurrenthypothesis.esecondterm

is the probability of the association hypothesis of the current

data given the previous hypothesis (−1).ethirdterm

is the probability of the previous hypothesis from which the

current hypothesis is calculated.

e MHT algorithm has the ability to detect a new

track while maintaining the hypothesis tree structure. e

probability of a true track is given by the Bayes decision model

as

(|)=(|)∗()

(),()

where ( | ) is the probability of obtaining the set of

measurements given ,()is the a priori probability of

the source signal, and ()is the probability of obtaining the

set of detections .

MHT considers all of the possibilities, including both

the track maintenance and the initialization and removal

of tracks in an integrated framework. MHT calculates the

possibility of having an object aer the generation of a set

of measurements using an exhaustive approach, and the

algorithm does not assume a xed number of targets. e key

challengeofMHTistheeectivehypothesismanagement.

e baseline MHT algorithm can be extended as follows:

(i) use the hypothesis aggregation for missed targets births,

e Scientic World Journal

cardinality tracking, and closely spaced objects; (ii) apply

a multistage MHT for improving the performance and

robustness in challenging settings; and (iii) use a feature-

aided MHT for extended object surveillance.

e main disadvantage of this algorithm is the compu-

tational cost, which grows exponentially with the number of

tracks and measurements. erefore, the practical implemen-

tation of this algorithm is limited because it is exponential in

both time and memory.

With the aim of reducing the computational cost, []

presented a probabilistic MHT algorithm in which the

associations are considered to be random variables that

are statistically independent and in which performing an

exhaustive search enumeration is avoided. is algorithm is

known as PMHT. e PMHT algorithm assumes that the

number of targets and measurements is known. With the

same goal of reducing the computational cost, []presented

an ecient implementation of the MHT algorithm. is

implementation was the rst version to be applied to perform

tracking in visual environments. ey employed the Murty

[] algorithm to determine the best set of hypotheses

in polynomial time, with the goal of tracking the points of

interest.

MHT typically performs the tracking process by employ-

ing only one characteristic, commonly the position. e

Bayesian combination to use multiple characteristics was

proposed by Liggins II et al. [].

A linear-programming-based relaxation approach to the

optimization problem in MHT tracking was proposed inde-

pendently by Coraluppi et al. []andStormsandSpieksma

[]. Joo and Chellappa []proposedanassociationalgo-

rithm for tracking multiple targets in visual environments.

eir algorithm is based on in MHT modication in which

a measurement can be associated with more than one target,

and several targets can be associated with one measurement.

ey also proposed a combinatorial optimization algorithm

to generate the best set of association hypotheses. eir

algorithmalwaysndsthebesthypothesis,incontrastto

other models, which are approximate. Coraluppi and Carthel

[] presented a generalization of the MHT algorithm using

a recursion over hypothesis classes rather than over a single

hypothesis. is work has been applied in a special case of

themulti-targettrackingproblem,calledcardinalitytracking,

in which they observed the number of sensor measurements

instead of the target states.

3.5. Distributed Joint Probabilistic Data Association. e dis-

tributed version of the joint probabilistic data association

(JPDA-D) was presented by Chang et al. []. In this tech-

nique, the estimated state of the target (using two sensors)

aer being associated is given by

|1,2=1

=0

2

=0|1

,2

,1,2

∗1

,2

|1,2, ()

where ,=1,2, is the last set of measurements of

sensor and , ,=1,2,isthesetofaccumulativedata,

and is the association hypothesis. e rst term of the right

side of the equation is calculated from the associations that

were made earlier. e second term is computed from the

individual association probabilities as follows:

1

,2

|1,2=

1

2

=1,2|1,2

1

1

2

2,

1,2|1,2=1

1|12|21,2,

()

where are the joint hypotheses involving all of the

measurements and all of the objectives, and

()are the

binary indicators of the measurement-target association. e

additional term (1,2)depends on the correlation of the

individual hypothesis and reects the localization inuence

of the current measurements in the joint hypotheses.

ese equations are obtained assuming that commu-

nication exists aer every observation, and there are only

approximations in the case in which communication is

sporadic and when a substantial amount of noise occurs.

erefore, this algorithm is a theoretical model that has some

limitations in practical applications.

3.6. Distributed Multiple Hypothesis Test. e distributed

version of the MHT algorithm (MHT-D) [,] follows a

similar structure as the JPDA-D algorithm. Let us assume the

case in which one node must fuse two sets of hypotheses and

tracks. If the hypotheses and track sets are represented by

()and ()with =1,2,thehypothesisprobabilities

are represented by

; and the state distribution of the tracks

(

)isrepresentedby(

)and ( | ,

);then,the

maximum available information in the fusion node is =

1∪

2.edatafusionobjectiveoftheMHT-Disto

obtain the set of hypotheses (),thesetoftracks(),the

hypothesis probabilities ( | ), and the state distribution

(|,)for the observed data.

e MHT-D algorithm is composed of the following

steps:

() hypothesis formation: for each hypothesis pair 1

and

2

,whichcouldbefused,atrackis formed by

associating the pair of tracks 1

and 2

,whereeach

pair comes from one node and could originate from

thesametarget.enalresultofthisstageisaset

of hypotheses denoted by ()and the fused tracks

();

() hypothesis evaluation: in this stage, the association

probability of each hypothesis and the estimated

state of each fused track are obtained. e dis-

tributed estimation algorithm is employed to calcu-

late the likelihood of the possible associations and

the obtained estimations at each specic association.

e Scientic World Journal

Using the information model, the probability of each

fused hypothesis is given by

(|)=−1

∈() |()()

∈(|),()

where is a normalizing constant, and ( | ) is the

likelihood of each hypothesis pair.

e main disadvantage of the MHT-D is the high com-

putational cost that is in the order of (),whereis the

number of possible associations and is the number of

variables to be estimated.

3.7. Graphical Models. Graphical models are a formalism for

representing and reasoning with probabilities and indepen-

dence. A graphical model represents a conditional decom-

position of the joint probability. A graphical model can be

represented as a graph in which the nodes denote random

variables; the edges denote the possible dependence between

the random variables, and the plates denote the replication of

a substructure, with the appropriate indexing of the relevant

variables. e graph captures the joint distribution over the

random variables, which can be decomposed into a product

of factors that each depend on only a subset of variables. ere

are two major classes of graphical models: (i) the Bayesian

networks [], which are also known as the directed graphical

models, and (ii) the Markov random elds, which are also

known as undirected graphical models. e directed graph-

ical models are useful for expressing causal relationships

between random variables, whereas undirected models are

better suited for expressing so constraints between random

variables. We refer the reader to the book of Koller and

Friedman [] for more information on graphical models.

A framework based on graphical models can solve the

problem of distributed data association in synchronized

sensor networks with overlapped areas and where each sensor

receives noisy measurements; this solution was proposed

by Chen et al. [,]. eir work is based on graphical

models that are used to represent the statistical dependence

between random variables. e data association problem is

treated as an inference problem and solved by using the

max-product algorithm []. Graphical models represent

statistical dependencies between variables as graphs, and

the max-product algorithm converges when the graph is

a tree structure. Moreover, the employed algorithm could

be implemented in a distributed manner by exchanging

messages between the source nodes in parallel. With this

algorithm, if each sensor has possible combinations of

associations and there are variables to be estimated, it has

a complexity of (2), which is reasonable and less than

the ()complexity of the MHT-D algorithm. However,

aspecial attention must be given to the correlated variables

when building the graphical model.

4. State Estimation Methods

State estimation techniques aim to determine the state of

the target under movement (typically the position) given

the observation or measurements. State estimation tech-

niques are also known as tracking techniques. In their general

form, it is not guaranteed that the target observations are

relevant, which means that some of the observations could

actually come from the target and others could be only noise.

e state estimation phase is a common stage in data fusion

algorithms because the target’s observation could come from

dierent sensors or sources, and the nal goal is to obtain a

global target state from the observations.

e estimation problem involves nding the values of the

vector state (e.g., position, velocity, and size) that ts as much

as possible with the observed data. From a mathematical

perspective, we have a set of redundant observations, and

the goal is to nd the set of parameters that provides the

best t to the observed data. In general, these observations

are corrupted by errors and the propagation of noise in the

measurement process. State estimation methods fall under

level of the JDL classication and could be divided into two

broader groups:

() linear dynamics and measurements: here, the esti-

mation problem has a standard solution. Specically,

when the equations of the object state and the mea-

surements are linear, the noise follows the Gaussian

distribution, and we do not refer to it as a clutter

environment; in this case, the optimal theoretical

solution is based on the Kalman lter;

() nonlinear dynamics: the state estimation problem

becomes dicult, and there is not an analytical solu-

tion to solve the problem in a general manner. In prin-

ciple, there are no practical algorithms available to

solve this problem satisfactorily.

Most of the state estimation methods are based on control

theory and employ the laws of probability to compute a

vector state from a vector measurement or a stream of vector

measurements. Next, the most common estimation methods

are presented, including maximum likelihood and maxi-

mum posterior (Section .), the Kalman lter (Section .),

particle lter (Section .), the distributed Kalman lter

(Section .),distributedparticlelter(Section .)and,

covariance consistency methods (Section .).

4.1. Maximum Likelihood and Maximum Posterior. e max-

imum likelihood (ML) technique is an estimation method

that is based on probabilistic theory. Probabilistic estimation

methods are appropriate when the state variable follows an

unknown probability distribution []. In the context of

data fusion, is the state that is being estimated, and =

((1),...,()) is a sequence of previous observations of

. e likelihood function () is dened as a probability

density function of the sequence of observations given the

true value of the state . Consider

()=(|).()

e ML estimator nds the value of that maximizes the

likelihood function:

()=arg max

(|),()

e Scientic World Journal

which can be obtained from the analytical or empirical

models of the sensors. is function expresses the probability

of the observed data. e main disadvantage of this method

in practice is that it requires the analytical or empirical model

of the sensor to be known to provide the prior distribution

and compute the likelihood function. is method can also

systematically underestimate the variance of the distribution,

which leads to a bias problem. However, the bias of the ML

solution becomes less signicant as the number of data

points increases and is equal to the true variance of the

distribution that generated the data at the limit →∞.

e maximum posterior (MAP) method is based on the

Bayesian theory. It is employed when the parameter to

be estimated is the output of a random variable that has a

known probability density function (). In the context of

data fusion, is the state that is being estimated and =

((1),...,())is a sequence of previous observations of .

e MAP estimator nds the value of that maximizes the

posterior probability distribution as follows:

()=arg max

(|).()

Both methods (ML and MAP) aim to nd the most likely

valueforthestate.However,MLassumesthatis a xed

butanunknownpointfromtheparameterspace,whereas

MAP considers to be the output of a random variable with

a known a priori probability density function. Both of these

methods are equivalent when there is no a priori information

about , that is, when there are only observations.

4.2. e Kalman Filter. e Kalman lter is the most popular

estimation technique. It was originally proposed by Kalman

[] and has been widely studied and applied since then. e

Kalman lter estimates the state of a discrete time process

governed by the following space-time model:

(+1)=Φ()()+()()+()()

with the observations or measurements at time of the state

represented by ()=()()+V(),()

where Φ()is the state transition matrix, ()is the input

matrix transition, () is the input vector, () is the

measurement matrix, and and Vare the random Gaussian

variables with zero mean and covariance matrices of ()

and (), respectively. Based on the measurements and on

the system parameters, the estimation of (),whichis

represented by

(),andthepredictionof(+ 1),which

is represented by

(+1|), are given by the following:

()=

(|+1)+()[()−()

(|−1)],

(+1|)=Φ()

(|)+()(),()

respectively, where is the lter gain determined by

()=(|−1)()

×()(|−1)()+()−1,()

where ( | −1)is the prediction covariance matrix and

can be determined by

(+1|)=Φ()()Φ()+()()

with

()=(|−1)−()()(|−1).()

e Kalman lter is mainly employed to fuse low-level

data. If the system could be described as a linear model and

theerrorcouldbemodeledastheGaussiannoise,thenthe

recursive Kalman lter obtains optimal statistical estimations

[]. However, other methods are required to address nonlin-

ear dynamic models and nonlinear measurements. e modi-

ed Kalman lter known as the extended Kalman lter (EKF)

is an optimal approach for implementing nonlinear recursive

lters []. e EKF is one of the most oen employed

methods for fusing data in robotic applications. However,

it has some disadvantages because the computations of the

Jacobians are extremely expensive. Some attempts have been

made to reduce the computational cost, such as linearization,

but these attempts introduce errors in the lter and make it

unstable.

e unscented Kalman lter (UKF) []hasgained

popularity, because it does not have the linearization step and

the associated errors of the EKF []. e UKF employs a

deterministic sampling strategy to establish the minimum set

of points around the mean. is set of points captures the

true mean and covariance completely. en, these points are

propagated through nonlinear functions, and the covariance

of the estimations can be recuperated. Another advantage of

the UKF is its ability to be employed in parallel implementa-

tions.

4.3. Particle Filter. Particle lters are recursive implemen-

tations of the sequential Monte Carlo methods []. is

method builds the posterior density function using several

random samples called particles. Particles are propagated

over time with a combination of sampling and resampling

steps. At each iteration, the sampling step is employed to

discard some particles, increasing the relevance of regions

with a higher posterior probability. In the ltering process,

several particles of the same state variable are employed,

and each particle has an associated weight that indicates

the quality of the particle. erefore, the estimation is the

result of a weighted sum of all of the particles. e standard

particle lter algorithm has two phases: () the predicting

phase and () the updating phase. In the predicting phase,

each particle is modied according to the existing model

and accounts for the sum of the random noise to simulate

the noise eect. en, in the updating phase, the weight of

eachparticleisreevaluatedusingthelastavailablesensor

observation, and particles with lower weights are removed.

Specically, a generic particle lter comprises the following

steps.

e Scientic World Journal

() Initialization of the particles:

(i) let be equal to the number of particles;

(ii) ()(1)=[(1),(1),0,0]for =1,...,.

() Prediction step:

(i) for each particle =1,...,, evaluate the state

(+1| )of the system using the state at time

instant with the noise of the system at time .

Consider

() (+1|)=()

() ()

+cauchy-distribution-noise(),()

where ()is the transition matrix of the sys-

tem.

() Evaluate the particle weight. For each particle =

1,...,:

(i) compute the predicted observation state of the

system using the current predicted state and the

noise at instant . Consider

() (+1|)=(+1)

() (+1|)

+gaussian-measurement-noise(+1);

()

(ii) compute the likelihood (weights) according to

the given distribution. Consider

likelihood() =

() (+1|);() (+1),var;()

(iii) normalize the weights as follows

() =likelihood()

∑

=1 likelihood() .()

() Resampling/Selection: multiply particles with higher

weights and remove those with lower weights. e

current state must be adjusted using the computed

weightsofthenewparticles.

(i) Compute the cumulative weights. Consider

Cum Wt() =

=1

().()

(ii) Generate uniform distributed random variables

from () ∼ (0,1)with the number of steps

equal to the number of particles.

(iii) Determine which particles should be multiplied

and which ones removed.

() Propagation phase:

(i) incorporate the new values of the state aer the

resampling of instant to calculate the value at

instant +1. Consider

(1:) (+1|+1)=

(+1|);()

(ii) compute the posterior mean. Consider

(+1)=mean (+1|+1), =1,...,; ()

(iii) repeat steps to for each time instant.

Particle lters are more exible than the Kalman lters

and can cope with nonlinear dependencies and non-Gaussian

densities in the dynamic model and in the noise error.

However, they have some disadvantages. A large number

of particles are required to obtain a small variance in the

estimator. It is also dicult to establish the optimal number of

particles in advance, and the number of particles aects the

computational cost signicantly. Earlier versions of particle

lters employed a xed number of particles, but recent studies

have started to use a dynamic number of particles [].

4.4. e Distributed Kalman Filter. e distributed Kalman

lter requires a correct clock synchronization between each

source, as demonstrated in []. In other words, to correctly

use the distributed Kalman lter, the clocks from all of

the sources must be synchronized. is synchronization is

typically achieved through using protocols that employ a

shared global clock, such as the network time protocol (NTP).

Synchronization problems between clocks have been shown

tohaveaneectontheaccuracyoftheKalmanlter,

producing inaccurate estimations [].

Iftheestimationsareconsistentandthecrosscovariance

is known (or the estimations are uncorrelated), then it is

possible to use the distributed Kalman lters []. However,

the cross covariance must be determined exactly, or the

observations must be consistent.

We refer the reader to Liggins II et al. [] for more details

about the Kalman lter in a distributed and hierarchical

architecture.

4.5. Distributed Particle Filter. Distributed particle lters

have gained attention recently [–]. Coates []useda

distributed particle lter to monitor an environment that

couldbecapturedbytheMarkovianstate-spacemodel,

involving nonlinear dynamics and observations and non-

Gaussian noise.

In contrast, earlier attempts to solve out-of-sequence

measurements using particle lters are based on regenerating

the probability density function to the time instant of the

out-of-sequence measurement []. In a particle lter, this

step requires a large computational cost, in addition to the

necessary space to store the previous particles. To avoid

this problem, Orton and Marrs [] proposed to store the

information on the particles at each time instant, saving the

cost of recalculating this information. is technique is close

e Scientic World Journal

to optimal, and when the delay increases, the result is only

slightly aected []. However, it requires a very large amount

of space to store the state of the particles at each time instant.

4.6. Covariance Consistency Methods: Covariance Intersec-

tion/Union. Covariance consistency methods (intersection

and union) were proposed by Uhlmann [] and are general

and fault-tolerant frameworks for maintaining covariance

means and estimations in a distributed network.ese meth-

ods do not comprise estimation techniques; instead, they are

similar to an estimation fusion technique. e distributed

Kalman lter requirement of independent measurements or

known cross-covariances is not a constraint with this method.

4.6.1. Covariance Intersection. If the Kalman lter is employ-

ed to combine two estimations, (1,1)and (2,2),thenit

is assumed that the joint covariance is in the following form:

1

2

,()

where the cross-covariance should be known exactly so

that the Kalman lter can be applied without diculty.

Because the computation of the cross-covariances is compu-

tationally intensive, Uhlmann [] proposed the covariance

intersection (CI) algorithm.

Let us assume that a joint covariance can be dened

with the diagonal blocks 1>1and 2>2. Consider

1

2

()

for every possible instance of the unknown cross-covariance

; then, the components of the matrix could be employed

in the Kalman lter equations to provide a fused estimation

(,) that is considered consistent. e key point of this

method relies on generating a joint covariance matrix that

can represent a useful fused estimation (in this context, useful

refers to something with a lower associated uncertainty). In

summary, the CI algorithm computes the joint covariance

matrix , where the Kalman lter provides the best fused

estimation (,)with respect to a xed measurement of the

covariance matrix (i.e., the minimum determinant).

Specic covariance criteria must be established because

there is not a specic minimum joint covariance in the

order of the positive semidenite matrices. Moreover, the

joint covariance is the basis of the formal analysis of the

CI algorithm; the actual result is a nonlinear mixture of the

information stored on the estimations being fused, following

the following equation.

=1

1−1

11+2

2−1

22+⋅⋅⋅+

−1

−1,

=1

1−1

11+2

2−1

22+⋅⋅⋅+

−1

−1,

()

where is the transformation of the fused state-space

estimation to the space of the estimated state .evalues

of canbecalculatedtominimizethecovariancedetermi-

nant using convex optimization packages and semipositive

matrix programming. e result of the CI algorithm has

dierent characteristics compared to the Kalman lter. For

example, if two estimations are provided (,) and (,)

and their covariances are equal =,sincetheKalman

lter is based on the statistical independence assumption, it

produces a fused estimation with covariance = (1/2).

In contrast, the CI method does not assume independence

and, thus, must be consistent even in the case in which

the estimations are completely correlated, with the estimated

fused covariance =. In the case of estimations where

<, the CI algorithm does not provide information about

the estimation (,);thus,thefusedresultis(,).

Every joint-consistent covariance is sucient to produce

a fused estimation, which guarantees consistency. However,

it is also necessary to guarantee a lack of divergence. Diver-

gence is avoided in the CI algorithm by choosing a specic

measurement (i.e., the determinant), which is minimized in

each fusion operation. is measurement represents a non-

divergence criterion, because the size of the estimated covari-

ance according to this criterion would not be incremented.

e application of the CI method guarantees consis-

tency and nondivergence for every sequence of mean and

covariance-consistent estimations. However, this method

does not work well when the measurements to be fused are

inconsistent.

4.6.2. Covariance Union. CI solves the problem of correlated

inputs but not the problem of inconsistent inputs (inconsistent

inputs refer to dierent estimations, each of which has a

high accuracy (small variance) but also a large dierence

from the states of the others); thus, the covariance union

(CU) algorithm was proposed to solve the latter []. CU

addresses the following problem: two estimations (1,1)

and (2,2)relate to the state of an object and are mutually

inconsistent from one another. is issue arises when the

dierence between the average estimations is larger than

the provided covariance. Inconsistent inputs can be detected

using the Mahalanobis distance [] between them, which is

dened as

=1−21+2−1 1−2, ()

and detecting whether this distance is larger than a given

threshold.

e Mahalanobis distance accounts for the covariance

information to obtain the distance. If the dierence between

the estimations is high but their covariance is also high,

the Mahalanobis distance yields a small value. In contrast,

if the dierence between the