ArticlePDF Available

Abstract and Figures

The problem of construction of a surrogate model based on available lowand high-fidelity data is considered. The low-fidelity data can be obtained, e.g., by performing the computer simulation and the high-fidelity data can be obtained by performing experiments in a wind tunnel. A regression model based on Gaussian processes proves to be convenient for modeling variable-fidelity data. Using this model, one can efficiently reconstruct nonlinear dependences and estimate the prediction accuracy at a specified point. However, if the sample size exceeds several thousand points, direct use of the Gaussian process regression becomes impossible due to a high computational complexity of the algorithm. We develop new algorithms for processing multifidelity data based on Gaussian process model, which are efficient even for large samples. We illustrate application of the developed algorithms by constructing surrogate models of a complex engineering system.
Content may be subject to copyright.
ISSN 1064!2269, Journal of Communications Technology and Electronics, 2015, Vol. 60, No. 12, pp. 1348–1355. © Pleiades Publishing, Inc., 2015.
Original Russian Text © E.V. Burnaev, A.A. Zaytsev, 2015, published in Informatsionnye Protsessy, 2015, Vol. 15, No. 1, pp. 97–109.
1348
1. INTRODUCTION
Ve r y o f t e n i n e n g i n e e r i n g p r a c t i c e w e h a v e a s a m p l e
of high fidelity function values and additionally a
larger sample of low fidelity function values (being
some approximation of the high fidelity function) [1].
Therefore the problem of multifidelity data modeling
arises: we have to construct a surrogate model of the
high fidelity function using both of these samples.
For example, the high fidelity function represents
an aircraft wing lift coefficient, measured in a wind
tunnel, and the low fidelity function represents an air!
craft wing lift coefficient, calculated using computer
simulations. Experiments in the wind tunnel allow one
to obtain exact lift coefficient values, but these exper!
iments are time and resource intensive in contrast to
virtual computer simulation experiments. Therefore,
when constructing a surrogate model of the lift coeffi!
cient dependence on the aircraft wing geometrical
parameters, we have to use a small sample of high
fidelity function values, obtained from the wind tunnel
experiments, and a large sample of a less accurate
computer simulation data.
For modeling of the multifidelity data, the Gauss!
ian process regression model [1–3] proves to be effi!
cient. Using this model one can efficiently reconstruct
nonlinear dependences and evaluate their prediction
accuracy at specified points [4–6].
The maximal sample size, which can be used for
Gaussian process regression construction, is limited
by several thousands of points, since when estimating
parameters it is necessary to invert a sample covariance
matrix [7]. Therefore, if there is a large sample of equal
fidelity data, a special approximation of the initial
covariance matrix is used to construct the Gaussian
process regression [7–9] (Nystrom approximation
[10]). This approximation is based on some subset of
base points allowing one to substantially reduce the
computational complexity. However, up to now, there
are no approaches for construction of the Gaussian
process regression using large (more than several thou!
sands of points) multifidelity data samples. At the
same time large multifidelity data samples often arise
in engineering practice. For example, since the com!
putational cost of the low fidelity function is usually
very low compared to the computational cost of the
high fidelity function [4], then the low fidelity data
sample can have a very large size.
In this paper we develop an approach to Gaussian
process regression for large multifidelity data samples.
The basic idea is to use a subset of the initial multifi!
delity sample for approximation of the sample covari!
Surrogate Modeling of Multifidelity Data for Large Samples
E. V. Burnaev
a
,
b
,
c
and A. A. Zaytsev
a
,
b
a
Kharkevich Institute for Information Transmission Problems (IITP), Russian Academy of Sciences,
Bol’shoi Karetnyi per. 19, str. 1, Moscow 127051 Russia
b
OOO Datadvans, Pokrovskii bul’v. 3, str. 1B, Moscow, 109028 Russia
c
Laboratory of Predictive Modeling and Data Analysis, Moscow Institute of Physics and Technology (State University),
Institutskii per. 9, Dolgoprudnyi, Moscow oblast, 141700 Russia
e!mail: burnaev@iitp.ru
Received March 27, 2015
Abstract
—The problem of construction of a surrogate model based on available low! and high!fidelity data
is considered. The low!fidelity data can be obtained, e.g., by performing the computer simulation and the
high!fidelity data can be obtained by performing experiments in a wind tunnel. A regression model based on
Gaussian processes proves to be convenient for modeling variable!fidelity data. Using this model, one can
efficiently reconstruct nonlinear dependences and estimate the prediction accuracy at a specified point.
However, if the sample size exceeds several thousand points, direct use of the Gaussian process regression
becomes impossible due to a high computational complexity of the algorithm. We develop new algorithms for
processing multifidelity data based on Gaussian process model, which are efficient even for large samples. We
illustrate application of the developed algorithms by constructing surrogate models of a complex engineering
system.
Keywords
: multifidelity data, uncertainty estimate, Gaussian processes, covariance matrix approximation,
cokriging
DOI:
10.1134/S1064226915120037
MATHEMATICAL MODELS
AND COMPUTATIONAL METHODS
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1349
ance matrix [9]. We obtain estimates of a posterior
regression mean, used as a prediction of the high fidel!
ity function value, of a posterior variance, used for the
prediction uncertainty quantification, and of compu!
tational complexities of the proposed algorithms.
This paper contains the following sections: the
Gaussian process regression is described in Section 2;
Section 3 shows how one should use the Gaussian pro!
cess regression for modeling multifidelity data;
the proposed algorithm for constructing the
surrogate model based on the large multifidelity data
samples is described in Section 4; Section 5 contains
the results of experiments with artificial and real data;
and conclusions are presented in Section 6.
2. REGRESSION BASED
ON GAUSSIAN PROCESSES
Let us consider the learning sample
D
= (
X
,
y
) =
where points
x
!
"
d
and func!
tion values
y
(
x
)
"
. We assume that
y
(
x
) =
f
(
x
) +
ε
,
where function
f
(
x
) is a realization of the Gaussian
process, and
ε
is a Gaussian white noise with variance
σ
2
. It is necessary to construct the surrogate model for
the objective function
f
(
x
).
The mean value and the covariance matrix of the
Gaussian process
k
(
x
,
x
') = cov(
f
(
x
),
f
(
x
'))
=
#
(
f
(
x
) –
#
(
f
(
x
)))(
f
(
x
') –
#
(
f
(
x
')))
completely determine Gaussian process
f
(
x
). To sim!
plify notations we assume that its mean value is equal
to zero. We assume that the covariance function belongs
to the parametric family {
k
θ
(
x
,
x
'),
θ
Θ
"
p
}, i.e.,
k
(
x
,
x
') =
k
θ
(
x
,
x
') for some
θ
Θ
. Then
y
(
x
) will be
the Gaussian process with zero mean and the covari!
ance function cov(
y
(
x
),
y
(
x
')) =
k
θ
(
x
,
x
') +
σ
2
δ
(
x
x
'),
where
δ
(
x
x
') is the delta function. A widely used
class of covariance functions is, e.g., the quadratic
exponential covariance function [3]
k
θ
(
x
,
x
') =
Parameters of covariance function
θ
and noise vari!
ance
σ
2
specify the regression model. We use the max!
imum!likelihood method for evaluation of parameters
θ
and
σ
2
[3]:
(1)
where
K
= is the matrix
of covariances between function values
y
(
X
) from the
learning sample and |
K
| is the determinant of the
matrix
K
. In the Gaussian process regression,
σ
2
plays
xiyi = xi
(),{}
i1=
n,
θ0
2θk
2xkxk
'
()
2
k1=
d
().exp
pyXθσ2
,,()log 1
2
!! n2πlog(=
+
Klog yTK1y+),
θσ2
,
max
kθxixj
,()σ
2δxixj
()+{}
ij,1=
n
the role of the regularization parameter for the covari!
ance matrix of the values of
f
(
X
).
Theoretical results obtained in [11] and more applied
studies [12] show that the obtained parameter esti!
mates are accurate even if the size of the sample is
small and the model determined by the covariance
function is incorrectly specified.
Using estimates of parameters
θ
and
σ
2
, it is possi!
ble to calculate the posterior mean and the posterior
variance of
y
(
x
) at some new point, which are used for
predicting the function value and evaluating the pre!
diction uncertainty correspondingly. The posterior
mean
#
(
y
(
X
*)|
y
(
X
)) at new points
X
*= { can
be written as
(2)
where
K
(
X
*,
X
) = are
covariances between
y
(
X
*) at new points and values of
y
(
X
)at the points from the learning sample. The pos!
terior covariance matrix
$
(
X
*) =
#
[(
y
(
X
*)
#
y
(
X
*))
T
(
y
(
X
*) –
#
y
(
X
*))|
y
(
X
)] at new points can be
written:
(3)
where
K
(
X
*,
X
*) =
is the matrix of covariances between values
y
(
X
*).
3. REGRESSION BASED ON GAUSSIAN
PROCESS FOR MULTIFIDELITY DATA
Let us now consider the case of multifidelity data.
Let the sample of low fidelity function values be spec!
ified as
D
l
= (
X
l
,
y
l
) = and the sample of
high fidelity function values be
D
h
= (
X
h
,
y
h
) =
with
"
d
,
y
l
(
x
),
y
h
(
x
)
"
. The
low fidelity function
y
l
(
x
) and high fidelity function
y
h
(
x
) model the same physical phenomenon but with
different accuracy.
Using samples of values of low fidelity function and
high fidelity function, it is necessary to construct as
accurate as possible a surrogate model
of the high fidelity function. In addition, we want to
obtain the prediction uncertainty estimates of high
fidelity function at new points.
A special model is necessary for modeling the multifi!
delity data. We use the widespread cokriging model [4]:
y
l
(
x
) =
f
l
(
x
) +
ε
l
,
y
h
(
x
) =
ρ
y
l
(
x
) +
y
d
(
x
),
where
y
d
(
x
) =
f
d
(
x
) +
ε
d
,
f
l
(
x
), and
f
d
(
x
) are the inde!
pendent Gaussian processes with zero means and
covariance functions
k
l
(
x
,
x
') and
k
d
(
x
,
x
'), respec!
tively, and
ε
l
and
ε
d
is the Gaussian white noise with
kθxixj
,(){}
ij,1=
n
θ
ˆ
xi
*
{}
i1=
n*
kxi
*xj
,(){}
i1n*j,,, 1n,,==
$X*() KX*X*,()KX*X,()=
×K–1 KXX*,()()
T,
kxi
*xj
*
,()σ
2δxi
*xj
*
()+{}
ij,1=
n*
xi
lylxi
l
(),{}
i1=
nl
xi
hyhxi
h
(),{}
i1=
nh
xi
lxi
h
,
y
ˆhx() yhx()
DRAFT
1350
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
variances and respectively. Let us use the nota!
tions
X
=
y
= Then the posterior mean
for high fidelity function values at new points can be
written as
(4)
where
K
l
(
X
a
,
X
b
) and
K
d
(
X
a
,
X
b
) are the matrices of pairwise
covariances of Gaussian processes
y
l
(
x
) and
y
d
(
x
) at
points from some sets
X
a
and
X
b
, respectively. The pos!
terior covariance matrix can be written as:
(5)
To evaluate the parameters of the covariance func!
tions of Gaussian processes
f
l
(
x
) and
f
d
(
x
), the follow!
ing algorithm is used [1]:
1. Estimate parameters of the covariance function
k
l
(
x
,
x
) using the algorithm for the standard Gaussian
process regression, described in Section 2, and the
sample
D
=
D
l
.
2. Calculate values of the posterior mean for
Gaussian process
y
l
(
x
) and
x
X
h
.
3. Estimate parameters of the Gaussian process
y
d
(
x
) with a covariance function
k
d
(
x
,
x
') and the
parameter
ρ
by maximizing likelihood (1) with
D
=
D
diff
= (
X
h
,
y
d
=
y
h
) and
k
(
x
,
x
') =
k
d
(
x
,
x
').
4. SPARSE GAUSSIAN PROCESS REGRESSION
FOR MULTIFIDELITY DATA
The efficient use of the Gaussian process regression
for multifidelity data is possible only for samples with
a size of no more than several thousand points (When
learning the model and using it for predictions, one
should invert an
n
×
n
covariance matrix, where
n
=
n
h
+
n
l
. The computational complexity of this proce!
dure is
O
(
n
3
)).
We p r o po s e a n e w a p p ro a c h f o r G a u s si a n p r o c es s
regression construction using large multifidelity data
samples. The approach is based on the Nystrom
σl
2
σd
2,
Xl
Xh
⎝⎠
⎜⎟
⎛⎞
,
yl
yh
⎝⎠
⎜⎟
⎛⎞
.
y
ˆhX*() KX*X,()K–1y,=
KX*X,() ρKlX*Xl
,()
ρ2KlX*Xh
,()KdX*Xh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
KXX,()
=
KlXlXl
,() ρKlXlXh
,()
ρKlXhXl
,()ρ
2KlXhXh
,()KdXhXh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,
$X*() ρ
2KlX*X*,()KdX*X*,()+=
KX*X,()K–1 KX*X,()()
T.
y
ˆlx()
ρy
ˆlXh
()
approximation of matrices
K
(
X
*,
X
),
K
and
K
(
X
*,
X
*)
using a subsample of base points from the initial sam!
ple. The presented results generalize the results of the
paper [9] for the multifidelity data case.
Let us specify the subsample
D
1
= (
X
1
,
y
1
),
X
1
=
y
1
= of base points from the initial
sample of size
n
1
= , such that for the specified
subsample the available computational resources
allow one to invert the corresponding covariance
matrices and estimate parameters of the Gaussian pro!
cess in a reasonable time. A sufficiently reliable
method of setting the base point subsample consists in
the random selection without repetitions of points
from the initial sample.
Then, using the base point subsample and setting
for new points
X
* = we obtain approxima!
tions of matrices
K
(
X
*,
X
),
K
and
K
(
X
*,
X
*), respec!
tively:
Let us define
where
I
k
is the identity matrix of size
k
,
C
1
=
RK
1
,
and
V
=
V
11
is the Cholesky factor [13] of
matrix
K
11
.
Xl
1
Xh
1
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,
ylXl
1
()
yhXh
1
()
⎝⎠
⎜⎟
⎜⎟
⎛⎞
nh
1nl
1
+
K11 KlXl
1Xl
1
,() ρKlXl
1Xh
1
,()
ρKlXh
1Xl
1
,()ρ
2KlXh
1Xh
1
,()KdXh
1Xh
1
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
K1KlXl
1Xl
,() ρKlXl
1Xh
,()
ρKlXh
1Xl
,()ρ
2KlXh
1Xh
,()KdXh
1Xh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
K1
*ρKlX*Xl
1
,()
ρ2KlX*Xh
1
,()KdX*Xh
1
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
=
xi
*
{}
i1=
n*,
K
ˆX*X,()K1
*K11
–1K1,K
ˆK1
()
TK11
–1K1,==
K
ˆX*X*,()K1
*K11
–1K1K1
*
()
T.=
R
1
σl
!!!!Inl0
01
ρ2σl
2σd
2
+
!!!!!!!!!!!!!!!!!!!!!! Inh
⎝⎠
⎜⎟
⎜⎟
⎜⎟
⎜⎟
⎜⎟
⎛⎞
,=
C1V11
T,
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1351
Statement 1. The Nystrom approximation for the
posterior mean can be written as,
(6)
Proof. Indeed,
When estimating the prediction uncertainty, the
result will depend on the used approximation of cova!
riance matrices. Let us consider three possible variants
of the approximation, which are listed in Table 1.
Statement 2. In case of using variants 1, 2 and 3 of
approximation from Table 1, the Nystrom approxima!
tions for the posterior variance can be written as
(7)
(8)
(9)
Proof. Using the first approximation, we obtain the
following expression for the variance:
The above expression is similar to that obtained in [9].
y
ˆhX*()K1
*V11 In1VTV+()
1VTy,
y
ˆhx*()K1
*K11
–1K1
TK1K11
–1K1
TR2
+()
1y
=
K1
*K11
–1K1
TRRK
1K11
–1K1
TRI
n
+()
1Ry
=
K1
*K11
–1C1
TC1K11
–1C1
TIn
+()
1Ry
=
K1
*K11
–1 C1
TC1K11
–1 In1
+()
1C1
TRy
=
K1
*C1
TC1K11
+()
1C1
TRyK1
*C1
TC1
(=
+
V11
TV11 )1C1
TRyK1
*V11
–1 V11
TC1
TC1V11
–1
(=
+
In1)1V11
TC1
TRyK1
*V11
–1 VTVI
n1
+()
1VTRy.=
σ
ˆ1
2x*() K1
*V11
–1 IV
TV+()
1V11
TK1
*T,=
σ
ˆ2
2x*() kx*x*,()K1
*V11
–1V11
TK1
*T,=
σ
ˆ3
2x*() kx*x*,()=
K1
*V11
–1 IV
TV+()
1VTV()V11
TK1
*T.
σ
ˆ1
2x*()K1
*K11
–1K1
*TK1
*K11
–1K1
TR2K1K11
–1K1
T
+()
1
×K1K11
–1K1
*TK1
*K11
–1 K11
–1K1
TR2K1K11
–1K1
T
+()
1
(=
×K1K11
–1 )K1
*TK1
*K11 K1
TR2K1
+()
1K1
*T
=
=
K1
*V11
TV11 C1
TC1
+()
1K1
*T
=
K1
*V11
–1 IV
TV+()
1V11
TK1
*T.
We w i l l n ow ob t a i n t he re s u l t u si n g t h e s e co n d v a ri !
ant of the covariance matrices approximation:
The obtained variance estimate coincides with the
variance estimate for the case when the standard
Gaussian process regression and the sample
D
1
are
used. Therefore, it is clear that this estimate has suffi!
ciently good numerical properties, but it does not take
into account information that the additional sample
D
\
D
1
was used for the prediction calculation.
Let us obtain an expression when the third variant
of covariance matrices approximation is used:
Let us transform the obtained expression with con!
sideration for the earlier introduced designations:
σ
ˆ2
2x*() kx*x*,()kx*X,()kXX,()
1kx*X,()
T
=
kx*x*,()K1
*K11
–1K1
TK1K11
–1K1
T
()
1K1K11
–1K1
*T
=
kx*x*,()K1
*K11
–1K1
*T.
σ
ˆ3
2x*()kx*x*,()K1
*K11
–1K1
TR2K1K11
–1K1
T
+()
1
×K1K11
–1K1
*Tkx*x*,()K1
*K11
–1K1
TR=
×IRK
1K11
–1K1
TR+()
1RK1K11
–1K1
*T
=
kx*x*,()K1
*K11
–1C1
TIC
1K11
–1C1
T
+()
1C1K11
–1K1
*T
=
kx*x*,()K1
*K11
–1 IC
1
TC1K11
–1
+()
1C1
TC1K11
–1K1
*T
=
kx*x*,()K1
*K11 C1
TC1
+()
1
×C1
TC1K11
–1K1
*Tkx*x*,()=
K1
*IC
1
TC1
()
1K11
+()
1K11
–1K1
*Tkx*x*,()=
K1
*K11 K11 C1
TC1
()
1K11
+()
1K1
*T.
σ
ˆ3
2x*()kx*x*,()K1
*K11 K11 C1
TC1
()
1K11
+()
1
×K1
*Tkx*x*,()K1
*V11
TV11 V11
TV11
+(=
×C1
TC1
()
1V11
TV11 )1K1
*Tkx*x*,()=
K1
*V11
TV11 V11
TVTV()
1V11
+()
1K1
*T
=
kx*x*,()K1
*V11
–1 IV
TV()
1
+()
1V11
TK1
*T
=
kx*x*,()K1
*V11
–1 IV
TV+()
1VTVV11
TK1
*T.
Ta bl e 1 .
Approximation of covariance matrices for different estimates of the posterior prediction variance
Approximation
variant
k
(
x
*,
x
*)
k
(
x
*,
X
)
k
(
X
,
X
)
1
2
k
(
x
*,
x
*)
3
k
(
x
*,
x
*)
K1
*K11
1K1
*T
K1
*K11
1K1
T
R2K1K11
1K1
T
+
K1
*K11
1K1
T
K1K11
1K1
T
K1
*K11
1K1
T
R2K1K11
1K1
T
+
DRAFT
1352
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
Note that if we want to avoid the inversion of matrix
(
V
T
V
)
–1
whose condition number we cannot directly
control, it is necessary to use an expression asymmet!
rical relative to the involved matrices. One should also
note that, if the scale of the matrix
V
T
V
is significantly
larger than the identity matrix, then (
I
+
V
T
V
)
–1
(
V
T
V
)
I
and
i.e., in this case, these two vari!
ants of uncertainty estinates almost conincide.
Statement 3.The computational complexity of the
posterior mean calculation using (6) and the posterior
variance calculation using (7), (8) or (9) at one point
is
O
().
Proof. At first, it is necessary to calculate matrices
V
11
and
V
= The size of the matrix
V
11
is
n
1
×
n
1
and
O
() operations are necessary to calculate
its inverse.
O
() operations are required for calcu!
lating
O
(
n
1
n
) operations are necessary now to
calculate
V
, since the matrix
R
is diagonal.
For
n
* = 1, calculation of the posterior mean con!
sists in calculating
V
11
( +
V
T
V
)
–1
V
T
y
. We use
O
() operations for calculating
V
T
V
.
O
() opera!
tions are necessary to invert +
V
T
V
.
O
() opera!
tions are necessary to calculate
V
11
( +
V
T
V
)
–1
V
T
. At
last,
O
(
n
1
n
) operations are required to estimate the
posterior mean. Hence, the computational complexity
of the Nystrom approximation calculation of the pos!
terior mean is
O
( ) operations.
To c a l c u l a t e
V
11
( +
V
T
V
)
–1
O
() opera!
tions are necessary for calculating ( +
V
T
V
)
–1
and
O
() operations are necessary for obtaining the final
result. Hence, the computational complexity of the
posterior variance approximation using (7) requires
O
( ) operations. Similarly, we obtain the computa!
tional complexity of the posterior variance approxi!
mation using (8) and (9).
Thus, it is required to perform
O
( ) operations
for calculating the required matrices and
O
() oper!
ations for calculating the posterior mean and the pos!
terior variance using these matrices. Hence, the total
computational complexity is
O
().
5. COMPUTATIONAL EXPERIMENTS
In this section, we consider solution of several arti!
ficial test problems and one real problem using the
proposed approach to the sparse Gaussian process
σ
ˆ3
2x*()
σ
ˆ2
2x*(),
nn1
2
RK1V11
T.
n1
3
n1
2n
K1V11
T.
In1
n1
2n
n1
3
In1
n1
2n
In1
n1
2n
In1
V11
–1,
n1
2n
In1
n1
3
n1
2n
n1
2n
n1
2n
n1
2n
regression for multifidelity data (Sparse Variable
Fidelity Gaussian Processes (SVFGP) Regression).
The proposed approach is compared with the Gauss!
ian processes (GP) regression for single!fidelity data
and Gaussian process regression for multifidelity data
(Variable Fidelity Gaussian Processes (VFGP)
Regression), for which the Nystrom approximation is
not used. The quadratic exponential covariance func!
tion is used in the experiments [3].
As a measure of accuracy of the obtained surrogate
models, we use the RRMS error, which is estimated
using cross!validation. For the test sample
D
test
=
the RRMS error of the sur!
rogate model is
here Usually, the values of the
RRMS error lie between 0 and 1. The values of the
RRMS error of accurate surrogate models are close to
zero and those of inaccurate surrogate models are
close to or exceed 1.
5.1. Artificial Data
For testing the proposed SVFGP approach, we use
an artificial function with a large number of local sin!
gularities and input dimensionality
d
= 5. Thus, to
construct an accurate surrogate model, we need a big
sample. As a high!fidelity function
y
h
(
x
) and a low!
fidelity function
y
l
(
x
), we use
The high!fidelity function was corrupted by the
Gaussian white noise
ε
h
with a variance equal to 0.001
and low!fidelity function was corrupted by the Gauss!
ian white noise
ε
l
with a variance equal to 0.002. We
generate points from a hypercube [0, 1]
d
using optimal
latin hypercubes (OLHS, [14]). The size of the high!
fidelity data sample was
n
h
= 100, the size of the sub!
sample with base points for the SVFGP was = 1000
in all experiments in this section. The sizes
n
l
= 1000,
2000, 3000, 40000, 5000 of the low!fidelity data sam!
xi
test yi
test = fhxi
test
(),{}
i1=
nt
y
ˆx()
RRMS Dtest y
ˆ
,()
y
ˆhxi
test
()yi
test
()
2
i1=
nt
yy
i
test
()
2
i1=
nt
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!,=
y1
nt
!!!yi
test.
i1=
nt
=
yhx() 20 xi
210 2πxi
()cos()
i1=
d
+=
+
εh,x01,[]
d,
ylx() yhx() 0.2 xi1+()
2εl, x01,[]
d.+
i1=
d
+=
nl
1
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1353
ple were considered. The results were averaged over
50 random initializations for each value
n
l
.
A personal computer with the Ubuntu operating
system, Intel Core i7, 4 cores, up to 3.4 GHz and 8 Gb
RAM was used for calculations. The obtained results
are summarized in the tables below:
(i) The RRMS errors for the VFGP and SVFGP
are summarized in Table 2.
(ii) The learning time of the surrogate models for
the VFGP and SVFGP are summarized in Table 3.
The RRMS errors for the SVFGP are comparable
with the RRMS errors for the VFGP for the same size
of the learning sample, but the learning time of the
model is substantially smaller for the SVFGP, espe!
cially for sample sizes close to 5000.
5.2. Prediction Uncertainty Estimates for Artificial Data
Let us consider the Ackley function [15] for the
three!dimensional input space:
To l e a r n t h e m o d e l , t h e f o l l o w i n g h i g h f i d e l i t y a n d
low fidelity functions were used. They differed in the
noise variances:
y
h
(
x
) =
f
(
x
)(1 + ),
y
l
(
x
) =
f
(
x
)(1 + ),
ε
is the Gaussian white noise with the unit variance.
For the high fidelity function, variance was equal to
0.1 and the sample size was equal to 60. For the low
fidelity function variance was equal to 0.4 and the
sample size was equal to 160.
We will compare the accuracy of uncertainty esti!
mates on an independent test sample. Figure 1 shows
the distribution function of real prediction errors and
the distribution function of prediction uncertainty esti!
mates, which were obtained using formulas (7), (8), and
(9), respectively. One can see that, for uncertainty esti!
mates (8) and (9), the distribution functions almost
coincide and are closer to the true error distribution
function than the distribution function of uncertainty
estimates obtained using formula (7); the use of this for!
mula leads to under!estimation of the error values.
The information on the accuracy of error estimates
is summarized in Table 4 and allow one to come to sim!
ilar conclusions, i.e., formulas (8) and (9) ensure more
accurate uncertainty estimates from the viewpoint of
correlation and the RRMS criterion calculated from
the true error values and their predictions.
fx() 0.2 x1
2x2
2
+()exp=
+
32x1
() 2x2
()sin+cos()0.2x2
2x3
2
+()exp+
+
32x2
() 2x3
()sin+cos().
2σh
2ε
2σl
2ε
σh
2
σl
2
5.3. Rotating Disk Problem
Rotating disk is an important element of the air!
craft engine. It is necessary to construct accurate mod!
els for predicting the maximum radial displacement
u
max
and the maximum load
s
max
, which determine the
disk reliability [16].
Ta b le 2.
Comparison of the RRMS errors for the VFGP
and SVFGP on artificial data
n
l
1000 2000 3000 4000 5000
VFGP 0.0100 0.0086 0.0028 0.0031 0.0024
SVFGP 0.0100 0.0067 0.0049 0.0044 0.0044
Ta b le 3 .
Comparison of the models learning times (in sec!
onds) for the VFGP and SVFGP on artificial data
n
l
1000 2000 3000 4000 5000
VFGP 23.83 254.4 758.2 2334 4496
SVFGP 23.36 26.07 28.89 29.49 35.33
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
50–10 –5 log
σ
(
x
)
0.9
1.0
CDF
True errors
First variant
Second variant
Thrid variant
Fig. 1.
Dis tribu tion f unctio n of th e loga rithm ic tra nsform ation
of errors for the case of: real erros (True errors) and uncertainty
estimates obtained using formula (7) (First variant), for!
mula (8) (Second variant), formula (9) (Third variant).
Ta b le 4 .
Quality of the uncertainty estimates obtained using
different approximations
Formula
for the approximation correlation RRMS error
(7) –0.0274 1.2730
(8) 0.4756 0.6769
(9) 0.4757 0.6769
DRAFT
1354
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
We parameterize the geometry of the rotating disk
using 8 parameters: radii
r
i
,
i
= 1, ..., 6, which deter!
mine where the disk thickness changes and parameters
t
1
,
t
3
,
t
5
, which determine the disk thickness itself. In
the considered problem, we fix radii
r
4
and
r
5
, and
thickness
t
3
of the rotating disk. Therefore, the dimen!
sionality of the disk parameter space is equal to 6. The
geometrical parameters of the rotating disk are shown
in Fig. 2.
We c o n s id e r t h e f o l l ow i n g h i g h!f i d e li t y a n d l o w !
fidelity functions for calculating objective values
u
max
and
s
max
. As a low!fidelity function we use a solver
based on ordinary differential equations implementing
the Runge–Kutta method [17], and, as a high!fidelity
function we use a solver based on the finite!element
method. One calculation of the low!fidelity function
takes about 0.01 s, and one calculation of the high!
fidelity function requires about 300 s.
The examples of the slices of the low!fidelity and
the high!fidelity functions are shown in Figs 3, 4 and 5
for output
s
max
. The low!fidelity and high!fidelity
functions are similar, but for some cases the low!fidel!
ity function does not represent some nonlinear effects.
In this section we compare the SVFGP with two
basic methods (GP and VFGP). For generation of
sample points the optimal latin hypercube method was
used.
n
h
calculated values of the high fidelity function,
1000 values of the low fidelity function for the VFGP,
and 5000 values of the low fidelity function for the
SVFGP were used for construction of surrogate mod!
els. = 1000 base points were randomly selected
nl
1
r
1
t
5
r
2
r
3
r
4
r
5
r
6
t
3
t
1
Fig. 2.
Parameterization of the rotating disk.
480
460
440
420
400
380
360
340
320 804006020 100
s
max
r
1
Low fidelity points
High fidelity points
Fig. 3.
Sl ice o f
s
max
along
r
1
fo r hig h!fi deli ty a nd lo w!fidel!
ity functions.
432
430
428
426
422
420
418
416
414 190180170 185175 195
s
max
r
4
Low fidelity points
High fidelity points
424
Fig. 4.
Sl ice o f
s
max
al ong
r
4
fo r hig h!fid eli ty an d lo w!fidel!
ity functions.
750
700
650
600
500
450
400 402003010
s
max
t
3
Low fidelity points
High fidelity points
550
Fig. 5.
Slice of
s
max
alo ng
t
3
for high!fid elity and low!fidel!
ity functions.
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1355
from the low fidelity sample; we varied value of
n
h
from
20 to 100.
To e v a l u a t e t h e a c c u r a c y o f t h e m o d e l , c r o s s !vali!
dation over the sample consisting of 140 values of the
high!fidelity function was used (this sample contained
n
h
points used for construction of surrogate models).
The results are summarized in Table 5 for the
u
max
output and in Table 6 for the
s
max
output. From RRMS
errors values we can see that the SVFGP allows one to
obtain more accurate results than the VFGP and GP.
6. CONCLUSIONS
We proposed a new approach to surrogate model!
ing of multifidelity data, which allows one to process
samples with sizes up to several thousands of points.
The approach is based on the Nystrom approximation
of initial covariance matrices by products of matrices
with smaller sizes. Closed form expressions for a pre!
diction of high fidelity function and its uncertainty
were obtained. The proposed approach and other
widely used methods were compared on some real and
artificial data. Results of experiments allowed us to
conclude that using the proposed approach it is possi!
ble to construct more accurate surrogate models, and
in this case the model construction process has signif!
icantly lower computational complexity.
ACKNOWLEDGMENTS
This work was fulfilled at the IITP and supported
by the Russian Science Foundation, project no. 14!
50!00150.
REFERENCES
1. A. I. J. Forrester, A. Sóbester, and A. J. Keane,
Engi!
neering Design Via Surrogate Modeling: A Practical
Guide. Progress in Astronautics and Aeronautics
(Wiley,
New!York, 2008).
2. N. A. C. Cressie and N. A. Cassie,
Statistics for Spatial
Data
(Wiley, New York, 1993), Vol. 900.
3. C. E. Rasmussen and C. K. I. Williams,
Gaussian Pro!
cesses for Machine Learning
(MIT Press, Cambridge,
MA, 2006).
4. A. I. J. Forrester, A. Sóbester, and A. J. Keane, “Multi!
fidelity optimization via surrogate modelling,” Proc.
Royal Soc. A: Math, Phys. Engineering Sci.
463
, 3251–
3269 (2007).
5. M. C. Kennedy and A. O’Hagan, “Predicting the out!
put from a complex computer code when fast approxi!
mations are available,” Biometrika
87
, 1–13 (2000).
6. S. Koziel, S. Ogurtsov, I. Couckuyt, and T. Dhaene,
“Cost!efficient electromagnetic!simulation!driven
antenna design using co!kriging,” IET Microwaves,
Antennas & Propagation,
6
, 1521–1528 (2012).
7. E. Snelson and Z. Ghahramani, “Sparse gaussian pro!
cesses using pseudo!inputs,” Adv. Neural Inf. Process.
Syst.
18
, 1257–1264 (2006).
8. J. Quiñonero!Candela and C. E. Rasmussen, A unify!
ing view of sparse approximate gaussian process regres!
sion,” J. Mach. Learn. Res.
6
, 1939–1959 (2005).
9. L. Foster, A. Waagen, N. Aijaz, M. Hurley, A. Luis,
J. Rinsky, C. Satyavolu, M. J. Way, P. Gazis, and
A. Srivastava, “Stable and efficient gaussian process
calculations,” J. Mach. Learn. Res.
10
, 857–882
(2009).
10. P. Drineas and M. W. Mahoney, “On the nyström
method for approximating a gram matrix for improved
kernel!based learning,” J. Mach. Learn. Res.
6
, 2153–
2175 (2005).
11. A. A. Zaytsev, E. V. Burnaev, and V. G. Spokoiny,
“Properties of the bayesian parameter estimation of a
regression based on gaussian processes,” J. Math. Sci.
203
, 789–798 (2014).
12. F. Bachoc, “Cross validation and maximum likelihood
estimations of hyper!parameters of gaussian processes
with model misspecification,” Comput. Stat. Data
Analysis,
66
, 55–69 (2013).
13. G. H. Golub and Ch. F. Van Loan,
Matrix Computations
(Johns Hopkins Univ., London, 2012), Vol. 3.
14. J.!S. Park, “Optimal latin!hypercube designs for com!
puter experiments,” J. Stat. Plann. Infer.
39
, 95–111
(1994).
15. D. Karaboga and B. Basturk, “A powerful and efficient
algorithm for numerical function optimization: artifi!
cial bee colony (abc) algorithm,” J. Global Optim.,
39
,
459–471 (2007).
16. S. C. Armand,
Structural Optimization Methodology for
Rotating Disks of Aircraft Engines. Technical Report
(National Aeronautics and Space Administration,
Office of Management, Scientific and Technical Infor!
mation Program, 1995).
17. J. Ch. Butcher,
Numerical Methods for Ordinary Differ!
ential Equations
(Wiley Online Library, 2005).
Translated by N. Pakhomova
Ta b l e 6 .
RRMS errors for the rotating disk problem.
Output
s
max
n
h
20 40 60 80 100
GP 0.5261 0.3181 0.2164 0.2095 0.1643
VFGP 0.2336 0.2326 0.2058 0.1321 0.1088
SFGP 0.1674 0.1095 0.1023 0.0939 0.0812
Ta bl e 5 .
RRMS errors for the rotating disk problem.
Output
u
max
n
h
20 40 60 80 100
GP 0.3368 0.1826 0.1305 0.1091 0.0756
VFGP 0.1679 0.0998 0.0822 0.0564 0.0435
SVFGP 0.1018 0.0658 0.0494 0.0427 0.0339
DRAFT
... Some algorithmic novelties of the tool have already been described earlier [30][31][32][33][34][35][36][37]; in the present paper we describe the tool as a whole, in particular focusing on the overall decision process and performance comparison that have not been published before. The tool is a part of the MACROS library [38]. ...
... GP memory requirements scale quadratically with the size of the training set, so this technique is not applicable to very large training sets. SGP is a version of GP that lifts this limitation by using only a suitably selected subset of training data and approximating the corresponding covariance matrices [35]. ...
Preprint
Full-text available
We describe GTApprox - a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe several applications of GTApprox to real engineering problems.
... Nice property of Gaussian process regression is an ability to treat variable fidelity data [30,43,52,37,26,23]: one can construct a surrogate model of a high fidelity function using data both from high and low fidelity sources (e.g., high fidelity evaluations can be obtained using a computational code with a fine mesh, and low fidelity evaluations can be obtained using the same computational code with a coarser mesh). Recent results provide theoretical analysis of obtained models [67,69] and of parameters estimates [24]. ...
... Maximum likelihood estimation of a Gaussian process regression model sometimes provides degenerate results -a phenomenon closely connected to overfitting [65,68,48,51]. To regularize the problem and avoid inversion of large ill-conditioned matrices, one can impose a prior distribution on a Gaussian process regression model and then use Bayesian MAP (Maximum A Posteriory) estimates [20,23,11]. In particular in this paper we adopted the approach described in [20]: we impose prior distributions on all parameters of the covariance function and additional hyperprior distributions on parameters of the prior distributions. ...
Preprint
Engineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nystr\"om approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a~low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape.
... From one point of view, it is a con, especially for a complex base can be prohibitive in some cases. We suggest using sparse Gaussian process regression (GPR) [37] or other large-data-set variants of GPR widely available in modern literature [26] and used in industrial problems [3]. We would highlight the importance of selection of the set of inducing points in our approach, as they are the points with zero model uncertainty in them, and can be selected with this thought in mind. ...
Preprint
Full-text available
Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate surrogate. Compared to other approaches, the estimate remains computationally effective with training only one additional model and doesn't rely on data-specific assumptions. The only requirement is the availability of the base model as a black box, which is typical. Experiments for challenging time-series forecasting data show that surrogate model-based methods provide more accurate confidence intervals than bootstrap-based methods in both medium and small-data regimes and different families of base models, including linear regression, ARIMA, and gradient boosting.
... We used the SBO algorithm implemented in the industrial software [22]. The SBO methodology is based on Gaussian processes modeling technique [23,24,25,26]. The particular numerical realization roots in the scientific works published in [23]. ...
Preprint
Full-text available
We describe a stacked model for predicting the cumulative fluid production for an oil well with a multistage-fracture completion based on a combination of Ridge Regression and CatBoost algorithms. The model is developed based on an extended digital field data base of reservoir, well and fracturing design parameters. The database now includes more than 5000 wells from 23 oilfields of Western Siberia (Russia), with 6687 fracturing operations in total. Starting with 387 parameters characterizing each well, including construction, reservoir properties, fracturing design features and production, we end up with 38 key parameters used as input features for each well in the model training process. The model demonstrates physically explainable dependencies plots of the target on the design parameters (number of stages, proppant mass, average and final proppant concentrations and fluid rate). We developed a set of methods including those based on the use of Euclidean distance and clustering techniques to perform similar (offset) wells search, which is useful for a field engineer to analyze earlier fracturing treatments on similar wells. These approaches are also adapted for obtaining the optimization parameters boundaries for the particular pilot well, as part of the field testing campaign of the methodology. An inverse problem (selecting an optimum set of fracturing design parameters to maximize production) is formulated as optimizing a high dimensional black box approximation function constrained by boundaries and solved with four different optimization methods: surrogate-based optimization, sequential least squares programming, particle swarm optimization and differential evolution. A recommendation system containing all the above methods is designed to advise a production stimulation engineer on an optimized fracturing design.
... It performs for each training set a numerical optimization of the technique as well as its parameters [37,38] by minimizing the cross-validation error; see [36]. Among the algorithms scanned by pSeven are the following: ridge regression [39], stepwise regression [40], elastic net [41], Gaussian processes [42], sparse Gaussian processes [43,44], high-dimensional approximation (HDA) [36,45] and high-dimensional approximation combined with Gaussian processes (HDAGP) (this technique is related to artificial neural networks and, more specifically, to the two-layer perceptron with a nonlinear activation function [45]). Two desirable features of pSeven are: (i) all data manipulation is done via graphical user interface and (ii) it can export the constructed surrogate model as a stand-alone function in a number of scientific computing languages, including Matlab, C source for MEX, C source for stand-alone program, C header for library, C source for library, functional mock-up interface for Co-simulation 1.0 and executable. ...
Article
Full-text available
Specific mental processes are associated with brain activation of a unique form, which are, in turn, expressed via the generation of specific neuronal electric currents. Electroencephalography (EEG) is based on measurements on the scalp of the electric potential generated by the neuronal current flowing in the cortex. This specific form of EEG data has been employed for a plethora of medical applications, from sleep studies to diagnosing focal epilepsy. In recent years, there have been efforts to use EEG data for a more ambitious purpose, namely to determine the underlying neuronal current. Although it has been known since 1853, from the studies by Helmholtz, that the knowledge of the electric potential of the external surface of a conductor is insufficient for the determination of the electric current that gave rise to this potential, the important question of which part of the current can actually be determined from the knowledge of this potential remained open until work published in 1997, when it was shown that EEG provides information only about the irrotational part of the current, which will be denoted by Ψ ; moreover, an explicit formula was derived in the above work relating this part of the current, the measured electric potential, and a certain auxiliary function, v s , that depends on the geometry of the various compartments of the brain–head system and their conductivities. In the present paper: (i) Motivated by recent results which show that, in the case of ellipsoidal geometry, the assumption of the L 2 minimization of the current yields a unique solution, we derive an analogous analytic formula characterizing this minimization for arbitrary geometry. (ii) We show that the above auxiliary function can be computed numerically via a line integral from the values of a related function v s computed via OpenMEEG; moreover, we propose an alternative approach to computing the auxiliary function v s based on the construction of a certain surrogate model. (iii) By expanding Ψ in terms of an inverse multiquadric radial basis we implement the relevant formulae numerically. The above algorithm performs well for synthetic data; its implementation with real data only requires the knowledge of the coordinates of the positions where the given EEG data are obtained.
Conference Paper
Machine learning models play a vital role in time series forecasting. These models, however, often overlook an important element: point uncertainty estimates. Incorporating these estimates is crucial for effective risk management, informed model selection, and decision-making.To address this issue, our research introduces a method for uncertainty estimation. We employ a surrogate Gaussian process regression model. It enhances any base regression model with reasonable uncertainty estimates. This approach stands out for its computational efficiency. It only necessitates training one supplementary surrogate and avoids any data-specific assumptions. Furthermore, this method for work requires only the presence of the base model as a black box and its respective training data.The effectiveness of our approach is supported by experimental results. Using various time-series forecasting data, we found that our surrogate model-based technique delivers significantly more accurate confidence intervals. These techniques outperform both bootstrap-based and built-in methods in a medium-data regime. This superiority holds across a range of base model types, including a linear regression, ARIMA, gradient boosting and a neural network.
Article
We describe a stacked model for predicting the cumulative fluid production for an oil well with a multistage-fracture completion based on a combination of Ridge Regression and CatBoost algorithms. The model is developed based on an extended digital field data base of reservoir, well and fracturing design parameters. The database now includes more than 5000 wells from 23 oilfields of Western Siberia (Russia), with 6687 fracturing operations in total. Starting with 387 parameters characterizing each well, including construction, reservoir properties, fracturing design features and production, we end up with 38 key parameters used as input features for each well in the model training process. The model demonstrates physically explainable dependencies plots of the target on the design parameters (number of stages, proppant mass, average and final proppant concentrations and fluid rate). We developed a set of methods including those based on the use of Euclidean distance and clustering techniques to perform similar (offset) wells search, which is useful for a field engineer to analyze earlier fracturing treatments on similar wells. These approaches are also adapted for obtaining the optimization parameters boundaries for the particular pilot well, as part of the field testing campaign of the methodology. An inverse problem (selecting an optimum set of fracturing design parameters to maximize production) is formulated as optimizing a high dimensional black box approximation function constrained by boundaries and solved with four different optimization methods: surrogate-based optimization, sequential least squares programming, particle swarm optimization and differential evolution. A recommendation system containing all the above methods is designed to advise a production stimulation engineer on an optimized fracturing design.
Article
Surrogate models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate for several applications, including surface approximation and surrogate-based optimization. This work evaluates the performance of eight surrogate modeling techniques for those two applications over a set of generated datasets with known characteristics. With this work, we aim to provide general rules for selecting an appropriate surrogate model form based solely on the characteristics of the data being modeled. The computational experiments revealed that there is a dependence of the surrogate modeling performance on the data characteristics. However, in general, multivariate adaptive regression spline models and Gaussian process regression yielded the most accurate predictions for approximating a surface. Random forests, support vector machine regression, and Gaussian process regression models most reliably identified the optimum locations and values when used for surrogate-based optimization.
Chapter
In this work, we aimed at predicting children’s fluid intelligence scores based on structural T1-weighted MR images from the largest long-term study of brain development and child health. The target variable was regressed on a data collection site, sociodemographic variables, and brain volume, thus being independent to the potentially informative factors, which were not directly related to the brain functioning. We investigated both feature extraction and deep learning approaches as well as different deep CNN architectures and their ensembles. We proposed an advanced architecture of VoxCNNs ensemble, which yields MSE (92.838) on a blind test.
Article
Full-text available
We consider the regression approach based on Gaussian processes and outline our theoretical results about the properties of the posterior distribution of the corresponding covariance function’s parameter vector. We perform statistical experiments confirming that the obtained theoretical propositions are valid for a wide class of covariance functions commonly used in applied problems.
Article
Full-text available
A technique for design optimisation of antennas is presented. The approach exploits coarse-discretisation electromagnetic (EM) simulations of the antenna of interest that are used to create its fast initial model (a surrogate) through Kriging. During the design process, predictions obtained by optimising the surrogate are verified using high-fidelity EM simulations, and these high-fidelity data are used to enhance the surrogate using co-Kriging interpolation that accommodates all EM simulation data into one surrogate model. The co-Kriging-based optimisation algorithm is simple and it is capable of yielding a satisfactory design at a low-cost equivalent to a few high-fidelity EM simulations of the antenna under design. To the best of author's knowledge, this is a first application of co-Kriging to antenna design. The proposed approach is demonstrated using several antenna design cases and compared with other techniques including pattern search and space mapping.
Article
Full-text available
The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.
Article
Full-text available
This paper demonstrates the application of correlated Gaussian process based approximations to optimization where multiple levels of analysis are available, using an extension to the geostatistical method of co-kriging. An exchange algorithm is used to choose which points of the search space to sample within each level of analysis. The derivation of the co-kriging equations is presented in an intuitive manner, along with a new variance estimator to account for varying degrees of computational 'noise' in the multiple levels of analysis. A multi-fidelity wing optimization is used to demonstrate the methodology.
Article
Full-text available
Swarm intelligence is a research branch that models the population of interacting agents or swarms that are able to self-organize. An ant colony, a flock of birds or an immune system is a typical example of a swarm system. Bees’ swarming around their hive is another example of swarm intelligence. Artificial Bee Colony (ABC) Algorithm is an optimization algorithm based on the intelligent behaviour of honey bee swarm. In this work, ABC algorithm is used for optimizing multivariable functions and the results produced by ABC, Genetic Algorithm (GA), Particle Swarm Algorithm (PSO) and Particle Swarm Inspired Evolutionary Algorithm (PS-EA) have been compared. The results showed that ABC outperforms the other algorithms.
Article
Abstract We present a new Gaussian process (GP) regression model whose,co- variance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M N, where N is the number of real data points, and hence obtain a sparse regression method,which has O(M,) pre- diction cost per test case. We also find hyperparameters,of the covari- ance function in the same joint optimization. The method can be viewed as a Bayesian regression model,with particular input dependent,noise. The method turns out to be closely related to several other sparse GP ap- proaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.
Article
The Maximum Likelihood (ML) and Cross Validation (CV) methods for estimating covariance hyper-parameters are compared, in the context of Kriging with a misspecified covariance structure. A two-step approach is used. First, the case of the estimation of a single variance hyper-parameter is addressed, for which the fixed correlation function is misspecified. A predictive variance based quality criterion is introduced and a closed-form expression of this criterion is derived. It is shown that when the correlation function is misspecified, the CV does better compared to ML, while ML is optimal when the model is well-specified. In the second step, the results of the first step are extended to the case when the hyper-parameters of the correlation function are also estimated from data.
Conference Paper
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1