Content uploaded by Evgeny Burnaev
Author content
All content in this area was uploaded by Evgeny Burnaev on Dec 28, 2015
Content may be subject to copyright.
ISSN 1064!2269, Journal of Communications Technology and Electronics, 2015, Vol. 60, No. 12, pp. 1348–1355. © Pleiades Publishing, Inc., 2015.
Original Russian Text © E.V. Burnaev, A.A. Zaytsev, 2015, published in Informatsionnye Protsessy, 2015, Vol. 15, No. 1, pp. 97–109.
1348
1. INTRODUCTION
Ve r y o f t e n i n e n g i n e e r i n g p r a c t i c e w e h a v e a s a m p l e
of high fidelity function values and additionally a
larger sample of low fidelity function values (being
some approximation of the high fidelity function) [1].
Therefore the problem of multifidelity data modeling
arises: we have to construct a surrogate model of the
high fidelity function using both of these samples.
For example, the high fidelity function represents
an aircraft wing lift coefficient, measured in a wind
tunnel, and the low fidelity function represents an air!
craft wing lift coefficient, calculated using computer
simulations. Experiments in the wind tunnel allow one
to obtain exact lift coefficient values, but these exper!
iments are time and resource intensive in contrast to
virtual computer simulation experiments. Therefore,
when constructing a surrogate model of the lift coeffi!
cient dependence on the aircraft wing geometrical
parameters, we have to use a small sample of high
fidelity function values, obtained from the wind tunnel
experiments, and a large sample of a less accurate
computer simulation data.
For modeling of the multifidelity data, the Gauss!
ian process regression model [1–3] proves to be effi!
cient. Using this model one can efficiently reconstruct
nonlinear dependences and evaluate their prediction
accuracy at specified points [4–6].
The maximal sample size, which can be used for
Gaussian process regression construction, is limited
by several thousands of points, since when estimating
parameters it is necessary to invert a sample covariance
matrix [7]. Therefore, if there is a large sample of equal
fidelity data, a special approximation of the initial
covariance matrix is used to construct the Gaussian
process regression [7–9] (Nystrom approximation
[10]). This approximation is based on some subset of
base points allowing one to substantially reduce the
computational complexity. However, up to now, there
are no approaches for construction of the Gaussian
process regression using large (more than several thou!
sands of points) multifidelity data samples. At the
same time large multifidelity data samples often arise
in engineering practice. For example, since the com!
putational cost of the low fidelity function is usually
very low compared to the computational cost of the
high fidelity function [4], then the low fidelity data
sample can have a very large size.
In this paper we develop an approach to Gaussian
process regression for large multifidelity data samples.
The basic idea is to use a subset of the initial multifi!
delity sample for approximation of the sample covari!
Surrogate Modeling of Multifidelity Data for Large Samples
E. V. Burnaev
a
,
b
,
c
and A. A. Zaytsev
a
,
b
a
Kharkevich Institute for Information Transmission Problems (IITP), Russian Academy of Sciences,
Bol’shoi Karetnyi per. 19, str. 1, Moscow 127051 Russia
b
OOO Datadvans, Pokrovskii bul’v. 3, str. 1B, Moscow, 109028 Russia
c
Laboratory of Predictive Modeling and Data Analysis, Moscow Institute of Physics and Technology (State University),
Institutskii per. 9, Dolgoprudnyi, Moscow oblast, 141700 Russia
e!mail: burnaev@iitp.ru
Received March 27, 2015
Abstract
—The problem of construction of a surrogate model based on available low! and high!fidelity data
is considered. The low!fidelity data can be obtained, e.g., by performing the computer simulation and the
high!fidelity data can be obtained by performing experiments in a wind tunnel. A regression model based on
Gaussian processes proves to be convenient for modeling variable!fidelity data. Using this model, one can
efficiently reconstruct nonlinear dependences and estimate the prediction accuracy at a specified point.
However, if the sample size exceeds several thousand points, direct use of the Gaussian process regression
becomes impossible due to a high computational complexity of the algorithm. We develop new algorithms for
processing multifidelity data based on Gaussian process model, which are efficient even for large samples. We
illustrate application of the developed algorithms by constructing surrogate models of a complex engineering
system.
Keywords
: multifidelity data, uncertainty estimate, Gaussian processes, covariance matrix approximation,
cokriging
DOI:
10.1134/S1064226915120037
MATHEMATICAL MODELS
AND COMPUTATIONAL METHODS
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1349
ance matrix [9]. We obtain estimates of a posterior
regression mean, used as a prediction of the high fidel!
ity function value, of a posterior variance, used for the
prediction uncertainty quantification, and of compu!
tational complexities of the proposed algorithms.
This paper contains the following sections: the
Gaussian process regression is described in Section 2;
Section 3 shows how one should use the Gaussian pro!
cess regression for modeling multifidelity data;
the proposed algorithm for constructing the
surrogate model based on the large multifidelity data
samples is described in Section 4; Section 5 contains
the results of experiments with artificial and real data;
and conclusions are presented in Section 6.
2. REGRESSION BASED
ON GAUSSIAN PROCESSES
Let us consider the learning sample
D
= (
X
,
y
) =
where points
x
∈
!
⊆
"
d
and func!
tion values
y
(
x
)
∈
"
. We assume that
y
(
x
) =
f
(
x
) +
ε
,
where function
f
(
x
) is a realization of the Gaussian
process, and
ε
is a Gaussian white noise with variance
σ
2
. It is necessary to construct the surrogate model for
the objective function
f
(
x
).
The mean value and the covariance matrix of the
Gaussian process
k
(
x
,
x
') = cov(
f
(
x
),
f
(
x
'))
=
#
(
f
(
x
) –
#
(
f
(
x
)))(
f
(
x
') –
#
(
f
(
x
')))
completely determine Gaussian process
f
(
x
). To sim!
plify notations we assume that its mean value is equal
to zero. We assume that the covariance function belongs
to the parametric family {
k
θ
(
x
,
x
'),
θ
∈
Θ
⊆
"
p
}, i.e.,
k
(
x
,
x
') =
k
θ
(
x
,
x
') for some
θ
∈
Θ
. Then
y
(
x
) will be
the Gaussian process with zero mean and the covari!
ance function cov(
y
(
x
),
y
(
x
')) =
k
θ
(
x
,
x
') +
σ
2
δ
(
x
–
x
'),
where
δ
(
x
–
x
') is the delta function. A widely used
class of covariance functions is, e.g., the quadratic
exponential covariance function [3]
k
θ
(
x
,
x
') =
Parameters of covariance function
θ
and noise vari!
ance
σ
2
specify the regression model. We use the max!
imum!likelihood method for evaluation of parameters
θ
and
σ
2
[3]:
(1)
where
K
= is the matrix
of covariances between function values
y
(
X
) from the
learning sample and |
K
| is the determinant of the
matrix
K
. In the Gaussian process regression,
σ
2
plays
xiyi = xi
(),{}
i1=
n,
θ0
2θk
2xkxk
'
–()
2
k1=
d
∑
–().exp
pyXθσ2
,,()log 1
2
!! n2πlog(–=
+
Klog yTK1–y+),
θσ2
,
max
→
kθxixj
,()σ
2δxixj
–()+{}
ij,1=
n
the role of the regularization parameter for the covari!
ance matrix of the values of
f
(
X
).
Theoretical results obtained in [11] and more applied
studies [12] show that the obtained parameter esti!
mates are accurate even if the size of the sample is
small and the model determined by the covariance
function is incorrectly specified.
Using estimates of parameters
θ
and
σ
2
, it is possi!
ble to calculate the posterior mean and the posterior
variance of
y
(
x
) at some new point, which are used for
predicting the function value and evaluating the pre!
diction uncertainty correspondingly. The posterior
mean
#
(
y
(
X
*)|
y
(
X
)) at new points
X
*= { can
be written as
(2)
where
K
(
X
*,
X
) = are
covariances between
y
(
X
*) at new points and values of
y
(
X
)at the points from the learning sample. The pos!
terior covariance matrix
$
(
X
*) =
#
[(
y
(
X
*) –
#
y
(
X
*))
T
(
y
(
X
*) –
#
y
(
X
*))|
y
(
X
)] at new points can be
written:
(3)
where
K
(
X
*,
X
*) =
is the matrix of covariances between values
y
(
X
*).
3. REGRESSION BASED ON GAUSSIAN
PROCESS FOR MULTIFIDELITY DATA
Let us now consider the case of multifidelity data.
Let the sample of low fidelity function values be spec!
ified as
D
l
= (
X
l
,
y
l
) = and the sample of
high fidelity function values be
D
h
= (
X
h
,
y
h
) =
with
∈
"
d
,
y
l
(
x
),
y
h
(
x
)
∈
"
. The
low fidelity function
y
l
(
x
) and high fidelity function
y
h
(
x
) model the same physical phenomenon but with
different accuracy.
Using samples of values of low fidelity function and
high fidelity function, it is necessary to construct as
accurate as possible a surrogate model
of the high fidelity function. In addition, we want to
obtain the prediction uncertainty estimates of high
fidelity function at new points.
A special model is necessary for modeling the multifi!
delity data. We use the widespread cokriging model [4]:
y
l
(
x
) =
f
l
(
x
) +
ε
l
,
y
h
(
x
) =
ρ
y
l
(
x
) +
y
d
(
x
),
where
y
d
(
x
) =
f
d
(
x
) +
ε
d
,
f
l
(
x
), and
f
d
(
x
) are the inde!
pendent Gaussian processes with zero means and
covariance functions
k
l
(
x
,
x
') and
k
d
(
x
,
x
'), respec!
tively, and
ε
l
and
ε
d
is the Gaussian white noise with
kθxixj
,(){}
ij,1=
n
θ
ˆ
xi
*
{}
i1=
n*
y
ˆX*() KX*X,()K–1y,=
kxi
*xj
,(){}
i1…n*j,,, 1…n,,==
$X*() KX*X*,()KX*X,()–=
×K–1 KXX*,()()
T,
kxi
*xj
*
,()σ
2δxi
*xj
*
–()+{}
ij,1=
n*
xi
lylxi
l
(),{}
i1=
nl
xi
hyhxi
h
(),{}
i1=
nh
xi
lxi
h
,
y
ˆhx() yhx()≈
DRAFT
1350
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
variances and respectively. Let us use the nota!
tions
X
=
y
= Then the posterior mean
for high fidelity function values at new points can be
written as
(4)
where
K
l
(
X
a
,
X
b
) and
K
d
(
X
a
,
X
b
) are the matrices of pairwise
covariances of Gaussian processes
y
l
(
x
) and
y
d
(
x
) at
points from some sets
X
a
and
X
b
, respectively. The pos!
terior covariance matrix can be written as:
(5)
To evaluate the parameters of the covariance func!
tions of Gaussian processes
f
l
(
x
) and
f
d
(
x
), the follow!
ing algorithm is used [1]:
1. Estimate parameters of the covariance function
k
l
(
x
,
x
) using the algorithm for the standard Gaussian
process regression, described in Section 2, and the
sample
D
=
D
l
.
2. Calculate values of the posterior mean for
Gaussian process
y
l
(
x
) and
x
∈
X
h
.
3. Estimate parameters of the Gaussian process
y
d
(
x
) with a covariance function
k
d
(
x
,
x
') and the
parameter
ρ
by maximizing likelihood (1) with
D
=
D
diff
= (
X
h
,
y
d
=
y
h
– ) and
k
(
x
,
x
') =
k
d
(
x
,
x
').
4. SPARSE GAUSSIAN PROCESS REGRESSION
FOR MULTIFIDELITY DATA
The efficient use of the Gaussian process regression
for multifidelity data is possible only for samples with
a size of no more than several thousand points (When
learning the model and using it for predictions, one
should invert an
n
×
n
covariance matrix, where
n
=
n
h
+
n
l
. The computational complexity of this proce!
dure is
O
(
n
3
)).
We p r o po s e a n e w a p p ro a c h f o r G a u s si a n p r o c es s
regression construction using large multifidelity data
samples. The approach is based on the Nystrom
σl
2
σd
2,
Xl
Xh
⎝⎠
⎜⎟
⎛⎞
,
yl
yh
⎝⎠
⎜⎟
⎛⎞
.
y
ˆhX*() KX*X,()K–1y,=
KX*X,() ρKlX*Xl
,()
ρ2KlX*Xh
,()KdX*Xh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
KXX,()
=
KlXlXl
,() ρKlXlXh
,()
ρKlXhXl
,()ρ
2KlXhXh
,()KdXhXh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,
$X*() ρ
2KlX*X*,()KdX*X*,()+=
–
KX*X,()K–1 KX*X,()()
T.
y
ˆlx()
ρy
ˆlXh
()
approximation of matrices
K
(
X
*,
X
),
K
and
K
(
X
*,
X
*)
using a subsample of base points from the initial sam!
ple. The presented results generalize the results of the
paper [9] for the multifidelity data case.
Let us specify the subsample
D
1
= (
X
1
,
y
1
),
X
1
=
y
1
= of base points from the initial
sample of size
n
1
= , such that for the specified
subsample the available computational resources
allow one to invert the corresponding covariance
matrices and estimate parameters of the Gaussian pro!
cess in a reasonable time. A sufficiently reliable
method of setting the base point subsample consists in
the random selection without repetitions of points
from the initial sample.
Then, using the base point subsample and setting
for new points
X
* = we obtain approxima!
tions of matrices
K
(
X
*,
X
),
K
and
K
(
X
*,
X
*), respec!
tively:
Let us define
where
I
k
is the identity matrix of size
k
,
C
1
=
RK
1
,
and
V
=
V
11
is the Cholesky factor [13] of
matrix
K
11
.
Xl
1
Xh
1
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,
ylXl
1
()
yhXh
1
()
⎝⎠
⎜⎟
⎜⎟
⎛⎞
nh
1nl
1
+
K11 KlXl
1Xl
1
,() ρKlXl
1Xh
1
,()
ρKlXh
1Xl
1
,()ρ
2KlXh
1Xh
1
,()KdXh
1Xh
1
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
K1KlXl
1Xl
,() ρKlXl
1Xh
,()
ρKlXh
1Xl
,()ρ
2KlXh
1Xh
,()KdXh
1Xh
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
,=
K1
*ρKlX*Xl
1
,()
ρ2KlX*Xh
1
,()KdX*Xh
1
,()+
⎝⎠
⎜⎟
⎜⎟
⎛⎞
=
xi
*
{}
i1=
n*,
K
ˆX*X,()K1
*K11
–1K1,K
ˆK1
()
TK11
–1K1,==
K
ˆX*X*,()K1
*K11
–1K1K1
*
()
T.=
R
1
σl
!!!!Inl0
01
ρ2σl
2σd
2
+
!!!!!!!!!!!!!!!!!!!!!! Inh
⎝⎠
⎜⎟
⎜⎟
⎜⎟
⎜⎟
⎜⎟
⎛⎞
,=
C1V11
–T,
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1351
Statement 1. The Nystrom approximation for the
posterior mean can be written as,
(6)
Proof. Indeed,
When estimating the prediction uncertainty, the
result will depend on the used approximation of cova!
riance matrices. Let us consider three possible variants
of the approximation, which are listed in Table 1.
Statement 2. In case of using variants 1, 2 and 3 of
approximation from Table 1, the Nystrom approxima!
tions for the posterior variance can be written as
(7)
(8)
(9)
Proof. Using the first approximation, we obtain the
following expression for the variance:
The above expression is similar to that obtained in [9].
y
ˆhX*()K1
*V11 In1VTV+()
1–VTy,≈
y
ˆhx*()K1
*K11
–1K1
TK1K11
–1K1
TR2–
+()
1–y≈
=
K1
*K11
–1K1
TRRK
1K11
–1K1
TRI
n
+()
1–Ry
=
K1
*K11
–1C1
TC1K11
–1C1
TIn
+()
1–Ry
=
K1
*K11
–1 C1
TC1K11
–1 In1
+()
1–C1
TRy
=
K1
*C1
TC1K11
+()
1–C1
TRyK1
*C1
TC1
(=
+
V11
TV11 )1–C1
TRyK1
*V11
–1 V11
–TC1
TC1V11
–1
(=
+
In1)1–V11
–TC1
TRyK1
*V11
–1 VTVI
n1
+()
1–VTRy.=
σ
ˆ1
2x*() K1
*V11
–1 IV
TV+()
1–V11
–TK1
*T,=
σ
ˆ2
2x*() kx*x*,()K1
*V11
–1V11
–TK1
*T,–=
σ
ˆ3
2x*() kx*x*,()=
–
K1
*V11
–1 IV
TV+()
1–VTV()V11
–TK1
*T.
σ
ˆ1
2x*()K1
*K11
–1K1
*TK1
*K11
–1K1
TR2–K1K11
–1K1
T
+()
1–
–≈
×K1K11
–1K1
*TK1
*K11
–1 K11
–1K1
TR2–K1K11
–1K1
T
+()
1–
–(=
×K1K11
–1 )K1
*TK1
*K11 K1
TR2K1
+()
1–K1
*T
=
=
K1
*V11
TV11 C1
TC1
+()
1–K1
*T
=
K1
*V11
–1 IV
TV+()
1–V11
–TK1
*T.
We w i l l n ow ob t a i n t he re s u l t u si n g t h e s e co n d v a ri !
ant of the covariance matrices approximation:
The obtained variance estimate coincides with the
variance estimate for the case when the standard
Gaussian process regression and the sample
D
1
are
used. Therefore, it is clear that this estimate has suffi!
ciently good numerical properties, but it does not take
into account information that the additional sample
D
\
D
1
was used for the prediction calculation.
Let us obtain an expression when the third variant
of covariance matrices approximation is used:
Let us transform the obtained expression with con!
sideration for the earlier introduced designations:
σ
ˆ2
2x*() kx*x*,()kx*X,()kXX,()
1–kx*X,()
T
–=
≈kx*x*,()K1
*K11
–1K1
TK1K11
–1K1
T
()
1–K1K11
–1K1
*T
–
=
kx*x*,()K1
*K11
–1K1
*T.–
σ
ˆ3
2x*()kx*x*,()K1
*K11
–1K1
TR2–K1K11
–1K1
T
+()
1–
–≈
×K1K11
–1K1
*Tkx*x*,()K1
*K11
–1K1
TR–=
×IRK
1K11
–1K1
TR+()
1–RK1K11
–1K1
*T
=
kx*x*,()K1
*K11
–1C1
TIC
1K11
–1C1
T
+()
1–C1K11
–1K1
*T
–
=
kx*x*,()K1
*K11
–1 IC
1
TC1K11
–1
+()
1–C1
TC1K11
–1K1
*T
–
=
kx*x*,()K1
*K11 C1
TC1
+()
1–
–
×C1
TC1K11
–1K1
*Tkx*x*,()=
–
K1
*IC
1
TC1
()
1–K11
+()
1–K11
–1K1
*Tkx*x*,()=
–
K1
*K11 K11 C1
TC1
()
1–K11
+()
1–K1
*T.
σ
ˆ3
2x*()kx*x*,()K1
*K11 K11 C1
TC1
()
1–K11
+()
1–
–≈
×K1
*Tkx*x*,()K1
*V11
TV11 V11
TV11
+(–=
×C1
TC1
()
1–V11
TV11 )1–K1
*Tkx*x*,()=
–
K1
*V11
TV11 V11
TVTV()
1–V11
+()
1–K1
*T
=
kx*x*,()K1
*V11
–1 IV
TV()
1–
+()
1–V11
–TK1
*T
–
=
kx*x*,()K1
*V11
–1 IV
TV+()
1–VTVV11
–TK1
*T.–
Ta bl e 1 .
Approximation of covariance matrices for different estimates of the posterior prediction variance
Approximation
variant
k
(
x
*,
x
*)
k
(
x
*,
X
)
k
(
X
,
X
)
1
2
k
(
x
*,
x
*)
3
k
(
x
*,
x
*)
K1
*K11
–1K1
*T
K1
*K11
–1K1
T
R2–K1K11
–1K1
T
+
K1
*K11
–1K1
T
K1K11
–1K1
T
K1
*K11
–1K1
T
R2–K1K11
–1K1
T
+
DRAFT
1352
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
Note that if we want to avoid the inversion of matrix
(
V
T
V
)
–1
whose condition number we cannot directly
control, it is necessary to use an expression asymmet!
rical relative to the involved matrices. One should also
note that, if the scale of the matrix
V
T
V
is significantly
larger than the identity matrix, then (
I
+
V
T
V
)
–1
(
V
T
V
)
≈
I
and
≈
i.e., in this case, these two vari!
ants of uncertainty estinates almost conincide.
Statement 3.The computational complexity of the
posterior mean calculation using (6) and the posterior
variance calculation using (7), (8) or (9) at one point
is
O
().
Proof. At first, it is necessary to calculate matrices
V
11
and
V
= The size of the matrix
V
11
is
n
1
×
n
1
and
O
() operations are necessary to calculate
its inverse.
O
() operations are required for calcu!
lating
O
(
n
1
n
) operations are necessary now to
calculate
V
, since the matrix
R
is diagonal.
For
n
* = 1, calculation of the posterior mean con!
sists in calculating
V
11
( +
V
T
V
)
–1
V
T
y
. We use
O
() operations for calculating
V
T
V
.
O
() opera!
tions are necessary to invert +
V
T
V
.
O
() opera!
tions are necessary to calculate
V
11
( +
V
T
V
)
–1
V
T
. At
last,
O
(
n
1
n
) operations are required to estimate the
posterior mean. Hence, the computational complexity
of the Nystrom approximation calculation of the pos!
terior mean is
O
( ) operations.
To c a l c u l a t e
V
11
( +
V
T
V
)
–1
O
() opera!
tions are necessary for calculating ( +
V
T
V
)
–1
and
O
() operations are necessary for obtaining the final
result. Hence, the computational complexity of the
posterior variance approximation using (7) requires
O
( ) operations. Similarly, we obtain the computa!
tional complexity of the posterior variance approxi!
mation using (8) and (9).
Thus, it is required to perform
O
( ) operations
for calculating the required matrices and
O
() oper!
ations for calculating the posterior mean and the pos!
terior variance using these matrices. Hence, the total
computational complexity is
O
().
5. COMPUTATIONAL EXPERIMENTS
In this section, we consider solution of several arti!
ficial test problems and one real problem using the
proposed approach to the sparse Gaussian process
σ
ˆ3
2x*()
σ
ˆ2
2x*(),
nn1
2
RK1V11
–T.
n1
3
n1
2n
K1V11
–T.
In1
n1
2n
n1
3
In1
n1
2n
In1
n1
2n
In1
V11
–1,
n1
2n
In1
n1
3
n1
2n
n1
2n
n1
2n
n1
2n
regression for multifidelity data (Sparse Variable
Fidelity Gaussian Processes (SVFGP) Regression).
The proposed approach is compared with the Gauss!
ian processes (GP) regression for single!fidelity data
and Gaussian process regression for multifidelity data
(Variable Fidelity Gaussian Processes (VFGP)
Regression), for which the Nystrom approximation is
not used. The quadratic exponential covariance func!
tion is used in the experiments [3].
As a measure of accuracy of the obtained surrogate
models, we use the RRMS error, which is estimated
using cross!validation. For the test sample
D
test
=
the RRMS error of the sur!
rogate model is
here Usually, the values of the
RRMS error lie between 0 and 1. The values of the
RRMS error of accurate surrogate models are close to
zero and those of inaccurate surrogate models are
close to or exceed 1.
5.1. Artificial Data
For testing the proposed SVFGP approach, we use
an artificial function with a large number of local sin!
gularities and input dimensionality
d
= 5. Thus, to
construct an accurate surrogate model, we need a big
sample. As a high!fidelity function
y
h
(
x
) and a low!
fidelity function
y
l
(
x
), we use
The high!fidelity function was corrupted by the
Gaussian white noise
ε
h
with a variance equal to 0.001
and low!fidelity function was corrupted by the Gauss!
ian white noise
ε
l
with a variance equal to 0.002. We
generate points from a hypercube [0, 1]
d
using optimal
latin hypercubes (OLHS, [14]). The size of the high!
fidelity data sample was
n
h
= 100, the size of the sub!
sample with base points for the SVFGP was = 1000
in all experiments in this section. The sizes
n
l
= 1000,
2000, 3000, 40000, 5000 of the low!fidelity data sam!
xi
test yi
test = fhxi
test
(),{}
i1=
nt
y
ˆx()
RRMS Dtest y
ˆ
,()
y
ˆhxi
test
()yi
test
–()
2
i1=
nt
∑
yy
i
test
–()
2
i1=
nt
∑
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!,=
y1
nt
!!!yi
test.
i1=
nt
∑
=
yhx() 20 xi
210 2πxi
()cos–()
i1=
d
∑
+=
+
εh,x01,[]
d,∈
ylx() yhx() 0.2 xi1+()
2εl, x01,[]
d.∈+
i1=
d
∑
+=
nl
1
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1353
ple were considered. The results were averaged over
50 random initializations for each value
n
l
.
A personal computer with the Ubuntu operating
system, Intel Core i7, 4 cores, up to 3.4 GHz and 8 Gb
RAM was used for calculations. The obtained results
are summarized in the tables below:
(i) The RRMS errors for the VFGP and SVFGP
are summarized in Table 2.
(ii) The learning time of the surrogate models for
the VFGP and SVFGP are summarized in Table 3.
The RRMS errors for the SVFGP are comparable
with the RRMS errors for the VFGP for the same size
of the learning sample, but the learning time of the
model is substantially smaller for the SVFGP, espe!
cially for sample sizes close to 5000.
5.2. Prediction Uncertainty Estimates for Artificial Data
Let us consider the Ackley function [15] for the
three!dimensional input space:
To l e a r n t h e m o d e l , t h e f o l l o w i n g h i g h f i d e l i t y a n d
low fidelity functions were used. They differed in the
noise variances:
y
h
(
x
) =
f
(
x
)(1 + ),
y
l
(
x
) =
f
(
x
)(1 + ),
ε
is the Gaussian white noise with the unit variance.
For the high fidelity function, variance was equal to
0.1 and the sample size was equal to 60. For the low
fidelity function variance was equal to 0.4 and the
sample size was equal to 160.
We will compare the accuracy of uncertainty esti!
mates on an independent test sample. Figure 1 shows
the distribution function of real prediction errors and
the distribution function of prediction uncertainty esti!
mates, which were obtained using formulas (7), (8), and
(9), respectively. One can see that, for uncertainty esti!
mates (8) and (9), the distribution functions almost
coincide and are closer to the true error distribution
function than the distribution function of uncertainty
estimates obtained using formula (7); the use of this for!
mula leads to under!estimation of the error values.
The information on the accuracy of error estimates
is summarized in Table 4 and allow one to come to sim!
ilar conclusions, i.e., formulas (8) and (9) ensure more
accurate uncertainty estimates from the viewpoint of
correlation and the RRMS criterion calculated from
the true error values and their predictions.
fx() 0.2 x1
2x2
2
+–()exp=
+
32x1
() 2x2
()sin+cos()0.2–x2
2x3
2
+()exp+
+
32x2
() 2x3
()sin+cos().
2σh
2ε
2σl
2ε
σh
2
σl
2
5.3. Rotating Disk Problem
Rotating disk is an important element of the air!
craft engine. It is necessary to construct accurate mod!
els for predicting the maximum radial displacement
u
max
and the maximum load
s
max
, which determine the
disk reliability [16].
Ta b le 2.
Comparison of the RRMS errors for the VFGP
and SVFGP on artificial data
n
l
1000 2000 3000 4000 5000
VFGP 0.0100 0.0086 0.0028 0.0031 0.0024
SVFGP 0.0100 0.0067 0.0049 0.0044 0.0044
Ta b le 3 .
Comparison of the models learning times (in sec!
onds) for the VFGP and SVFGP on artificial data
n
l
1000 2000 3000 4000 5000
VFGP 23.83 254.4 758.2 2334 4496
SVFGP 23.36 26.07 28.89 29.49 35.33
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
50–10 –5 log
σ
(
x
)
0.9
1.0
CDF
True errors
First variant
Second variant
Thrid variant
Fig. 1.
Dis tribu tion f unctio n of th e loga rithm ic tra nsform ation
of errors for the case of: real erros (True errors) and uncertainty
estimates obtained using formula (7) (First variant), for!
mula (8) (Second variant), formula (9) (Third variant).
Ta b le 4 .
Quality of the uncertainty estimates obtained using
different approximations
Formula
for the approximation correlation RRMS error
(7) –0.0274 1.2730
(8) 0.4756 0.6769
(9) 0.4757 0.6769
DRAFT
1354
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
BURNAEV, ZAYTSEV
We parameterize the geometry of the rotating disk
using 8 parameters: radii
r
i
,
i
= 1, ..., 6, which deter!
mine where the disk thickness changes and parameters
t
1
,
t
3
,
t
5
, which determine the disk thickness itself. In
the considered problem, we fix radii
r
4
and
r
5
, and
thickness
t
3
of the rotating disk. Therefore, the dimen!
sionality of the disk parameter space is equal to 6. The
geometrical parameters of the rotating disk are shown
in Fig. 2.
We c o n s id e r t h e f o l l ow i n g h i g h!f i d e li t y a n d l o w !
fidelity functions for calculating objective values
u
max
and
s
max
. As a low!fidelity function we use a solver
based on ordinary differential equations implementing
the Runge–Kutta method [17], and, as a high!fidelity
function we use a solver based on the finite!element
method. One calculation of the low!fidelity function
takes about 0.01 s, and one calculation of the high!
fidelity function requires about 300 s.
The examples of the slices of the low!fidelity and
the high!fidelity functions are shown in Figs 3, 4 and 5
for output
s
max
. The low!fidelity and high!fidelity
functions are similar, but for some cases the low!fidel!
ity function does not represent some nonlinear effects.
In this section we compare the SVFGP with two
basic methods (GP and VFGP). For generation of
sample points the optimal latin hypercube method was
used.
n
h
calculated values of the high fidelity function,
1000 values of the low fidelity function for the VFGP,
and 5000 values of the low fidelity function for the
SVFGP were used for construction of surrogate mod!
els. = 1000 base points were randomly selected
nl
1
r
1
t
5
r
2
r
3
r
4
r
5
r
6
t
3
t
1
Fig. 2.
Parameterization of the rotating disk.
480
460
440
420
400
380
360
340
320 804006020 100
s
max
r
1
Low fidelity points
High fidelity points
Fig. 3.
Sl ice o f
s
max
along
r
1
fo r hig h!fi deli ty a nd lo w!fidel!
ity functions.
432
430
428
426
422
420
418
416
414 190180170 185175 195
s
max
r
4
Low fidelity points
High fidelity points
424
Fig. 4.
Sl ice o f
s
max
al ong
r
4
fo r hig h!fid eli ty an d lo w!fidel!
ity functions.
750
700
650
600
500
450
400 402003010
s
max
t
3
Low fidelity points
High fidelity points
550
Fig. 5.
Slice of
s
max
alo ng
t
3
for high!fid elity and low!fidel!
ity functions.
DRAFT
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS Vol. 60
No. 12
2015
SURROGATE MODELING OF MULTIFIDELITY DATA FOR LARGE SAMPLES 1355
from the low fidelity sample; we varied value of
n
h
from
20 to 100.
To e v a l u a t e t h e a c c u r a c y o f t h e m o d e l , c r o s s !vali!
dation over the sample consisting of 140 values of the
high!fidelity function was used (this sample contained
n
h
points used for construction of surrogate models).
The results are summarized in Table 5 for the
u
max
output and in Table 6 for the
s
max
output. From RRMS
errors values we can see that the SVFGP allows one to
obtain more accurate results than the VFGP and GP.
6. CONCLUSIONS
We proposed a new approach to surrogate model!
ing of multifidelity data, which allows one to process
samples with sizes up to several thousands of points.
The approach is based on the Nystrom approximation
of initial covariance matrices by products of matrices
with smaller sizes. Closed form expressions for a pre!
diction of high fidelity function and its uncertainty
were obtained. The proposed approach and other
widely used methods were compared on some real and
artificial data. Results of experiments allowed us to
conclude that using the proposed approach it is possi!
ble to construct more accurate surrogate models, and
in this case the model construction process has signif!
icantly lower computational complexity.
ACKNOWLEDGMENTS
This work was fulfilled at the IITP and supported
by the Russian Science Foundation, project no. 14!
50!00150.
REFERENCES
1. A. I. J. Forrester, A. Sóbester, and A. J. Keane,
Engi!
neering Design Via Surrogate Modeling: A Practical
Guide. Progress in Astronautics and Aeronautics
(Wiley,
New!York, 2008).
2. N. A. C. Cressie and N. A. Cassie,
Statistics for Spatial
Data
(Wiley, New York, 1993), Vol. 900.
3. C. E. Rasmussen and C. K. I. Williams,
Gaussian Pro!
cesses for Machine Learning
(MIT Press, Cambridge,
MA, 2006).
4. A. I. J. Forrester, A. Sóbester, and A. J. Keane, “Multi!
fidelity optimization via surrogate modelling,” Proc.
Royal Soc. A: Math, Phys. Engineering Sci.
463
, 3251–
3269 (2007).
5. M. C. Kennedy and A. O’Hagan, “Predicting the out!
put from a complex computer code when fast approxi!
mations are available,” Biometrika
87
, 1–13 (2000).
6. S. Koziel, S. Ogurtsov, I. Couckuyt, and T. Dhaene,
“Cost!efficient electromagnetic!simulation!driven
antenna design using co!kriging,” IET Microwaves,
Antennas & Propagation,
6
, 1521–1528 (2012).
7. E. Snelson and Z. Ghahramani, “Sparse gaussian pro!
cesses using pseudo!inputs,” Adv. Neural Inf. Process.
Syst.
18
, 1257–1264 (2006).
8. J. Quiñonero!Candela and C. E. Rasmussen, “A unify!
ing view of sparse approximate gaussian process regres!
sion,” J. Mach. Learn. Res.
6
, 1939–1959 (2005).
9. L. Foster, A. Waagen, N. Aijaz, M. Hurley, A. Luis,
J. Rinsky, C. Satyavolu, M. J. Way, P. Gazis, and
A. Srivastava, “Stable and efficient gaussian process
calculations,” J. Mach. Learn. Res.
10
, 857–882
(2009).
10. P. Drineas and M. W. Mahoney, “On the nyström
method for approximating a gram matrix for improved
kernel!based learning,” J. Mach. Learn. Res.
6
, 2153–
2175 (2005).
11. A. A. Zaytsev, E. V. Burnaev, and V. G. Spokoiny,
“Properties of the bayesian parameter estimation of a
regression based on gaussian processes,” J. Math. Sci.
203
, 789–798 (2014).
12. F. Bachoc, “Cross validation and maximum likelihood
estimations of hyper!parameters of gaussian processes
with model misspecification,” Comput. Stat. Data
Analysis,
66
, 55–69 (2013).
13. G. H. Golub and Ch. F. Van Loan,
Matrix Computations
(Johns Hopkins Univ., London, 2012), Vol. 3.
14. J.!S. Park, “Optimal latin!hypercube designs for com!
puter experiments,” J. Stat. Plann. Infer.
39
, 95–111
(1994).
15. D. Karaboga and B. Basturk, “A powerful and efficient
algorithm for numerical function optimization: artifi!
cial bee colony (abc) algorithm,” J. Global Optim.,
39
,
459–471 (2007).
16. S. C. Armand,
Structural Optimization Methodology for
Rotating Disks of Aircraft Engines. Technical Report
(National Aeronautics and Space Administration,
Office of Management, Scientific and Technical Infor!
mation Program, 1995).
17. J. Ch. Butcher,
Numerical Methods for Ordinary Differ!
ential Equations
(Wiley Online Library, 2005).
Translated by N. Pakhomova
Ta b l e 6 .
RRMS errors for the rotating disk problem.
Output
s
max
n
h
20 40 60 80 100
GP 0.5261 0.3181 0.2164 0.2095 0.1643
VFGP 0.2336 0.2326 0.2058 0.1321 0.1088
SFGP 0.1674 0.1095 0.1023 0.0939 0.0812
Ta bl e 5 .
RRMS errors for the rotating disk problem.
Output
u
max
n
h
20 40 60 80 100
GP 0.3368 0.1826 0.1305 0.1091 0.0756
VFGP 0.1679 0.0998 0.0822 0.0564 0.0435
SVFGP 0.1018 0.0658 0.0494 0.0427 0.0339
DRAFT