Content uploaded by Jeffrey Dean Kelly
Author content
All content in this area was uploaded by Jeffrey Dean Kelly on Sep 03, 2015
Content may be subject to copyright.
Formulating Large-Scale Quantity-Quality Bilinear Data Reconciliation Problems
Jeffrey Dean Kelly1
Keywords: Data reconciliation, multi-linear, equality constrained optimization, analytical derivatives and quantity-quality
problem.
1 Industrial Algorithms, 15 St. Andrews Road, Toronto, Ontario, Canada, M1P 4C3
E-mail: jdkelly@industrialgorithms.ca
Abstract
This short note describes the relevant details of formulating and implementing general bilinear quantity-quality balances found
in industrial processes when data reconciliation is applied. The modeling also allows for the straightforward generation of
analytical first-order derivatives. Quantity-quality balance problems are those that involve both extensive and intensive stream
variables such as flows and compositions respectively and are related through laws of conservation of material, energy and
momentum. The balance equations involve both linear and bilinear terms (multi-linear) of quantity and quality where quantity
times quantity and quality times quality are not germane although they can be included easily. Two numerical examples are
provided to demonstrate the new formulation technique.
Introduction
The process industries that typically require some form of data reconciliation of their aggregated and real-time process data
include mineral processing, oil and gas recovery, petroleum refining and petrochemical manufacturing. These industries use
data reconciliation to verify custody transfer and field flowmeters, field analyzers and laboratory instruments including automatic
tank gauging systems. The resulting data from data reconciliation are then used to monitor the effectiveness and efficiency of the
manufacturing process and to benchmark its performance against other plants consuming and producing similar feedstocks and
productstocks.
Obviously in order to have data reconciliation we need some level of redundancy in the system or network. This is achieved
through the existence of independent measuring devices and laws of conservation of material, energy and momentum which
allow us to relate one measurement to another through the topology of the integrated process. Unfortunately, these relationships
are nonlinear in the sense that there may exist a product of one variable times another, as we shall see further, thus requiring
iterative or successive substitution solving techniques. Fortunately, for the types of data reconciliation problems found in the
process industries, these problems are relatively well studied with some recent and relevant literature available in Crowe (1986),
Stephenson and Shewchuk (1986), MacDonald and Howat (1988), Pai and Fisher (1988), Swartz (1989), Tjoa and Biegler
(1991), Crowe (1996), Sanchez and Romagnoli (1996), Weiss et. al. (1996), Chen et. al. (1998), Kelly (1998), Schraa and Crowe
(1998) and Puming and Gang (2002). There is also an important structure that can be exploited in terms of presenting the
problem to the solver and this is what we term the quantity-quality problem which is also found at the core of the models that
plan and schedule these processes.
In the quantity-quality problem, we hierarchically class the variables into two sub-classes called quantity variables and quality
variables. Quantity variables are flows and inventories of mass, moles, volume, energy and momentum. Quality variables are
compositions, properties and conditions such as concentrations, densities, temperatures, pressures and velocity. Quantities are
known as extensive variables and qualities are intensive variables. Extensive variables scale with the system whereas intensive
variables do not. If the size of a reactor doubles then the throughput will double but the temperature and pressure at which the
reactor is operated will still be according to the process design and operating procedure. These variables are then related through
2
the quantity-quality balances of the system, where for the purposes of data reconciliation, there is no distinction being made
between differential (continuous or rate-based) and integral (batch or amount-based) balances. Moreover, it is implied that these
are steady-state balances with no accumulation. These variables are further classed into sub-categories specific to the problem of
data reconciliation defined shortly. It should also be noted that a possible advantage of this formulation is that it allows and
supports the reconciliation and calculation of both the measured and unmeasured variables explicitly using one optimization
solution. In contrast, the nonlinear matrix projection method of Crowe (1986) for example aggregates measured flows with
measured concentrations (i.e., category 1 in Crowe’s terminology). After a solution has been obtained by matrix projection a post
one-stream-at-a-time optimization must be performed to disaggregate a stream’s measured flow and concentration. The
formulation presented here does not require aggregating a stream’s extensive and intensive variables together and hence can be
used to provide consistent covariance information necessary for gross error detection and identification on the variables
explicitly.
Using the well-known definition of a process, which is an operation (or series of operations) that causes a physical or chemical
change in a substance, it is possible to explain a little further the relevance of the quantity-quality balances. We know that matter
can neither be created nor destroyed and consequently all processes must conserve mass and therefore a quantity-quality balance
can be used to perform a mass (or weight) balance involving flows in volume units times densities. Moles always conserve
except when a chemical reaction occurs (e.g., combusting natural gas with air) and hence a quantity-quality balance can be
formulated for a mole balance involving flows in mass units times the reciprocal molecular weights. Volume always conserves
except when either a chemical or physical reaction occurs (e.g. mixing alcohol and water) and thus a quantity-quality balance can
be modeled for a volume balance involving flows in mass units times specific volumes (or the reciprocal of density). Other
notable quantity-quality balances that would be found in the process industries mentioned are component balances such as
hydrogen, sulfur and propylene, property balances such as flash point, octane and vapor pressure and process condition balances
such as heat, work and entropy.
The next two sections that follow describe in more detail how to formulate both the functions and the first-order derivatives of the
quantity-quality balances but first it is important to highlight the mathematical description of the data reconciliation problem to
show their relevance. Adopting the nomenclature found in Kelly (1998), it is possible to pose the data reconciliation as a local,
equality constrained optimization problem with a quadratic objective function and general nonlinear constraints as
0zy,x,f
x)(xQ)x(x
mm
x
=
−⋅⋅−=
−
)(
min
1T
J
(1)
where
J
is the objective function, to be compared against a theoretical lower bound to detect the presence of gross errors (Crowe
(1996)),
m
x
is the vector of measurements for quantity and quality measured variables,
x
is the vector of reconciled quantity
and quality variables,
y
is the vector of calculated quantity and quality variables (unmeasured variables),
z
is the vector of
3
fixed or constant quantity and quality variables (more appropriately parameters),
Q
is the matrix of weights for quantity and
quality measured variables (i.e., proportional to the uncertainty in the measurements) and
f
is the vector of quantity-quality
balances or functions. The notation used in equation (1) for the functions describes them as general nonlinear equality
constraints with ad hoc structure, however for our purposes they are shown to be described in a specific quantity-quality bilinear
framework. The linear data reconciliation sub-problem solved at each cycle or iteration
s
of the solution method is
0f+yB+x A
)x(xQ)x(x
sssss
smsm
x
s
=⋅⋅
−⋅⋅−=
−
min
1T
s
J
(2)
where
s
x
and
s
y
are the vectors of reconciled and calculated variables
s
A
and
s
B
represent the matrix of first-order
derivatives (Jacobian) re-computed at each iteration. There is also the matrix
s
C
described in the section to follow which is
required to compute
s
f
.
Formulating the Functions (Vector of Equations)
In this section we show how the quantity-quality bilinear problem can be formulated generically. The nomenclature may be
somewhat daunting at first however it can be easily programmed directly into commercial matrix programming languages such
as MATLAB and MATHEMATICA for example. Our objective is to replace the general function of equation (1) with a more
structured version as follows.
0zIzE,,yI,yE,xI,xEf
sssss
=
)(
(3)
where the reconciled (
ss
xI,xE
), calculated (
ss
yI,yE
) and fixed (
zIzE,
) variables are classed into the extensive and
intensive categories. Equation (3) shows the sub-division of the variables but we also require a decomposition of the functions or
equality constraints as well
C
s
B
s
A
s
L
ss
fffff
+++=
(4)
where
L
s
f
is the vector of functions for the extensive and intensive linear constraints (see equation (5)),
A
s
f
is the vector of
functions evaluated for the constraints involving the bilinear terms of extensive reconciled variables times intensive reconciled,
calculated and fixed variables (see equation (6)),
B
s
f
is the vector of functions involving extensive calculated variables with
intensive variables (see equation (7)) and
C
s
f
is the vector of functions involving extensive fixed variables with intensive
variables (see equation (8)).
zICIzECEyIBIyEBExIAIxEAEf
ssss
L
s
⋅+⋅+⋅+⋅+⋅+⋅=
(5)
sss
A
s
xEzIAECIxEyIAEBIxExIAEAIf ⋅+⋅+⋅=
(6)
4
sss
B
s
yEzIBECIyEyIBEBIyExIBEAIf
⋅+⋅+⋅=
(7)
zEzICECIzEyICEBIzExICEAIf
ss
C
s
⋅+⋅+⋅=
(8)
The six incidence matrices found in equation (5) correspond to the linear relationships for all of the variables where all of the
matrices shown in equations (5) to (8) have the same number of rows (i.e., the number of functions
NF
). As an example,
AE
would contain the 1’s and –1’s in the appropriate elements for a mass balance involving mass reconciled extensive variables for a
reactor and
AI
would contain the 1’s and –1’s for a splitter involving density reconciled intensive variables requiring the
density in to be equal to the densities out. The bilinear variable terms found in for example the vector
s
xExI
represent all of
the bilinear products in the quantity-quality system balance for extensive reconciled variables times intensive reconciled
variables. In essence, we simply create extra variables for the bilinear terms which in planning and scheduling terminology of
the quantity-quality problem is known as cascading. The associated incidence matrix
AEAI
will also contain 1’s and –1’s
indicating where the bilinear term is found in the balances; note by convention, a +1 means a fluxion or movement of substance
into a piece of equipment or balance node and –1 means a fluxion of substance out of the node. All of these incidence matrices
do not have the subscript
s
attached because they do not change during the path to the reconciliation solution. It should also be
mentioned that we have neglected extensive variables times other extensive variables and intensive variables times other
intensive variables given that these are not necessarily germane to the quantity-quality problem however, they could be easily
added in a similar way.
In order to update the bilinear vectors such as
s
xExI
at each iteration with the new estimates of the variables, there must be
mechanism to do this. This can be easily accomplished with the use of two index sets
XE
XEXI
and
XI
XEXI
(there will be
two index sets for every bilinear term vector). These contain the extensive and intensive variable indices respectively and
s
xExI
can be computed as follows.
)(,)(,..1,..1,..1
)()()(
iXEXIkiXEXIjNXIkNEXjNXEXIi
kji
XIXE
=====∀ ⋅=
sss
xIxExExI
(9)
where
NXEXI
specifies the number of reconciled extensive times intensive variable bilinear terms,
NXE
is the number of
reconciled extensive variables and
NXI
is the number of reconciled intensive variables. From the perspective of generating the
linear sub-problem information for equation (2), the vectors of reconciled and calculated variables (
s
x
,
s
y
) are formed as
=
s
s
s
xI
xE
x
and
=
s
s
s
yI
yE
y
where the parameters of the optimization are formed before the first iteration as
5
=
m
m
m
xI
xE
x
,
=
zI
zE
z
and
=
QI0
0QE
Q
Before we show how to form analytically the Jacobian, there are couple of modeling details that need to be discussed specific to
the quantity-quality problem structure. The first detail is that a variable, either extensive or intensive, cannot exist in the same
function more than once in either a linear or bilinear term. This was originally violated in the heat balance example of Swartz
(1989) found in Kelly (1998) and discussed in our numerical example 2. This means that every stream or fluxion of material
must have a unique set of variables created. For example, the original formulation of example 2 had the two terms
)2(*)1()2(*)1( xIxEyIxE
−
in one of the heat balance function definitions which violates our bilinear quantity-quality
requirement. The correct way to model this is to use the known and generated stream flow variable
)2(xE
in place of
)1(xE
in the second bilinear term. The second detail is that the number of non-zeros in the bilinear incidence matrices (
AEAI
, etc.) must be greater than or equal to the number of bilinear terms found in the index sets (
XE
XEXI
and
XI
XEXI
,
etc.) and less than or equal to two times the number of bilinear terms (
NXEXI
, etc.). This means that a bilinear term must
appear in at least one function and cannot appear more than twice in keeping with the first detail.
Formulating the First-Order Derivatives (Jacobian Matrix)
A major driving force to model the process industries’ data reconciliation problem in a structured quantity-quality framework is
the ability to easily generate analytical Jacobian or the matrix of first-order derivatives. Knowing that the underlying data
reconciliation is a bilinear problem, the Jacobian can be easily evaluated at each iteration simply by inserting one of the extensive
or intensive variable iterates into the appropriate location. For our purposes, the matrix definitions of the derivatives for the
reconciled variables are
[ ]
XE
s
XE
s
XE
s
s
s
E
s
AECIAEBIAEAIAE
xE
f
A
+++=
∂∂
=
(10a)
[ ]
XI
s
XI
s
XI
s
s
s
I
s
CEAIBEAIAEAIAI
xI
f
A
+++=
∂
∂
=
(10b)
[ ]
I
s
E
s
ss
s
s
AA
xIxE
f
A
=
∂∂
=
] [
TT
(10c)
where is
s
A
is the same matrix found in the equation (2) and is formed by the matrix augmentation in equation (10c). Similarly
the matrix of derivatives
s
B
and
s
C
are found in equations (11) and (12).
[ ]
YE
s
YE
s
YE
s
E
s
BECIBEBIBEAIBEB
+++=
(11a)
[ ]
YI
s
YI
s
YI
s
I
s
CEBIBEBIAEBIBIB
+++=
(11b)
6
[ ]
I
s
E
ss
BBB
=
(11c)
and
[ ]
ZEZE
s
ZE
s
E
s
CECICEBICEAICEC
+++=
(12a)
[ ]
ZIZI
s
ZI
s
I
s
CECIBECIAECICIC
+++=
(12b)
[ ]
I
s
E
ss
CCC
=
(12c)
The new matrices,
XE
s
AEAI
for example, symbolize derivative of the bilinear terms with respect to the extensive and intensive
variables. These are easily generated at iteration
s
for example as
)(),(,1,1..1
)()),((),(
kXEXImkXEXIj..NXEXIk..NXENF,ji
mkisignji
XIXE
=====∀ ⋅=
s
XE
s
xIAEAIAEAI
(13)
where
)(
⋅
sign
is the sign of the 1 or –1 for the matrix element in question and
XE
XEXI
and
XI
XEXI
are the same index
sets as defined previously to generate the bilinear terms at each iteration used in the function evaluations. The last section
delineates the bilinear quantity-quality formulation and implementation for two numerical examples solved previously in the
literature.
Numerical Examples
The two numerical examples described below have been hitherto solved in Kelly (1998) and can also be found in other studies.
The examples are formulated and solved using MATLAB Release 13 (The Mathworks Inc. (2002)). The first example is a small
mining floatation circuit problem with copper (Cu) and zinc (Zn) concentrations in each of the eight streams. The dimension of
the problem is provided in Table 1 along with the statistics for example 2.
Table 1. Numbers of functions, variables and bilinear terms for examples 1 and 2 respectively.
N
F
NXE NXI NYE NYI NZE NZI NXEXI NXEYI NXEZI NYEXI NYEYI NYEZI NZEXI NZEYI NZEZ
I
12 0 14 7 2 1 0 0 0 0 12 2 0 2 0 0
17 6 10 9 5 0 0 5 1 0 5 4 0 0 0 0
Of the nine sets of possible bilinear term combinations, only three have any elements which is consistent with the observation
that the number of extensive measured variables and the number of intensive fixed variables is zero. Table 2 shows the division
of the variables into the designation of reconciled, calculated and fixed. Table 3 shows the index sets
YE
YEXI
,
XI
YEXI
,
YE
YEYI
,
YI
YEYI
,
ZE
ZEXI
and
XI
ZEXI
. The index set pairs
YE
YEXI
and
XI
YEXI
, for example, indicate the
xIyE
⋅
products found in the system hence bilinear term number seven specifies the product of
)1(yE
and
)9(xI
.
Table 2. Classification of variables for example 1.
7
xI yE yI zE
1 Cu in Stream 1 Flow of Stream 2 Cu in Stream 8 Flow of Stream 1
2 Cu in Stream 2 Flow of Stream 3 Zinc in Stream 8
3 Cu in Stream 3 Flow of Stream 4
4 Cu in Stream 4 Flow of Stream 5
5 Cu in Stream 5 Flow of Stream 6
6 Cu in Stream 6 Flow of Stream 7
7 Cu in Stream 7 Flow of Stream 8
8 Zn in Stream 1
9 Zn in Stream 2
10 Zn in Stream 3
11 Zn in Stream 4
12 Zn in Stream 5
13 Zn in Stream 6
14 Zn in Stream 7
Table 3. Index sets for bilinear terms in example 1.
123456789101112
YEXIYE 123456123456
YEXIXI 2 3 4 5 6 7 9 10 11 12 13 14
YEYIYE 7 7
YEYIYI 1 2
ZEXIZE 1 1
ZEXIXI 1 8
Due to space limitations we do not show the fifteen possible incidence matrices (i.e.,
AEAI,AI,AE,
etc.) however after
converging the model in 6 iterations using the solution method of Kelly (1998) a final objective function value of 17.9615269
was found compared to 17.9616400 found after 22 iterations previously. The reason for the discrepancy is due to the fact that the
Kelly (1998) problem instance was missing an element in the calculated variables Jacobian matrix
s
B
for function ten.
The second example’s problem size can be found in Table 1. This problem is a heat balance with four heat exchangers, one
splitter and one mixer. Table 4 provides the variable classification and Table 5 specifies the appropriate index sets.
8
Table 4. Classification of variables for example 2 (F = Flow and T = Temperature).
xE xI yE yI
1 F of Stream A1 T of Stream A1 F of Stream A2 T of Stream A2
2 F of Stream A3 T of Stream A3 F of Stream A4 T of Stream A6
3 F of Stream A6 T of Stream A4 F of Stream A5 T of Stream B2
4 F of Stream B1 T of Stream A5 F of Stream A7 T of Stream B3
5 F of Stream C1 T of Stream A7 F of Stream A8 T of Stream C2
6 F of Stream D2 T of Stream A8 F of Stream B2
7 T of Stream B1 F of Stream B3
8 T of Stream C1 F of Stream C2
9 T of Stream D1 F of Stream D1
10 T of Stream D2
Table 5. Index sets for bilinear terms in example 2.
12345
XEXIXE 12456
XEXIXI 1 2 7 8 10
XEYIXE 3
XEYIYI 2
YEXIYE 23459
YEXIXI 34569
YEYIYE 1678
YEYIYI 1345
Notice appropriately from Table 5 that there are as many bilinear terms as there are products of stream flow times stream
temperature which is identical to the number of streams in the problem (A1…A8, B1...B3, C1, C2, D1, D2). There can never be
more bilinear terms than streams or fluxions times the number of qualities for a stream however there can be fewer bilinear terms
if a quantity-quality balance involving a particular stream is not required. This is also a consistent observation for example 1
given that there are eight streams with two metal concentrations requiring sixteen bilinear terms which is exactly the number of
bilinear terms found in Table 3. Finally, when example 2 was re-solved using the new quantity-quality bilinear formulation an
objective function of 14.5861664 was found in 4 iterations. Previously it took 5 iterations with an objective function value of
14.5861656 using the same settings. Unfortunately, there is no obvious explanation for the slight difference in the iteration
count.
Conclusions
Presented in this short note are the fine points of formulating and implementing quantity-quality bilinear data reconciliation
problems found in the process industries. The contribution of this work is that the formulation is completely extensible to both
small and large bilinear quantity-quality problems given that either dense or sparse matrix implementations can be employed and
that first-order derivative information can be easily calculated in the form of the Jacobian matrix explicitly. The formulation can
also be straightforwardly modified to handle other multi-linear terms (i.e., trilinear and quadlinear) either by creating extra
variables and constraints or by extending the formulation with new incidence matrices and index sets. For example, adding a
trilinear product term for a volume flow times a density times a sulfur concentration can be achieved by creating a calculated
9
extensive variable to represent the bilinear product of volume flow times the density and an extra function to enforce that the
calculated extensive variable equals the volume flow times the density.
References
Chen, X., Pike, R.W., Hertwig, T.A. and Hopper, J.R. “Optimal Implementation of On-Line Optimization”, Computers chem.
Engng., 22, S435-S442, (1998).
Crowe, C.M., “Reconciliation of Process Flow Rates by Matrix Projection, Part II: The Nonlinear Case”, AIChE Journal, 32,
616-623, (1986).
Crowe, C.M., “Data Reconciliation - Progress and Challenges”, J. Proc. Cont., 6, 89-98, (1996).
Kelly, J.D., “A Regularized Solution to the Reconciliation of Constrained Data Sets”, Computers chem. Engng., 22, 1771-1788,
(1998).
MacDonald, R.J. and Howat, C.S., “Data Reconcilation and Parameter Estimation in Plant Performance Analysis”, AIChE
Journal, 34, 1-8, (1988).
The Mathworks Inc., MATLAB Release 13, June, (2002).
Pai, C.C.D. and Fisher, G.D., “Application of Broyden’s Method to Reconciliation of Nonlinearly Constrained Data”, AIChE
Journal, 34, 873-876, (1988).
Puming, Z. and Gang, R., “Steady-State Bilinear Data Reconciliation Dealing with Scheduling”, IFAC 15th Triennial World
Congress, Barcelona, Spain, (2002).
Sanchez, M. and Romagnoli, J., “Use of Orthongonal Transformations in Data Classification-Reconciliation”, Computers chem.
Engng., 20, 483-493, (1996).
Schraa, O.J. and Crowe, C.M., “The Numerical Solution of Bilinear Data Reconciliation Problems Using Constrained
Optimization Methods”, Computers chem. Engng., 22, 1215, (1998).
Stephenson, G.R., and Shewchuk, C.F., “Reconciliation of Process Data with Process Simulation”, AIChE Journal, 32, 247-254,
(1986).
Swartz, C.L.E., “Data Reconciliation for Generalized Flowsheet Applications”, Amer. Chem. Society National Meeting, Dallas,
Texas, (1989).
Tjoa, I.B., and Biegler, L.T., “Simultaneous Strategies for Data Reconciliation and Gross Error Detection of Nonlinear Systems”,
Computers chem. Engng, 15, 679-690, (1991).
Weiss, G.H., Romagnoli, J.A., and Islam, K.A., “Data Reconciliation – An Industrial Case Study”, Computers, chem. Engng.,
20, 1441-1449, (1996).
10