Available via license: CC BY 4.0
Content may be subject to copyright.
arXiv:2409.07490v1 [cs.CR] 8 Sep 2024
Polynomial Methods for Ensuring Data Integrity in Financial
Systems
Ignacio Brasca
September 13, 2024
1 Introduction
In any sufficiently complex system, such as any vast
financial system with any practical user base, ensur-
ing data integrity across Kdata points is crucial.
Furthermore, to satisfy correctness during usage,
we should ensure that kndata points are verified and
correct for consumption while Kis correct as a whole,
without any discrepancies in expected values.
Values falling out of valid ranges can cause distress
among consumers of this data, leading to questions
directed at platform designers, resulting in wasted
time trying to recover original data points used, mit-
igating data loss, and validating inputs across Iindi-
cators where Iis the number of total indicators used
across the platform. (An indicator is a set of kncon-
figured in a way that an operation produces a result
consumed by the end-consumer.)
To address these issues, we propose implementing
an algorithm that relies on the classical Lagrange In-
terpolation to help maintain correctness in our set of
Iby using polynomials through known data points.
2 Background
Indicators (In) play a fundamental role in any fi-
nancial system where, for each specific In, a set
{k1, k2,...,kn}could be in use.
In=
kn
X
i=0
f(ki) (1)
where f(ki) is a function that takes a data point
and produces a result. This is called an operation and
is used to calculate any measure required to obtain
In.
An indicator, in conclusion, is a function that takes
a set of data points and produces a result used in
the calculation of other indicators or as an individual
indicator ready for consumption.
Polynomials
The usage of polynomials to maintain consistency is
well-discussed in the literature. Algorithms such as
Reed-Solomon codes, used in error detection and cor-
rection, rely on polynomials to ensure data integrity
[2], [6], [4].
This, in addition to the simplicity of polynomial
representation, makes it a good candidate for ensur-
ing data integrity.
Additionally, polynomials are key in solving com-
plex problems in mathematics by breaking them
down into simpler functions that can be recombined
to form the original problem. Given any polynomial
P(x), there exists a unique function y=φ(t) that
satisfies the differential equation:
y′+p(t)y=g(t), y(t0) = y(2)
This guarantees the existence and uniqueness of
the polynomial for given points [10].
Knowing existence and uniqueness from a set of
points, we can rely on that to understand how we can
use this to ensure the integrity of our data points.
Shamir Secret Sharing [12] is another example of
a cryptographic algorithm that relies on polynomial
interpolation to recover a secret from a set of shares,
1
leveraging the same properties of polynomials de-
scribed later in this document.
Lagrange Interpolation
Lagrange Interpolation [1] is a method used to find
a polynomial P(x) that passes through a set of given
points (x0, y0),(x1, y1),...,(xn, yn). The polynomial
P(x) is constructed as a linear combination of basis
polynomials:
P(x) =
n
X
i=0
yiLi(x) (3)
where Li(x) are the Lagrange basis polynomials
defined as:
Li(x) = Y
0≤j≤n
j6=i
x−xj
xi−xj
(4)
Lagrange Interpolation: An example
Consider the points (1,2), (2,3), and (3,5). Using
Lagrange Interpolation, the basis polynomials are:
L0(x) = (x−2)(x−3)
(1 −2)(1 −3) =(x−2)(x−3)
2(5)
L1(x) = (x−1)(x−3)
(2 −1)(2 −3) =−(x−1)(x−3) (6)
L2(x) = (x−1)(x−2)
(3 −1)(3 −2) =(x−1)(x−2)
2(7)
The interpolating polynomial P(x) is given by:
P(x) =
2
X
i=0
yiLi(x) (8)
Simplifying, we obtain:
P(x) = 1
2x2−1
2x+ 2 (9)
This polynomial passes through the points (1,2),
(2,3), and (3,5).
3 Application
The application of polynomials in our context allows
us to recover and validate data points, ensuring con-
tinuity and uniqueness of the results across all inputs.
We describe a k1, k2,...,knset of data points used
to generate a polynomial from a list of arguments
used in the Inindicator.
Suppose we have a set of data points
(x0, y0),(x1, y1),...,(xn, yn) to compute an in-
dicator In.
Furthermore, we take a subset of these original
data points and assume [x0= 0, x1= 1, x2=
2,...,xn=n], where the value of f(x) at each xi
corresponds to the original data point’s value. For
example, f(0) = X,f(1) = Y,f(2) = Z, and so on
up to f(n) = n.
From there, we can start describing a f(x) poly-
nomial from a set of points first defined by our data
explanation from knsample data points. After pre-
senting a set of points, already defined, we can start
interpolating the polynomial from the set of those
same exact points.
Giving us a list of (x0, y0),(x1, y1),...,(xn, yn),
defined across (0, f (0)),(1, f (1)),...,(n, f(n)) from
which we now have a polynomial that describes the
function f(x):
f(x) =
n
X
i=0
yiLi(x) (10)
where Li(x) is the Lagrange basis polynomial de-
fined as:
Li(x) = Y
0≤j≤n
j6=i
x−xj
xi−xj
(11)
Using f(x), we can now generate a set of mparity
blocks that can be used to recover the original data
points in case of data loss or corruption (as long as
we store those blocks in a different place, see section
5).
Ideally, one can generate as many k(where kis
the number of points used in the first place) parity
blocks as needed; however, we can generate as many
2
mblocks as we want and store them across different
data storages.
One caveat is that we can reconstruct the original
value as long as we have at least kpoints, where k
is the threshold number of points required to recon-
struct the polynomial, and nis deg (P(x)), which is
n=k−1.
This ensures original kdata points can be recon-
structed from the parity blocks, maintaining data in-
tegrity and consistency without the need to store the
original data points in multiple locations.
Example: Carbon footprint calculation
One notable application is the calculation of the car-
bon footprint from the total investments of a portfo-
lio [3]:
carbon footprint = total scope emissions
total value of investments (12)
Breaking this down, the total scope emissions can
be derived from individual companies’ emissions. By
using Lagrange Interpolation, we can ensure that the
calculated carbon footprint is consistent and accu-
rate.
For three companies with emissions:
Company A: 300 tonnes (13)
Company B: 400 tonnes (14)
Company C: 300 tonnes (15)
Total Value: 3000 EUR (16)
We can define the polynomial P(x) denoted by the
set of points: (1,300), (2,400), (3,300), (4,3000).
The Lagrange Interpolation polynomial is:
P(x) = 1
6x3−3
2x2+11
6x−100 (17)
By interpolating these values, we can recover the
original carbon footprint even if some data points are
lost.
4 Concerns and Data Recovery
4.1 Concerns
One primary concern is the possibility of data corrup-
tion or loss. To mitigate this, we store data points in
multiple locations and use redundancy to recover lost
data. The method also ensures that any interpolation
is reversible, allowing us to verify the integrity of the
data. This redundancy is analogous to techniques
used in Reed-Solomon error correction codes, which
can correct multiple errors in data blocks [9], [5].
4.2 Data Recovery Example
To demonstrate data recovery, consider the earlier ex-
ample of the carbon footprint calculation. If we lose
some data points, we can use the remaining points
and the interpolating polynomial to reconstruct the
missing values. This is done using the Lagrange In-
terpolation formula and the stored data points.
5 Storing Data Points
To further enhance data integrity, we store data
points in a secondary data storage. This ensures that
even in the case of primary storage failure, the data
can be recovered from the secondary storage. The
steps include:
1. Prepare data parts for kdata points.
2. Construct the polynomial using Lagrange Inter-
polation.
3. Sample additional points (parity blocks) and
store them in secondary storage.
4. Use the stored data and parity blocks to recover
the original values if needed.
This approach is similar to RAID 6 storage sys-
tems, which use Reed-Solomon codes to provide fault
tolerance and data recovery capabilities [8], [13].
3
5.1 Example
fn store_data_points(data_points: Vec<f64>) {
// construct polynomial from data points
let polynomial = polynomial(&data_points);
// sample parity blocks
let parity_blocks =
sample_parity(&polynomial);
// store data points and parity blocks
store_in_primary_storage(&data_points);
store_in_secondary_storage(&parity_blocks);
}
fn recover_data_points(data_points: Vec<f64>)
-> Vec<f64> {
// check if data points exist, if not,
recover from backup
let data_points = if !in_primary_storage() {
retrieve_from_backup()
}else {
retrieve_from_primary()
};
// construct polynomial from available data
points
let polynomial = construct(&data_points);
polynomial.interpolate(&data_points)
}
Listing 1: Pseudo code for storing data points
6 Use Cases in Other Fields
6.1 Medical Imaging
In medical imaging, accurate data reconstruction is
crucial. Lagrange Interpolation can be used to fill
in missing or corrupted pixel values in MRI and CT
scans, ensuring accurate and reliable images for diag-
nosis [7].
6.2 Climate Modeling
Climate models rely on vast amounts of data from
various sources. Lagrange Interpolation can help in
interpolating missing data points from temperature,
humidity, and other climatic variables, ensuring the
models are accurate and robust [2].
6.3 Engineering Design
In engineering design, particularly in finite element
analysis, interpolating values at various points in a
mesh is essential. Lagrange Interpolation can provide
accurate approximations of physical properties across
the mesh [6].
7 Secondary Database for Data
Recovery
Consider a scenario where we have a critical equation
used in financial forecasting:
F(t) = a·ebt +c·sin(dt) (18)
The coefficients a, b, c, d are stored in a primary
database. In the event of a database failure, we can
recover these coefficients using Lagrange Interpola-
tion from a secondary database where interpolated
parity blocks are stored. If the primary database fails
or is corrupted, the following steps can be used to re-
cover the coefficients:
7.1 Recovery Procedure
Step 1: Check Primary Database Accessibility
First, determine whether the primary database is
accessible. This involves checking the network con-
nectivity, server status, and database health. En-
suring the primary database is online and function-
ing correctly is crucial before proceeding to data re-
trieval.
Step 2: Retrieve Parity Blocks from Sec-
ondary Database
If the primary database is not accessible, initiate
the retrieval process for parity blocks stored in the
secondary database. These parity blocks are essential
for reconstructing the missing data and are typically
stored in a fault-tolerant manner.
4
Step 3: Reconstruct Missing Coefficients
Using Lagrange Interpolation
Utilize the retrieved parity blocks and the known
data points to perform Lagrange Interpolation. This
mathematical technique will help in reconstructing
the missing coefficients, ensuring that the recovered
data is accurate and reliable.
Step 4: Validate Recovered Coefficients
Finally, validate the reconstructed coefficients by
comparing them against known data points. This
step is crucial for verifying the accuracy and integrity
of the recovered data, ensuring consistency across the
system.
8 Conclusion
The technique described here can be used to ensure
data integrity across a set of Iindicators and main-
tain consistency in a system where traceability of the
Inin use is critical.
This technique can be applied in any critical sys-
tem where robustness is mandatory. By relying on
the mathematical properties of the Lagrange Inter-
polation, we can be sure our function will be well-
defined in the domain of R.
Storing parity blocks in secondary databases allows
us to ensure from them kparity and discard corrupted
(or untrusted) data points. These parity blocks en-
able us to trace back across the domain and recover
the set {(0, f (0)),(1, f(1)),...,(n, f (n))}data points
used to generate the polynomial f(x) in the first
place. This remains true even if all the original data
points are lost or corrupted, thanks to extrapolation
and reliance on the uniqueness and existence of the
polynomial f(x) in the domain of R[11].
This approach provides fault tolerance (up to m
parity blocks) and data recovery capabilities. By im-
plementing this technique, we can enhance data in-
tegrity and ensure the continuity of critical opera-
tions in various domains.
9 Future Work
Future work includes the implementation of the de-
scribed technique in real-world scenarios and the eval-
uation of its performance in terms of data recovery,
accuracy, and computational efficiency.
10 References
References
[1] Burden, R. L., and Faires, J. D. Numerical
Analysis, 7th ed. Brooks/Cole, 2001.
[2] Cormen, T. H., Leiserson, C. E., Rivest,
R. L., and Stein, C. Introduction to Algo-
rithms. MIT Press, 2009.
[3] ESMA. Esma technical advice: Final report on
draft regulatory technical standards, 2023. Car-
bon Footprint Calculation.
[4] Lidl, R., and Niederreiter, H. Introduction
to Finite Fields and Their Applications. Cam-
bridge University Press, 1986.
[5] Lin, S., and Costello, D. J. Error Control
Coding: Fundamentals and Applications. Pren-
tice Hall, 1983.
[6] MacWilliams, F. J., and Sloane, N. J. A.
The Theory of Error-Correcting Codes. North-
Holland, 1977.
[7] McEliece, R. J. The Theory of Information
and Coding. Addison-Wesley, 1977.
[8] Patterson, D. A., Gibson, G., and Katz,
R. H. A case for redundant arrays of inexpen-
sive disks (raid). In ACM SIGMOD Conference
(1988).
[9] Peterson, W. W., and Weldon, E. J.
Error-Correcting Codes. MIT Press, 1972.
[10] Rana, I. K. An Introduction to Measure and
Integration. Springer, 2002.
5
[11] Rudin, W. Principles of Mathematical Analy-
sis, 3rd ed. McGraw-Hill, 1976.
[12] Shamir, A. How to share a secret. Communi-
cations of the ACM 22, 11 (1979), 612–613.
[13] Sudan, M. Decoding of reed-solomon codes
beyond the error-correction bound. Journal of
Complexity 13, 1 (1997), 180–193.
6