Content uploaded by Thierry Lecomte
Author content
All content in this area was uploaded by Thierry Lecomte on Mar 01, 2016
Content may be subject to copyright.
Formal Data Validation in the Railways
Thierry Lecomte, Erwan Mottin
ClearSy
Aix en Provence, France
Abstract Safety-critical systems and software require particular care when their
parameters have to be verified and validated, as any mistake may lead to a cata-
strophic scenario during their operating use. A recent technique, called formal
data validation, enables an improvement in the level of confidence of the verifica-
tion/validation process by associating a formal data model to the parameters, and
by formally checking that these parameters fit within the model. This paper re-
ports on the development and use of such tools for industrial railway applications.
1 Introduction
Historically, the B Method (Abrial 96) was introduced in the late 80’s to design
safety-critical software correctly. With B, a software model is built before its im-
plementation; the model is proved to be consistent with static and dynamic proper-
ties; the implementation is proved to refine the model. The approach is entirely
aimed at modelling and proving software behaviour. Promoted and supported by
RATP (Régie Autonome des Transports Parisiens who are the operator of bus and
metro public transport in Paris), the B method and Atelier B, the tool implement-
ing it, have been successfully applied to the industry of transportation. Today,
Alstom Transport Information Solutions and Siemens Transportation Systems are
the main actors in the development of B safety-critical software. They share a
product-based strategy and reuse as much as possible existing B models to devel-
op future metros.
In the mid ‘90s Event-B (Abrial 2005) enlarged the scope of B to analyse,
study and specify not only software, but also whole systems. Event-B has been
influenced by the work done earlier on Action Systems (Back 1991) by the Finn-
ish School. Event-B is the synthesis between B and Action System. It extends the
usage of B to systems that might contain software but also hardware and pieces of
equipment. In that respect, one of the outcome of Event-B is the proved definition
of systems architectures and, more generally, the proved development of, so
called, “system studies” (Sabatier et al 2006, Sabatier et al 2008, Hoffmann et al
2007, Lecomte et al 2007, Lecomte 2008), which are performed before the speci-
fication and design of the software. This enlargement allows one to perform fail-
ure studies right from the beginning in a large system development. Event-B has
been used to perform system level safety studies in the Railways (Sabatier 2012),
allowing to formally verify part of the whole system specification, hence contrib-
uting to improve the overall level of confidence of the railways system being built.
The verification of behaviour, based on Event-B system specification or B
software specification, is quite easily achievable by semi-automated proof. How-
ever the verification of parameters (that tune the system or the software) against
properties may turn out to be a nightmare in case of large data sets. For the Meteor
metro (line 14, Paris), software and data were kept together in a B project (Behm
& al 1999). Demonstrating data correctness regarding expected properties was
really difficult because of required iterations over large sets of variables and con-
stants (and their domains). Indeed the Atelier B main theorem prover is not de-
signed for this activity (Milonnet 99) that requires more a model checker or con-
straint solver rather than a theorem prover. Later on, software and data started to
be developed and validated within two different processes, in order to avoid a new
compilation if the data are modified but not the software. Data validation started
to be entirely human, leading to painful, error-prone, long-term activities (usually
more than six months to manually check 100,000 items of data against 1,000
rules).
In this article, we present a formal approach, based on the B/Event-B mathe-
matical language and the ProB model checker and constraint solver, designed and
experimented on several projects for the validation of safety-critical railways data.
2 Data validation and formalism
Verifying railways systems covers many aspects and requires a large number of
cross-verifications, performed by a wide range of actors including the designer of
the system, the company in charge of its exploitation, the certification body, etc.
Even if complete automation is not possible, any automatic verification is
welcome as it helps to improve the overall level of confidence. Indeed a railway
system is a collection of highly dependent sub-system specifications and these
dependencies need to be checked. They may be based on railways signalling rules
(that are specific to every country or even each company in a single country), on
rolling stock features (constant or variable train size or configuration) and
operating conditions.
By data validation, we mean the validation of the parameters (i.e. constants)
that determine a specific behaviour of a software/system over a wide range of pos-
sible sets of values. Microsoft Excel defines data validation in terms of type
checking: a cell may contain a date, an integer, a string or a floating point number.
In our case, the data to validate are not only scalar but also represent more com-
plex structures like graphs. A metro line is seen as a graph, made of connected
tracks with distributed signals and switches implementing signalling rules. Graphs
are encoded through a large number of tables.
In figure 1, as an example we consider a set of track circuits {t1, t2, t3, t4, t5}
defining a simple line L1 (no switches). In the raw data entering the validation
process, this set is encoded as a table TrackCircuit where TrackCircuit[0]=t1,
TrackCircuit[1]=t2, etc. The function Next models the connectivity between track
circuits: as a partial function, it associates zero or one track circuit to any track
circuit of the set TrackCircuit. Next(t1)=t2 means that the successor of t1 is t2.
The first track circuit of the line L1 may be defined as the track circuit that can’t
be reached through the Next function (it should not belong to the range of Next).
In our case, it is t1. In case the track circuits are looped (i.e. Next(t5)=t1), it is not
possible to define FirstTrackCircuit as it belongs to the empty set. If we consider
the abscissa of the beginning of each track circuit (function KpAbs), we can also
assert that the abscissa of successive track circuits are ordered, that is for any track
circuit tt, KpAbs(Next(tt)) > KpAbs(tt). The KpAbs function may be provided as an
input table or has to be calculated by using the length of each track circuit, consid-
ering that the FirstTrackCircuit has a null abscissa and by summing the length of
the successive tracks circuits. In this case, the verification is composed of several,
sequential steps; the verification is complete only if all steps are successful.
Fig. 1. Modelling track circuits and related properties like connectivity (the function Next)
and abscissa (Kilometre point or Kp). Next function set-based representation is on the right.
We could also express more complex properties like “a path exists (made of
successive track circuits) between any two different track circuits” i.e. all track
circuits are connected to the signal network. The notation used, set theory, allows
for writing properties elegantly and concisely that fits well with the graph-based
track topology. A railway network is made of a collection of sub-networks where
different signalling rules apply and for which this formalism appears adequate:
properties are specified to apply only on subsets that are identified through a
mathematical predicate.
The modelling part of this approach consists in translating natural language
sentences into mathematical predicates expressing typing information and con-
straints over several mathematical entities. Some intermediate constants need to be
computed prior to complete the verification; if these intermediate constants are
used in several verification rules, their definition is localized in a specific file con-
taining the so-called “intermediate constructs”: they are computed once and their
value is reused when necessary, in order to save computation time. Even if the
natural language is most of the time subject to interpretation / inconsistent / wrong
/ incomplete, the formalization activity allows to save the knowledge of expert
engineers in a persistent, sound way. The verification of the tracks data, usually
counted as 50,000 to 100,000 Excel cells (see figure 2), against these signalling
rules constituting the data model requires a validation engineer to spend several
months into error-prone manual / human checks.
Fig. 2. Raw input data containing respectively equipment ID, IP address, associated routes,
length and GPS position. Data may be missing, incorrect, IP addresses duplicated, etc.
This activity is really difficult and demanding as data is evolving during the
whole period (CAD data is replaced by real plant data, topology is modified after
in situ testing, etc.) and several interleaved iterations are required. The challenge
here is to design a tool or diverse tools able to automate these verifications while
improving the level of confidence of the verification (the tool has to be generic
and not specifically designed for this activity).
2 Tools are everything
As seen on figure 3, our experience is linked to multiple tools developed and ap-
plied to several cases during both research and industrial experiments. The tools,
the formalism and the method have been improved over the last 12 years to reach
a stable state and are applicable to industry-strength systems.
In France, “Atelier de Qualification Logiciel” RATP laboratory is in charge of
qualifying railways applications before their installation. A specific tool, initially
developed for validating Paris metro line 14 data, representing more than 300,000
lines of C++ code, was too difficult to maintain and to adapt to other lines and
hence was not reused for other lines. RATP initiated the development of a generic
tool to verify trackside data for the metro line 1 in Paris that was being automated.
Initially tested on Paris metro line 13 configuration data, the tool has been able to
check 400 definitions and 125 rules in 5 minutes.
The approach that has been kept during 12 years consists of formalizing prop-
erties with B mathematical language (set theory, first order logic), and in generat-
ing a B machine containing both the properties (the data model) and the data to
verify. The compliancy is then checked by a generic tool. For this first application,
the PredicateB predicate evaluator, developed by ClearSy, was able to parse data
(XML, csv or text-based formats), load rules and verify that data complies with
rules. The PredicateB tool is a symbolic calculator able to manipulate B mathe-
matical language predicates in order to animate a B formal model: constants and
variables initial values are calculated, then operations are executed depending on
their guards (enabling conditions) and their substitutions (variable modifications).
Symbolic values are scalars, sets, functions, etc. However PredicateB has limited
capabilities for non-deterministic computations (“find an element such as ….”): it
is not able to find all possible values for any non-deterministic substitution or to
find all counter-examples. Moreover the way the errors are displayed may lead to
difficult analysis when the faulty predicate is complex as it requires injection of
the data into the predicate.
Fig. 4. Formal Data Validation History reported in this article. Formal data validation technology
is today used by all companies mentioned.
During the DEPLOY project, the University of Düsseldorf and Siemens Trans-
portation Systems have elaborated a new approach, based on the ProB model
checker. The ProB model_checker embeds several well performing heuristics for
reducing search space (symmetry detection for example), is able to better handle
non-deterministic substitution and to provide a more complete set of counter ex-
amples. The major outcome of this decision was a dramatic reduction of the vali-
dation duration from about six months of human verification to some minutes of
computation (if we set aside the time to formalize verification rules) while being
able to take into account all properties. Data provided by Siemens contained a
number of added, identified errors but after the verification, undetected errors
were uncovered by the tool. The discovery of this error that remained unnoticed
by validation engineers while the target metro was in active use clearly demon-
strated the added value of this approach and its ability to reproduce results in
minutes. Data was extracted from ADA source code and properties came from B
models used for the software development. In the case of the San Juan metro
(Central America), 79 files with a total of 23,000 lines of B were parsed to extract
226 properties and 147 assertions. The verification took 1017 seconds and led to
the discovery of 4 false formulas (one was not expected by Siemens).
ProB was then experimented with great success on several projects: Roissy
Charles de Gaule airport shuttle, Barcelona line 9, San Paulo line 4, Paris line 1
and Algiers line 1. On that occasion, ProB was slightly improved in order to deal
with large scale problems and well validated in order to ease its acceptance by a
certification body. However analyzing false properties remained difficult as it re-
quires browsing the complete model valued with the complete set of data.
Alstom Transport Information Solutions decided to experiment with a new ap-
proach by reusing successful features of previous experiments. A new tool was
designed and implemented, still based on ProB. The verification rules are ex-
pressed using the B mathematical language and structured as B operations. Instead
of having to deal with large, quantified predicates, a verification rule is decom-
posed in small steps that allow displaying accurate error messages helping to de-
termine the source of the error. Figure 4 shows a rule searching for all signals as-
sociated with an interlocking territory. The clause WHERE allows filtering of
data: it should be a signal, with an ID, and a geographical position (geopoint) in-
cluded into interlocking zone. The clause VERIFY specifies the conditions ex-
pected for all filtered signals. In case the predicates of this clause are not verified,
an error message is displayed for each signal found.
Fig. 4. Example of verification rule. Signals belonging to an interlocking territory are searched
(clause WHERE); such signals have to be linked to this interlocking (clause VERIFY). If not, an
error message is displayed for each faulty signal found (clause MESSAGE).
ProB is the central tool for the verification. It has been modified in order to
produce a file containing all counter examples detected and slightly improved to
better support some B keywords. The resulting tool has been experimented with
success on several ongoing developments (Mexico, Toronto, Sao Paulo, and Pan-
ama) to verify up to 50,000 Excel cells with up to 200 rules. A first round allowed
defining required concepts, intermediate constructs (predicates used by several
rules) and formalizing a set of generic rules that are shared by all projects. During
the next stages, specific project rules and data files were added. A complete verifi-
cation is performed in about 10 minutes, including the verification report. The
process is completely automatic and can be replayed without any human interven-
tion when data values are modified.
For certification purposes (the overall process should be SIL4 compliant), a di-
verse tool, PredicateB++ (a newer version of the first tool developed in 2003), has
been added to the toolchain in order to provide a confirmation of the ProB deci-
sion: it reuses the values computed by ProB (especially coming from non-
deterministic substitutions) and performs symbolic calculation. The PredicateB++
weak point is hence solved and joint positive decisions lead to final agreement on
the rule verification. This tool has been used for the verification of several metro
lines data (not only the data for the automatic pilot, as it was the case in 2003). A
methodology and a process were defined to handle up to 2000 different rules.
Some of these are generic (800), checked for consistency once and then reused
from project to project. Some others are specific to lines or stations, and require
specialization of existing rules. Defining entirely new rules is now exceptional.
Fig. 5. A tool aimed for specifying graphically non-trivial test cases that validate the formal data
model.
Most of the rules fit in one page, but some rules are really large, up to 10 pages,
as they embed several small steps or they contain a lot of implicit information. To
validate these particular rules, a specific process was devised: rules have to be
cross-read and tested by an independent engineer. If reading activity is quite
straightforward, testing rules require to fully understand the objective of the rule.
Then the identification of the different kind of situations where the rule may be
not verified by data enables designing a non-trivial network configuration by in-
venting from scratch raw input data (could be csv, json, XML, RailML, etc.). In
this case, the risk of mistake is high to define a bad configuration where one could
wrongly validate a rule by not checking an error condition adequately. A specific
tool was the designed to fulfil this need (see figure 5): it allows the tester to design
graphically a specific network configuration and to generate track data in the same
format as it is provided to the verification tool. For complex rules, many different
network configuration may be designed, in order to reduce the probability to neg-
ligible of having a software error covering exactly a rule design error.
2 Conclusion and perspectives
Formal data validation appears to be of paramount importance in safety-critical
systems. Formal modelling adds semantics that allow the definition of more con-
sistent verification rules, by removing ambiguities and most errors. These rules
encode the knowledge and know-how of a company in a domain: knowledge is
incrementally built-up, made permanent and more easily applicable to new sys-
tems. The rules can be read and checked independently, so data model verification
may be easily distributed.
Using generic symbolic verification tools increases the level of confidence of
the verification results, as the tool is not specifically built for a project/system and
makes a better profit of a considerable return of experience from a large user base.
Diverse tools, using different technologies and developed by different teams, can
be used for highest safety critical levels. The overall verification time is then low-
ered from several months to few minutes or hours, that may be replayed at will.
However some verification rules imply large quantities of implicit knowledge
that have to be made explicit in the rules. In that case the predicates in the verifi-
cation rules could be lengthy, difficult to read and to validate. So manual testing
of verification rules is required, checking that for any category of error a rule is
designed to detect, the tool should trigger an error message. Interactive test con-
figuration tools have been designed to build test scenarios and reduce the risk of
human mistake.
Finally introducing formal modelling and symbolic tools has improved the lev-
el of confidence of the overall verification process but at the cost of requiring
higher qualifications (not only signalling engineers) for formal modelling and ver-
ification. Similarly time to complete the verification has been shortened by several
orders of magnitude on real size systems. Even if a human is still required for test-
ing the validity of the modelling, formal data validation is being used by most
major railways players and it is definitely a good signal to the formal methods
community.
Acknowledgments Work reported in this article has been partly funded by European Union
(FP7 projects Rodin and DEPLOY), Alstom and General Electric companies.
References
Abrial, J.R. (1996) , The B-book: Assigning programs to meanings, Cambridge University Press
Abrial J.R (2005) Rigorous Open Development Environment for Complex Systems: event B
language
Back R.J. (1991) Stepwise Refinement of Action Systems, Structured Programming #12 p17-30
(Springer verlag ed)Alexander C, Ishikawa S, Silverstein M et al (1977) A pattern language:
towns, buildings, construction. Oxford University Press
Behm P, Benoit P, Faivre A, Meynadier J.M (1999) Météor: A Successful Application of B in a
Large Project
Hoffmann S (2007) The B Method for the Construction of Micro-Kernel Based Systems, ZB
2007
Lecomte T (2007), Formal Methods in Safety Critical Railway Systems, SBMF 2007
Lecomte T (2008), Safe and Reliable Metro Platform Screen Doors Control/Command Systems,
FM 2008
Lecomte T, Burdy L, Leuschel M (2012) Formally Checking Data Sets in the Railways, S-Event-
B 2012: Workshop on the experience of and advances in developing dependable systems in
Event-B, in conjunction with ICFEM 2012 - Kyoto, Japan, November 13, 2012
Lecomte T (2015) Formal Virtual Modelling and Data Verification for Supervision Systems, FM
2015: Formal Methods, Volume 9109 of the series LNCS pp 597-600
Leuschel M, Falampin J, Fabian F, Plagge D (2009) Automated Property Verification for Large
Scale B Models. In: Proceedings FM 2009. Springer-Verlag.
Leuschel M (2012) ProB, ProR and Data Validation with B, FM’2012, Industray Day.
Milonnet C. (1999) B Validation Book; Internal document ref (Matra Transport International)
Sabatier D (2006) Use of the Formal B Method for a SIL3 System Landing Door Commands for
line 13 of the Paris subway, Lambda Mu 15
Sabatier D (2008) FDIR Strategy Validation with the B method, DASIA 2008
Sabatier D (2012) Formal proofs for the NYCT line 7 (Flushing) modernization project, DE-
PLOY Industry Day, Fontainebleau