Conference PaperPDF Available

Indexation of Numeric Bench Test Records, a Big Data Vision

Authors:
  • Safran Aircraft Engines
  • Safran Aircraft Engines

Abstract and Figures

Every day, new engine configurations or engine parts are tested in Snecma's test benches. During each test, up to two thousand sensors capture every bit of information generated by the engine or the bench cell itself. It is extremely difficult to manually analyze all this data. Due to the huge amount of data and their diversity, specialists who analyze them can miss interesting information that could save hours of development. One of Snecma's data laboratory challenge is to find a way to automatically provide relevant information to the adequate experts. Numeric records are analyzed and coded as succession of labels, each representing a different transient or stabilized phase. The labels are issued from classification of local mathematics model parameters with (and without) their initial conditions and are stored in a distributed database allowing parallel search using classic map/reduce scheme. Then it becomes a lot easier to look for a specific pattern in a set of tens of years of numerical records. Similarity distances are built to compare labels or label sequences. For example if an engine shows a specific vibration pattern after the test pilot changes the shaft rotation speed from one level to another, one may ask if this behavior is usual. If not, it should be very interesting to remember if such pattern happens in the past on other engines and dig from the database the old documents related to those rare events and eventually the people concerned. The documents related to each test are the specification of the test and many expert reports stored in our knowledge base. Those documents also are classified using a tf/idf-like algorithm to link each of them to specific topics. This way a query may be based on a test specification to fetch similar tests but also from numerical records and may result into topics related to specific events known by company experts. Our road-map is to progressively increase the distributed database of labels and topics (with links to original documents and numeric records). The first step was to identify the different phases extracted from small subsets of temporal measurements and build local models for given patterns. In parallel, our knowledge base is classified into topics and a prototype of query system is implemented. This incremental process allows us to build our database progressively, adding new patterns when experts are asking for them without any need for a redesign of the system.
Content may be subject to copyright.
978-1-4673-7676-1/16/$31.00 ©2016 IEEE
1
Indexation of Numeric Bench Test Records
A Big Data Vision
Jérôme Lacaille
Snecma (Safran Group)
Rond-Point René Ravaud,
Réau
77550 Moissy-Cramayel,
France
+33 1 60 59 70 24
jerome.lacaille@snecma.fr
William Bense
Snecma (Safran Group)
Rond-Point René
Ravaud, Réau
77550 Moissy-Cramayel,
France
+33 1 60 59 72 67
william.bense@snecma.fr
Ion & Stefan Berechet
SISPIA
18 allée Henri Dunant
94300 Vincennes,
France
+33 1 6 86 80 21 63
ion.berechet@sispia.fr
Cynthia Faure
PhD Student SAMM
Univ. Panthéon Sorbonne
90 rue de Tolbiac, 75634
Paris, France
+33 1 44 07 89 22
cynthia.faure@univ-
paris1.fr
AbstractEvery day, new engine configurations or engine parts
are tested in Snecma’s test benches. During each test, up to two
thousand sensors capture every bit of information generated by
the engine or the bench cell itself. It is extremely difficult to
manually analyze all this data. Due to the huge amount of data
and their diversity, specialists who analyze them can miss
interesting information that could save hours of development.
One of Snecma’s data laboratory challenge is to find a way to
automatically provide relevant information to the adequate
experts. Numeric records are analyzed and coded as succession
of labels, each representing a different transient or stabilized
phase. The labels are issued from classification of local
mathematics model parameters with (and without) their initial
conditions and are stored in a distributed database allowing
parallel search using classic map/reduce scheme. Then it
becomes a lot easier to look for a specific pattern in a set of tens
of years of numerical records. Similarity distances are built to
compare labels or label sequences. For example if an engine
shows a specific vibration pattern after the test pilot changes the
shaft rotation speed from one level to another, one may ask if
this behavior is usual. If not, it should be very interesting to
remember if such pattern happens in the past on other engines
and dig from the database the old documents related to those
rare events and eventually the people concerned. The
documents related to each test are the specification of the test
and many expert reports stored in our knowledge base. Those
documents also are classified using a tf/idf-like algorithm to link
each of them to specific topics. This way a query may be based
on a test specification to fetch similar tests but also from
numerical records and may result into topics related to specific
events known by company experts. Our road-map is to
progressively increase the distributed database of labels and
topics (with links to original documents and numeric records).
The first step was to identify the different phases extracted from
small subsets of temporal measurements and build local models
for given patterns. In parallel, our knowledge base is classified
into topics and a prototype of query system is implemented. This
incremental process allows us to build our database
progressively, adding new patterns when experts are asking for
them without any need for a redesign of the system.
TABLE OF CONTENTS
1. INTRODUCTION ........................................................... 1
2. PROPOSED METHODOLOGY ........................................ 2
3. RESEARCH AND CURRENT WORK ................................ 3
4. CONCLUSION ............................................................... 6
REFERENCES ....................................................................... 6
BIOGRAPHY ......................................................................... 7
1. INTRODUCTION
Snecma’s Datalab is a small experimental team focused on
evaluating new data technologies (like those described under
the “Big-Data” symbol). The different solutions are
challenged by specific studies producing analytic reports or
effective prototypes. We address three domains in the
company beyond prognostic and health monitoring (PHM)
which are:
Industry: optimize the design of the engine and the
fabrication process.
Operation: identify the usage of the engine during
flights, link to wear, then maintenance and finally the
possession cost.
Development: better understand the development
process and optimize our knowledge base.
The present paper describes our road map to build a
methodology and tools to help interpreting data generated by
development tests. Today’s development ground tests are
specified by design offices according to each specific needs:
dynamic behavior, performance, acoustic, aerodynamic, etc.
The test is designed by each party to identify specific
behaviors but the number of sensors is so big that it is difficult
to analyze all generated data and it requires many people to
do the job. We often lost any chance to look at patterns that
may not have been in the present interest of engineers. We
estimate that less than 10% of the recoded data is observed.
A challenge is to automatically analyze all those records,
identify known patterns and detect unusual behaviors. Even
if this goal may be reached, another problem is to efficiently
process the results. For this purpose, it has been asked our
datalab to find a way to automatically send messages to
concerned people (see Figure 1).
The next section (2. Proposed methodology) describes the
methodology and the tools we will try to develop to solve this
challenge in the long term. The following one (3. Research
and current work) summarizes our present work and results.
2
Snecma is an engine manufacturer and definitely not a
software editor and our power in data analysis stays limited.
However, we just build new entities in the Safran Group to
address this specific domain of data analysis. They are the
research center Safran Tech [1] and a new company named
Safran Analytics. Safran Tech helps driving PhDs and
interactions with academic laboratories while Safran
Analytics focuses more on business solutions. The datalab in
Snecma as well as other similar entities in each of our
different industrial companies exists to identify the analytic
approaches leading to technology ruptures, investigate the
engineering needs and prepare the context (data, prototypes
and process changes). We clearly seek for all good ideas in
this new domain and build a business environment able to
mature them.
Figure 1. Our challenge: find a way to systematically
scan data, build statistics such that experts may
automatically be informed about patterns of their
concern or about unusual behaviors.
2. PROPOSED METHODOLOGY
Problem summary
Development bench tests are specified in a document giving
for each design office the goal of the test, the execution
pattern, the awaited results and the demanding engineers. At
the end of the test, each numeric record corresponds to a
specified demand and if a visual expertise concludes by
original annotations, the results are stored in logbook
documents written by experts and linked to the numeric
records.
When specialists ask a test and associated measures, they
want to check if the behavior they expect will happen.
Therefore, when they analyze the results they focus on this
behavior explanation and expect particular patterns.
Unexpected patterns or trends are more difficult to detect and
analyze. Now, experts want the datalab to search in history
databases for similar or recurrent patterns.
In summary, we need statistics on temporal patterns in a very
huge database of engine tests where each test produces
numerous high frequency records (from 1Hz to 100Hz to
begin) of hundreds to thousands of measurements. These
measurements come from different sets of sensors that differ
from engine to engine or even from test to test. Moreover, as
we are also looking to compare different types of engines, we
also have to manage equivalent sensors but with different
names, position or unit. This may be seen as a similar
problem as searching images or document after a general
indexation of patterns. The difference here resides in the
diversity of dimensions and corresponding units per record
but also in the physical relations between those dimensions,
which are specific to the aerodynamic process.
Coding transient phases
The idea is to analyze all measurements, identify and code
the transient patterns (where something happens, PHM
already deals with stabilized observations [2][6]) as to
replace the numeric records by sequences of parameterized
labels. Transforming multivariate numeric temporal records
by sequences of labels will clearly help searching for
apparition of a given label in a record that belongs to a given
test of a given engine. A direct search in the temporal numeric
records is almost impossible. It implies that we build a metric
in that numeric space and such metric need a clear reduction
of the input dimension to be effective, hence the building of
labels.
We replace the task to code systematically all unknown
patterns by an incremental solution. The expert defines a
specific pattern; inserts the pattern in an analytic tool that
defines a model which parameters code the label. Then he
executes a corresponding matching algorithm and adds
results in a new label base. This iterative mode builds
sequences of labels for each test record (Figure 2). This
approach is not exhaustive but it let us focuses beforehand on
physically interesting patterns. Temporal unidentified
behaviors may also be detected as “unknown” and draw our
attention in a general way if recurrent. Moreover, if an expert
creates a label it is easy to register his name as a person of
interest when the matching algorithm detects the
corresponding pattern.
Figure 2. We progressively increment the code of the
record sequence by adding new labels. Labels of
temporal patterns may overlap, for example when
looking at different sets of measurements.
3
Distributed storage
The bench test history is very huge: we execute many tests
per day, each one corresponding to hundreds of records
during several hours. Reading directly the database used for
storing all these data may not be the best solution. This
history database is adapted to high velocity storage, not for
data analysis except on the fly computation. Using a
distributed hardware with an indexation process (building the
label sequences) and the search/statistics on the labels is
preferred. It can be implemented as described below (Figure
3):
(1) Historic data will be accessed periodically (“at leisure
time”) and temporarily transferred on a cluster.
(2) Online matching algorithms automatically working in
parallel on stored records will detect registered patterns
and compute their characteristics. Detection results
(labels) will update the sequences corresponding to each
record improving the indexation of our history database.
(3) The last step is to build statistics on those sequences,
hence to be able to search for apparition of labels or
successive labels. The label sequences are also stored in
a distributed environment that allows parallel execution
of the search algorithms.
Figure 3. Illustration of the distributed process of
indexation and query.
Analysis of a new test
During a new test (Figure 4), the numeric records will
automatically go through the matching filters and produce a
label sequence with labels detected in agreement with a
likelihood threshold chosen by the experts. The search
algorithm will check parts of this sequence including the label
singletons themselves, hence producing a rank output among
our data history. Thus, new records will generate a set of
reports associated to each detected label. An alert is generated
when the observation of the label behaves like an outlier;
otherwise, statistics presenting the rank position as a quartile
(or p-value) will be given for information.
When the label base improves, it will be also possible to
check for successive patterns acting like a sequence of events
instead of just one pattern.
Figure 4. Matching algorithms index the database of
records. Search algorithm finds similar patterns in the
indexed database.
Identification of the experts
The ranking of a pattern identifies a list of similar
observations that appears on past test records. Those records
are linked to specification documents and some patterns may
be linked to visual analysis and stored in the knowledge base
(Figure 5). The frequently referenced authors and related
services at the origin of those documents are automatically
informed by the system. Topic classification of documents is
also used to generate the report.
Figure 5. Meta data such as names and services at the
origin of the test as well as specific analysis reports
written by experts are linked to each test and each
pattern.
3. RESEARCH AND CURRENT WORK
We organize our work around two directions, first the
definition of what is exactly a pattern in the aeronautic
expert’s mind, and then how to automatically index all our
database with such patterns.
How to define a transient pattern
A transient phase is a temporal interval during which
something is observed. It is defined by the context that
identifies a temporal interval and the behavior of the
observations in that same interval.
For example, a transient interval may begin with the increase
of the thrust level and ends when the engine shaft reaches a
4
stabilized speed and the engine temperatures stabilize also
(Figure 6). In such specific event the pattern is defined by
“increase of thrust”, it is parameterized by the amount of
increase, the lever and probably the initial values of the
engine shaft speed and temperatures.
Figure 6. The thrust level change at 20:17:35, then the
shaft speed reaches a new stabilized value with a concave
pattern and the compressor temperature increases in a
different way until it also reaches a new stabilized value.
Another example of pattern is still after an increase of thrust,
but then after stabilization of the shaft speed, the vibration
energy excited by this same shaft continues to change (Figure
7). In that case, the expert was positive that this problem
already appears in old engine configurations but was not able
to retrieve information from the knowledge base.
Figure 7. After engine acceleration, even when the speed
is stabilized, the vibration filtered around the speed shaft
frequency continues to change.
To code this specific pattern we identify the exogenous
measurements (shaft speed increase ) and the endogenous
observations (the way the filtered vibration energy changes).
An easy method to analyze this pattern is to compare
observations with a physical model driven by the context.
However, even if such a model exists it is very slow and does
not simulate all possible positions and types of sensors.
Instead, we search local mathematic models. Each model
should be local because if we cannot hope for a general
solution it is easier to build approximate estimations when the
context is almost stable or evolves in a stabilized way
(constant acceleration for example).
To help the expert to define such a pattern we built an
interactive tool able to load a set of engine tests and
corresponding numeric records, select sensors with
constraints on type, units and corresponding ranges. Once the
user has selected a satisfying representation of exogenous and
endogenous data, he selects one representative record. The
tool shows the temporal curves letting the user place its
curves into different frames and apply a common temporal
zoom to select the interval of interest. The application uses a
windowing process to search for good local representations
as regressions of the endogenous observation on the
exogenous context. The user may also choose to smooth and
normalize some variables to build a model independent of
their ranges. The windowed local regressions test linear and
autoregressive models with delays but also known dynamic
features as linear trends and exponential damping. Positive
delays help identify causes; but negative ones may be
obtained if consequence observations are inadvertently
selected as exogenous variables. The user enters windows
sizes and delays to check. The application outputs are the
piecewise regression estimations and the residuals. Finally,
as local models may overlap due to the generic windowing
search, the user selects a minimal set of models that
represents its whishes.
Figure 8. Frame of our tool that helps selecting
characteristic instants for the pattern description.
To better identify a pattern, the application shows “relations
between the model pattern and other variables that the user
may eventually add in the model (Figure 9). The goal is to
register those other relations in the pattern. More elements are
used in the definition of the label (description of a pattern),
more specific the matching query will be. This relation
measurement is computed as the amount of information
added to the model if we consider each new variable. It is
actually computed by a forward stepwise regression method
with a wrapper approach (the cost is the increase of R²
determination coefficient) but we will also evaluate a filter
approach using mRMR (compute mutual information to
select variables [7]). The label parameters are the exogenous
variables and selected models with regressions parameters
5
that produce the smallest residuals. The empirical local
distribution of the residuals is used to implement a matching
test.
Figure 9. View of the matching frame with selection of
group of correlated variables.
Once a label is defined on the representative record, the
matching algorithm scans other records to find corresponding
matches. A match may be detected if the application of the
set of piecewise local regressions on a signal gives a small
residual variance. The intervals with residuals corresponding
to the preceding matching test evaluated with a reject
probability given by the user are sorted in decreasing value
of likelihood. Finally the user selects a subset of best
matching intervals, builds a database, learns more
representative statistic distribution of the residual scores and
estimates confidence intervals (confidence tubes, see [3], [8],
[9]) to establish a reliable maximum score bound for the
future pattern detections.
A sub-product of this tool is the relation analysis between
a first model identified by the user, which describes the
interesting pattern and other observations of the same record
(Figure 9). This simple feature by itself is already a great
analytic tool for the experts. For example in the second
pattern described above, the dynamic expert observed an
increase of vibration level driven by the low-pressure shaft
rotation speed. In fact, with the current existing tool he was
unable to detect small variation in the high-pressure rotation
speed. This impact is probably a consequence of the engine
excitation but it is interesting to observe an effect on the
engine core.
Automatically search of temporal changes
We now have a tool that is able to manually define a pattern
on one multidimensional record and test it on a set of other
records. The matching test computation is very slow but this
is not really a problem because it can be executed on the
background and the test part may be coded specifically to be
launched directly on the distributed database.
However, one may also want to find other kinds of patterns
like recurrent ones or new behaviors and build statistics. This
job includes the definition part of the pattern, which should
be automated. The knowledge learnt defining the interactive
tool helps us to identify the main points to address: the
temporal intervals of interest. If some instants are easy
identified by an expert (such as a change in a command of the
opening of a valve) others may be difficult to find, especially
considering the high number of dimensions to analyze. For
example, when thrust increases, the pattern ends when shaft
speed and temperatures stabilize or if another event appears.
On another hand, some patterns may be easier to detect by
their consequence effect: engine regulation commands valves
opening (Transient Bleed Valves) to avoid compressor stall.
Regulation uses denser online data than stored
measurements. Hence, it is easier to detect valves opening
than premises of blade stall. The real pattern begins before
the TBV opens and should be estimated by detection of a
change in the compressor behavior, probably because of
another action, for example an increase of electric power
need, which is not a priori linked to this dramatic stall effect.
Our proposition is to apply change-detection algorithms that
segment each monovariate signal into parts corresponding to
simple mathematic models (for example stepwise linear
trends, Figure 10). The multivariate signal is coded from the
conjunction of all change points (with some freedom to
regroup nearby positions). Matching the changes present in
the multivariate system with a delay shows the causality
between the variables and focus on interesting and recurrent
pattern. The temporal signal is compressed in sequences of
small intervals (which we call events) where each dimension
is modeled by a mathematic function with a small number of
parameters.
Figure 10. A model of the pattern built from successive
local approximations.
We choose to use an online version of the PELT (Pruned
Exact Linear Time) algorithm [10] because of the wide
amount of data in each record. This method does not require
any assumption on the number of break points and is very fast
because of its linear complexity. PELT is also very flexible,
the computations are based on linear regressions however, it
is easily modifiable to auto-regressive models for example.
PELT is an offline change-point detection strategy that
accelerates the standard optimal partitioning algorithm. It is
based on the minimization of a cost function


 
where is the observed data and is the model between
change-points (a linear model to start) The
6
minimization is searched on the change-points but also on
their number . Hence, we add a compensation term 
(Bayesian Informtion Criterion) to avoid overfitting and
seems effective on our data. This algorithm gives 83% of
good classification in less than 0.4 seconds on 1000 points
simulated data (where change-points are known in advance).
The first patterns we identify from those sequences of events
are those linked to a change of the thrust lever. We
concatenate all following steps until stabilization is reached
on temperature or shaft speed or if another lever’s change is
detected (see Figure 11). Our next steps will be to base other
patterns on binary commands such as valves openings,
regrouping nearby events (eventually previous ones) and
match these patterns in other sampling of data. We also look
at automatic identification of recurrent sequences, for
example using minimum description length principle (MDL)
to compress our database [11] like the KRIMP algorithm [12]
used to detect interesting sequences in a base of recorded
events.
Figure 11. A model of the pattern built from a detection
in the lever and the nearby detection in the shaft speed.
A causality delay is easy to observe at the between the
change of lever position and the decrease of the shaft
speed.
4. CONCLUSION
We just begin a wide indexation program of the bench test
records for development engines. The first step was the
definition of patterns, then store labels in a distributed
database. We begin with expert-made patterns using a
graphic tool and matching filters. Then we write an algorithm
to automatically detect changes in sets of parameters opening
a way to mechanically encode new pattern discovered in the
signals.
The next step, once we get some interesting number of labels
and when we would have indexed a nice set of records, will
be to implement the search algorithm. Then we will produce
label statistics and execute ranking computation by similarity
on new records.
In the mean time, we defined a PhD program to formalize the
algorithm methodology match/search with a map/reduce
coding scheme for our couple of main algorithms and their
implementation on distributed environments (Hadoop, Spark,
or any other platform to be tested).
Now a fablab in Snecma (but belonging to the Safran
Analytics actually) implements a Hadoop cluster to test this
technology. Dassault Systems Exalead™ application is
currently running on the cluster and may be an effective
solution for an implementations of our search algorithms as
well as an adequate platform for visualization and restitution.
REFERENCES
[1] Safran, “The New Safran R&T Center,” www.safran-
group.com, 2014. [Online]. Available:
http://www.safran-group.com/site-safran-en/press-
media/media-section/article/the-new-safran-r-t-
center.
[2] J. Lacaille and V. Gerez, “A Batch Detection
Algorithm Installed on a Test Bench,” in PHM, 2012,
pp. 17.
[3] J. Lacaille, “Robust Monitoring of Turbofan
Sensors,” in IEEE Aerospace conference, 2014.
[4] A. Bellas, C. Bouveyron, M. Cottrell, and J. Lacaille,
“Anomaly detection based on confidence intervals
using SOM with an application to Health
Monitoring,” in WSOM, 2014.
[5] J. Lacaille and V. Gerez, “SYSTEM FOR
MONITORING AN ENGINE TEST BED,”
WO2012049396A12012.
[6] J. Lacaille and V. Gerez, “Online Abnormality
Diagnosis for real-time Implementation on Turbofan
Engines and Test Cells,” in PHM, 2011.
[7] H. Peng, F. Long, and C. Ding, “Feature Selection
Based on Mutual Information: Criteria of Max-
Dependency, Max-Relevance, and Min-
Redundancy,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 27, no. 8, pp. 12261238, 2005.
[8] R. Azencott, “Procédé de surveillance d’un système,”
EP1170650B12008.
[9] R. Azencott, “Procédé de détection d’anomalies dans
un signal,” EP1122646B12003.
[10] R. Killick, P. Fearnhead, and I. a. Eckley, “Optimal
detection of changepoints with a linear computational
cost,” p. 37, 2011.
[11] A. Barron, J. Rissanen, and B. Yu, “The Minimum
Description Length Principle in Coding and
Modeling,” IEEE Trans. Inf. Theory, vol. 44, no. 6,
pp. 27432760, 1998.
[12] J. Vreeken, M. Leeuwen, and A. Siebes, “Krimp:
mining itemsets that compress,” Data Min. Knowl.
Discov., vol. 23, no. 1, pp. 169214, 2010.
7
BIOGRAPHY
Jérôme Lacaille is an emeritus expert in
algorithms for Safran international
aeronautic group. He joined the Snecma
Company in 2007 with responsibility for
developing a health monitoring solution
for jet engines. Jérôme has a PhD from
the Ecole Normale Supérieure (France)
in Mathematics. He has held several
positions including scientific consultant and professor. He
has also co-founded the Miriad Technologies Company,
entered the semiconductor business taking in charge the
direction of the Innovation Department for Si Automation
(Montpellier - France) and PDF Solutions (San Jose - CA).
He developed specific mathematic algorithms that were
integrated in industrial processes. Over the course of his
work, Jérôme has published several papers on integrating
data analysis into industry infrastructure, including neural
methodologies and stochastic modeling as well as some
industrial patented applications.
William Bense is the instrumentation &
measurement R&T leader at Snecma test
division. He joined the company in 2010
as system engineer in health monitoring
and worked on a new business jet engine.
In 2012, he became health monitoring
R&T project leader and worked on
innovative health monitoring systems and new embedded
sensors developments. Now he manages R&T for bench
test instrumentation & measurement. He is the inventor of
several patents.
Ion Berechet is an Engineer and
Physicist, developer of several
innovative methods and advanced
software used to improve performance
and process risk management. His
creativity and capacity for
multidisciplinary team management has
led him to be in charge of several R&D units. In 2003, he
founds SISPIA Company oriented for algorithmic
development, data-mining and knowledge treatment.
Stefan Berechet works in algorithms
and data mining solutions for risk
management and performance of
complex processes and systems. Since
2004, he is responsible at SISPIA for
research and development in the field of
cause-effect relationships extraction
from databases.
Cynthia Faure is PhD student in
statistics at the Statistique, Analyse,
Modélisation Multidisciplinaire
laboratory (SAMM) in université Paris
1, Panthéon Sorbonne.
... We also have to deal with some constraints such that the cost of removing a label and inserting another is more expensive that exchanging both labels. Anyway, here again, the Kohonen map appears to be useful as it maintains a correspondence between distances in the original space and distance on the map (Lacaille, Bense, Berechet & Faure, 2016). ...
Conference Paper
Full-text available
How can we tell if a flight is normal or abnormal? In Safran Aircraft Engines, we are interested in the engine behavior. Some data are collected at low frequency between 1Hz up to 66Hz. These data are mainly measurements acquired from engines sensors, information coming from the aircraft that are needed to control the propulsion system and results of online computations for monitoring and maintenance. Hence, a flight appears as a big multivariate temporal signal. But it is not just a simple temporal observation, this signal is structured and may be decomposed in standard phases like start, taxi, takeoff , climb, cruise, descent, reverse and taxi again. Moreover, during each standard phase there may be stabilized regimes and transient phases, the stabilized parts are easy to understand and to model mathematically. There are mainly four stabilized regimes: slow ground speed, normal cruise speed, slow descent and climb. The transient regimes are more complex as they depend a lot on the command issued by the pilot, but we identify two classes of transient phases: accelerations and decelerations. Depending on the flight plan, the airport ground geography, the day time, season and meteorology those phases may appears randomly at different instants during the journey. All of this complexity makes the comparison of different flights very difficult. Our goal in this work is to give a definition of an abnormal fight based on a new kind of metric that we build to compare those multivariate temporal series two by two.
Article
Full-text available
We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of new data to the Kohonen map learned with corrected healthy data. We apply the proposed method to the detection of aircraft engine anomalies.
Conference Paper
Full-text available
Bench development tests of new turbofan engines make an important use of sensor measurements to help engineers understand the behavior of new components design. Such tests are expensive and may be completely compromised if any measurement is missing. One of the big challenges about development tests is to minimize the number of sensors as each measurement has a non-negligible cost. However, if we suppress most of the existing redundancy it becomes very important to ensure a nominal working of the surviving sensors. On another hand, development tests are aimed to excite the systems in very specific configurations, often at the edge of the normal operational range, so very specific and original observations are frequent; thus testing if the sensor behaves according to specification is a challenge. Some sensor failures such as harness intermittencies are easy to detect because the fault pattern is specific, but a drift, a bias or even a trend caused by regulation and correction are not easy to highlight. To compensate the context variation, we model each sensor measurement, some models use only context dependencies given by nearby or redundant information and others make use of the temporal relations to consider the time dependency and continuity. The sensor model does not need to be very precise and more specifically it may be limited to specific regimes chosen among the m o s t recurrent configurations, hence improving local robustness. An autoadaptive clusterisation algorithm calibrated on the fly, identifies such recurrent configurations. The sensor models may be updated on line and the quality of each sensor is finally observed by analyzing the evolution of its model parameters or its frequency response for time dependent models. Clearly, the resulting model may be polluted by dependent and potentially faulty measurements, especially on controlled systems. Anyway, from these computations, clues to detect the specific faulty acquisition chains among a set of measurements are given and the methodology proposes an algorithmic correction scheme. The fault diagnostic is implemented in two steps: the detection of a problem, then its localization. The faulty measurement may be replaced by an estimation that corrects the physical observation according to past and validated measurements.
Conference Paper
Full-text available
Test benches are used to evaluate the performance of new turbofan engine parts during development phases. This can be especially risky for the bench itself because no one can predict in advance whether the component will behave properly.Moreover, a broken bench is often much more expensive than the deterioration of the component under test.Therefore, monitoring this environment is appropriate, but as the system is new, the algorithms must automatically adapt to the component and to the driver's behavior who wants to experience the system at the edge of its normal domain. In this paper we present a novelty detection algorithm used in batch mode at the end of each cycle.During a test cycle, the pilot increases the shaft speed by successive steps then finally ends the cycle by an equivalent slow descent. The algorithm takes a summary of the cycle and works at a cycle frequency producing only one result at the end of each cycle. Its goal is to provide an indication to the pilot on the reliability of the bench's use for a next cycle.
Conference Paper
Full-text available
A turbofan used in flight or in a bench test cell produces a lot of data. Numeric measurements describe the performance of the engine, the vibration conditions and more generally the behavior of the whole system (engine + bench or aircraft). It seems reasonable to embed an application capable of health diagnosis. This inboard monitoring system should use very light algorithms. The code need to work on old fashion FADEC calculators (Fault Authority Digital Engine Control) built on a technology dating more than 20 years. Snecma, as an engine manufacturer, has a great knowledge of the engine design and its behavior. This knowledge helps to select the best inputs for a good abnormality detection process, hence limiting the need of a too complex solution. In this article, I describe a very simple anomaly detection algorithm designed for embedding on light computers. This algorithm was validated on a bench test cell running a military engine.
Patent
Full-text available
La présente invention concerne le domaine de systèmes de surveillance d’un banc d’essai de moteur d’aéronef.
Article
Full-text available
We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.
Article
Full-text available
We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples
Article
One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.
Article
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
Procédé de détection d'anomalies dans un signal
  • R Azencott
R. Azencott, "Procédé de détection d'anomalies dans un signal," EP1122646B12003.