Content uploaded by Gopalakrishna Palem
Author content
All content in this area was uploaded by Gopalakrishna Palem on Jul 24, 2017
Content may be subject to copyright.
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
28
Designing Condition-based
Maintenance Management Systems
for High-Speed Fleet
Gopalakrishna Palem
Cenacle Research, 520012 IN
ABSTRACT
Advancement in the big-data technologies in combination with machine-to-machine
(M2M) interconnectivity and predictive analytics is creating new possibilities for real-time
analysis of machine components for identifying and avoiding breakdowns in the early
stages ahead of time. Designing such a condition-based maintenance system for high-speed
fleet requires special attention to the design methodologies used in collecting the operating
requirements from the users and translating them into big-data parallel architectures that are
capable of exhibiting fault-tolerant behavior and load-balancing possibilities to sustain the
real-time data processing demands. This paper discusses the M2M approach for the big-
data condition-based maintenance system and the requirement specification steps involved
in building such a system, along with the cost-savings benefited from the system.
Keywords
Condition-based maintenance, Fleet-management, M2M Telematics, Predictive Analytics
1. INTRODUCTION
Approximately 30% of the life-cycle costs of a high-speed vehicle are spent
on the maintenance of the vehicle, the largest spend besides energy [1]. The
overall life-cycle cost distribution for a high-speed fleet is as shown below.
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
29
Figure 1. Life-cycle costs of high-speed fleet
Pain-points that customers usually complain about such life-cycle costs are:
Maintenance is the highest cost factor in the operations of high
speed vehicles, besides energy and depreciation.
Over a period of time, maintenance costs exceed the depreciation.
Approximately 40% of the maintenance goes for the material / spare
parts costs, while the remaining 60% amounts to personnel costs.
For an operational fleet, the depreciation and energy costs stay
constant during the fleet’s life-cycle, leaving the maintenance cost as
the only major cost position available for optimization [1][2].
Thus, reducing the maintenance costs highly improves the profit margins for
operators. The different maintenance strategies followed by manufacturers
and operators in this regard are as follows:
Corrective Maintenance: This is a Run-till-Failure methodology without
any specific plan of maintenance in place. Vehicle is considered to be
functional and fit until it breaks-down.
o Cons:
Unexpected and uncontrolled production downtimes.
Risk of secondary failures and collateral damage.
Uncontrolled costs of spare parts and overtime labor.
o Pros:
Zero overhead of planning or condition monitoring costs.
Machines are not over-maintained.
Preventive Maintenance: A periodic maintenance strategy popular with
the current manufacturers and vehicle service operators. Based on the
asset design parameters, a potential breakdown period is pre-calculated
and a schedule is pre-determined for preventive maintenance. Vehicle is
subjected to regular maintenance periodically on those intervals,
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
30
irrespective of the usage pattern or the condition of the asset, assuming
that the vehicle is going to break-down otherwise.
o Cons:
A time-driven procedure. Assets are subjected to repair
even in the absence of any faults.
Unscheduled breakdowns can still happen
o Pros:
Maintenance cost estimates are known beforehand.
Inventory control and spare-parts planning is possible.
Fewer catastrophic failures and lesser collateral damage.
Predictive Maintenance (PdM): This is an emerging strategy that applies
predictive analytics to the real-time data gathered from the vehicles with
the aim of detecting any deviations in the functional and behavioral
parameters that can lead to vehicle breakdowns. Such anomaly detection
procedures help identify the breakdowns as soon as their potential cause
arises in real-time long before the break-down happens.
o Cons:
Additional investment needed for the monitoring system
Skilled labor specially trained to effectively use the
system may be required.
o Pros:
Parts are ordered on the need basis and maintenance is
performed during convenient schedules.
Unexpected breakdowns are eliminated.
Reduced breakdowns result in maximum asset utilization.
Predictive maintenance, is also often commonly referred to as the
Condition-based Maintenance (CBM), as it avoids the unnecessary
inspection and repair costs by recommending a maintenance schedule that is
based on the prevailing conditions of the machine in the real-world
operating conditions [3].
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
31
Figure 2. Predictive Maintenance reduces costs by detecting failures in early stages
To understand this, let us consider a typical periodic maintenance scenario
for a vehicle. In a normal periodic maintenance mode, the vehicle owners
are expected to change the engine-oil frequently at regular periods, such as
after every 4 or 5 thousand Kilometers traveled. In such cases, the real
condition of the vehicle or the performance capabilities of the engine-oil are
not taken into consideration. Maintenance is carried out purely because it is
as per the schedule. Had the owner had a way to realize the underlying
vehicle condition (the remaining useful life, RUL), or the engine oil
lubrication contamination levels at that instance, he or she could potentially
either postpone the oil change, to a later point where the change is really
needed, or even pre-pone it as per the prevailing conditions. CBM provides
such capability to gain insight into the actual operating conditions of the
vehicle and use them to accurately predict the maintenance requirements.
Our earlier paper [3] presented an in-depth review on the inner workings of
CBM systems and how in conjunction with sensor arrays and telematics
they facilitate predictive maintenance.
Increased component availability, better worker safety and improved asset
usage etc. are some of the compelling reasons why more and more operators
and manufacturers are actively embracing CBM based fleet management
solutions.
Benefits for workers:
– Work-life balance with predictable schedules
– Turn-key solutions with zero paper work
– Increased on-road safety
– Navigation helpers and landmark guides
Benefits for Management:
– Reduced maintenance costs with Predictive Maintenance
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
32
– Increased asset usage with zero unplanned downtime
– Operational costs are reduced and idle times are eliminated
with smart scheduling
– Improved customer loyalty with always on-time deliveries
– Theft and misuse prevention with real-time asset tracking
In the following sections, we present the methodology involved in designing
such a condition-based maintenance management system using the
machine-to-machine (M2M) approach, and showcase the architectural
outline for one of our recently built system, along with the open-source
tools and frameworks used in building the system and the cost-savings
reported by the customers using it.
2. M2M APPROACH TO THE CBM
A Condition-based Maintenance Management (CBMM) solution designed
around M2M operates on three major technology directives:
1. Remote Sensor Monitoring & Data Capturing.
2. Real-time Stream Processing of Sensor Data.
3. Predictive Analytics.
Sensors are attached to the remote assets to collect various data about the
assets’ operating behavior and send it in real-time to a centralized
monitoring station. The data arrives as continuous streams at the monitoring
station, and is subjected to analysis using anomaly detection mathematical
models to identify patterns of deviations in the expected functionality. Once
any such anomaly is identified by the algorithms, owners are immediately
notified indicating the potential failure and suggesting the appropriate
corrective action. Handling such anomalies in timely manner prevents
further functional degradation of the vehicle, thus avoiding potential costly
breakdowns down the line. Often times the centralized monitoring station
resides on the same network as that of the sensors (such as control area
network) or it could be in a distant remote location connected through
satellite networks or WAN.
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
33
Figure 3. M2M facilitates real-time failure detection and prediction
During their operations, devices such as On-Train Monitoring Recorder
(OTMR) for trains and Flight Data Recorder for flights record events in
real-time from their connected vehicles, and either store them on-board for
later processing when they reach their destination, or relay the events to the
centralized processing system in real-time enroute using the machine-to-
machine (M2M) telematics procedures and get processed on the fly to detect
any current anomalies and predict future failures [4]. Nature of some of the
data collected and analyzed for this purpose could be as follows:
On-board Diagnostics (OBD) data: Vehicle speed, RPM, fuel etc.
Driving Patterns: Acceleration patterns, braking patterns etc.
GPS data: Locations, routing, length of stay of vehicle etc.
OTMR data: Door close status, Air suspension pressure, Brake
dragging, HVAC failure etc.
In a nutshell, the concept of CBM is centered around: detect failures in their
early stages so that you can prevent them from happening in the later
stages. At the minimal level one can expect the below listed functionality
from a well-designed CBMM system [7][8][9]:
Find the Remaining Useful Life of assets
Estimate the Failure Rate for assets
Design a Predictive Maintenance Schedule
Maintain right levels of Inventory for spare parts
Schedule right skilled and sized workforce
Optimize Inspection routines
Decide right Warranty period at design time
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
34
Evaluate What If alternate scenarios
Compare different designs for reliability evaluation
A major challenge in implementing a CBMM system for high-speed fleet,
however, is: processing the enormous volumes of data streamed-in from
sensors attached to the high-speed vehicle in real-time. This requires:
Parallel architectures capable of handling large volumes of data,
Low payload data-structures that optimize sensor data bandwidth,
Fault-tolerance capabilities that can deal with packet drops and
fragile networks for real-time data streaming,
Adaptable ontologies capable of supporting varied data types and
protocols in parallel,
Proof based security to ensure data privacy and anonymity.
Latest advancements in the Big-data open-source family of technologies
offer viable solutions for the above requirements [5][6]. However, before
one can design such big-data solution for the CBMM, the design process
has to go through the requirement gathering and specification mapping
stages to be able to accurately capture the customer requirements and realize
them in software. The following section elaborates on this.
3. THE CBMM SYSTEM DESIGN PROCESS
The design process starts with requirement gathering, which can be
classified as addressing the three solution enabler stages as indicated below:
Stage 1: Sensor data capturing stage
Stage 2: Real-time stream processing stage
Stage 3: Predictive failure-detection stage
The requirement gathering for stage 1 encompasses collecting information
from the customer on the requirements of data capturing and real-time
monitoring. Some of the questions that help gathering information from the
customers at this stage are:
What data should be collected and which sensors should be used?
E.g. thermal imagery, audio signals, etc.
What are the components and parts that need monitoring? E.g.
Engine Oil, Train brakes, Engine Crank Time, etc.
How frequently the data should be collected? Hourly, daily etc.
How to identify and handle faulty sensors?
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
35
In the requirement gathering for stage 2, the focus is on real-time processing
of the collected data and some of the questions that customers need to
answer in this stage are:
What is the expected data processing latency?
What should happen to the collected data post processing?
How to address missing data points and inaccuracies? For example,
a faulty sensor sending incorrect data.
For the final stage, the emphasis is on the analytical-subsystem. Customer
requirements for this stage are collected through questions such as:
Define the acceptable behavior and define the anomaly.
What are the response actions for each anomaly class?
What is the maximum acceptable time lag after the detection of the
anomaly, before the corresponding corrective action takes place?
How to deal with multiple anomalies detected at the same time?
Once complete, the gathered requirements are then formulated into a system
specification that gives a formal outline of what is the expected from the
CBMM. E.g. for the stage 1 requirements, the specifications outline what
should be the operational level notifications possible in case of network
unreachability for the sensors during the data capturing stage.
Similarly, stage 2 requirement specifications formalize the data-processing
functionality. The specifications for this stage result in a matrix like
structure as shown in the below table, where each component that is being
monitored is listed alongside the possible events it can generate and the
criticality of each event, along with what action, if any, should be carried
out by the ground/operating crew monitoring that event.
Component
Event
Source
Event
Criticality
Control
Center
Alert
Event reaction
Door
Closed after
the train
started moving
Door
side
camera
Low
-
-
Break
Emergency
break tripped
OTMR
Critical
SMS/Email/
Escalation
Matrix
Check power
supply, air
pressure
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
36
For example, in the above, one can see the component door being monitored
for the close event, with a low criticality being attributed to it, while an
emergency brake event is being monitored with high criticality attribution.
Also, in case of emergency brake event, the event reactions list possible
course of action, such as checking the power supply and air-brake pressure,
which act as resolution guidelines for the crew and/or automated resolution
solver system.
The specifications for the final stage revolve around failure prediction.
Formal guidelines are established as to how a failure should be predicted
and which data source and event should be used in the process. For
example, the below table lists trend analysis criteria and pattern matching
criteria as the stipulated methods for the door and break failure respectively.
Component
Event
Failure Indication
Door
Closed after the train
started moving
1. Delay increasing, or 2. Happening for the last
n observations (n > threshold)
Break
Abnormal break
pressure patterns
Pattern matches with historical failure data
Based on these specifications, the CBMM system collects the data at the
specified intervals from the sensors and utilizes the below methodologies to
assert the asset’s condition:
Critical range and limits: Various statistical tests are performed to
assert if the captured sensor data falls inside a critical failure range
decided by the expert and requirement specifications [10].
Trend Analysis: Verify if the vehicle condition is in a deteriorating
mode with an immediate downwards trend towards breakdown [11].
Pattern recognition: Establishes the causal relations between the
events and the vehicle breakdowns [12].
Statistical process analysis: Historical failure record data, collected
through case-study histories, warranty claims and data archives, is
processed with statistical procedures to find a suitable analytical
model for the failure curves. As new data is gathered from the
sensors, it is compared against those statistical models to predict the
future breakdowns [13].
Trend analysis and critical range limit violations can be detected with real-
time monitoring and stream processing of data. However, the pattern
recognition and statistical process analysis requires historical data to be
analyzed and compared against the real-time live data for insights. Usually
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
37
such historical data is gathered through warranty-claims and maintenance
records.
Advancements in the Big-data technologies and predictive analytics are
enabling the stream processing of high volume live-data in real-time and
matching it with the voluminous historical data offline. A reference
architecture that was created for one of our large high-speed fleet
management clients using the afore-mentioned design methodology on Big-
data using M2M is as shown below:
Figure 4. Reference architecture for condition-based maintenance mgmt. system
The layered architecture enables one to easily customize or upgrade only
particular part of the system without completely replacing the whole system.
The XML schemas used as the base to store and operate on the operating
design specifications allow cross-platform compatibility and open-systems
interoperability. Sensors communicate with the data acquisition and
manipulation layers using the M2M framework, while the condition-
detection, prognosis and health-assessment layers were implemented using
Big-data parallelism. The maintenance support layers take care of the
required notifications for the administrators and operating crew using the
report dashboards and HMI visualizations along with security restrictions.
To achieve this level of sophistication, we integrated and customized
multiple open-source frameworks to our requirements, some of which are
listed below.
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
38
Remote sensor monitoring & data capturing: OpenXc
Real-time stream processing: Storm, Kestrel, ZMQ, MQTT
Predictive analytics: R
Real-time anomaly detection: Esper, CEP
Distributed fault-tolerant storage: Hadoop, HBase
Failure report dashboards: HTML 5
Control center visualization: OpenGl, Vtk, Qt, HMI
The value-add in integrating and customizing these frameworks lies in
achieving the required level of functionality with commodity hardware,
enabling it to handle large volumes of data with adaptable ontologies all the
while reducing the sensor data bandwidth. In their native form, individually,
these open-source frameworks will not be able to achieve the afore-
mentioned objectives in a manner suitable for enterprise customers [14].
The integration and interconnection of different technologies used for
implementing this solution is as shown below:
Figure 5. Technology stack integration for our condition-based maintenance
management solution
After the initiation of a fully functional CBMM system, our customer
reports have indicated the following year-wise average savings resulted
across their business units:
Reduction in maintenance costs: 25% to 30%
Spare parts inventories reduced: 20% to 30%
Reduction in equipment downtime: 35% to 45%
Elimination of breakdowns: 70% to 75%
Overtime expenses reduced: 20% to 50%
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
39
Asset life increased: 20% to 40%
Increase in production: 20% to 25%
While the predictive technology reduced the unexpected brake-downs, the
collateral benefits, such as work-life balance (with no unexpected brake-
down calls), reduction of over-time expenses and improved asset
availability contributed to the production increase rates.
4. CONCLUSIONS
Advancement in the big-data technologies in combination with M2M and
predictive analytics is creating new possibilities for real-time analysis of
machine components for detecting failures in the early stages and avoiding
them ahead of time. Increased component availability, improved worker
and environment safety, better asset usage etc. are some of the reasons that
are attracting more operators and manufacturers to embrace condition-based
maintenance strategy in their operations. Designing such a system for high-
speed fleet, however, requires special attention to the design methodologies
used for collecting the operating requirements from the users and translating
them into big-data parallel architectures that are capable of exhibiting fault-
tolerant behavior and load-balancing possibilities to sustain the real-time
data processing demands. This paper presented reference architecture for
one of our big-data M2M systems we designed as a large fleet-management
solution for a customer and showcased the technology framework
interconnects used in the said system. With more and more customers
becoming interested in these solutions, one can expect more solutions built
on these architectures using the listed frameworks and suggested design
methodologies in the future.
REFERENCES
[1] Romain Bosquet, Pierre-Olivier Vandanjon, Alex Coiret, and Tristan Lorino,
Model of High-Speed Train Energy Consumption, World Academy of Science,
Engineering and Technology, 2013.
[2] Jui-Sheng Chou, Changwan Kim, Yao-Chen Kuo, Nai-Chi Ou, Deploying
effective service strategy in the operations stage of high-speed rail, Transportation
Research Part E: Logistics and Transportation Review, 47(4):507-519, July 2011.
[3] Gopalakrishna Palem, Condition-Based Maintenance using Sensor Arrays and
Telematics, International Journal of Mobile Network Communications &
Telematics, 3(3):19-28, 2013.
[4] Gopalakrishna Palem, M2M Telematics & Predictive Analytics, Technical Report,
Symphony Teleca Corp., 2013
[5] Dino Citraro, Expanding Real-Time Data Insight at PARC, Big Data, 1(2): 78-81,
2013
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
ISSN: 1694-2108 | Vol. 17, No. 1. JANUARY-JUNE 2017
40
[6] Okuya Shigeru, M2M and Big Data to Realize the Smart City, NEC Technical
Journal, 7(2), 2012
[7] Moubray, John. Reliability-Centered Maintenance. Industrial Press. New York,
NY. 1997
[8] Nowlan, F. Stanley, and Howard F. Heap. Reliability-Centered Maintenance.
Department of Defense, Washington, D.C. 1978. Report Number AD-A066579
[9] NFPA 1911, Standard for the Inspection, Maintenance, Testing, and Retirement of
In-Service Automotive Fire Apparatus, 2007 Edition, 6.1.5.1, p1911-14
[10] Hodge, V.J. and Austin, J A survey of outlier detection methodologies, Artificial
Intelligence Review, 22 (2). pp. 85-126, 2004
[11] Sematech, Failure Reporting, Analysis and Corrective Action System, 1993
[12] Felix Salfner, Predicting Failures with Hidden Markov Models, In Proceedings of
the 5th European Dependable Computing Conference (EDCC-5), 2005
[13] Weibull, W. A statistical distribution function of wide applicability, Journal of
Applied Mechanics-Trans. ASME 18 (3): 293–297. 1951
[14] Tristan Müller, How to choose a free and open source integrated library system,
OCLC Systems & Services, 27(1): pp.57 – 78, 2011
This paper may be cited as:
Gopalakrishna, P. 2017. Designing Condition-based Maintenance
Management Systems for High-Speed Fleet. International Journal of
Computer Science and Business Informatics, Vol. 17, No. 1, pp. 28-40.