ArticlePDF Available

Abstract and Figures

Downtime affects the production capability of physical assets by reducing output, increasing operating costs and interfering with customer service. Once a unit experiences a service downtime or downgrade, the covariates or risk factors can directly shows impact on the delay in repairing activities. This study reveals the risk factors that either delay or accelerate repair times, and it also demonstrates the extent of such delay, attributable to specific risk factors. The potential risk factors provide necessary inputs in order to improve operation performance. Once risk factors are detected, the maintenance planners and maintenance supervisors are aware of the starting and finishing points for each repairing job due to their prior knowledge about the potential barriers and facilitators. The study employs semi-parametric approaches in a different way using the proportional hazards model to examine the relationship between repair time and various risk factors of interest. The properties of the hazard function for the repair time problem are critically examined and the major findings are highlighted. This paper demonstrates on technicians underlying characteristics estimation using proportional hazards model. A case study has been performed from a data set collection of 1169 air conditioners maintenance records in 2001 from one of the universities in Malaysia. The sample consists of repair time data and background characteristics of the technicians. The estimation can be used as a benchmarking to develop quality services and products in enhancing competitiveness among service providers in maintenance field.
Content may be subject to copyright.
European Journal of Scientific Research
ISSN 1450-216X Vol.33 No.4 (2009), pp.692-702
© EuroJournals Publishing, Inc. 2009
http://www.eurojournals.com/ejsr.htm
Reliability Analysis of Repair Time Data Using
Semi-Parametric Measures
Burhanuddin Mohd. Aboobaider
Faculty of Information and Communication Technology
Universiti Teknikal Malaysia Melaka, Locked Bag 1200
Ayer Keroh, 75450 Melaka, Malaysia
E-mail: burhanuddin@utem.edu.my
A. Rahman Ahmad
Faculty of Computer Science and Information System
Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
E-mail: ahmadar@utm.my
M. Ataharul Islam
School of Mathematical Sciences, Universiti Sains Malaysia
11800 USM Penang, Malaysia
E-mail: mataharul@yahoo.com
Anton Satria Prabuwono
Faculty of Information Science and Technology
Universiti Kebangsaan Malaysia, 43600 UKM Bangi
Selangor D.E., Malaysia
E-mail: antonsatria@ftsm.ukm.my
Abstract
Downtime affects the production capability of physical assets by reducing output,
increasing operating costs and interfering with customer service. Once a unit experiences a
service downtime or downgrade, the covariates or risk factors can directly shows impact on
the delay in repairing activities. This study reveals the risk factors that either delay or
accelerate repair times, and it also demonstrates the extent of such delay, attributable to
specific risk factors. The potential risk factors provide necessary inputs in order to improve
operation performance. Once risk factors are detected, the maintenance planners and
maintenance supervisors are aware of the starting and finishing points for each repairing
job due to their prior knowledge about the potential barriers and facilitators. The study
employs semi-parametric approaches in a different way using the proportional hazards
model to examine the relationship between repair time and various risk factors of interest.
The properties of the hazard function for the repair time problem are critically examined
and the major findings are highlighted. This paper demonstrates on technicians underlying
characteristics estimation using proportional hazards model. A case study has been
performed from a data set collection of 1169 air conditioners maintenance records in 2001
from one of the universities in Malaysia. The sample consists of repair time data and
background characteristics of the technicians. The estimation can be used as a
Reliability Analysis of Repair Time Data Using Semi-Parametric Measures 693
benchmarking to develop quality services and products in enhancing competitiveness
among service providers in maintenance field.
Keywords: Delay in Repairing, Proportional Hazards, Corrective Maintenance, Risk
Factor Analysis
1. Introduction
Downtime affects the production capability of physical assets by reducing output, increasing operating
costs and interfering with customer service. Once a unit experiences a service downtime or downgrade,
the covariates or risk factors can directly impact on the delay in repairing activities. The repair time can
vary from relatively simple products, such as light bulbs, tires, toasters, or articles of clothing, to
complex systems such as aircrafts, centralized air-conditioning systems, power generating systems, or
communications devices. These items are engineered and manufactured to perform in some specified
manner when operated under normal operating conditions. However, the internal factors could be
easier to tackle as the control always varies within management control.
This paper demonstrates on technicians underlying characteristics estimation using proportional
hazards model. The paper contributes on the extension work by [1] and [2] to analyze delays in failure-
based maintenance. A case study has been performed from a data set collection of 1169 air
conditioners maintenance records in 2001 from one of the universities in Malaysia. The sample
consists of repair time data and background characteristics of the technicians.
2. Related Work
Failure-based maintenance is unscheduled maintenance of repair to return the machine to a defined
state. Ref. [3] integrated some covariates into the failure-based maintenance. Systematic measurements
on covariates illustrated with the use of the multistate hazards model for transition and reverse
transition among more than one transient state emerged from follow-up studies discussed by [4]. The
paper discussed the extension of the use of the multistate proportional hazards model (PHM) for
analyzing transitions in contraceptive use over time and illustrates the score test on testing the equality
of parameters for models on transitions and repeated transitions. Ref. [5] describes the underlying
failure-causing mechanisms, such as degradation and wear, with the assessments of item survivability.
A more appealing approach would be to choose a model based on the physics of failures and the
characteristics of the operating environment.
Ref. [6] gives a few reasons on why Taguchi methods have not been employed commonly
while repairing the failures. The reasons are the performance of a service process is very difficult to
measure accurately where the outcome of a service process is inherently much more inconsistent in
quality than its manufacturing counterparts. Furthermore, the service process has more ‘noise factors’
that are difficult, expensive or impossible to control. Then [7] modified the model of operation in the
service or technical center, where they discussed Six-sigma methodology to measure machine
downtime to produce products with zero defects. Usage of the 80-20 rule which states 20% of the
defects account for 80% of the quality loss applied to minimize the number of experiments necessary
to study the effects of various factors on the performance of a process.
Ref. [8] develops a robust regression metamodel for a maintenance float policy which was
originally referred to “repairman’s problem”. All the factors affecting the system were identified and
classified into design factors (controllable variables) and noise factors (uncontrollable variables). Then
[9] discussed the three-level departure points in quality management. Maintenance has been defined as
the control system by itself using matrix representatives. The columns of the matrix show the
differentiation between structural and social aspects on a self-organizing layer, an adaptive layer and a
control layer, and the rows show the various levels of control. By handling both aspects, a better failure
694 Burhanuddin Mohd. Aboobaider, A. Rahman Ahmad,
M. Ataharul Islam and Anton Satria Prabuwono
recovery model can be formed. However both, [8] and [9] models are similar to analyze data by a
linear approach which may not provide us with an accurate computational procedure for the reliability
of repair time estimation. Ref. [10] introduced better model to evaluate potential causes of failures and
their effects using a multi-attribute failure mode analysis, where the ranking of failure causes can be
obtained by using qualitative pair wise. Then [11] discussed a new technique for prioritizing failures
for corrective actions in failure mode and effect analysis by setting the Risk Priority Number and the
Risk Priority Rank (RPR). Failure modes with higher RPR are considered to be more important as
having higher priority. Ref. [12] distinguishes failure modes and their basic maintenance policies in
failure-based and preventive maintenance.
Ref. [1] defines the functional requirements of a decision support system necessary to serve as
working tool in assisting the reliability engineer in equipment repair or replacement management of a
material handling equipment. They address the statistical techniques on functional requirements of a
decision support system applicable to reliability modeling of repairable and non-repairable system(s).
Ref. [2] extends the study using the Proportional Hazards Model (PHM) for a renewal process,
homogeneous and non-homogenous repairable system with the application of a Poisson process. Their
study shows the few types of failure been assigned as covariates to estimate the hazards ratios
accordingly using proportional hazards technique to assess failure behavior at different operating
conditions. Ref. [13] used decision support approach to solve problems in concrete bridge decks
maintenance activities. They proposed different decision for different type of repair i.e. shallow repair,
deep repair, protective repair, non-protective repair and deck replacement. All decisions must consider
cost of repair when making any recommendation. Ref. [13] also deliberated decision support system to
model human reasoning and decision making process for concrete bridge decks. At the end of the
study, they have concluded that buying sophisticated hardware or software is not the complete answer.
But justification on middleware software and object oriented system by integrating some maintenance
techniques in to the decision support system is another potential area to consider.
These are two main objectives in the present study. First objective is to extend the use of the
complex system failure analysis suggested by [1] and [2] to analyze delay in repair time. Then second
objective is to find ways to estimate the risk factors significance at specific time alternately using semi-
parametric approach.
3. Delay in Repair Time: Some Critical Issues
Total Planned Quality Management systemizes all preventive, predictive, corrective maintenances, plus
the control of maintenance quality control. In Total Planned Quality Management, maintenance
strategies should contain consideration of the following elements [14]:
(i) Maintenance organization and management;
(ii) Measures of effectiveness;
(iii) Work control;
(iv) Maintenance management information system;
(v) Personnel records regarding competences;
(vi) Technical documentation;
(vii) Maintenance tasks; and
(viii) Maintenance engineering.
The operational unit may be considered as one of three status conditions at a particular time as
shown in Figure 1. When the unit experiences failure, it follows the queue to be repaired. The service
provider should offer quality service by providing shortest response time to the plant, which is the sum
of the technician traveling and the repairing times. Once a unit is brought to be repaired, there will be a
chance for delay due to a lot of varying factors. Each factor has a different kind of solution with
different costs. It is essential to spend properly at the right time to optimize the budget. According to
Reliability Analysis of Repair Time Data Using Semi-Parametric Measures 695
the 80-20 rule, 80% of the delay arises from the same 20% of the possible causes. Our study measures
those 20% of the effectiveness by estimate the most significant risk factors using the PHM.
Figure 1: Operation Cycle of the Repairable Unit
Operational
FailureBeingrepaired
Usually, a service provider sets the Standard Level Agreement (SLA) within the organization on
the repair time for specific major components as a benchmark or goal in order to be a vendor of choice
in a competitive market. A standard maintenance procedure is followed to ensure that the task is
completed within the SLA’s. Figure 2 shows the progress of troubleshooting on repair represented by a
line, L1. For most of the cases, the technical staff will troubleshoot as shown by L2 and update the
documentation accordingly but still meeting the goal, G through line L3. The organization always sets
the standard maintenance procedure to follow in order to meet the goal, G at all times. However, there
are always special cases where troubleshooting or repair time misses the target and causes a delay
along line L4. At this stage, the delays are due to some risk factors or covariates. The present study
focuses on improving the performance of the troubleshooting activities by using statistical reliability
theory to measure the significance of those risk factors or covariates.
Figure 2: Progress of Troubleshooting on Repair
4. Components of Delay in Repair Time
Traditionally, many companies employed reactive strategies, corrective or Failure-Based Maintenance
(FBM): repair the machines only when they failed. FBM is conducted in the following way [15]:
(i) Perfect repair: a maintenance action that restores the system operating condition to be as good
as new;
(ii) Minimal repair: a maintenance action that restores the system operating state to be as bad as
old;
(iii) Imperfect repair: a maintenance action that restores the system operating state to be somewhere
between as good as new and as bad as old;
696 Burhanuddin Mohd. Aboobaider, A. Rahman Ahmad,
M. Ataharul Islam and Anton Satria Prabuwono
(iv) Worse repair: a maintenance action that makes the operating condition worse than that just
prior to the failure; and
(v) Worst repair: a maintenance action that makes the system fail or break down undeliberately.
The present study shows that the repair time can be a function of a few factors such as number
of repairmen, available number of machines queue for repairing, characteristics of repairman,
characteristics of machines, nature of problems, availability of tools, availability of procedures and any
other obstacles. These factors can be assigned as risk factors or covariates. Most of the time, to
eliminate these major covariates, is necessary either to redesign a process or to retrain the people who
face such covariates that cause delays. In practical, both the repair and the failure distributions follow
arbitrary or general distributions. As shown in Figure 3, customers have a clearer idea on the operation
unit at position a. Once a machine is down along position b, the customer does not know much about
the repairing details as background work has been involved. Responding to the problem, step by step
troubleshooting, diagnosis and root cause finding, which are very technical, will be handled by the
technicians. Since repair time is the part of the machine downtime, which is of significance, delay at
this stage affect the total downtime of the unit. The factors that affect the delay in repair time can be
explored when more risk factors or covariates can be included into the model.
Figure 3: Operation Unit Status
ab c
t
down up
5. Troubleshooting Time Delay Using PHM
Repair data can be analyzed for patterns in a variety of ways. They can help with the identification of
the primary sources of repeated problem. It is important to discover exactly how much time is spent on
repairing tasks. Overtime spent in emergency maintenance affects the cost of the process and data
about this can indicate the economic and other benefits in improving the process. Repairing and
problem solving activities in maintenance field are categorized as repetitive events as a machine would
be faulty again in future. The status of machines in operation can be represented using a state diagram
with a few symbols. The system may be in one of the states at any time where the first letter of the
symbol denotes the mode of the unit and the second corresponds to the progressing state of the device.
Our study focuses on the delay in bringing up the machine, which has already experienced failure. The
state transition diagram can be simplified and the interference of external factors or covariates that
might arise at a few of the states as shown in Figure 4.
Normally, a failed unit will be attended to by the first level support team which perform regular
breakdown maintenance at state S1 and return it back to normal operation at state S2 without facing
any problem. However, once the unit experiences a transition to states S3, S4 or S5, there is a high
possibility of a delay and it varies on the basis of the significant risk factors that exist prior to the
occurrence of an event. These are unresolved problems where categories are the special cases that have
been focused on in our study. Risk factors may cause transition from S1 to either S3 or S4 due to such
factors as technical competency, experience, age or training background of the technicians. It also
could be due to background characteristics of the machine such as aging or bad records of preventive
maintenance. There could also be other obstacles such as ordering delay of the replacement parts.
Reliability Analysis of Repair Time Data Using Semi-Parametric Measures 697
Figure 4: Troubleshooting State Diagram
The present research uses a hypothetical project to allow robust statistical analysis by using
Proportional Hazards Model (PHM) to estimate the reliability while considering explanatory variables
simultaneously. The PHM is the most suitable model for analyzing repair times as it is able to estimate
several explanatory variables or risk factors simultaneously. By fitting this model to the breakdown
data, the estimates for the impact of a few risk factors can be obtained. Even if the breakdown teams
are similar with respect to the variables known to the reliability effect, by using PHM with these
prognostic variables may produce a more precise estimate of the troubleshooting performance, for
example, by narrowing the confidence interval and likelihood estimates. This gives the guidelines for
adjusting the existence of certain risk factors as a milestone to improve the reliability of the
maintenance operation.
The variables can be represented as a vector of covariates, Z = (Z1, Z2,…, Zp). The
corresponding vector of regression parameters can be represented as β = (β1, β2,…, βp). The general
form of the proportional hazards function is h(t, z) = h0(t) g(z). Baseline hazard function is h0(t) and
g(z) is the exponential expression for the sum of the corresponding explanatory variables, which can be
either continuous or discrete [16]. The reliability function, R(t), can be obtained as R(t; Z) = P(T t | Z
= z), where T is the associated failure time. Let Z denotes the regression vector of explanatory variables
(Z1, Z2,…,Zp) with t the associated failure time. Let β denotes the vector of unknown regression
parameters associated with the explanatory variables, β = (β1, β2,…, βp)', then the hazards relationship
is given by [17] as hi(t) = h(t | z) = h0i (t) exp(Zβ); i=1, 2, ... or log[hi(t)] = log[h0i (t)] + (Zβ); i=1, 2, ...
The set of parameters h0i (t) is called the baseline hazard function whose purpose is merely to
control the explanatory variables of interest, β for any changes in the hazards over time. The reliability
function is given by [18] as: R(t; z) =
t
0
de)(
0
h
e
ττ
zβ. Ref. [16] defined the partial likelihood as the
products of several likelihoods, one for each of, say, n failure times. Thus, at the ith failure time, Li
denotes the likelihood of failing at this time, given the machine has already survived up to this time,
represented as: L = L1 x L2 x L3 x … x Ln =
=
n
i1
Li, where Li is the portion of L for the ith failure time.
Let Zl denote the vector of explanatory variables for the lth individual. Let t1 < t2 < … < tk.
denote the k distinct, ordered event times. Let fi denote the multiplicity of failures at event time, ti. That
is, fi is the size of the set Fi of individuals that fail at ti. Let Si be the sum of the vectors zl over the
698 Burhanuddin Mohd. Aboobaider, A. Rahman Ahmad,
M. Ataharul Islam and Anton Satria Prabuwono
individuals who fail at ti, that is, i
Fl zl. Let Ri denote the risk set just before the ith ordered event
time ti. Let Ri* denote the set of individuals whose event or censored times exceed ti or whose censored
times are equal to ti. Then the exact type of partial likelihood function is:
=
=
k
ij
t
t
e
e
i
f
i
Rl dtee
101
)(
*
)(
1
βz
βz
l
)
j
(
The coefficients for each covariate can be examined. For an explanatory variable a positive
regression coefficient means that the hazards are higher. This implies that the prognosis is worse for
higher values. Conversely, a negative regression coefficient implies a better prognosis for higher
values of the variable. The PHM requires that, for any two covariate sets z1 and z2, the hazard functions
are related as h (t; z1) h (t; z2), 0 < t < .
6. Empirical Results
Air conditioning corrective maintenance records in one of the public university in Malaysia, which
consist of 1169 records on split unit and window unit repair time data had been collected for the year
of 2001. Most of the components are renewable with the non-homogenous involvement of the risk
factors. Technicians performing the troubleshooting have different backgrounds such as qualification,
competency, experience and other underlying characteristics. They are expected to handle various
types of repairable and renewable air conditioning units. The variable TYPE takes the value 1 for a
split unit and 2 for a window unit. The variable AGE takes a value 0 for technician age below 25 years
old and 1 for technician age 25 years old and above. The variable EXP takes a value 0 for technicians
who have less than 5 years working experience and 1 for those with working experience 5 years and
above. QLF carries the value of 1 for technicians who have academic qualification of Malaysian Lower
Certificate Education, 2 for technicians who have academic qualification of Malaysian Certificate
Education and 3 for technicians who have technical certificates. All the tests have been conducted
based on the dependant variable of repair time, which is in the hours scale.
A frequency table of repair times (in percentage) is given in Table 1. Less than 35% of the
repairs are completed within 48 hours. Hence, more than 65% of the repair activities take more than 2
days. This justifies that further analysis should be done to investigate the delay in repairing time.
Reliability Analysis of Repair Time Data Using Semi-Parametric Measures 699
Table 1: Relative Frequency of Repair Times
Repair Time
Class (Hour) Frequency Percentage
0-48 408 34.90%
49-96 91 7.78%
97-144 102 8.73%
145-192 99 8.47%
193-240 92 7.87%
241-288 61 5.22%
289-336 39 3.34%
337-384 46 3.93%
385-432 50 4.28%
433-480 50 4.28%
481-528 28 2.40%
529-576 26 2.22%
577-624 28 2.40%
625-672 14 1.20%
673-720 18 1.54%
721-768 17 1.45%
Total 1169 100.00%
Table 2 gives statistical analysis on repair times. It can be observed that the mean repair time
increases rapidly with the delay class and standard deviations vary inconsistently. This indicates that
there is variability in repair activities, which might be influenced by some risk factors.
Table 2: Statistical Analysis on Repair Times
Repair Time (Hour)
Class Freq Sum Mean StdDev
0-48 408 3,178 7.79 11.24
49-96 91 6,351 69.79 15.91
97-144 102 12,026 117.90 16.31
145-192 99 16,336 165.01 15.88
193-240 92 19,623 213.30 15.29
241-288 61 15,772 258.55 15.20
289-336 39 12,222 313.38 13.32
337-384 46 16,417 356.90 15.81
385-432 50 20,527 410.54 12.52
433-480 50 22,444 448.88 14.32
481-528 28 14,088 503.15 18.13
529-576 26 14,177 545.28 12.55
577-624 28 16,722 597.21 16.43
625-672 14 9,105 650.36 16.99
673-720 18 12,522 695.66 19.26
721-768 17 12,450 732.35 13.04
Table 3 shows the analysis on risk factors. Technicians with age 25 years and above have less
mean repair time (193.06 hours) compared to the mean repair time of their younger counterparts
(253.31 hours). Academic qualification is divided into three groups, i.e., QLF1 (QLF = 1), QLF2 (QLF
= 2) and QLF3 (QLF = 3). Technicians with technical qualification, QLF3 have the smallest standard
deviation (102.46 hours). This indicates that this group has less variability in handling repair activities.
Technicians with working experience of 5 years and above can repair the machines faster with a mean
repair time of 110.33 hours.
700 Burhanuddin Mohd. Aboobaider, A. Rahman Ahmad,
M. Ataharul Islam and Anton Satria Prabuwono
Table 3: Analysis on Risk Factors
Repair Time (Hour)
Variable Value Frequency
Total Mean StdDev
0 364 92,205 253.31 225.45
AGE 1 609 117,573 193.06 183.07
1 293 56,895 194.18 202.94
2 356 110,698 310.95 228.31
QLF 3 324 42,186 130.20 102.46
0 600 168,625 281.04 215.47
EXP 1 373 41,153 110.33 117.56
Reliability analysis of real data can be obtained by using the PHM introduced by [19]. The
present work estimated the parameters using Phreg procedure in the SAS computer software. The
maximum partial likelihood estimates using the Phreg procedure are given in Table 4. Academic
qualifications, QLF1 and QLF2 are combined as QLF12, then it is compared with the technical
qualification, QLF3 (QLF = 3) predictor. The estimates gave highly significant values (p-value =
<.0001) for the academic qualification, QLF12, QLF3 and experience (EXP) risk factors. The PHM
estimates show that the technician experience (EXP) factor has a risk ratio of 2.346, which is more
than 1. This means that when handling repairing activities, technicians with experience 5 years and
above reduce the delay in repair time by a factor of 2.35. In contrast, technicians with less experience
took a longer time in troubleshooting.
The parameter estimate for academic qualification, QLF12 is negatively associated compared to
the technical qualification, QLF3, which is -0.52962. This indicates a reduction in the risk when
technical qualification, QLF3 increases. This shows that technicians, who possess technical
qualifications, are able to solve the problems faster than their counterparts with academic qualification.
Providing more technical training to the technicians can enhance their effectiveness in handling
troubleshooting activities.
The PHM estimates show that technicians 25 years and above (AGE) are positively associated
but with a very small coefficient of 0.01639. The risk ratio is 1.017, which is close to one. This means
that the older and younger technicians are equivalently competent in repairing the equipment.
Furthermore, the statistics for the AGE factor is not statistically significant (p-value = 0.8282). Thus,
there is no evidence of an increasing or decreasing trend over time in the hazard ratio for AGE factor.
Note that we interpret the results differently as negative association in reliability or survival means
positive association in repair time or failure recovery intervals.
Table 4: Proportional Hazards Estimates
Variable Parameter Estimation Std Err Chi-Sq Pr>Chi-Sq Risk Ratio
AGE 0.01639 0.07553 0.0471 0.8282 1.017
QLF12 -0.52962 0.08393 39.8155 <.0001 0.589
QLF3 0.52962 0.08393 39.8155 <.0001 1.698
EXP 0.85263 0.0872 95.6082 <.0001 2.346
Based on the proportional hazards estimates as shown in Table 4, the estimated parameters of
the regression for the risk ratio can be measured as:
)(
),(
0t
zt
λ
λ
= e 0.01639 AGE – 0.52962 QLF12 + 0.52962 QLF3 + 0.85263 EXP
Generally, reliability analysis involves a study on how to ensure a longer survival time for the
equipment. In contrast, for the repair time analysis, the shorter the period of the troubleshooting over
time is better. This is our main contribution in this paper, where we use hazards function differently to
Reliability Analysis of Repair Time Data Using Semi-Parametric Measures 701
estimate risk factors during failure recoveries intervals. The above regression can be used to measure
and estimate the risk ratios for the stratification by different categories of risk factors. Then, the risk
factors can be adjusted accordingly to reduce risk or hazards ratio to shorten the troubleshooting time.
7. Discussion and Conclusion
The PHM is applied in the present study to evaluate failure behavior at different operating conditions.
The PHM is robust, where estimates of air conditioning troubleshooting performance can be computed
even with an unspecified baseline function. The model is able to analyze multivariate data and allows
the isolation of the effects of air conditioning failures from the effects of different technicians
underlying characteristics. From the estimates, if the repair activities are handled by technicians with
five years and above of experience, are able to conduct the repair activities 85% better than their less
experienced counterparts. This is one of the key results, which shows that the less experienced
technicians contribute to major delays in troubleshooting activities. It is evident that the involvements
of experienced technicians in any troubleshooting activities reduce the delay.
Consequently, if the repair activities are handled by technicians who have a technical
qualification background, the reduction of the risk is almost double compared to technicians who have
only academic qualifications. The estimates can be used as guidelines for decision making process
such as setting hiring targets and type of training for the technical staff. For instance, maintenance
management should hire new workers with five years or above experience and technically qualified.
The organization should also try to keep all current experienced technicians in the department by
providing fair assessments, reviews and benefits. Another useful finding is that the risk ratios of
technicians with only academic qualification are negatively associated. The estimates demonstrate that
the technicians who possess improved technical qualifications are able to solve problems faster. This is
an important element, which implies that upgrading academic qualifications does not give much impact
in the reduction of the delay in repair times. The proportional hazards estimates show 53% efficiency if
technically qualified technicians repair the equipment. Maintenance management personnel should
send their technicians for more technical training aggressively, as this can enhance their efficiency in
handling troubleshooting activities.
Acknowledgement
The authors are grateful to the manager of the food processing company to allow us to conduct the case
study. Thanks to the Dean of Faculty of Information and Communications Technology, Universiti
Teknikal Malaysia Melaka for providing facilities and support in the completion of the paper.
702 Burhanuddin Mohd. Aboobaider, A. Rahman Ahmad,
M. Ataharul Islam and Anton Satria Prabuwono
References
[1] S.Y. Choy, J.R. English, T.L. Landers and L. Yan, 1996. Collective Approach for Modeling
Complex System Failures, Proceedings of IEEE Annual Reliability and Maintainability
Symposium, pp. 282-286.
[2] P.V. Prasad and K.R. Rao, 2002. Reliability Models of Repairable Systems Considering the
Effect of Operating Conditions, Proceedings of IEEE Annual Reliability and Maintainability
Symposium, pp. 503-510.
[3] Y. Lam and Y.Z. Lin, 2004. A Shock Model for the Maintenance Problem of a Repairable
System, Computers and Operations Research, Vol. 31, pp. 1807-1820.
[4] M.A. Islam, 1994. Multistate Survival Models for Transitions and Reverse Transitions: an
Application to Contraceptive Use Data, Journal of the Royal Statistical Society A, Vol. 157,
Part 3, pp. 441-455.
[5] N.D. Singpurwalla, 1995. Survival in Dynamic Environments, Statistical Science, 1995, Vol
10, No. 1, pp. 86-103.
[6] A. Kumar, J. Motwani and L. Otero, 1996. An Application of Taguchi’s Robust Experimental
Design Technique to Improve Service Performance, International Journal of Quality and
Reliability Management, Vol. 13, No. 4, pp. 85-98.
[7] R.S. Behara and J. Lemmink, 1997. Benchmarking Field Services Using a Zero Defects
Approach, International Journal of Quality and Reliability Management, Vol. 14, No. 5, pp.
512-526.
[8] I.E. Madu, 1999. Robust Regression Metamodel for a Maintenance Float Policy, International
Journal of Quality and Reliability Management, Vol. 16 No. 5, pp. 433-456.
[9] H.V. Water, 2000. A Maintenance Model for Quality Management, International Journal of
Quality and Reliability Management, Vol. 17, No. 7, pp. 756-770.
[10] M. Braglia, 2000. MAFMA: Multi-Attribute Failure Mode Analysis, International Journal of
Quality and Reliability Management, Vol. 17 No. 9, pp. 1017-1033.
[11] N.R. Sankar and S.P. Bantwal, 2001. Modified Approach for Prioritization of Failures in a
System Failure Mode and Effects Analysis, International Journal of Quality and Reliability
Management, Vol. 18, No. 3, pp. 324-335.
[12] W. Geert and P. Liliane, 2004. Maintenance Concept Development: A Case Study,
International Journal Production Economics, Vol. 89, pp. 395-405.
[13] Y. Sherif, A. Osama, F. Imran and R. Dennis, 2008. A Decision Support System for Concrete
Bridge Deck Maintenance, Advances in Engineering Software, Vol. 39, pp. 202-210.
[14] M.C. Eti, S.O.T. Ogaji and S.D. Probert, 2006. Strategic Maintenance-Management in Nigerian
Industries, Applied Energy, Vol. 83, pp. 211-227.
[15] M. Adamantios and W. Zhao, 2005. Modeling and Analysis of Repairable Systems with
General Repair, Proceedings of IEEE Annual Reliability and Maintainability Symposium, pp. 1-
8.
[16] D.G. Kleinbaum, 1996. Survival Analysis: a Self-Learning Text, Springer-Verlag, New York
Inc., USA.
[17] J.F. Lawless, 1982. Statistical Models and Methods for Lifetime Data, John Wiley and Sons,
USA.
[18] J.D. Kalbfleisch and R.L. Prentice, 1980. The Statistical Analysis of Failure Time Data, John
Wiley and Sons, Inc, USA.
[19] D.R. Cox, 1972. Regression Models and Life-Tables with Discussion, Journal of the Royal
Statistical Society B, Vol. 34, No. 2, pp. 187-220.
... This paper demonstrates equipment underlying characteristics estimation using proportional hazards model. The paper contributes on the extension work by [2] and [3] to analyze delays in failure-based maintenance. For estimation, an empirical study has been performed from a data set collection of 1203 air conditioners maintenance records in 2011 and 2012 from food processing industries in Malaysia. ...
... They used numerical techniques on functional requirements of a decision support system applicable to reliability modeling of repairable and non-repairable system(s). Ref. [2] extends the study using the Proportional Hazards Model (PHM) for a renewal process, homogeneous and non-homogenous repairable system with the application of a Poisson process. Their study shows the few types of failure been assigned as covariates to estimate the hazards ratios accordingly using proportional hazards technique to assess failure behavior at different operating conditions. ...
... These are two main objectives in the present study. First objective is to extend the use of the complex system failure analysis suggested by [1], [2] and [19] to analyze delay in repair time. Then second objective is to find ways to estimate the risk factors significance at specific time alternately using proportional hazards model. ...
Article
Full-text available
Once a unit experiences a service downtime or downgrade; the covariates or risk factors can directly shows impact on the delay in repairing activities. In this paper, the risk factors are revealed that either delay or accelerate repair times, and it also demonstrates the extent of such delay, attributable to the underlying characteristics of the equipment. The potential risk factors provide necessary inputs in order to improve operation performance. Once risk factors are detected, the maintenance planners and maintenance supervisors are aware of the starting and finishing points for each repairing job due to their prior knowledge about the potential barriers and facilitators. This study employs semi-parametric approaches in a different way to examine the relationship between repair time and various risk factors of interest. The properties of the hazard function for the repair time problem are critically examined and the major findings are highlighted. This paper focused to estimate underlying characteristics of the machines during failures, which may prolong the troubleshooting time. An empirical study has been accomplished to estimate the risk factors. There are 1203 air conditioners maintenance records in 2011 and 2012 are collected from food industries in Malaysia. The empirical studies estimates repair time data and background characteristics of the machines. The estimates show that the equipment suppliers and models are significant risk factors in corrective maintenance. If these risk factors are managed accordingly, repair time can be reduced tremendously. The estimation also can be used to improve availability of the machines and their reliabilities.
... Reviewing the pervious researches showed that the repair time can be as a function of many factors such as number of repairmen, available number of machines, characteristics of repairman and machines, nature of problems and any other parameters [26]. Due to relation of repair time duration and the problem nature, the repair time duration maybe shorter related to the type and measure of the failure. ...
Article
Full-text available
In this research, a new method named minimal pumping (MP) method has been proposed. In this proposed method, the required time for damaged pipe repairing is minimized using up stream pipes and manholes capacities, and therefore pumping cost required for diversion is reduced. Generally, wastewater bypass is necessary for repairing the wastewater pipelines using some trenchless repair methods. As a result, upstream of the damaged pipe is plugged with plug and packer and the wastewater is pumped to downstream of damaged pipe. If repair time is long, bypassing would be very expensive and therefore, here, the MP method is proposed in which it significantly reduces the repair cost. In this research, as a case study, the wastewater pipelines of Mirzakouchek Khan Boulevard in Isfahan and wastewater network of Ardekan city are selected to apply the proposed method. Furthermore, in the second case study, the effect of repair time duration and starting time of repair in real condition are also studied using the extended period condition for analyzing. The results show the compatibility of the proposed method in reducing the repair costs of underground wastewater pipelines in urban areas. In other words, the required pumping time and related cost is reduced 70 to 85% and 70 to 100% for Mirzakouchek Khan Boulevard and Ardekan city wastewater network, respectively, using the proposed method.
... Burrell et al., [6] fitted four models: exponential, gamma, log-normal and Weibull, to data representing inter-arrival times of failure events. Researchers in [7] demonstrated on technicians underlying characteristics estimation using proportional hazards model. A case study has been performed from a data set collection of 1169 air conditioners maintenance records in 2001 from one of the universities in Malaysia. ...
... Burrell et al., [6] fitted four models: exponential, gamma, log-normal and Weibull, to data representing inter-arrival times of failure events. Researchers in [7] demonstrated on technicians underlying characteristics estimation using proportional hazards model. A case study has been performed from a data set collection of 1169 air conditioners maintenance records in 2001 from one of the universities in Malaysia. ...
Article
Full-text available
Reliability engineering is utilized to predict the performance and optimization of the design and maintenance of air conditioning systems. There are a number of failures associated with the conditioning systems. The failures of an air conditioner such as turn on, loss of air conditioner cooling capacity, reduced air conditioning output temperatures, loss of cool air supply and loss of air flow entirely are mainly due to a variety of problems with one or more components of an air conditioner or air conditioning system. To maintain the system forecasting for system failure rates are very important. The focus of this paper is the reliability of the air conditioning systems. The most common applied statistical distributions in reliability settings are the standard (2 parameter) Weibull and Gamma distributions. Reliability estimations and predictions are used to evaluate, when the estimation of distributions parameters is done. To estimate good operating condition in a building, the reliability of the air conditioning system that supplies conditioned air to the several companies' departments is checked. This air conditioning system is divided into two systems, namely the main chilled water system and the ten air handling systems that serves the ten departments. In a chilled-water system the air conditioner cools water down to 40 - 45oF (4 - 7oC). The chilled water is distributed throughout the building in a piping system and connected to air condition cooling units wherever needed. Data analysis has been done with support a computer aided reliability software, with the application of the Weibull and Gamma distributions it is indicated that the reliability for the systems equal to 86.012% and 77.7% respectively . A comparison between the two important families of distribution functions, namely, the Weibull and Gamma families is studied. It is found that Weibull method has performed well for decision making.
Article
Full-text available
The aim of this paper is to develop a new tool for reliability and failure mode analysis by integrating the conventional aspects of the popular failure mode and criticality analysis (FMECA) procedure with economic considerations. Here FMECA is approached as a multi-criteria decision making technique which integrates four different factors: chance of failure, chance of non-detection, severity, and expected cost. To aid the analyst to formulate an efficient and effective priority ranking of the possible causes of failure, the analytic hierarchy process technique is adopted. With this technique, factors and alternative causes of failure are arranged in a hierarchic structure and evaluated only through the use of a series of pairwise judgements. With this new approach to failure investigation, the critical FMECA problem concerning the (direct) evaluation of failure factors is also by-passed. The principles of the theory and an actual application in an Italian refrigerator manufacturing company are reported in the paper.
Article
Manufacturers in Europe, Japan, and the USA have widely employed the Taguchi methods of robust experimental design in optimizing product designs and manufacturing/assembly processes. However, these methods have made relatively little inroads into the service industries, for rather obscure reasons. Develops a robust experimental design to study the variabilities of a service process, namely, a customer complaint correction process, used by a small export company. The goal of the study is to reduce system response time to failures resulting from human or equipment error, equipment malfunction or damage, or unspecified abnormalities in the hardware or software modules of the system. Successfully identifies factors that affected the system response time in a statistically significant manner and yielded the optimum combination of factor levels that produce best results as measured in terms of system response time. Also demonstrates the usefulness and applicability of Taguchi methods in a service environment ‐ thus chipping away at the myth that Taguchi methods work only in a manufacturing environment.
Article
The analysis of censored failure times is considered. It is assumed that on each individual are available values of one or more explanatory variables. The hazard function (age‐specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.
Article
Notes that after-sales field service has become an established competitive differentiator among equipment manufacturers. From the service provider’s perspective, effective field service management is essential for operational productivity, customer satisfaction and retention, and profitability. Benchmarking is a process for measuring a firm’s performance against the best-in-class, and using such an analysis to meet or exceed the best-in-class performance. Develops a benchmarking approach that utilizes customer satisfaction survey results in conjunction with a zero defects metric. The applicability of this method is demonstrated through its use in competitive and internal benchmarking of equipment field services in the case of a European office-equipment manufacturer.
Article
In this article a model is presented concerning the organisation of the maintenance process of a quality system. This model consists of the composition of two existing models. The point of departure is a three-level model of quality management. Then each of these three levels has been split up into two components called “system-structural” and “social-structural”. After introducing several maintenance concepts on a conceptual level, these concepts are applied to each of these levels and components.
Article
Develops a robust maintenance float policy, that considers system design parameter settings that not only satisfy the system performance criteria but are also insensitive to various noise conditions. The experimental design strategy employed in the study involves the use of discrete event simulation. In the study, the strategy proposed involves solving a maintenance float policy using both the inner and outer arrays as advocated by Genichi Taguchi. Initial system variables and their parameter settings were chosen based on a prior study. These system variables were then classified into design factors and noise factors. An experimental design was developed using Taguchi’s orthogonal array, after which a simulation experiment was performed and additional data collected. Based on the results, regression was performed with the significant factors and interactions. From the regression analysis, a robust metamodel was developed. A cost model was also proposed.
Article
Describes a new technique for prioritizing failures for corrective actions in failure mode and effects analysis (FMEA). This technique extends the risk prioritization beyond the conventional risk priority number (RPN) method. A new scale has been defined. The ranks 1 through 1,000 are used to represent the increasing risk of the 1,000 possible severity-occurrence-detection combinations, called risk priority ranks (RPRs). The failures having a higher rank are given higher priority. This approach resolves some of the shortcomings in the traditional RPN technique. Traditionally, FMEA identifies the risk associated with a product failure through assignment of a standard RPN. A fundamental problem with FMEA is that it attempts to quantify risk without adequately quantifying the factors that contribute to risk. In particular cases, RPNs can be misleading. This deficiency can be eliminated by using the new technique. A methodology combining the benefits of matrix FMEA and the new technique as stated above is presented.
Article
Chapter 1. The basic ideas are considered: like event, survival time, censoring, survival function, hazard function. Data layouts and some descriptive measures are discussed. The principles of confounding and interaction are discussed. In a multivariable example the similarities and differences between linear, logistic regression and survival regression and the concept of a hazard ratio are discussed.This chapter was overall clear. In this book the hazard is introduced as the instantaneous potential per unit time for the event to occur. The derivation of the relationship between the hazard function and the survival function is not given (the author finds it not important, since a computer can do the calculations), while the calculations are easy to perform. We also missed a warning that censoring should be independent of the occurrence of an event. Competing risks and left censoring are mentioned but not treated in detail.Chapter 2. This chapter is about Kaplan-Meier curves, the log-rank test and the Peto test. The proof of the Kaplan-Meier formula is difficult to follow, just because the author is so hard trying to make it easy to understand. The log-rank test and Peto test are clearly explained.Chapter 3. Here the Cox Proportional Hazards model is discussed. The chapter starts with the computer output of a rather complex example with a treatment effect, a confounder plus an interaction term. Hereafter the general model is discussed. The partial likelihood is discussed but the formula is not given. The interpretation of the coefficients is discussed using the data example. Somewhere in the midst of several examples, the general rule is given, which is confusing.Chapter 4. Four different ways of checking the assumptions of the Cox model are discussed: log-log plots, comparing observed with model survival curves, Schonfeld's goodness of fit test and time dependent covariate methods. Suggestions are given about how to proceed in practice. Overall this is a good chapter, although the part about log-log plots is too elaborate.Chapter 5 is about stratification in the Cox model. There is a discussion of when to use stratification (if the PH assumption does not hold), and the possibility of interaction between the stratification variables and the other predictors is considered, with statistical tests.In Chapter 6 time dependent covariates are discussed. A distinction is made between two different situations. The first situation is a variable that changes in time (like employment status, smoking status). The second situation is a variable that is constant in time, but its effect on the hazard changes in time, so that the proportional hazards assumption does not hold. By multiplying the variable by some function of time, the model can be improved.
Article
In this paper, a shock model for the maintenance problem of a repairable system is studied. Assume that shocks will arrive according to a Poisson process. If the interarrival time of two successive shocks is less than a threshold, then the system will fail. For a deteriorating system, we assume that the successive threshold values are geometrically nondecreasing after repair, and the consecutive repair times after failure form an increasing geometric process. For an improving system, we assume that the successive threshold values are geometrically decreasing after repair, and the consecutive repair times after failure form a decreasing geometric process. A replacement policy N is adopted by which we shall replace the system by an identical new one at the time following the Nth failure. Then for each of the deteriorating system and improving system, an optimal policy N∗ for minimizing the long-run average cost per unit time is determined explicitly.