Content uploaded by Yuk-Hee Chan
Author content
All content in this area was uploaded by Yuk-Hee Chan on Jul 15, 2018
Content may be subject to copyright.
Journal of Facilities Management
Improving service management in outsourced IT operations
Stewart H.C. Wan Yuk-Hee Chan
Article information:
To cite this document:
Stewart H.C. Wan Yuk-Hee Chan, (2007),"Improving service management in outsourced IT operations",
Journal of Facilities Management, Vol. 5 Iss 3 pp. 188 - 204
Permanent link to this document:
http://dx.doi.org/10.1108/14725960710775072
Downloaded on: 11 May 2015, At: 21:09 (PT)
References: this document contains references to 21 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 1600 times since 2007*
Users who downloaded this article also downloaded:
Stewart H.C. Wan, Yuk-Hee Chan, (2008),"Improving service management in campus IT operations",
Campus-Wide Information Systems, Vol. 25 Iss 1 pp. 30-49 http://dx.doi.org/10.1108/10650740810849070
Peter Yamakawa, Claudio Obregón Noriega, Alfredo Novoa Linares, Willy Vega Ramírez,
(2012),"Improving ITIL compliance using change management practices: a finance sector
case study", Business Process Management Journal, Vol. 18 Iss 6 pp. 1020-1035 http://
dx.doi.org/10.1108/14637151211283393
Norita Ahmad, Noha Tarek Amer, Faten Qutaifan, Azza Alhilali, (2013),"Technology adoption model and a
road map to successful implementation of ITIL", Journal of Enterprise Information Management, Vol. 26 Iss
5 pp. 553-576 http://dx.doi.org/10.1108/JEIM-07-2013-0041
Access to this document was granted through an Emerald subscription provided by 272833 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
*Related content and download information correct at time of download.
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
Improving service management
in outsourced IT operations
Stewart H.C. Wan
Projects and Facilities Division,
Hong Kong Science and Technology Parks Corporation, Hong Kong, and
Yuk-Hee Chan
Department of Electronics and Information Engineering,
Hong Kong Polytechnic University, Hong Kong
Abstract
Purpose – The purpose of this paper is to first evaluate the effect of IT service management (ITSM)
tools in a practical environment followed by sharing experience in developing management process
modules in a service outsourcing model. In order to improve the fault correlation from business and user
perspectives, the aim is to propose a framework to automate network and system alerts with respect to
its business service impact and user impact for proactive notification to IT operations management.
Design/methodology/approach – Three years of quantitative analysis using real operational data
were used to present the effect on ITSM tools adoption. For the proposed framework, it consists of a
hybrid case- and rule-based reasoning module and a new approach for fault mapping with business
criticality and user activities.
Findings – Over the past decade there has been significant focus in the context of ITSM in the IT
services operations industry. In the market of ITSM software tools, customer and operational
processes are not sufficiently developed nor integrated with other management applications following
IT services daily processes which make it difficult to correlate faults to business service impacts and
user impacts. For any fault of the same severity level, traditional fault discovery and notification tools
provide equal weighting from business and user points of view.
Research limitations/implications – Most of the related works were done individually in the
entire ITSM processes. Moreover, some works present the enabling technology for outsourced
facilities management rather than IT operations management. Lack of research activity was noted in
the areas of user and business impact correlation with service management.
Practical implications – This paper outlines the implications of implementing ITSM tools in
outsourced IT operation. The business continuity planning also forms one of the critical factors to
improve responsiveness in service management.
Originality/value – This paper illustrates the effect of ITSM tools adoption by analyzing real
operation data. Central to the service-oriented philosophy in ITSM, we introduce a framework to
correlate with user and business elements. Inclusion of the dimensions of business and user impact in
the fault correlation process could further improve service efficiency and user satisfaction.
Keywords Communication technologies, Outsourcing, Universities, Hong Kong
Paper type Research paper
1. Introduction
Improving delivery of IT services through the use of structured processes has been an
active research topic over the last few years (Mayerl et al., 2005; Bartolini and Salle,
2004; Keller, 2005; Hanemann et al., 2005; Stanley et al., 2005; Jakobson et al., 2004).
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1472-5967.htm
This paper was written with the support offered by the Hong Kong Science and Technology
Parks Corporation, Hong Kong.
JFM
5,3
188
Journal of Facilities Management
Vol. 5 No. 3, 2007
pp. 188-204
qEmerald Group Publishing Limited
1472-5967
DOI 10.1108/14725960710775072
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
Historically, IT support plays as a best effort role to help businesses function in an
organization and service level agreements (SLA) and its management are luxury in
nature. This mindset and practice has significantly changed over the last decade with
the introduction of service outsourcing and service management frameworks like
information technology infrastructure library (ITIL, 2000).
The introduction of IT into the building operation has caused substantial change
from traditional manual processes in managing a campus environment into an era
where the business operations are IT-enabled. IT failures can cause disaster for critical
business functions, rather than merely inconvenient or annoying to general users.
The situation becomes worse when there is multiple linkage or interfacing between the
underlying infrastructure and the IT-enabled business functions. Incident realization
by manual processes can be time consuming and reactive rather than proactive, i.e. the
operator reacts to incidents that have arisen after receiving users’ complaints.
Service impact caused by fault incident is the most concerned subject for business
operation units. However, such language might not be conveyed by IT operation units
whom terms such as “node down” or “network/server utilization” arcane terms that mean
nothing to business operation units and the type of impact to their business operations.
Prioritization of business importance is another shortcoming from traditional notification
tools. Although software tools with respective management modules are available in the
market for ITSM, solutions for managing IT services, customers and operational processes
are not sufficiently developed nor integrated with other management applications
following IT services daily processes (Mayerl et al., 2005). A campus built as a hub for
technology companies emphasizes the role of IT in enabling and delivering better service
to tenants for innovation and technology development in the focused clusters and the
upgrading of manufacturing and service industry capabilities. To cope with the dynamic
management processes, management platforms shall be designed not only focused on
homogenous views on heterogeneous IT components but also be flexible enough to align
with daily management workflow processes and critical business functions.
The aim of this paper has two folds: firstly, it is to show that implementing
appropriate ITSM processes can improve the service performance in campus IT
operations. We took a real campus site in technology industries as a case study to
present the effect of adopting ITSM process tools in supporting various service
categories in a campus environment. In view of the increased trend in outsourced
service management in the commercial market, we then share the experience and
discuss the specific concerns in developing management process modules under
the service outsourcing model. Secondly, knowing the weighting system in
traditional fault discovery and notification tools that weight system/network alerts
according to its severity level to the entire operations, which could not appreciate
the business criticality and user importance to improve the situation, we proposed
a framework and implementation architecture to correlate these alerts with the
other two dimensions: business impact knowledge-base, which is derived from
business continuity aspects; and user activities to provide earlier service impact
notifications.
2. Related works
To our knowledge, no approach can be found in the literature that addresses a
framework completely suitable to the requirements. Most of the related works were
Improving
service
management
189
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
done individually in the entire ITSM processes. Mayerl et al. (2005) presents a
service-oriented architecture approach to integrate service management application,
but without ITIL processes modeling. In Bartolini and Salle (2004) the aim is a better
incident prioritization, Keller (2005) addresses automating change management
activities. Enhanced telecom operations map (eTOM, 2006), a business process
framework outlines by telemanagement forum (TM) to guide the development and
management of key processes within a telecommunications service provider. eTOM
lacks useful information than ITIL about each specific service management
sub-process including benefits and best practices. Brooks and Lilley (2006) present
the enabling technology for outsourced facilities management, however, the discussed
cases were on building environment control rather than IT operations management.
Iwhiwhu (2005) suggests the use of management information systems (MIS) to improve
decision-making based on available data and proper management of records.
Adeoti-Adekeye (1997) discovers that the successful MIS must be designed and
operated with due regard to organizational and behavioral principles as well as
technical factors. In the area of improving the responsiveness to network/system alerts
in IT operations, Hanemann et al. (2005) present a service fault management
framework, which identified the relevant components to provide service-quality-based
fault management. Similar works were used in Hochstein et al. (2005a, b) which
presents an approach to incident prioritization by business objectives. Jakobson and
Weissman (1993) and Lewis (1999) present a set of rules and modeling approach to
perform the event correlation in rule- and model-based reasoning, respectively. Lewis
(1993, 1999) also presents the case-based reasoning approach in the area of event
correlation. In this paper, ITIL was adopted as the reference model for formulating
ITSM process modules with the considerations of operational environment. The
proposed framework for fault correlation makes use of a hybrid case- and rule-based
reasoning (RBR) module together with a new approach in fault mapping with business
criticality and user activities.
3. The campus – Hong Kong Science Park
We took a real case to illustrate the effects and concerns of ITSM process
implementation in a campus environment with outsourced IT services support. The
campus is located in Hong Kong and built as a hub providing not only rentable floor
space for research and development (R&D) offices and laboratories, but also a variety
of supporting facilities/services for tenants to foster their R&D works in innovation
and technology clusters of IT and telecommunications, electronics, bio-technology, and
precision engineering. It is a non-labor-intensive campus for research and development
industries which comprises three development phases. Phase 1 was completed in 2002;
there are ten buildings with occupancy of 155 companies and over 4,500 employees as
of the submission date of this paper.
The campus IT service support was outsourced by the campus management to a
technology service operator while the building operation support was outsourced to
another facility management service operator. The daily IT service support includes all
aspects in information and communications technology (ICT) infrastructure which
were grouped into ten categories:
(1) server farm;
(2) office automation (OA) and desktop applications;
JFM
5,3
190
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
(3) public facilities – Kiosks and displays;
(4) campus network;
(5) web portal and user applications;
(6) main IT room/control centre;
(7) wireless communications;
(8) campus cabling;
(9) IT security; and
(10) telephony and unified messaging system.
Compared with the average enterprise with satellite offices or subsidiary companies,
the distinct features in campus IT operations are the ad hoc and daily service offerings
for tenants, visitors and the public, such as scheduled and ad hoc virtual local area
network setup for tenants and conference events, publicly accessible wireless network
services, real-time multimedia broadcast for conference event and guidance
information. The operations’ knowledge management in handling different setup
environment in each tenant office is another key issue in campus operations.
4. IT service management processes
Several ITSM process frameworks such as ITIL (2000) and eTOM (2006) were
developed in the IT/Telecom industries. ITIL provides a comprehensive, consistent
and coherent set of best practices for ITSM processes, promoting a quality approach to
achieving business effectiveness and efficiency in the use of information systems. It is
not a process model but a description of activities, documents, roles, success factors,
and key performance indicators (KPI), Hochstein et al. (2005a, b). eTOM is a business
process framework to guide the development and management of key processes within
a telecommunications enterprise. ITIL, however, provides a more complete
business-aligned process framework which includes all the functional elements
needed for IT support processes.
In the ITSM hierarchy, service support and service delivery form the basis for service
management. Service support aims to deal with day-to-day operational support of IT
services while service delivery provides long-term planning and improvement of IT
service provision. The adoption of process elements and its downstream components
would depend on the organization’s business needs and resources allocation. In practice,
not all of the process activities in service support and delivery could be automated or
aided by software management tools but require manual process in administrative
determinations instead of bridging over the other process activities. Once an
organization understood the required process activities , relevant process modules would
be formed accordingly as a set of standardized framework for ITSM.
5. Implementation in practice
ITSM shall be built according to the operational environment with focused and
strategic direction. To avoid chaos in operations management and to instead achieve a
systematic process, ITSM workflow processes were adopted in stages for the IT
operation team since the campus was put into operation in 2002. Figure 1 shows the
mapping of ITIL activities to a campus ITSM solution. The process modules combined
Improving
service
management
191
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
together some of the activities in ITIL for better efficiency in workflow arrangement in
practical adoption, in which the service helpdesk and incident management were aided
by proprietary management software tools in 2005. The software tools were
customized according to the management workflow with knowledge repository to
handle requests and incidents efficiently.
5.1 Implications on outsourcing model
It has become popular in the commercial market that the management of building and
IT services are under outsourcing models which impose some implications to ITSM
implementation. In a service outsourcing model, the service operator is selected either
through strategic partnership or competitive tender bidding. Unlike the management
of direct staff employment, the relationship between the company and the employer
organization is built on the contract itself. A service provider to take up such service
contract may not be long-term but on a year-based contract term. Some implications
were observed during the stages in planning, design and implementation of ITSM.
5.1.1 Conflict of interest. Conflict-of-interest refers to a situation where the
personal/organizational interests of the appointed service provider compete or conflict
with the interests of the employer. The financial element in ITSM brought to the
concerns of conflict-of-interest. It is especially a sensitive subject for government and
public agencies that require serious attention. Unless there is plenty of supply of
vendor-neutral service provider companies in the regional market, which is actually
not the case in reality, the procurement of new equipment in order to satisfy user
request shall be done by a third party so as to get rid of conflict-of-interest during the
course of equipment vendor selection. The service providers in IT and building facility
management are therefore required to take up the inventory control (asset management
(ASM)) and financial control, respectively, as shown in Figure 1, in order to match with
the financial management activity in ITIL.
5.1.2 Standardizaton of processes and measurement. Implementation of ITSM with
KPI and SLA measures is crucial in the service outsourcing environment for achieving
success in service support and delivery. Standardization of process activities,
Figure 1.
Campus ITSM solution
Financial
Management
Capacity
Management
Asset
Management
Financial Control
Performance
Monitoring and
Measurement
Problem
Management Incident
Management Service /
Helpdesk
Management
Change and
Configuration
Management
User Reporting
Campus ITSM Solution Modules
Contingency
Planning Release
Management Incident
Management
Service Level
Management Available
Management
ITIL Activities in Service Support & Service Delivery
Change and
Configuration
Management
Problem
Management Service / Helpdesk
Management
By
Building Facility
Management
JFM
5,3
192
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
documents, procedures, roles, measurement/performance metrics, etc. are the native
elements with the establishment of ITSM. With the knowledge-base processing and
repository built in the management tools, it could minimize the service impact of
human resource turnover or skill transfer due to the change of service provider
contract. Moreover, areas of process refinement could easily be identified from the
continuous measurement.
5.1.3 Owner of the tools. Although the ITSM tools could be included by subscription
under the service outsourcing contract, in practical operations, it is recommended that
the employer organization build their own ITSM tools. Such arrangement retains all
historical details and know-how for service management which minimizes any
disruption or deterioration of the service delivery during and as a result of the hand-over
by the existing service operator to its successor; and minimizes data conversion from one
tool to another. All relevant historical intelligences could therefore be retrievable.
5.2 Performance monitoring and measurement
ITIL is not a process model but a description of activities, documents, roles, and
success factors, Hochstein et al. (2005a, b). The activities cannot be fully automated in
reality especially for those processes in ITIL service delivery. The management of
availability, service level, capacity and contingency planning actually requires manual
administrative processes and continuous monitoring of system/network performance
by real time measurement in practice. In fact, most of these components could not be
fully automated but require management in an analytical way manually instead.
Performance monitoring and measurement service modules captures and provides
historical and real-time performance data in two major categories:
(1) system performance; and
(2) service performance.
It quantifies the specifics in KPIs, service level, capacity, availability, and response
time. It works together with reporting tools in user reporting service modules and
leverage data from ASM module to facilitate operation team to ensure business
demand is met by adequate capacity.
The metrics used for evaluating ITSM comprises of five KPIs in helpdesk services,
problem management and bug fixing, change request/enhancement, system
monitoring and optimization, and production support and request. In which the
indicator of helpdesk services is derived from the performance attainment in SLA.
Once administratively developed the products of SLA, KPI criteria, contingency
planning will be put under a knowledge-base to interact with the real time
measurement in the system and incident event records to provide useful results for
service management. However, some of the activities in ITSM in respect of the human
processes in service management and availability management, like service level
improvement plans and availability improvement plans are crucial yet cannot be
replaced by functions provided by this service module.
5.3 User reporting
This module is the underlying service which could not be omitted in the entire ITSM
solution. It is not a service management process but offers effective means to facilitate
documentation to bring up specific matters to campus management attention.
Improving
service
management
193
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
By sharing the backend database and corresponding data stored, the user reporting
service integrates with other modules of ITSM for report generation. Authorized user
could generate management reports according to the selected information from the
captured data.
5.4 The analysis results
We collected data in two stages: the first stage before deployment of management
software tools from January 2004 to January 2005 and; the second stage after
deployment from February 2005 to December 2006. The purpose of such analysis
illustrates the effect after adopting ITSM process tools in supporting various service
categories. Benefits such as quantified improvement and the information of identifying
the most requested user and services were obtained.
5.4.1 Service target improvement. The level of achieving predetermined service
targets is one of the primary concerns in service-oriented IT support. It is especially
important when the service is provided through outsourcing. For the IT operations in
this campus environment, there are four pre-defined severity levels for
incident/problem cases in IT operation. Performance target in term of the relevant
service level requirements for each severity level are also defined, ranging from low
impact of 48 hours recovery period, to very high impact of 2 hours recovery period.
Incident cases cannot be resolved according to the SLA and would be classified as
non-compliant case. Therefore, service performance in response to
incidents ¼12(non-complaint cases/total no. of cases) £100 percent.
Figure 2 shows the service performance before and after the deployment of
management tools by comparing 36 months of data. The sampled data showed that the
service target was improved by 13.4 percent on average and 25 percent at maximum.
There were nine months learning period and three months optimization period after
deployment. More stable performance was observed afterwards which attained
95 percent. This result delivered a positive message to the campus management. The
intangible cost of coordinating disruptions in several locations can be drastically
reduced. The ITSM by means of tools and the process automation had created value to
the operation thereby increased the efficiency of processes and workflows.
Figure 2.
Service target
improvement
Service Target Improvement
Performance
Without Management Tools (Period: January 2004 to January 2005)
Learning Period
Month / Year
2004 2005 2006
Optimizing
With Management Tools (Period: February 2005 to December 2006)
100.0%
95.0%
90.0%
85.0%
80.0%
75.0%
70.0%
65.0%
60.0%
55.0%
50.0%
Jan
Feb
Mar
Apr
MayJunJul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
MayJunJul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
MayJunJul
Aug
Sep
Oct
Nov
Dec
JFM
5,3
194
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
5.4.2 The most requested users. Knowing the most requested users in the campus could
help the management to understand the current situations and provide forward
planning in supporting these users. Such information helps to classify the user
importance so as to provide early detection of impact to particular user groups after the
mapping process in Section 6. We made use of the aggregated system logged data in
the service helpdesk and identified, as shown in Figure 3, the most requested user
groups in the campus were campus facility management and campus helpdesk rather
than tenants and remote operation centre. The tenants correspond to incubation
tenants and offices/laboratories tenants while the campus facility management
corresponds to the outsourced service operator to manage building facilities and
operations.
Considering the business nature and operation model of the campus, we have the
following interpretation. There is a stringent leasing requirement for the campus to
promote innovation and technology development. An eligible company for leasing
must satisfy the condition of not less than 40 percent activities in R&D for its business.
In this connection, tenants are technology companies with internal IT service support.
The IT problems in their daily business are mainly tackled by their internal
support rather than the campus IT support. The service requests to campus IT
support, however, are mainly the support of internet connectivity, intranet, and public
facilities which made low participation in the service request.
Conversely, the most requested user group – campus facility management which
occupied 71 percent of user requests, is an outsourced service provider which provides
manpower resources to manage the campus facilities excluding IT facilities. This
service support outsourcing structure makes them extensively use the campus IT
service support for their business operation. Together, with the second largest user
group – campus helpdesk, visitors and new tenants would put in IT service request via
campus facility management and campus helpdesk to realize the concept of
one-stop-shop service which created high participation in the service request.
5.4.3 The most requested services. As shown in Figure 4, the two most requested
services were server farm and OA & desktop applications categories which involve
user account creation, server configuration, housekeeping works, e-mail and electronic
fax, printer support, PC desktop setup and computing services. We are not surprised to
obtain this result in campus environment as most of the users require access to campus
systems, PC applications, and e-mail and files services. The moderate turnover rate of
staff in outsourced service operator in facility management and the continuous move-in
Figure 3.
Service request user type
Remote Centre
1% Campus
Helpdesk
25%
Tenan
t
3%
Campus Facility
Management
71%
Improving
service
management
195
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
tenants to new buildings also impose service requests to suspend and create user
accounts as well as proportioned PC service supports.
Unlike the IT service environment in average enterprise companies, there are
considerable public facilities in this campus environment for visitors and tour guides in
technology clusters. Public facilities involve the multimedia content scheduling and
publishing to interactive kiosks and large-formatted display panels. Information such
as latest news feed, weather report, video content of event marketing and
advertisement would be put up to public display screen while navigation of the
in-campus information would be supported by the interactive kiosk. The result also
provides useful input in determining the critical business service operation in Section 6.
6. Campus service impact analysis architecture (CSIA)
The effects on campus service interruption are not limited to financial-revenue or
investment loss, overtime, or extra (renting or replacing equipment or staff) expenses.
Other types of results can include public image, legal and customer service. In order to
achieve the strategic alignment between business and IT, prompt response in the
management system to discover and prioritize incident from its business objective and
user importance for service continuity is essential.
The target achievement in SLA corresponds to one of the measurement criterion in
service performance. SLA becomes a basic measurement for gauging the performance
of operation team in customer services. In most instances, as shown in Figure 2, SLA
could be achieved and within a service target of 95 percent. However, while meeting
SLA is the primary target, the prioritization of alerts with respect to business
objectives and user importance could improve further the customer satisfaction. In the
real world of IT operation in campus environment, severity levels normally determines
its impact to the entire operation of IT infrastructure. Consider a scenario that ten alert
cases arrive in which eight cases exhibit higher severity level than the other. However,
the latter two alerts carry higher business criticality and user importance than the
others. The traditional process will weight these ten alerts according to their severity
level and obviously the latter two alerts will be responded in lower priority. From the
business and customer points of view, the minimization of response and turnaround
time for these two alerts in fact could bring up the appreciation level of the customer
and business operation management.
The CSIA provides service impact notification to the service management team in
reaction to network/system alerts. By using the notified information, the service
Figure 4.
User request service
category
Web portal and
User
Applications
9.51%
Main IT room /
Control Centre
3.92%
Wireless
Communications
System
2.49% Campus Cabling
System
2.34%
IT Security
0.67%
Telephony and
UMS
0.05%
Server Farm
27.82%
Campus Network
10.23%
Public facilities
(Kiosk/Display)
16.97%
OA & Desktop
Applications
26.00%
JFM
5,3
196
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
management team knows what “in used” or “provisioned” services are affected with
their respective prioritized impacts. The architecture is formed by correlating
information amongst business continuity aspects, ICT infrastructure configuration,
user applications activities, provisioned service request, and incident alerts. It provides
not only the impact notification from the perspective of business services, but also the
user/user group importance, which delivers value to customers/users.
6.1 Review of industrial reasoning approaches
In the area of network and systems management, event correlation techniques have
proven to be useful in industry for root cause analysis. These techniques are of interest
for the development of CSIA.
6.1.1 Rule-based reasoning. The rule-based reasoning (RBR) approach stores a set of
rules in a rules database (or knowledge base). Rules are in “if-then” structures. If certain
criteria are met, then the system is to take certain action(s). By increasing the number
of these rules, the system will be able to solve many and more complicated problems.
However, RBR has restrictions to enterprise-wide networks which require complex
decisions and a wide range of monitoring parameters. It only works well under a
well-understood, stable, narrow problem area and then makes justifications by a
rule-trace mechanism. Examples of RBR systems are IBM Tivoli, HP OpenView, CA
Unicenter, BMC Patrol, Microsoft Application Center, IBM Netview with simple
network management protocol (SNMP) and SpectroWatch.
6.1.2 Case-based reasoning. The case-based reasoning (CBR) approach is to recall,
adapt and execute episodes of former problem solving and past experience in an
attempt to deal with a current problem and isolate the root cause. Under CBR approach,
IT operators shall perform four steps if a fault is reported or found:
(1) retrieves the most similar case from the Case Library;
(2) reuses the solution if the fault symptoms are matched;
(3) revises and updates the proposed solution if necessary; and then
(4) retains the new case and its solution for future use.
The maintenance of Case Library is a critical task. The relevance criteria of each case
shall be well defined in order to avoid any misguided solutions. System adaptability
and learning capability can only be achieved by CBR. Examples of CBR approach in
commercial products are SpectroRx and Ciscowork.
6.1.3 Model-based reasoning. The model-based reasoning (MBR) approach is an
object-oriented model. An MBR system represents each component in an enterprise as
a model. The model is either a representation of physical entity (e.g. a hub, switch,
router and port) or a logical entity (e.g. WAN, LAN, domain, and service). A description
of a model includes three categories of information:
(1) attributes, e.g. IP address, MAC address, alarm status;
(2) relations to other models, e.g. connects to, depends on, standby, part-of, is-a,
has-a; and
(3) behaviors, e.g. no response after three tries.
MBR doing faults diagnosis requires the complete description of the network: the
topology, type of equipment, routing tables, and so on. The real situation is never
Improving
service
management
197
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
easily modeled because of the challenges of software complexity, database technology
and network scalability. An example of MBR approach is SPECTRUM.
6.2 Components in CSIA architecture
It is not necessary to select just one reasoning technique for a correlation task. In
Jakobson et al. (2004) a hybrid approach combining RBR and CBR has been proposed
to deal with highly dynamical situations. In this proposed CSIA architecture, as shown
in Figure 5, CBR and RBR components run in parallel. The CBR engine makes use of
case library for prior situations while RBR engine uses temporal and spatial
dependencies to correlate reported alerts. There are three kinds of fault reporting
mechanisms:
(1) Network and system management component. Servers and network equipment
are monitored by IBM Tivoli Monitoring (ITM, 2006) and IBM NetView
(Netview, 2006), respectively. Both monitoring systems sent alerts to IBM Tivoli
Enterprise Console (TEC, 2006), which offers a variety of rules and is flexible
enough for the definition of service-related rules.
(2) Performance monitoring and measurement component. It performs basic
network monitoring tasks with respect to KPI on periodical bases.
(3) Case library component. The helpdesk management (HDM), incident
management and problem management handle incidents and service requests
and update into the case library.
Configuration management database (CMDB), service continuity factor (SCF), and
applications database (AppsDB) are the main elements in this CSIA to correlate with
the fault reporting inputs. Upon completion of the analysis, the output in form of
textual/graphical representation with respective ID code will be published in real time
through management interface. Simultaneously, respective ID and analysis results will
be updated to the service helpdesk for reporting.
The critical success factors for this architecture are how we could derive the SCF in
business functions and user activities from the entire ICT infrastructure and the proper
management and maintenance of CMDB. To effectively and efficiently predict the
Figure 5.
Hybrid CBR and RBR
approach in ITSM - CSIA
IBM
Netview IBM
TM
IBM TEC
Application DB
User Activities
Asset
Management
Service Impact
Notifications
Case Library
Change and Configuration
Management
Schedule
Helpdesk
Management
New
Solved Case
Tested Case
Learnt Case
Previous Case
Suggested
Solution
Confirmed
Solution
New Case
escalation
retrieve
revised
retain
reuse
escalation Problem
Management
Incident Management
Release Verify
Perform
revise
SCF
Mapping
Performance
Monitoring and
Measurement
Service Impact
Analysis
BCP
CBR approach
RBR approach
CMDB
KPI
Rules Database
JFM
5,3
198
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
impact of incident alerts to the business functions, IT services and the supported
business functions shall be aligned, Stanley et al. (2005). With this knowledge, IT
personnel could deliver meaningful performance information to the campus
management. Identification of IT service supported business operations and
prioritize their criticality need to be examined.
6.2.1 Derive of critical business operations. Business continuity planning (BCP) is a
methodology used to develop a plan to maintain or restore business operations in the
required time scales following interruption to, or failure of, critical business processes
(BSI, 2001). Having the BCP in place before the business interruption occurs is critical
or the organization may not be able to respond quickly enough to the service
interruption.
The development of a BCP has five main phases: analysis, solution design,
implementation, testing and organization acceptance, and maintenance. BCP serves as
a preventive and corrective control measure where the proposed CSIA serves as the
detective measure for the IT service supported business functions.
In this CSIA, we employed the BCP development process to provide critical business
service correlation. As shown in Figure 6, the first step in assuring the continued
delivery of mission-critical services in the event of an ICT infrastructure interruption is
to identify what are the delivered services, to whom these services are delivered, and to
rank each service in terms of its priority/severity. After identifying the mission-critical
services, consider what types of ICT infrastructure interruptions are likely to affect
these services, and which are unlikely to affect them. The information created will
be stored in a knowledge-base, which will be used and reviewed from time to time
before retirement. The business criticality level is one of the dimensions in mapping the
service impact.
6.2.2 User activities monitoring. The proposed CSIA architecture is capable to show
the respective user impact according to user activities in web applications such as
web-based video conferencing. When user successfully logon to the intranet, the
AppsDB will keep the login record with login time as last accessed time-stamp. During
subsequent web portal users’ activities, the last accessed time-stamp will be updated by
individual functions in the web portal. By referencing this time stamp, a list of “active”
users having last accessed time within 60 minutes can be maintained. In this connection,
Figure 6.
Derive of service
continuity factors
Analysis
Solution Design
Implementation
Testing and
Acceptance
Maintenance
BCP
Development
Identify Service Delivery
Prioritize Critical Services
Identify the links with IT
Services in ICT Infrastructure
Classify the respective level
CMDB
Business
Criticality
Level
Improving
service
management
199
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
CSIA could estimate all potential affected users due to the reported alerts from systems
and network. The prioritization based on user privilege level settings further allows the
system to report impact with importance level for different users.
6.2.3 Three dimensions mapping. The difficulty with service impact mapping is to
achieve an understanding of business process and the heterogeneity of the ICT
components to be used to support these business processes in the organization, Wade
and Lewis (1999). In this paper, we proposed service impact mapping from three
dimensions. As shown in Figure 7, the first dimension is the criticality level which is
derived from BCP; the second dimension is the severity level from probed alerts; user
activity, as stored in AppsDB, is the third dimension for analyzing the service impact
so caused which affected the service delivery to user. As mentioned in Section 5.4.2, the
importance of specific user or user group may vary from one to another, granting
different user privilege levels could allow early detection of impact to particular VIP
user/user groups. Initially, we proposed to use two stages of 4 £4 matrix mapping for
the three dimensions.
6.3 Example of CSIA operation
Figure 8 shows the input/output information flow between components in the proposed
CSIA of the entire ITSM solution.
6.3.1 Configuration and usage snap-shot. Once a network or system device was
attached to the ICT infrastructure, its configuration information would be updated into
the CMDB automatically by discovery tools or manually by operators. The AppsDB
would support holding authentication and user registration activities during log-in and
log-out to system or user applications. When there is an incident alert arisen, on one
hand, SIA could obtain configuration information from CMDB, on the other hand, it
could retrieve by SQL query to AppsDB the user activities in systems. Hence, the
ITSM could obtain the most current snap-shot of the ICT infrastructure not only its
configuration but also the usage details for better service management.
The reason for this arrangement is that it is generally easy to estimate loading on
ITSM because the number of concurrent users and user behavior is well known.
If CMDB is required to serve other applications, such as web applications, it is difficult
to estimate loading of other applications serving a large group of users, which is highly
possible that a sudden increase in loading of these applications will degrade the
Figure 7.
Three dimensions
mapping
Before Mapping After Mapping
Noblications
With the
considerations of
Business
- User
- Alert Severity
BCP
Business criticality
IBM TEC
network systems
alerts
Application DB
User activities
Service impact Analysis
Criticality
Level
Severity
Level User
Level
JFM
5,3
200
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
performance of the whole ITSM. To strike a balance between performance and
expandability, CMDB will be run in a database instance, while having the application
database running as a separate database instance. This allows ITSM application to
quickly access data without making, and hence waiting for, too many unnecessary
connections as well as easy expansion of CMDB in future. To better utilize the allocated
computing storage resources, relevant administrative process to control the use,
manage and retirement of information in the knowledge-base is thereby required.
6.3.2 Network/system status probing. Manage network/system alerts from a
heterogeneous environment under a homogenous platform is a challenge for
implementation. Prior to the three dimensions mapping process in the proposed CSIA
architecture for deriving service impact notifications from the various knowledge
repositories, it is required to map alerts from a heterogeneous environment into a
homogenous platform; its mapped resultant fault reporting will then map with the
other two dimensions.
The probing of network/system status to be performed by monitoring agents,
system and network management information base (MIB) browsing, SNMP queries,
and interfacing middleware development. When there is a failure in any node of the
network or service of system server, a shell script will be triggered in the network and
server monitoring applications in real time to send respective alert information with
alert ID, time stamp, hostname, protocol address to the SIA module via the RBR
module (i.e. monitoring console IBM TEC). Such information will be embedded in XML
format and the whole piece of XML information will be sent via industry standard
HTTP. To avoid information-storm in the notification, prior input to the SIA module,
options to filter alerts that are not important from the BCP point of view will be
provided by the monitoring console. The case-based reasoner can be seen as a backup
in case of an incorrect modeling causing the rule-based reasoner to fail. In contrast, the
case-based reasoner tries to match the current situations onto situations seen before.
Figure 8.
Information flow between
components in CSIA of
ITSM
Service
Continuity
Mapping
Monitoring
Console
Other ITSM Modules
HDM, IDM, PBM, PMM, ASM, URS, CCM
Network
Equipment
Network
Monitoring
System/Server
Monitoring
System/Server
Equipment BCP
Analysis Result
HRM: Helpdesk Management
IDM: Incident Management
PBM: Problem Management
PMM: Performance Monitoring and Measurement
ASM: Asset Management
URS: User Reporting Service
CCM: Change and Configuration Management
Application
Database
CSIA
CMDB
SIA
Filtered Alerts Service Impact
User activities
Configurations
Improving
service
management
201
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
6.3.3 Service impact correlation and notification. The monitoring console IBM TEC
works with BCP to deduce the SCF, example as shown in Table I. If SCF is high enough
to a pre-defined noticeable level, then SIA will retrieve the fault associated machines
and services from ASM and AppsDB in ITSM in order to predict the affected
services and users. In SIA, a predefined service impact list is stored in the CMDB for
mapping. After receiving the alert from TEC, the SIA looks up the service impact list
and SCF from the CMDB using information from the alert received, and then conclude
the service impact. In this lookup process, the hostname reported from IBM TEC will
be used as a key to find out potential service impact when server or network equipment
is down. The proactive approach tries to satisfy the customers/users first before
receiving any complaints and to avoid any unexpected larger problem.
When an unexpected scenario is reported to HDM, this scenario will firstly be
examined with the case library to find if any similar case has been reported within a
short period of time. If a similar case is found and the correct solution is provided, then
the scenario will be closed. Conversely, if the scenario is new or still in “OPEN” status,
then the scenario will be reported as a fault. The fault will be escalated to IDM or PBM
according to the escalation policies. When a confirmed solution is found, the remedy
actions shall be performed according to the well-defined procedures in CCM. Also, the
case solutions and new rules will be updated in the case library and rules database,
respectively.
7. Further work and conclusions
7.1 Further work
IT services shall align with the supported mission services, Stanley et al. (2005). The
analysis provides useful information to the campus management to align upcoming IT
services in respect of service level management and service continuity management.
As mentioned previously, the service helpdesk and incident management processes
were aided by tools in 2005, to increase further the efficiency of service processes and
workflows in IT management, together with the proposed CSIA, the entire ITSM
implementation work is currently carried out. Its performance and practical concerns
in operational environment will be addressed.
7.2 Conclusions
ITSM is a tool to facilitate the achievement of service-oriented IT management goals.
ITSM is not a one-off activity or short-term strategy, the transformation might take
years for a single process; on the other hand, only with the support of employees and
an understanding for ITSM processes can such an initiative be successful,
BCP: 3; IBM TEC: 30(WARNING) ¼.SCF:1 BLUE
BCP: 3; IBM TEC: 40(MINOR) ¼.SCF:2 YELLOW
BCP: 3; IBM TEC: 50(CRITICAL) ¼.SCF:3 ORANGE
BCP: 3; IBM TEC: 60(FATAL) ¼.SCF:4 RED
BCP: 4; IBM TEC: 30(WARNING) ¼.SCF:3 ORANGE
BCP: 4; IBM TEC: 40(MINOR) ¼.SCF:3 ORANGE
BCP: 4; IBM TEC: 40(CRITICAL) ¼.SCF:4 RED
BCP: 4; IBM TEC: 40(FATAL) ¼.SCF:4 RED
Table I.
Example mapping table
for SCF
JFM
5,3
202
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
Hochstein et al. (2005a, b). Depending on the organizational culture, service focus,
management decisions, and manual operation practices, the ITSM process modules
could be set up differently from one another. The experience obtained in this R&D
campus environment could serve as a reference for similar operation entity.
Operating IT services is dynamic in nature which emphasizes continuous
performance monitoring in ITSM to check against its validity to suit changes as they
arise. To minimize the disturbance to users on the service offerings, we proposed CSIA
architecture to proactively respond to incident alerts from the perspective of business
services and user importance. While maintaining the predetermined SLA, the CSIA
could enable IT service support more dynamically with more appreciation from the
customer and business operation.
References
Adeoti-Adekeye, W.B. (1997), “The importance of management information systems”, Library
Review, Vol. 46 No. 5, pp. 318-27.
Bartolini, C. and Salle, M. (2004), “Business driven prioritization of service incidents”,
Proceedings of the 15th IFIP/IEEE International Workshop on Distributed Systems:
Operations and Management (DSOM 2004), IFIP/IEEE.
Brooks, A. and Lilley, G. (2006), “Enabling technology for outsourced facilities management”,
Journal of Information Technology in Construction, Vol. 11, pp. 685-95.
BSI (2001), Information Technology – Code of Practice for Information Security Management BS
ISO/IEC 17799:2000, BSI, London, pp. 56-60.
eTOM (2006), “Enhanced telecom operations map (eTOM)”, paper presented at
Telemanagement-Forum, available at: www.tmforum.org
Hanemann, A. et al. (2005), “Towards a framework for IT service fault management”,
Proceedings of the European University Information Systems Conference (EUNIS 2005).
Hochstein, A. et al. (2005a), “Evaluation of service-oriented IT management in practice”,
Proceedings of the International Conference on Services Systems and Services
Management, IEEE, Vol. 1, pp. 80-4.
Hochstein, A. et al. (2005b), “ITIL as common practice reference model for IT service
management: formal assessment and implication for practice”, Proceedings of the
International Conference on e-Technology, e-Commerce and e-Service (EEE’05) on
e-Technology, e-Commerce and e-Service, IEEE, pp. 704-10.
ITIL (2000), IT Infrastructure Library (ITIL) – Service Support, Office of Government Commerce
(OGC), London.
ITM (2006), IBM Tivoli Monitoring, IBM, available at: www-306.ibm.com/software/tivoli/
products/monitor/
Iwhiwhu, E. (2005), “Management of records in Nigerian universities - problem and prospects”,
The Electronic Library, Vol. 23 No. 3, pp. 345-55.
Jakobson, G. and Weissman, M. (1993), “Alarm correlation”, IEEE Network, pp. 52-9.
Jakobson, G. et al. (2004), “Towards an architecture for reasoning about complex event-based
dynamic situations”, Proceedings of the Third International Workshop on Distributed
Event-based Systems (DEBS 2004), IEE.
Keller, A. (2005), “Automating the change management process with electronic contracts”,
Proceeding of the Seventh IEEE International Conference on E-Commerce Technology
Workshops (CECW 05), p. 99.
Improving
service
management
203
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
Lewis, L. (1993), “A case-based reasoning approach for the resolution of faults in communication
networks”, Proceedings of the 3rd IFIP/IEEE Symposium on Integrated Network
Management, pp. 114-20.
Lewis, L. (1999), Service Level Management for Enterprise Networks, Artech House, London,
pp. 165-90.
Mayerl, C. et al. (2005), “SOA-based integration of IT service management applications”,
Proceedings of the IEEE International Conference on Web Services (ICWS’05), pp. 785-6.
Netview (2006), IBM Netview, IBM, available at: www-306.ibm.com/software/tivoli/products/
netview/
Stanley, J.E. et al. (2005), “Correlating network services with operational mission impact”, paper
presented at Military Communications Conference, MILCOM 2005, IEEE, Vol. 1, pp. 162-8.
TEC (2006), IBM Tivoli Enterprise Console, IBM, available at: www-306.ibm.com/software/tivoli/
products/enterpriseconsole/
Wade, V. and Lewis, D. (1999), “Three keys to developing and integrating telecommunications
service management systems”, IEEE Communications Magazine, pp. 140-6.
About the authors
Stewart H.C. Wan received his BE with honors in Electrical and Electronic Engineering from the
University of Brighton, UK in 1993 and his MSc degree in Building Service Engineering from the
University of Hong Kong in 1998. Between 1994 and 2001, he worked as a Consulting Engineer at
Parsons Brinckerhoff (Asia) Ltd He currently works as a Project Manager for the Projects and
Facilities Division of Hong Kong Science and Technology Parks Corporation in Hong Kong. His
research interests include service management and system performance improvement in
building and IT operations. Wan is a Chartered Engineer. He holds memberships in the
Institution of Engineering and Technology (IET), Chartered Institution of Building Services
Engineering (CIBSE), Institute of Electrical and Electronics Engineers (IEEE), and Hong Kong
Institution of Engineers (HKIE). Stewart H.C. Wan is the corresponding author and can be
contacted at: stewart.wan@hkstp.org
Yuk-Hee Chan received his BSc degree with honors in electronics from Chinese University of
Hong Kong in 1987, and his PhD degree in signal processing from The Hong Kong Polytechnic
University in 1992. Between 1987 and 1989, he worked as an R&D engineer at Elec & Eltek
Group, Hong Kong. He joined this University in 1992 and is now an Associate Professor in the
Department of Electronic and Information Engineering. Chan has published over 110 research
papers in various international journals and conferences. His research interests include image
processing, IT management and digital signal processing. Chan is a member of IEEE and IET.
He was the Chairman of the IEEE Hong Kong Joint Chapter of CAS and COM in 2003-2004.
E-mail: enyhchan@eie.polyu.edu.hk
JFM
5,3
204
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)
This article has been cited by:
1. Stewart Wan. 2009. Service impact analysis using business continuity planning processes. Campus-Wide
Information Systems 26:1, 20-42. [Abstract] [Full Text] [PDF]
2. Stewart H. C. WanBusiness Continuity Planning in Business-Aligned IT Service Management 49-62.
[CrossRef]
Downloaded by Hong Kong Polytechnic University At 21:09 11 May 2015 (PT)