SYMIAN: Analysis and performance improvement of the IT incident management process

HP Labs., Palo Alto, CA, USA
IEEE Transactions on Network and Service Management 10/2010; DOI: 10.1109/TNSM.2010.1009.I9P0321
Source: IEEE Xplore

ABSTRACT Incident Management is the process through which IT support organizations manage to restore normal service operation after a service disruption. The complexity of real-life enterprise-class IT support organizations makes it extremely hard to understand the impact of organizational, structural and behavioral components on the performance of the currently adopted incident management strategy and, consequently, which actions could improve it. This paper presents SYMIAN, a decision support tool for the performance improvement of the incident management function in IT support organizations. SYMIAN simulates the effect of corrective measures before their actual implementation, enabling time, effort, and cost saving. To this end, SYMIAN models the IT support organization as an open queuing network, thereby enabling the evaluation of both the system-wide dynamics as well as the behavior of the individual organization components and their interactions. Experimental results show the SYMIAN effectiveness in the performance analysis and tuning of the incident management process for real-life IT support organizations.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The success of businesses in modern organizations heavily depends on the high availability of information technology (IT) infrastructures. To prevent business disruption, IT operators have worked hard to ensure that any changes to this infrastructure are properly and efficiently deployed. Change management—a discipline of the Information Technology Infrastructure Library (ITIL)—provides important guidance to help achieve this end. As IT infrastructures grow larger, however, ensuring that changes are harmless to business continuity becomes increasingly complex. In fact, previous research has shown that existing approaches for verifying changes suffer from severe scalability issues. This problem can become a serious threat to most organizations, as it can lead for example to customer dissatisfaction due to missed deadlines in service change deployment. To bridge this gap, we propose a partial-order reduction model checking paradigm and algorithm for efficiently detecting harmful change operations. Our model improves the complexity of verifying a set of concurrent change activities against safety constraints by reducing—without losing effectiveness—the verification scope. To prove concept and technical feasibility, we carried out an extensive performance evaluation of our algorithm considering a variety of change activities, safety constraints, and configuration scenarios. The results obtained from 32 benchmarks have shown that our algorithm significantly outperformed state-of-the-art, general purpose model checkers, improving the runtime complexity from polynomial/exponential to linear. In summary, the results evidenced that change verification finally became feasible and efficient for larger IT infrastructures.
    IEEE Transactions on Network and Service Management 01/2014; 11(3):292-306.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In IT service management, IT support organizations are the entities in charge of restoring normal service operation after a disruption. Building accurate models of IT support organizations is useful for several purposes, such as optimal workforce allocation and what-if scenario analysis. However, the complexity of real-life IT support organizations makes it extremely hard to model their organizational structure and their behavior with stochastic processes. A particularly interesting process to model in the reenactment of IT support organization is the incident arrival one. The paper presents three synthetic incident generator models, based on advanced statistical methods, that are capable of reenacting, with different levels of accuracy, the incident arrival process of real-life IT support organizations. The methods were developed from the experience that the authors developed in the experimental analysis of transaction logs from a real-life IT support organization, provided by the Outsourcing Services Division of HP.
    Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Change Management, a core process of the Information Technology Infrastructure Library (ITIL), is concerned with the management of changes to networks and services to minimize costly disruptions on the business. As part of Change Management, IT changes need to be planned. Previous approaches to automatically generate IT change plans struggle, in terms of scalability, to properly deal with large Configuration Management Databases (CMDBs). To enable IT change planning in the large, in this paper we discuss and analyze optimizations for refinement-based IT change planning over object-oriented CMDBs. Our optimizations reduce the runtime complexity of several key operations part of refinement-based IT change planning algorithms. A sensitivity analysis shows that our optimizations outperform SHOP2 - the winner of a previous comparison among IT change planners - in terms of runtime complexity for several important characteristics of IT changes and CMDBs. A cloud deployment case study of a Three-tier application and a virtual network configuration case study demonstrate the feasibility of our approach and confirm the results from the sensitivity analysis: IT change planning has evolved from planning in the small to planning in the large.
    Network and service management (cnsm), 2012 8th international conference and 2012 workshop on systems virtualiztion management (svm); 01/2012