Content uploaded by Debasish Jana
Author content
All content in this area was uploaded by Debasish Jana on Jul 26, 2015
Content may be subject to copyright.
Submitted for CSI Communications, October 2007 issue Page 1 of 7
Autonomic Computing: Can Computers Heal Itself?
Debasish Jana
1
and Soumya Maitra
2
“The natural healing force within each of us is the greatest force in getting well”. ~ Hippocrates
Abstract
This paper showcases the new trends in self-managed computing systems that configure, heal,
protect themselves and adapt to the user’s needs automatically. The idea derives from autonomic
nervous systems of the human body that helps controlling the entire system. Building self-
managing and self-healing systems towards autonomic computing is a grand challenge of today’s
complex computing world. With increased complexity in hardware and software and ease of
computing, the need of reliable and self-healing system administration is a call for today. We
evaluate the need and focus on the challenges of implanting a system thinker that helps in self-
healing the system at least partially if not at all to its entirety.
KEYWORDS: Autonomic Computing, Self CHOP,
1
Simplex Infrastructures Limited, Kolkata, E-Mail : debasishj@gmail.com
2
Cognizant Technology Solutions, Kolkata, E-Mail: soumya.maitra@cognizant.com
GENESIS
he idea behind autonomic computing
derives from the autonomic nervous
system of the human body wherein the
system controls important bodily functions
without any conscious intervention. Drawing
analogy to this phenomenon, IBM proposed in
2001 to create self-managing computer
systems that could automatically configure,
heal, optimize and protect on its own. This was
followed suit by Sun Microsystems, Hewlett-
Packard, and Microsoft, whose products are
leading the development and implementation
of autonomic computing.
IBM calls this seminal field of research "A
problem that by virtue of its degree of difficulty
and the importance of its solution, both from a
technical and societal point of view, becomes
the focus of interest to a specific scientific
community." [1]
WHAT IS AUTONOMIC COMPUTING?
Autonomic Computing introduces a new
buzzword – Self-CHOP – systems that self-
configure, self-heal, self-optimize, and self-
protect itself without human intervention
thereby leveraging the benefits of self-
managed systems.
Self-Configure
Let’s try picturing this: An organization is
running an Enterprise Application Integration
(EAI) application in a clustered environment
with failover mechanisms and other distributed
and object-oriented features built into it. It
includes a set of Java EE servers deployed on a
cluster of nodes, each tier of the system being
replicated for better performance and
availability. Daily this application caters to
thousands of requests over the Internet, and
carries out hundreds of transactions per
minute across locations in five continents.
T
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 2 of 7
Given the complexity and size of this
application and its environment, it takes a
group of systems administrators to install the
system in all the physical locations, configure
the application and it’s associated resources
such as JMS Queues, Mail Servers, Application
Servers, Workflow Servers, and many more.
This entire process of installation and
configuration consumes a few person days,
multiplied by the number of such installations.
Wouldn’t it be nice if the application once
setup in one location could self-install and self-
configure itself based on some predefined
organization policies under varying and
unpredictable conditions? That’s what
Autonomic Computing has to offer: self-
configuring systems that “knows itself” and the
context surrounding its activity.
Self-Heal
Another scenario: In the dead of the night one
arm of the system in Brazil faced a system
failure at a crucial point of operation owing to
some client requests originating from as far
away as, say, China. What would we do? Call
up the system administration persons the
following day to pour over the logs and trace
out the root cause of failure – which is not only
a very cumbersome task, but also incurs an
estimated loss of one-third to one-half of the
company's total IT budget in preventing or
recovering from such crashes.[14]
Autonomic Computing suggests building
software systems with self-healing features
that would monitor the application and server
logs periodically for system failures, and have
intelligence built into it to diagnose the root
cause, determine the problem, and finally
recover the system from the show-stopper. In
a much down-to-earth terms, this would mean
first detecting and isolating the failed
component, taking it offline, fixing or isolating
the failed component, doing an auto-build of
the system and redeploying the application
without any human intervention. For instance,
an autonomic system will encounter a failed
database index by re-indexing the files, and
subsequently testing and loading them back
into production. If the issue lies with storage
constraints, the self-healing manager will
automatically extend file space and database
storage according to previous data on growth
and expansion.
Figure 1: Impact of downtime by industry sector owing
to system failure [16]
Self-Optimize
Let’s consider our enterprise-wide application
to be an extremely resource-intensive system
with its performance being directly hit by how
efficiently it optimizes it resource utilization.
Autonomic Computing systems are empowered
with a self-optimizing workload manager that
is capable of logical partitioning and dynamic
server clustering extended across multiple
heterogeneous systems to provide a single
collection of computing resources across the
enterprise. Be it issues with storage,
databases, networks, and other resources, the
workload manager continually monitors and
tunes the available resources for optimal
usage. [14] Formulating new algorithms for
this self-optimizing design pattern is an open
area of research that calls upon advanced data
management techniques and feedback
mechanisms using control theory.
Self-Protect
We live in uncertain times where all possible
software and hardware vulnerabilities are
utilized with malicious intents. For instance, an
unethical hacker might exploit a memory
leakage in the printer spooler of a particular
unit to overflow the entire LAN and jamming
the network. Self-protecting autonomic
systems will be able to diagnose the attack,
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 3 of 7
isolate the component in question, and redirect
the printer usage to some alternate location
without human intervention. Self-protection is
also used as an early warning to anticipate and
prevent system failures.
BACKGROUND AND RELATED WORK
Neti et al [10] have stressed on the need of
self-healing systems to adapt and change over
time. They developed an analysis and
reasoning framework based on Attributes
Based Architectural Style (ABAS) to evaluate
the architecture of a self-healing system. The
quality standards leading to ISO or CMMI level
need to address the acceptable quality
standards for self-healing systems as well.
Gabriel et al [4] have focused on conscientious
software that can adapt itself under changing
circumstances and felt a need to separate
software that does the work versus software
that keeps the system alive. Kephert [5] has
placed autonomic computing as grand
challenge vision of the future to confiscate the
colossal complexity of heterogeneous systems
and stressed upon the need of the
collaboration of the best minds of academia
and industry to meet up the upcoming
challenges of autonomic computing. Engel et al
[7] suggested an use of dynamic Aspect
Oriented Programming (AOP) approach to cope
up self managing, self-adapting, self-
configuration and finally self-healing needs
with dynamic Operating System kernel aspects.
Ahmed et al [8] suggested a solution of self-
healing as a service using fault detection and
notification to isolate a faulty device in order to
achieve self-healing in autonomic pervasive
computing. Fleissner [12] et al suggested a
commensalistic software system and a
language to implement host components and
reflexes. Many modern operating systems, for
example, Sun Solaris 10 [15] have the features
in place to automatically diagnose failed
components and automatic restart of failed
services with predictive self-healing concepts.
Kramer et al [11] have defined a three-layer
reference model – component control, change
management and goal management to shape
self-managed systems. Martin et al [3]
stressed upon OASIS standard for Web
Services Distributed Management (WSDM) as a
standard for autonomic computing services
based on web services and shared their
experiences.
Litoiu [2] had investigated performance
analysis techniques used by the underlying
autonomic manager. Salehie et al [6]
highlighted the open problem areas like how to
analyze and manage the dependencies
between autonomous components to address
business policies, how to inject autonomicity in
non-autonomous or semi-autonomous systems
etc with challenges in building more open and
extensible autonomic tools. Shapiro [13] of
Sun Microsystems shared Sun’s experience for
advancing the state of the art in self healing by
appropriately leveraging the underlying
operating systems. Tasauro et al [9] of IBM T.J
Watson Research Center suggested a multi
agent systems approach to cater the needs of
autonomic computing and suggested utility
functions that provide a powerful and flexible
way to allow systems to manage themselves.
Patterson et al [14] presented Recovery
Oriented Computing (ROC) towards coping up
with hardware faults, software bugs, and
operator errors. White et al [19] have taken
benefit of the uniform representation and
composition of components in Service Oriented
Architectures (SOA) and the autonomy of
components in agent-oriented programming.
AUTONOIMIC SYSTEM ARCHITECTURE
White et al [19] have described an
architectural approach to achieve the goals of
autonomic computing. They suggested making
each autonomic computing element be
responsible for monitoring its input services
and determine whether they are performing
according to the agreed upon agreements and
contracts covering them. In case of a failure, in
partial or its entirety, because of wrong results
returned or out of bounds or something else,
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 4 of 7
the requesting autonomic computing may react
by cutting off the relationship of the problem
creating process and requesting a new or fresh
one.
Figure2: Control Loop in Autonomic Architecture
In autonomic computing supporting
architecture, a control loop spans over a
centralized or may be a distributed horizontally
partitioned knowledgebase to sniff problems
found through Sensors, thereby monitoring,
analyzing, planning, executing the action plan,
and finally triggering the Effector for
implanting the solution in order to self heal the
system. The source of such problems reporting
could be in terms of a system log, exceptions
occurred but not handled and reported through
log files, or through in-memory process or
implanted agents on client machines. The
knowledge base will have different rules for
symptoms of such incidents, possible execution
plans. In case solution does not exist, the
knowledgebase will be periodically analyzed for
possible solution schemes and action plans
which will be used for later occurring cases.
OBJECT ORIENTED AUTONOMIC
COMPUTING
Object-Oriented software systems [4] are
based on hiding the implementation and
exposing the public interface for controlling the
behavior and state changes by outside clients.
In case an object fails to restore states, or
goes to an inconsistent state because of some
transaction leading to such situations, the
autonomic engine implemented at the object
level could bring itself to a consistent state
based on a consistency profile check.
Service-oriented architecture (SOA) provides a
loosely coupled composition of services. With
heterogeneous service offerings on varied
platforms, web services [18] based on
distributed middleware technology using open
standards and interoperability through XML,
services can be described as exchanging
messages in XML [17]. With SOA, in
conjunction with web services, application
integration with cross-platform interoperability,
scalability, availability is achievable. Loose-
coupling and asynchronous linkage by
messaging are the important aspects of SOA
so that in case of a problem creating
component, the component can be
quarantined and a fresh copy of the
component can be re-instantiated to serve the
same purpose after resuming to the last rolled
back consistent state.
In case of a loosely coupled SOA, when an
attempt to respond to a request for a
designated service fails, the log of failures can
be analyzed to detect the cause of such
failures, and corrective action can be taken.
For example, in case of a Java EE application,
if a JDBC connection fails to connect to a
designated database, the cause of failure could
be detected in terms of the nature of
exception. If the exception is caused due to
class not found exception, then the classpath
and the existence of the related driver could be
searched automatically and depending on the
cause of such failure, a proper action plan
based on rule based knowledgebase can be
taken by the Effector.
SUGGESTED APPROACH
We can have multi-agents running on client
machines running as daemon processes to
work as sniffer of the problems. The agent
keeps track of the system and application logs
for applications registered for autonomic
computing and configured for that. In case of
exceptions coming out of a running java based
web application (say) and in case of a failure
(exception) within a known set of problems, it
Knowledge Base
Analyze
Execute
Plan
Monitor
Sensor
Effector
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 5 of 7
tries to rectify automatically by carrying out
some predefined steps like rectifying the
database (may be) or may be running some
auto-correct code snippet at the server end by
sending a remote procedure call (RPC) or
invoke a Java-based RMI (Remote Method
Invocation) to rectify the problem or prevent
the problem from re-occurring.
In case a garbage collector in Java-based
application fails due to PermGen errors (which
is overflow of the heap area), the system could
temporarily stall the application and force
automatic garbage collection initiated by the
healing agent and resume the application or
automatically restart a fresh copy of
application after reporting the problem.
We suggest that the problem determination
algorithms which are crucial to any autonomic
system be designed to be rules-based, with a
RBE (Rules Based Engine) implementing fuzzy
logic on a set of exceptions that the log
tracer/analyzer extracts from a system log
after a failure. The RBE can be made easily
configurable using XML as the underlying
knowledge repository. On encountering
multiple causes for a system failure, the self-
healing manager can use fuzzy logic to
determine the root cause behind it. For
instance, if the log analyzer/tracer reveals a
scenario where “Application server can't
connect to the database” but the “Application
server can ping database server machine”, the
RBE can figure out that the database server is
down, and consequently the Workload
Manager may take appropriate actions to
recover from the system crash.
TRENDS IN AUTONOMIC COMPUTING
The increasing heterogeneity, dynamism and
interconnectivity in software applications,
services and networks led to complex,
unmanageable and insecure systems. Coping
with such a complexity necessitates
investigation of this new paradigm for
computers to heal itself, and poses two broad
areas of research: technologies related to
autonomic computing [2, 5], and development
of autonomic computing products [3, 6]. Open
areas of research include Peer-to-Peer and
Grid Computing as a means towards
implementing autonomic systems, and
designing autonomic managers in multi-layer
P2P form [5], so that autonomic behavior and
the underlying RBE knowledge base are stored
in separated layers.
The scope of autonomic computing involves
not only the rules-based enterprise-wide
applications we build, but also the underlying
operating systems [13], middleware, database
systems, server/network systems and shared
services. This will be evident in the B2B and
B2C collaboration. We also envisage that in the
coming years, consumer electronics will also be
impacted by this paradigm shift as autonomic
computing invades the embedded software
arena, with Java ME as the object oriented
technology of choice in implementing
embedded autonomic systems.
In the words of Dr. Paul Horn, senior vice-
president and director, IBM Research, we are
looking at the next era of computing:
“Computer systems that can regulate
themselves much in the same way as our
autonomic nervous system regulates and
protects our bodies.”
REFERENCES
[1] Richard Murch, “
Autonomic Computing”
, IBM Press,
March 2004
[2] Litoiu, M.
“A performance analysis method for
autonomic computing systems”,
ACM Transactions
on Autonomous and Adaptive Systems, Vol. 2, No. 1,
Article 3, March 2007
[3] Martin, P., Powley, W., Wilson, K., Tian, W., Xu, T.,
and Zebedee, J. 2007. The WSDM of Autonomic
Computing: Experiences in Implementing Autonomic
Web Services. In
Proceedings of the 2007
international Workshop on Software Engineering For
Adaptive and Self-Managing Systems
(May 20 - 26,
2007). International Conference on Software
Engineering. IEEE Computer Society, Washington,
DC, 9.
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 6 of 7
[4] Gabriel, R. P. and Goldman, R. 2006. Conscientious
software. In
Proceedings of the 21st Annual ACM
SIGPLAN Conference on Object-Oriented
Programming Systems, Languages, and Applications
(Portland, Oregon, USA, October 22 - 26, 2006).
OOPSLA '06. ACM Press, New York, NY, 433-450.
[5] Kephart, J. O. 2005. Research challenges of
autonomic computing. In
Proceedings of the 27th
international Conference on Software Engineering
(St. Louis, MO, USA, May 15 - 21, 2005). ICSE '05.
ACM Press, New York, NY, 15-22.
[6] Salehie, M. and Tahvildari, L. 2005. Autonomic
computing: emerging trends and open problems. In
Proceedings of the 2005 Workshop on Design and
Evolution of Autonomic Application Software
(St.
Louis, Missouri, May 21 - 21, 2005). DEAS '05. ACM
Press, New York, NY, 1-7.
[7] Engel, M. and Freisleben, B. 2005. Supporting
autonomic computing functionality via dynamic
operating system kernel aspects. In
Proceedings of
the 4th international Conference on Aspect-Oriented
Software Development
(Chicago, Illinois, March 14 -
18, 2005). AOSD '05. ACM Press, New York, NY, 51-
62.
[8] Ahmed, S., Ahamed, S. I., Sharmin, M., and Haque,
M. M. 2007. Self-healing for autonomic pervasive
computing. In
Proceedings of the 2007 ACM
Symposium on Applied Computing
(Seoul, Korea,
March 11 - 15, 2007). SAC '07. ACM Press, New
York, NY, 110-111.
[9] Tesauro, G., Chess, D. M., Walsh, W. E., Das, R.,
Segal, A., Whalley, I., Kephart, J. O., and White, S.
R. 2004. A Multi-Agent Systems Approach to
Autonomic Computing. In
Proceedings of the Third
international Joint Conference on Autonomous
Agents and Multiagent Systems - Volume 1
(New
York, New York, July 19 - 23, 2004). International
Conference on Autonomous Agents. IEEE Computer
Society, Washington, DC, 464-471.
[10] Neti, S. and Muller, H. A. 2007. Quality Criteria and
an Analysis Framework for Self-Healing Systems. In
Proceedings of the 2007 international Workshop on
Software Engineering For Adaptive and Self-
Managing Systems
(May 20-26, 2007). International
Conference on Software Engineering. IEEE Computer
Society, Washington, DC, 6.
[11] Kramer, J. and Magee, J. 2007. Self-Managed
Systems: an Architectural Challenge. In
2007 Future
of Software Engineering
(May 23 - 25, 2007).
International Conference on Software Engineering.
IEEE Computer Society, Washington, DC, 259-268.
[12] Fleissner, S. and Baniassad, E. 2006. A
commensalistic software system. In
Companion To
the 21st ACM SIGPLAN Conference on Object-
Oriented Programming Systems, Languages, and
Applications
(Portland, Oregon, USA, October 22 -
26, 2006). OOPSLA '06. ACM Press, New York, NY,
560-573.
[13] Shapiro, M. W. 2004. Self-healing in modern
operating systems.
Queue
2, 9 (Dec. 2004), 66-75.
[14] Patterson, D. A., A. Brown, P. Broadwell, G. Candea,
M. Chen, J. Cutler, P. Enriquez, A. Fox, E. Kiciman,
M. Merzbacher, D. Oppenheimer, N. Sastry, W.
Tetzlaff, J. Traupman, N. Treuhaft. Recovery-
Oriented Computing (ROC): Motivation, Definition,
Techniques, and Case Studies.
UC Berkeley
Computer Science Technical Report UCB//CSD-02-
1175
, March 15, 2002.
[15] Predictive Self-Healing in the Solaris 10 Operating
System, White Paper, Sun MicroSystems Inc.,
http://www.sun.com/software/solaris/ds/self_healing
.jsp
[16] Merit Project, Computer Associates International,
http://www.meritproject.com/it_survey_results.htm.
[17] Jana D, “Service Oriented Architectures – A New
Paradigm”,
CSI Communications
, March, 2006, pp.
12-14.
[18] Fujita Satoru, Dynamic Collaboration of Businesses
using Web Services,
NEC Journal of Advanced
Technology
, Vol. 1, No. 1, Jan, 2004, P. 36-42
[19] White, S.R. Hanson, J.E. Whalley, I. Chess, D.M.
Kephart, J.O. “An architectural approach to
autonomic computing”. In Proceedings of
International Conference on Autonomic Computing,
2004 (ICAC’04), 17-18 May, 2004, pp2-9.
About the Authors
Debasish Jana
FIETE, FIE(I), SMIEEE, SMACM, SMCSI
MBA(Finnace)(IGNOU), MMATH(Computer Science)(UW, Canada), BE(Computer Science)(Jadavpur University)
Debasish is presently working as a Principal Consultant with Information Technology Department
of Simplex Infrastructures Ltd, Kolkata. He has more than twenty years of extensive experience in
IT industry in various stages of the software development lifecycle. He worked as various
capacities at HP Division of Blue Star Ltd, Techna Digital Services Pvt. Ltd., BFL Software Ltd.,
Millenium Information Systems Ltd., PriceWaterhouseCoopers Pvt. Ltd., Anshin Software Pvt Ltd,
Kolkata. He has authored two practice-oriented textbooks titled
C++ and Object Oriented
Programming Paradigm
,
Java and Object Oriented Programming Paradigm
published by Prentice
Autonomic Computing: Can Computers Heal Itself? Debasish Jana, Soumya Maitra
Submitted for CSI Communications, October 2007 issue Page 7 of 7
Hall of India. He has also served as Visiting faculty for core Computer Science subjects at Jadavpur
University, BIT Mesra and Army Institute of Management. He has also authored several papers and
technical articles at national and international level conferences and prominent national magazines.
He is a Fellow of IE (I), IETE, Senior Member of IEEE, ACM and Senior Life Member of CSI.
Soumya Maitra
MCA(BIT Mesra), BSc Physics (University of Calcutta)
Soumya is presently engaged as an Associate with Cognizant Technology Solutions, Kolkata. He
has also served as Visiting Faculty at the Kolkata Extension Centre of Birla Institute of Technology
– Mesra. His interest lies in theoretical Computer Science, and is also a Bronze Medalist at National
Computing Contest 2002, held by Nalini Foundation of Symbolic Logic, Pune. He has authored
several technical and popular science articles at various journals, magazines, and national dailies in
India, including Science Reporter, Bioinformatics India Journal, PC Quest, The Statesman, The
Telegraph, Wisdom, etc.