Conference PaperPDF Available

Creating Abstract Superclasses by Refactoring.



This paper focuses on object-oriented programming and one kind of structure-improving transformation (refactoring) that is unique to object-oriented programming: finding abstract superclasses. We decompose the operation of finding an abstract superclass into a set of refactoring steps, and provide examples. We discuss techniques that can automate or automatically support these steps. We also consider some of the conditions that must be satisfied to perform a refactoring safely; sometimes to satisfy these conditions other refactorings must first be applied.
Creating Abstract Superclasses by Refactoring
William F. Opdyke Ralph E. Johnson
AT&T Bell Laboratories Department of Computer Science
Naperville, Illinois 60566 University of Illinois at Urbana-Champaign
Abstract: This paper focuses on object-oriented
programming and one kind of structure-improving
transformation (refactoring) that is unique to object-
on’ented programming:
abstract superclasses.
We decompose the operation of finding an abstract
superclass into a set of refactoring steps, and provide
ezamples. We discuss techniques that can automate
or automatically support these steps. We also consid-
er some of the conditions that must be satisfied to per-
form a refactoring safely; sometimes to satisfy these
conditions other refactorings must first be applied.
1 Introduction
Software systems tend to grow with age as new fea-
tures are added and old features are changed. It is
not just the increased number of features that makes
the system grow; most of us have rewritten an old
system and seen its size shrink dramatically. The
real problem is that the original design was not suit-
ed for the subsequent purposes of the system. Also,
programs are often extended defensively, by copying
code instead of changing the original version. For ex-
ample, instead of making a subroutine more general,
a programmer might make a new version of it and
call that version, and so avoid any chance of affect-
ing other parts of the system that called the original
subroutine. The result is that a program’s structure
deteriorates over time.
The only way to prevent the structure of pro-
grams being maintained from decaying is to rewrite
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of the Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.
0 1993 ACM O-89791 -558-S/93/0200/0066
$1 so
Urbana, Illinois 61801
them. This paper is about techniques for incremen-
tally rewriting programs and improving their struc-
ture. It focuses on object-oriented programming and
one kind of structure-improving transformation that
is unique to object-oriented programming: finding
abstract superclasses.
Object-oriented programming is touted as being
more reusable and extensible than conventional pro-
gramming [19].
Nonetheless, the structure of an
object-oriented program still deteriorates as features
are added. As an object-oriented program grows,
class hierarchies get larger and less rational, code is
duplicated, and individual classes get larger and hard-
er to understand. A common practice in the object-
oriented community is to interleave periods of growth
with consolidation periods in which the program is
refactored (restructured) to make it smaller and eas-
ier to understand [18].
Refactoring is also important in developing
reusable software,
especially frameworks [14, 20,
211. Frameworks are program skeletons that can be
“fleshed out” to construct a complete program [9,30].
Frameworks are usually developed by generalizing a
set of concrete applications. Sometimes frameworks
are developed by careful planning, starting with a do-
main analysis and a study of several applications in
the problem domain. In these cases refactoring is less
important. However, if existing software is going to
reflect the new abstraction then it has to be refac-
tored. Also, it often is more economical to refactor
an existing application to extract a framework that
could have been used to create it than to start over.
Finally, an important part of building a framework is
testing it for reusability by building applications that
use it, and these test cases often point out the need
for changing the framework. In these cases, under-
standing refactoring is crucial.
RRfactoring has always been carried out manually,
but there have been several studies of how to auto-
mate or partially automate it. Casais [7] and Berg-
stein [5] have both invented algorithms to create ab-
stract classes from a set of concrete classes. These
algorithms do the easy part of abstraction, which
is moving common features to a single class. The
hard part of the process is deciding whether two fea
tures are the same, and making similar features be
the same. Casais has done some work in this latter
area; we consider several issues outside the scope of
his work.
This paper looks at one particular refactoring that
is unique to object-oriented programming: making
an abstract superclass of a set of concrete classes. It
should be noted that any kind of program can be
refactored, not just object-oriented ones, though the
set of possible refactorings depends on the style of the
program. For example, Griswald refactored Scheme
programs [12]. In fact, though he didn’t study the
object-oriented transformations that are the subject
of this paper, he showed how to implement some of
the simpler refactorings that we will use, such as con-
verting code segments into procedures.
Refactoring to create an abstract superclass is more
complicated than has been recognized in the past, and
depends as much on the purpose of the abstraction as
in the structure of the program. Thus, it is unreason-
able to try to completely automate it. Our interest in
this problem is to provide tools for making refactor-
ing easier, so we want to automate as many steps in
the refactoring as possible. Below we describe what
can be automated, what cannot be automated, and
what that are still up in the air.
2 Examples of Finding Ab-
stract Classes
An abstract class is a class designed to be used only
as a superclass [30]. This is in contrast to the normal
way that a class is used, which is both by making
instances of it and by using it as a superclass. An
abstract class usually defers the implementation of
some of its operations to its subclasses, so it is on-
ly a partial specification of an object and cannot be
directly used to make instances.
Abstract classes are an important design technique,
but are not always directly supported by object-
oriented programming languages. Although statically
typed languages such as C++ often have a way to give
a signature for an operation without giving an imple-
mentation (e.g. pure virtual functions in C++ [lo]),
untyped languages such as Smalltalk specify abstract
classes only by convention. In Smalltalk, for example,
documentation usually specifies which classes are ab-
stract, the names of abstract classes sometimes start
with “Abstract”,
and the operations in an abstract
class that are left to subclasses are supposed to be
implemented to generate a “subclass responsibility”
error message.
Abstract classes are always invented by generaliz-
ing from concrete subclasses. Once an abstract class
is found, many more concrete subclasses can be made
from it in a top down fashion, but the original discov-
ery of an abstract class is bottom up. This bottom up
discovery can happen early in the life-cycle of a sys-
tem: Wirfs-Brock et. al. show how to find abstract
classes during an early design phase before any al-
gorithms have been specified [29]. However, abstract
classes can be discovered anywhere in the life-cycle
of a system, though pulling abstract classes out of a
set of concrete classes is harder once the classes have
been implemented, as the next two examples (using
C++ syntax) show.
2.1 Matrix Example
This first example shows what happens when a Ma-
trix class is generalized to support sparse arrays. We
will start with a concrete Matrix class is that is not
sparse, then build a sparse version, and then capture
their commonalities in an abstract Matrix class. Con-
sider this initial implementation of the class
class Matrix
int elements Cl00001 ;
int columns, rows:
public :
int get(int rowlum, int colNum)
< 3;
. . .
void put (int newVa1, int rowlum,
int collum) ( . . . 3;
Matrix (int numNows, int numCols)
x 3;
. . .
Matrix matrixMultiply (Matrix m2)
< 3;
. . .
void rotate0 ( . . .I;
martixInverse0 (
. ..3
A reference to, for example, the matrix element in
row z, column y might be coded as:
= elementsC(x * columns) +
The current implementation is fine for dense ma
trices, but a different representation would be more
lln this example, the elements of a matrix are always
space efficient for sparse matrices. Such a represen-
tation would store only the non-zero values, along
with their locations. ‘Hetrieving and storing elements
would be different with this representation. The fol-
lowing steps could be applied to support both types
of matrices, while capturing their commonalities in
an abstract superclass:
1. rename the
class to be
2. in functions other than gel and p&, replace ref-
erences to elements with calls to get and pzlt.
new cIass
and copy the
members of
4. define a new type (SparseElemenl), to represent
an element of a sparse matrix. Each sparseEls
ment stores its location along with its value.
in the class
(a) change the type of the variable elements
sparseElement elements C501;
(b) change the get 8 put functions. For a sparse
matrix, the function get will return 0 if no
element is defined in elements for the spec-
ified location. The put function will remove
the old value (if any) from elements and
write a new value there only if the value
is non-zero. Since the other functions are
written in terms of the get & put functions,
they need not be changed.
6. define an abstract superclass
for classes
(a) add
to Matrix
class the function signatures
for matrixMultiply, rotate, and martixIn-
verse. These define the protocol for the
class; that is, the set of messages that an
instance of the class will accept.
(b) move the variables columns and rows to the
(c) add the function signatures for get 64 put
the superclass
(d) add the function bodies for matrixMultiply,
rotate, and martixInverse to the superclass,
and delete the redundant definitions from
the subclasses.
After these changes, class
is an abstract su-
perclass. Its only data are variables to store the num-
ber of columns and rows, but the data in the
is defined by the subclasses.
defines get and
put as pure virtual functions, so their implementa-
tion is left to subclasses, too. However, it can define
operations such as matrixMultiply, rotate, and ma-
trixInverse in terms of get and put.
are concrete
subclasses of
that define how the data for the
matrix is stored and implement get and put functions.
These classes will also have to implement constructor
and destructor functions. In practice, some of the al-
gorithms inherited from class
might be too in-
efficient and will probably be reimplemented, but the
original algorithms are correct. For example, most
representations of sparse matrixes permit a more ef-
ficient way of rotating the matrix than just iterating
over all the elements and relocating them, which is
the natural algorithm to be defined in class
In summary, as a result of these refactorings the
program defines two types of Matrices, making ex-
plicit their common features. This structure would
make it easier to extend the program to support ad-
ditional matrix representations in the future.
This example assumes that the programmer recog-
nized early that the storage and retrieval operations
would differ for dense and sparse matrices, and re-
placed direct references to elements with calls to the
functions get and put. The next example shows a case
where code replacement is done later in the refactor-
ing process.
2.2 Inode Example
The second example describes how refactorings were
applied to improve the
class during the de-
sign of the Choices file system framework [17]. An
Inode contains a description of the disk layout of
a file and other information such as the file owner,
access permissions and access times. The Choices
object-oriented operating system project at the Uni-
versity of Illinois has defined an operating system
framework consisting of interlocking frameworks for
file systems [17],
virtual memory [26], communication
[31], and process scheduling [25]. An early version
of the Choices file system framework supported only
the BSD UNIX file format. Then, it was extended
to handle both BSD UNIX and UNIX System V [l]
file formats. To support both formats, the
was changed as follows:
1. the
class was renamed
to BSDlnode,
21node is standard UNIX@ operating system terminology;
it is a contractionof the term indez node. UNIX is a registered
trademark of UNIX Systems Laboratories, Inc.
Inode @SD)
t DSDInode SystemVInode
steps1 &2:
BSDInode SystemVhode
Steps 3 & 4:
Figure 1: Creating An Abstract Superclass
class was added as a sibling of
variables and functions were copied
from the
class, and modified,
3. a new class
was added as the superclass of
4. the members common
to BSDlnode
and Sys-
were migrated up to their common
The fourth step had to make structural modifica-
tions to the subclasses before some of the common
members could be moved. For example, while most of
the code in the
implementation of the
the same
in the
class, there were a few minor differences. Getting and
setting logical block numbers was handled differently
for the two file formats. To handle this, the differing
code was first split off into separate functions; then,
when the implementations of the
function in
both subclasses matched, it was moved to the super-
3 Steps In Creating An Ab-
stract Superclass
This section will show how to create a abstract su-
perclass for a pair of classes
It is easy to
generalize this to more than two classes.
must already have a common superclass, or no super-
classes. If necessary, one of them can be moved in the
superclass graph so that they become sibling classes
The first important step in creating an abstract
superclass S of a pair of classes
is to create
an empty class with a unique name that is a sibling of
are given S as a superclass.
3 Sibling
classes are classes that either share a commondirect
superclass or are both top most classes in their inheritance
The prior examples (in particular, step 6 of the
example and step 4 of the
example) il-
lustrate the subsequent refactoring steps involved in
creating an abstract superclass, which are defined be-
low [20]:
adding function signatures to the superciass pro-
tocol (after making them compatible in both sub-
making functions bodies (and the variables ref-
erenced by them) compatible in both subclasses
migrating common variables to the superclass
migrating common code to the superclass.
3.1 Adding Function Signatures To
The Superclass
Functions belong in the superclass protocol if they are
part of the common abstraction represented by the
superclass. Sometimes, function signatures in one or
both of the classes need to be changed before the sig-
nature can be added to the superclass. Suppose, for
example the abstract superclass
is created for
the existing classes
has a function whose signature is:
void shiftDirection(direction newDirection,
int newspeed)
while in class
there is a function whose
signature is:
void redirect(int newspeed,
direction newDirection).
For the function signatures to match, one of the
function names needs to be changed, and function
arguments reordered. There is a range of assistance
that we could expect from a tool. Hueristics could be
applied to determine, based on structural attributes
of the function signatures, what refactorings would
be needed to make them match [20]. In the above
example, the tool could prompt the user that a re-
naming was needed, and provide a menu of choices:
one of the functions could be renamed to match the
other function, or both could be given an (identical)
new name. Similar support could be provided for ar-
gument reordering.
A more powerful form of automated support would
be to determine, given two classes, what functions
have structural similarities that
similarities. However, hueristics based on structural
similarities are not foolproof. Suppose that the class-
es Automobile
each contained a func-
tion with a single, integer argument. But, suppose
that in the
class, the function is called
clrangeOi1 whose argument is the number of quarts
of oil needed; in the
class, the function
is called submerge whose argument is the depth to
which to descend. These functions clearly don’t share
a common abstraction, but automatically matching
on attributes of the signature won’t detect this.
Such hueristics, despite their shortcomings, may
be powerful enough to support practical refactoring
tasks. More powerful similarity detection is possible
in some cases [8, 111.
Once the signature of a function in both subclasses
match, the function signature can be added as to the
3.2 Making Function Bodies Compat-
As for the mapunit functions described earlier in the
example, the function bodies in the subclasses
may be similar but not identical. Before the function
body can be migrated to the superclass, differences
need to be separated from the common code.
The approaches for detecting program differences
involve string comparison, tree comparison or a com-
bination of these techniques 141. The approaches rep-
resent the differences between programs as a set of
edit operations (code insertion, replacement and dele-
tion), to get from one program to the other. Program
differences have been studied in regard to spelling
correction [13, 273, parsing error correction [28], ver-
sion storage [24] and other uses. String comparison
finds the minimum cost sequence of edit operations
to convert one string into another. Tree comparison
algorithms detect syntactic differences between pro-
grams by building syntax trees and comparing the
trees. Tree comparison algorithms are more expen-
sive than string comparison approaches, but are not
as sensitive to minor differences in coding style (for
example, extra spaces or blank lines).
These techniques can be used in refactoring as fol-
lows 1201: Suppose the function commonFunction is
defined in classes Cl and C2. For each edit operation,
define a new function in both classes:
for each insertion, define in C2 a new function
whose body contains the inserted code; define in
Cl a new function with the same name whose
body is null. Convert in C2 the inserted code to
41n C++, the function can be defined as a pure virtual
function by assigning its value to be ‘0’ in the superclass.
a call to the new function; at the corresponding
location within
commonFunction in Cl, add a
call to the new function.
for each replacement, define in
a new function
whose body contains the replaced code; define in
C2 a new function with the same name whose
body contains the replacing code. In Cl, convert
the replaced code to a call to the new function;
in C2, convert the replacing code with a call to
the new function.
for each deletion, in Cl define a new function
whose body contains the deleted code; define in
C2 a new function with the same name whose
body is null. In Cl, convert the deleted code to
a call to the new function; at the corresponding
location within commonFunction in C2, add a
call to the new function.
These operations are safe if the code segments be-
ing inserted, replaced and deleted make syntactic
sense as the bodies of new functions. This can be
more easily realized by using tree analysis approach-
.While such function splitting can be safely applied
to a program, the resultant new functions will not
necessarily correspond to meaningful concepts in the
application domain. For example, when two functions
are compared, segments of code that differ between
them may be preceded or followed by segments of re-
lated code that (coincidentally) are the same in both
functions. In order for the new functions to represent
meaningful abstractions, this “common” code might
really belong together with the differing segments in
those new functions.
This suggests that automated analysis should be
combined with user interaction, such as the approach
Rak [23] describes for abstracting a function (Small-
talk method) into a superclass from its subclass im-
3.3 Moving Variables
Having created the abstract, superclass and deter-
mined the function signatures, it is sometimes nec-
essary to add member variables to the abstract su-
perclass. The most common reason is that they are
referenced by common code that belongs in the su-
As was the case with function signatures, variables
defined in one subclass may be structurally similar
to, but not exactly match, conceptually equivalent
variables in the other subclass. As with function sig-
natures, structural heuristics could detect structural
similarities (in name, access control mode and type).
In cases where the attributes of the variables differ,
refactorings can be applied to make them conform
PO1 *
Once the attributes of a variable in both subclass-
es match, the variable can be moved to the abstract
3.4 Migrating Common Code to the
Abstract Superclass
Before migrating the function body to the superclass,
any differences between the functions need to be de-
termined. Variables and functions referenced by the
common code must be visible from the superclass be-
fore the common code can be moved there.
The preconditions for this refactoring are:
1. the function signature, but not the function
body, is already defined in the superclass’
2. each differing code segment can be converted to
a legal function
3. the scope of all variables and functions referenced
by the common code includes the superclass and
both subclasses.
After checking its preconditions, this refactoring:
1. for each differing code segment:
(a) creates a new function in each subclass.
The name of the new function is automat-
ically generated and is distinct from the
name of any existing member.6
(b) adds th e si na ure of the new function to
g t
the superclass protocol
2. adds a function body to the member function
signature in the superclass
3. deletes the member function from the subclasses.
3.5 Summary
Given two classes, this section defines an approach for
creating a common abstract superclass that contains
a set of member function signatures, and possibly a
set of member variables and the partial implementa-
tions of some functions. After refactoring, the defi-
nitions in the subclasses are streamlined, as some of
the behavior that had been locally defined is now in-
herited. The commonalities and differences between
the subclasses are made more explicit.
‘This is satisfied by the results of a prior step.
‘A refactoring could later be applied to make the name more
4 Conclusions
Although refactoring is common in the Smalltalk
community, it seems to be practiced less often by the
C++ community and does not seem to be recognized
as an important part of the software life-cycle of C++
programs. We conjecture that one important reason
is that refactoring has been easier to do for Small-
talk programs than for C++ programs. Smalltalk is
a simpler, more compact language than C++. The
browsing and cross-reference tools that make refac-
toring easier are more a part of the Smalltalk pro-
gramming environments than among some currently
used C++ programming environments. However, as
C++ programming environments are becoming more
powerful refactoring is becoming easier to realize for
C++ programs.
Refactoring seems to be considered even less im-
portant outside the object-oriented community. This
is probably because its relative cost and benefit dif-
fer from one group to another. Reducing the cost
of refactoring should also encourage these groups to
consider refactoring their programs to keep the pro-
grams well-structured and make it easier to produce
reusable software.
There are several reasons why refactoring is hard.
The first is that it isn’t recognized as significant. We
find that simply having names for the different refac-
torings makes it easier to notice when they are need-
ed, to plan for them, and to carry them out. Another
resson is that refactoring takes time. It requires an-
alyzing the program to find all the places that have
to be changed. &factorings that require many sim-
ple changes can still take a long time to carry out by
hand. A third reason is that any change to a pro-
gram, including a refactoring, can introduce defects
into it.
We have addressed the first problem by specifying
a set of refactorings that are commonly used in the
object-oriented community [20, 211. These refactor-
ings include low-level transformations such as chang-
ing the name of a function or variable, moving a
function or variable from one class to another, and
breaking a function into smaller pieces, and also con-
tains higher-level refactorings such as dividing a class
into several smaller classes and finding abstract su-
perclasses. We have prototyped most of these refac
torings for a constrained set of C++ programs, and
are in the process of building a refactoring tool for
The best way to solve the last two problems is to
provide tools to carry out refactoring automatically.
Unfortunately, it is probably impossible to complete-
ly automate refactoring. The purpose of a refactoring
is to improve the design of a system, but a refactoring
that can be applied safely to a program will not nec-
essarily improve its design. On the contrary, apply-
ing arbitrary refactorings to a program is more likely
to corrupt the design rather than improve it, even
though the behavior of the program is unchanged. A
refactoring improves desigrrif the resultant code units
correspond to meaningful abstractions that make it
easier to refine or extend the program. What abstrac-
tions are meaningful depends on the application and
on the designer. This implies that refactoring tasks,
especially the more complex tasks, require some inter-
action with the designer. Nevertheless, much support
can be provided by a refactoring tool [20].
We have algorithms .for all of our refactorings,
though many of these algorithms require several in-
puts from a user. Each algorithm has a precondition;
if the precondition is met then the algorithm is behav-
ior preserving and will not introduce any defects into
the program. Since testing for program equivalence is
undecidable, the preconditions are often conservative.
Several of them are based on dataflow techniques, and
can almost certainly be improved upon. However, the
undecidability of the basic problem requires that any
algorithm for checking preconditions will be too con-
Refactoring is similar to the schema modification
in databases [3, 16, 221. The main difference is that
schema modification is concerned only with data,
while refactorings are concerned with both data and
program. On the other hand, the work on schema
modification is concerned with updating the existing
objects in the database. Our work has ignored the
problem of changing existing objects, since most im-
plementations of languages such as C++ do not allow
programs to be modified in the middle of their exe-
cution. An OODBMS, which unifies programs and
persistent data, would have to deal with both.
Refactoring is also similar to the more traditional
program transformation work, which usually has the
goal of improving efficiency, of converting abstract
program schemas into code, or of transforming an ab-
stract design into a concrete program [2,6, 151. These
systems often perform the inverse transformations to
ours, since refactorings often are used to make pro-
grams more abstract and are not usually concerned
with efficiency.
Refactoring is a practical problem that needs bet-
ter support. The examples in this paper show that
even relatively straightforward refactorings such as
finding a common superclass are more complicated
than they appear at first. Although this seems to
eliminate chances for refactoring programs complete-
ly automatically, it should be possible to build tools
that make refactoring easier. In the long run, this
will help make our programs easier to extend and will
make it easier to develop reusable software.
5 Acknowledgements
Peter Madany provided helpful input regarding the
evolution of the Choices file system framework. Janet
Coleman, Warren Montgomery and Ed Rak reviewed
drafts of this paper. The conference reviewers also
provided helpful comments.
AT&T Bell Laboratories has supported William F.
Opdyke’s research at the University of Illinois under
the full-time doctoral support program.
[l] AT&T. UNIX System V User Reference Manual.
AT&T, 1984.
[2] Robert Balzer. A fifteen-year perspective on au-
tomatic programming. In Software Reusability
- Volume II: Applications and Experience, pages
289-311, 1989.
[3] Jay Banerjee and Won Kim. Semantics and
implementation of schema evolution in object-
oriented databases. In Proceedings of the ACM
SIGMOD Conference, 1987.
[4] Carol Sue Beckman-Davies. Finding Program
Diflerences Based on Syntactic Dee Structure.
PhD thesis, University of Illinois at Urbana
Champaign, 1989.
[5] Paul L. Bergstein. Object-preserving class trans-
formations. In Proceedings of OOPSLA ‘91,
[6] R. M. Burstall and J. Darlington. A transforma-
tion system for developing recursive programs.
Journal of the ACM, 24(1):44-67, 1977.
[7] Eduardo Casais.
Reorganizing an Object Sys-
tem, pages 161-189. Centre Universitair
d’Informatique, Universite de Geneve, 1989.
[8] N. Dershowitz. Programming by analogy. Ma-
chine Learning: An Artificial Intelligence Ap-
proach (R.S. Michalski, J. G. Carbonell and T.
M. Mitchell, eds), 2:395-424, 1986.
[9] L. Peter Deutsch. Design reuse and frameworks
in the Smalltalk-80system. In Software Reusabil-
ity - Volume II: Applications and Experience,
pages 57-72, 1989.
[lo] Margaret A. Ellis and Bjarne Stroustrup. The
Annotated C++ Reference Manual. Addison-
Wesley Publishing Co., Reading, MA, 1990.
[ll] R. Greiner. Learning by understanding analo-
gies. Artificial Intelligence, 35:81-125, 1988.
[12] William G. Griswold. Program Restructuring as
an Aid in Software Maintenance. PhD thesis,
University of Washington, 1991.
[13] Patrick A. V. Hall and Geoff R. Dowling. Ap-
proximate string matching. Computing Surveys,
12(4):381-402, December 1980.
[14] Ralph E. Johnson and Brian Foote. Designing
reusable classes. Journal of Object-Oriented Pro-
gramming, 1(2):22-35, 1988.
[15] W. Lewis Johnson and Martin Feather. Build-
ing an evolution transformation library. In Pro-
ceedings of the 12th International Conference on
Software Engineering, pages 238-247, 1990.
[16] Won Kim.
Introduction to Object-Oriented
Databases. MIT Press, 1990.
[17] Peter W. Madany. An Object-Oriented Frame-
work for Filesystems. PhD thesis, Universi-
ty of Illinois at UrbanaChampaign, 1992. Al-
so Technical Report No. UIUCDCS-R-92-1751,
Department of Computer Science, University of
Illinois at UrbanaChampaign.
[18] Jeff McKenna. A proposal for change manage-
ment for smalltalk. Smalltalk Report, 1(5):1-3,
[19] Bertrand Meyer. Object-oriented Software Con-
struction. Prentice Hall, 1988.
[20] William F. Opdyke. Refactoring Object-Oriented
Frameworks. PhD thesis, University of Illinois
at UrbanaChampaign, 1992. Also Technical
Report No. UIUCDCS-R-92-1759, Department
of Computer Science, University of Illinois at
[21] William F. Opdyke and Ralph E. Johnson.
Refactoring: An aid in designing application
frameworks and evolving object-oriented sys-
tems. In Proceedings of Symposium on Object-
Oriented Programming Emphasizing Practical
Applications (SOOPPA), September 1990.
[22] D. Jason Penney and Jacob Stein. Class modi-
fication in the Gemstone object-oriented dbms.
In Proceedings of OOPSLA ‘87, 1987.
[23] Edward J. Rak. Two redesign tools for Small-
talk. Master’s thesis, University of Illinois at
Urbana-Champaign, 1990.
[24] Marc J. Rochkind. The source code control sys-
tem. IEEE fiansactions on Software Engineer-
ing, SE-1(4):364-370, December 1975.
[25] Vince Russo, Gary Johnston, and Roy H. Camp-
bell. Process Management in Multiprocessor Op-
erating Systems using Class Hierarchical Design.
In Proceedings of OOPSLA ‘88, San Diego, Ca.,
September 1988.
[26] Vincent Russo and Roy H. Campbell. Virtual
Memory and Backing Storage Management in
Multiprocessor Operating Systems using Class
Hierarchical Design.
In Submitted to OOPSLA
‘89, 1989. Also available as University of Illinois
Technical Report.
[27] David Sankoff
and Joseph B. Kruskal. Macro-
molecular sequences. In Time Warps, String Ed-
and Macromolecules: The Theory and Prac-
tice of Sequence Comparison (0. Sank08 and J.
Kruskal, eds), pages 45-53, 1983.
[28] Robert A. W g
a ner. Order-n correction for reg-
ular languages.
Communications of the ACM,
17(5):265-268, 1974.
[29] Rebecca Wirfs-Brock, Brian Wilkerson, and
Lauren Wiener. Designing Object-Oriented Soft-
ware. Prentice-Hall, 1990.
[30] Rebecca J. Wirfs-Brock and Ralph E. Johnson.
A survey of current research in object-oriented
design. Communications of the ACM, September
[31] Jonathan Zweig and Ralph Johnson. Conduits:
A communication abstraction in C++. In Pro-
ceedings of the USENIX C++ Workshop, pages
... Refactoring as an activity to extend the lifetime of existing software products is a behavior preserving code transformation to improve the source code that structurally deteriorated over time [30] or accumulated technical debt [39]. According to Pirkelbauer [33], agile software development methodologies benefit in particular due to frequent changes. ...
While the recently emerged Microservices architectural style is widely discussed in literature, it is difficult to find clear guidance on the process of refactoring legacy applications. The importance of the topic is underpinned by high costs and effort of a refactoring process which has several other implications, e.g. overall processes (DevOps) and team structure. Software architects facing this challenge are in need of selecting an appropriate strategy and refactoring technique. One of the most discussed aspects in this context is finding the right service granularity to fully leverage the advantages of a Microservices architecture. This study first discusses the notion of architectural refactoring and subsequently compares 10 existing refactoring approaches recently proposed in academic literature. The approaches are classified by the underlying decomposition technique and visually presented in the form of a decision guide for quick reference. The review yielded a variety of strategies to break down a monolithic application into independent services. With one exception, most approaches are only applicable under certain conditions. Further concerns are the significant amount of input data some approaches require as well as limited or prototypical tool support.
... There have been many works that try to (semi-) automatically identify different types of refactoring opportunities: abstract factory refactoring [Jeon et al. 2002], composite pattern refactoring [Jebelean et al. 2010], move method refactoring [Tsantalis and Chatzigeorgiou 2009], extract superclass refactoring [Opdyke and Johnson 1993], strategy pattern refactoring [Christopoulou et al. 2012], subclass/state pattern refactoring [Tsantalis and Chatzigeorgiou 2010], introduce null object refactoring [Gaitani et al. 2015], and so on. We present an approach to automatically identify opportunities for both subclass (SC) and state (ST) pattern, as part of RCP refactoring. ...
Full-text available
Refactoring is a program transformation that restructures existing code without altering its behaviour and is a key practice in popular software design movements, such as Agile. Identification of potential refactoring opportunities is an important step in the refactoring process. In large systems, manual identification of useful refactoring opportunities requires a lot of effort and time. Hence, there is a need for automatic identification of refactoring opportunities. However, this problem has not been addressed well for many non-trivial refactorings. Two such non-trivial, yet popular refactorings are “Replace Type Code with Subclass” (SC) and “Replace Type Code with State” (ST) refactorings. In this paper, we present new approaches to identify SC and ST refactoring opportunities. Our proposed approach is based around the notion of control-fields. A control-field is a field of a class that exposes the different underlying behaviors of the class. Each control-field can lead to a possible SC/ST refactoring of the associated/interacting classes. We first present a formal definition of control-fields and then present algorithms to identify and prune them; each of these pruned control-fields represents a refactoring opportunity. Further, we present a novel flow- and context-sensitive analysis to classify each of these refactoring opportunities into one of the SC and ST opportunities. We have implemented our proposed approach in a tool called Auto-SCST, and demonstrated its effectiveness by evaluating it against eight open-source Java applications.
... The goals of class hierarchy reorganization are approached in work [5], presenting a manual approach which consists in the application of several refactorings. The factoring mechanism we are using is inspired from the mentioned work. ...
Conference Paper
Full-text available
Inheritance is a class relationship that enables the extension of object-oriented systems. We use a reverse inheritance class relationship (1) in order to address the goals of limited adaptation and restricted reuse of object-oriented class hierachies. Reverse inheritance provides class hierarchy reorganization and class composition facilities. The reverse inheritance semantics are based a new class model, designed as an extension of the "classic" model of class.
... There has been a considerable amount of cross-fertilization among the adaptive software [LBS90], [FL94], [Silv94], [Hurs95], [Lieb96], refactoring [Opdy92], [JO93], [OJ93], [Moor96], [RB95] and software communities research efforts [Casa95], [DR95], [JF88], [NGT92]. For example, Casais references the Johnson/Foote design reuse strategies [JF88] and the Law of Demeter [LHR88]. ...
... The term "refactoring" was first introduced by Opdyke [32], where he presented it as an approach to restructuring object-oriented software in a way that is automatic and behaviour-preserving. Johnson and Opdyke present further refactorings and analyze their application in [24] and [33]. Fowler et al. describe and catalogue over 70 refactorings in [12]. ...
This dissertation defines “modular-objective coupling”, and shows that programming language designs which imply reduced modular-objective coupling reduce complexity of remodularizations—behaviour-preserving restructurings for which the only intended goals are to change program source code structure. We explicitly distinguish between two points of view on program structure: modular structure—the structure of a program as a set of static text documents, and objective structure—the structure of a program as a dynamic computational model during execution. We define modular-objective coupling as the degree to which changes in modular structure imply changes to objective structure, for a given programming language. We use the term remodularization to refer to any behaviour-preserving source code restructuring, for which the only intended goal is to change modular structure. We argue that programming languages with strong modularobjective coupling introduce accidental complexity into remodularizations, by requiring complex objective structure changes to achieve intended modular structure changes. Our claim is that a programming language design which implies reduced modular-objective coupling reduces remodularization complexity in the language. To validate this claim, we first present SubjectJ, a subject-oriented programming system that extends Java. The design of Java implies strong modular-objective coupling, while SubjectJ is designed for reduced modularobjective coupling. We then perform a series of remodularization case studies comparing Java and SubjectJ. Our results suggest that remodularizations are less complex in SubjectJ.
... An explicit architectural design phase is abandoned and the architecture emerges during coding. Architectural shortcomings are resolved through the application of refactoring techniques [14,15]. These are transformational techniques to refactor a system in small steps to enhance its structure. ...
Full-text available
This paper discusses a model-based approach to software development. It argues that an approach using models as central development artifact needs to be added to the portfolio of software engineering techniques, to further increase efficiency and flexibility of the development as well as quality and reusability of the results. Two major and strongly related techniques are identified and discussed: Test case modeling and an evolutionary approach to model transformation.
The study of software development processes has a long and respectable history as a subdiscipline of software engineering, so long and venerable indeed that the field became a bit sleepy and self-complacent when the jolt of agile methods caught it by surprise in the 2000s. Another incentive to question long-established wisdom was the spectacular rise of technologies made possible by the World Wide Web, notably cloud computing and software-as-a-service. No longer could we content ourselves with the well-honed scheme in which a software system is analyzed, then designed, then programmed and tested, then released unto the world, then updated at a leisurely pace as problem reports and requests for new features get filed, weeded out, and patiently implemented. The pace frantically increases: For idea–development–deployment cycles that we used to think of as spreading over months, the timeline now is days, hours, even minutes. In 2009 Patrick Debois coined the term “Devops” to cover this new framework of software development. He and his colleague Andrew Shafer understood the need to combine the skills of software development and system administration, long considered disjoint. They also realized the critical role of deployment, often considered a secondary matter as compared with development. Devops poses endless challenges to experts in software engineering: Which of the traditional lessons gained over five decades of the discipline’s development stand, and which ones need to be replaced in the dizzying world of immediate deployment? An example of a question that takes on a full new life is quality assurance: The stakes are quite different if you have a V&V (validation and verification) phase of a few weeks to prepare for the next release, as in the old world (“old” in IT means, like, 15 years ago), and in the brave new world of deploying this morning’s change in the afternoon for the millions of users of your Web-based offering. DEVOPS 2018 (, held during March 5–6, 2018, was one of the first scientific events devoted to the software engineering issues raised by the new development models. The event was kicked off by an outstanding introduction to the field by Professor Elisabetta Di Nitto from Politecnico di Milano, and featured an invited talk by Professor Benoît Combemale from Toulouse to start the education panel. The participants came from diverse organizations, with a strong representation of industry along with academia. This volume gathers their papers, considerably enhanced thanks to the feedback received during the conference. This post-conference proceedings format also enabled us to include precious material that usually does not transpire from conference-based publications: partial transcripts of the insightful discussions in panels. The contributions cover a wide range of problems arising from Devops and related approaches, current tools, rapid development–deployment processes, effects on team performance, analytics, trustworthiness, microservices and related topics, reflecting the thriving state of the discipline and, as is to be expected in such a fledgling field, raising new questions when addressing known ones. A significant number of contributions cover education, as a number of the authors have to teach the new development paradigms to both university students and developers in companies. These contributions provide a fascinating insight into the state of the art in this new discipline. DEVOPS 2018 was one of the first scientific events held at the new LASER center in Villebrumier near Montauban and Toulouse, France. Inspired by the prestigious precedent of the Dagstuhl center in Germany (the model for all such ventures), but adding its own sunny touch of accent du sud-ouest (the songful tones of Southwest France), the LASER center (, site of the foundation that also organizes the LASER summer school in Elba, Italy) provides a venue for high-tech events of a few days to a week in a beautiful setup in the midst of a region rich with historical, cultural, and culinary attractions. The proceedings enjoy publication in a subseries of the Springer Lecture Notes in Computer Science series. Several events are planned for 2018–2019, including the next DEVOPS: Participants agreed that the workshop merited another edition, which will take place May 6–8, 2019, again at the Villebrumier center, by invitation (write to any of us if you would like to be invited). We hope that you will benefit from the results of DEVOPS 2018 as presented in the following pages and, who knows, that they might even spur you into participating in DEVOPS 2019.
Conference Paper
Refactoring is the process of changing a program in such a way that its design improves with respect to some specific goal, while its observable behaviour remains the same. Trivially, the latter includes the preservation of the program’s well-formedness, since arguably, a malformed program has no behaviour to be preserved. While the problem of refactoring is easily stated, casting it into fully functional refactoring tools for contemporary programming languages is surprisingly hard. In fact, most refactoring tools in use today cannot even guarantee to preserve well-formedness, let alone behaviour, not even for some of the most basic refactorings (such as Rename or Pull Up Member). In Part I of this briefing, I will report on some of the most promising techniques for implementing correct refactoring tools. Common to these techniques is that they give up the notion of behaviour preservation in favour of the more basic (and less demanding) notion of invariant preservation: to be correct, a refactoring tool must not accidentally change the binding of names, the overriding of methods, the synchronization on a monitor, etc. Preservation of well-formedness is then the preservation of invariants relating to well-formedness. With invariant preservation tackled, it is straightforward to transfer refactoring technology to other programming tools, including tools for automatic repair and completion of programs, mutation testing, and program generation. How these are related to refactoring tools, and how they can be developed in concert, I will propose in Part II of this briefing.
Conference Paper
Service locator is a popular design pattern that facilitates building modular and reconfigurable systems. We investigate how existing monolithic systems can be automatically refactored using this pattern into more modular architectures, and measure the benefits of doing so. We present an Eclipse plugin we have developed for this purpose.
Full-text available
The Choices operating system architecture [3, 4, 15] uses class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry Parallel Computing Laboratory at the University of Illinois to study the performance of algorithms, mechanisms, and policies for parallel systems. This paper describes the architectural design and class hierarchy of the Choices memory and secondary storage management system. The mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-offs between response times and storage capacities. In Choices, the notion of a memory hierarchy is represented by layers in which abstract classes define interfaces between and internal to the layers. Concrete subclasses implement new algorithms or data structures or specializations of existing ones. This paper describes the motivation for an object-oriented, class-hierarchical approach to virtual memory system design, and describes the overall architecture of such an approach, as it has been applied to the Choices system. Special attention is paid to the advantages in both design and implementation that have resulted from using object-oriented techniques.
Full-text available
The object-oriented approach is being used in several areas of computing, including programming, databases, computer-aided design, and office information systems. The paper is a tutorial introduction to object-oriented databases, which is a new application in this field, although object-oriented programming has been under development since the late 1960s. The paper describes the background to object-oriented databases and outlines the rationale for this approach. It concludes with a possible application, in this case a cartographic database, explaining why the object-oriented approach is more appropriate to this application than traditional database approaches, such as the relational approach.
The Source Code Control System (SCCS) is a software tool designed to help programming projects control changes to source code. It provides facilities for storing, updating, and retrieving all versions of modules, for controlling updating privileges for identifying load modules by version number, and for recording who made each software change, when and where it was made, and why. This paper discusses the SCCS approach to source code control, shows how it is used and explains how it is implemented.
Analogical inference is a process which proposes new conjectures about a target analogue based on facts known about a source analogue. This article formally defines this process and discusses how to efficiently guide it to the conjectures which can help to solve a given problem. The intuition that a useful analogy provides the information needed to solve the problem, and no more, leads to two sets of heuristics: one set based on abstractions—abstract relations which encode solutions to previous problems—and the second, based on a preference for the most general set of new conjectures. Experimental data, collected using a program which embodies this theory of analogy, confirms the effectiveness of these ideas.
Conference Paper
We are currently designing a class modification methodology for GemStone. We describe the current status of the design. We choose from two basic approaches and then introduce those aspects of GemStone necessary for an understanding of the paper. After defining a set of invariants for GemStone databases, we discuss specific class modification operations in terms of maintaining these invariants. We next discuss several issues that impact class modification. These issues include concurrency and authorization, and lead to difficult choices.
Conference Paper
Object-oriented programming is well-suited to such data-intensive application domains as CAD/CAM, AI, and OIS (office information systems) with multimedia documents. At MCC we have built a prototype object-oriented database system, called ORION. It adds persistence and sharability to objects created and manipulated in applications implemented in an object-oriented programming environment. One of the important requirements of these applications is schema evolution, that is, the ability to dynamically make a wide variety of changes to the database schema. In this paper, following a brief review of the object-oriented data model that we support in ORION, we establish a framework for supporting schema evolution, define the semantics of schema evolution, and discuss its implementation.