Conference PaperPDF Available

A Study of the Documentation Essential to Software Maintenance

Authors:

Abstract

Software engineering has been striving for years to improve the practice of software development and maintenance. Documentation has long been prominent on the list of recommended practices to improve development and help maintenance. Recently however, agile methods started to shake this view, arguing that the goal of the game is to produce software and that documentation is only useful as long as it helps to reach this goal.On the other hand, in the re-engineering field, people wish they could re-document useful legacy software so that they may continue maintain them or migrate them to new platform.In these two case, a crucial question arises: "How much documentation is enough?" In this article, we present the results of a survey of software maintainers to try to establish what documentation artifacts are the most useful to them.
A Study of the Documentation Essential to Software
Maintenance
Sergio Cozzetti B. de
Souza
Universidade Cat´olica de
Bras´ılia
Brasilia, DF, BRAZIL
Nicolas Anquetil
Universidade Cat´olica de
Bras´ılia
Brasilia, DF, BRAZIL
anquetil@ucb.br
K´athia M. de Oliveira
Universidade Cat´olica de
Bras´ılia
Brasilia, DF, BRAZIL
kathia@ucb.br
ABSTRACT
Software engineering has been striving for years to improve
the practice of software development and maintenance. Doc-
umentation has long been prominent on the list of recom-
mended practices to improve development and help main-
tenance. Recently however, agile methods started to shake
this view, arguing that the goal of the game is to produce
software and that documentation is only useful as long as it
helps to reach this goal.
On the other hand, in the re-engineering field, people wish
they could re-document useful legacy software so that they
may continue maintain them or migrate them to new plat-
form.
In these two case, a crucial question arises: “How much
documentation is enough?” In this article, we present the
results of a survey of software maintainers to try to establish
what documentation artifacts are the most useful to them.
Categories and Subject Descriptors
D.2.0 [Software Engineering]: General; D.2.7 [Software
Engineering]: Distribution, Maintenance, and Enhance-
ment—Documentation
General Terms
software system documentation, empirical study, software
maintenance, program understanding
1. INTRODUCTION
Among all the recommended practices in software engi-
neering, software documentation has a special place. It is
one of the oldest recommended practices and yet has been,
and continue to be, renowned for its absence (e.g. [4]).
There is no end to the stories of software systems (partic-
ularly legacy software) lacking documentation or with out-
dated documentation. For years, the importance of docu-
mentation has been stressed by educators, processes, quality
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers orto redistribute to lists, requires prior specific
permission and/or a fee.
SIGDOC’05, September 21–23, 2005, Coventry, United Kingdom.
Copyright 2005 ACM 1-59593-175-9/05/0009 ...$5.00.
models, etc. and despite of this we are still discussing why
it is not generally created and maintained (e.g. [11]).
The topic gained renewed interest with two recent trends:
Agile methods question the importance of documenta-
tion as a development aid;
The growing gap between “traditional” (e.g. COBOL)
and up-to-date technologies (e.g. OO or web-oriented)
increased the pressure to re-document legacy software.
Both issues raise a similar question: What documentation
would be most useful to software maintenance?
If they propose a renewed development paradigm, agile
methods do not bring significant changes for software main-
tenance. They do claim that permanent re-factoring turns
maintenance into a normal state of the methods. However,
they do not explain how such methods would work over ex-
tended periods of time, when a development team is sure
to disperse with the knowledge it has of the implementation
details. Documentation is still a highly relevant artifact of
software maintenance.
Legacy software re-documentation tries to remedy the de-
ficiencies of the past in terms of up-to-date documentation.
However, it is a costly activity, difficult to justify to users
because it does not bring any visible change for them (at
least in the short term).
In this paper we present a survey of software maintainers
trying to establish the importance of various documenta-
tion artifacts for maintenance. The paper is divided as fol-
lows: In Section 2, we review some basic facts about software
maintenance and its needs; in Section 3, we summarize the
relevant literature on software documentation; in Section 4,
we present the survey we conducted; and in Section 5, we
comment the result of the survey.
2. SOFTWARE MAINTENANCE
Maintenance is traditionally defined as any modification
made on a system after its delivery. Studies show that soft-
ware maintenance is, by far, the predominant activity in
software engineering (90% of the total cost of a typical soft-
ware [17, 21]). It is needed to keep software systems up-
to-date and useful: Any software system reflects the world
within which it operates, when this world changes, the soft-
ware needs to change accordingly. Lehman’s first law of
software evolution (law of continuing change, [13]) is that
“a program that is used undergoes continual change or be-
comes progressively less useful”. Maintenance is mandatory,
one simply cannot ignore new laws or new functionality in-
troduced by a concurrent. Programs must also be adapted
to new computers (with better performances) or new oper-
ational systems.
One of the main problems that affect software mainte-
nance is the lack of up-to-date documentation. Because of
this, maintainers must often work from the source code to
the exclusion of any other source of information. For ex-
ample, a study (see [15, p.475], [17, p.35]) reports that from
40% to 60% of the maintenance activity is spent on studying
the software to understand it and how the planned modifi-
cation may be implemented.
To lessen this problem, organizations try to re-document
their software systems, but this is a costly operation that
would benefit from a clear indication of the software docu-
ments to focus on.
3. SOFTWARE DOCUMENTATION
3.1 Documentation needs
A software document may be described as any artifact in-
tended to communicate information on the software system
[7]. This communication is aimed at human readers.
According to Ambler [2], software documentation responds
to three necessities: (i) contractual; (ii) support a software
development project by allowing team members to gradually
conceive the solution to be implemented, and (iii) allow a
software development team to communicate implementation
details across time to the maintenance team.
Documentation typically suffers from the following prob-
lems: nonexistent or of poor quality [19, 5, 10]; out-dated
[25, 24, 18, 22, 7]; over abundant and without a definite
objective [7, 14]; difficult to access (for example when the
documents are scattered on various computers or in differ-
ent formats: text, diagrams) [24]; lack of interest from the
programmers [17, p.45], [9, 23]; and, difficult to standardize,
due, for example, to project specificities [16, 9].
Recently, agile methods proposed an approach to software
development that mostly eliminated the necessity of docu-
mentation as an helper for software development. Using in-
formal communication (between developers and with user),
code standardization, or collectivization of the code, agile
methods propose to realize the communication necessary to
a software development project on an informal level. This
ultimately can greatly reduce the need for documentation
in the development of software. However, agile methods do
not remove the need for documentation as a communication
tool through time, that allow developers to communicate
important information on a system to future maintainers.
3.2 Documentation for Maintenance
Better defining what document(s) software maintainers
need has already been considered in other studies.
Tilley in [25, 24]) stresses the importance of a doc-
ument describing the hierarchical architecture of the
system.
Cioch et al. [6] differentiate four stages of experience
(from new comer, the first day of work; to expert, af-
ter some years of work on a system). For each stage,
they propose different documents: new-comers need a
short general view of the system; apprentices need the
system architecture; interns need task oriented doc-
uments such as requirement description, process de-
scription, examples, step by step instructions; finally,
experts need low level documentation as well as re-
quirement description, and design specification.
Rajlich [20] proposes a re-documentation tool that al-
lows to gather the following information: notes on the
application domain, dependencies among classes, de-
tailed description of a class’ methods.
Ambler [2] recommends documenting the design deci-
sions, and a general view of the design: requirements,
business rules, architecture, etc.
In a workshop organized by Thomas and Tilley at SIG-
Doc 2001, they state that “no one really knows what
sort of documentation is truly useful to software engi-
neers to aid system understanding” [22].
Forward and Lethbridge [7], in their survey of man-
agers and developers, found the specification docu-
ments to be the most consulted whereas quality and
low level documents are the least consulted.
Grubb and Takang [8, pp.103-106], identify some in-
formation needs of maintainers according to their ac-
tivities. Few specific documents are listed. Managers
needs decision support information such as the size of
the system, cost of the modification. Analysts need
to understand the application domain, the require-
ments and have a global view of the system. Designers
need architectural understanding (functional compo-
nents and how they interact) and detailed design in-
formation (algorithms, data structures). Finally pro-
grammers need a detailed understanding of the source
code as well as an higher level view (similar to the
architectural view).
Anquetil et al. [3], present a re-documentation tool
to partially automate a re-documentation process fo-
cusing on the following information: high level view
(with description of requirements), data model, cross
references between functionalities, functions, and data,
business rules, subsystems description and interaction,
and comments.
Finally, according to Teles [1, p.212], the documents
that should be generated at the end of an XP project
are: histories, tests, data model, class model, business
process description, user manual, and project minutes.
One conclusion that we may draw from this short review is
that system architecture is an important document for soft-
ware maintenance (cited by Tilley, Cioch, Ambler, Grubb
and Takang, and Anquetil et al.). We will see in the results
of our survey do not concur with this proposal.
4. SURVEYS
To establish the relative importance of various documen-
tation artifacts to help software maintainers understand a
system, we realized two surveys: first, we asked software
maintainers to rate the importance of the documentation
artifacts in helping them understand a system; second, we
asked software maintainers to indicate what documentation
artifact(s) they had used to gain understanding on a soft-
ware they just finished maintaining.
Both surveys give slightly different results. The first one
is based on subjective opinion of the maintainers, whereas
the second, although objective, is limited to the scope of the
specific maintenances performed on specific systems.
4.1 Survey on opinion of maintainers
In this first survey, we distributed a questionnaire to soft-
ware maintainers asking them to rate the importance of var-
ious documentation artifacts in helping understanding a sys-
tem maintained. The questionnaire was available on paper
and on the Internet1. The selection of the subjects was
done by convenience (one of the author is a working soft-
ware engineer and used his list of contacts) on a voluntary
and anonymous basis. Subjects are from various parts of
Brazil. Seventy-six (76) software maintainers answered the
questionnaire over a period of 6 months (July to December
2004). The web page remains available and answers continue
to come in (at a low rate however).
The questionnaire is composed of two parts. The first
part is a characterization of the subject: Position (manager,
analyst, or programmer); Experience in maintenance (1-3
years, 3-5 years, 5-10 years, >10 years); Experience in num-
ber of systems already maintained (1 to 5, 6 to 10, 11 to 20,
more than 20); and, Known approaches (structured anal-
ysis2, object-orientation). The subject could also indicate
his-her email (optional) in case s-he wished to receive the
results of the survey.
The second part asked the subject to answer the follow-
ing question for a list of document artifacts: “Based on
your practical experience, indicate what importance each
documentation artifact have, in the activity of understand-
ing a software to be maintained”. Four levels of impor-
tance where proposed: 1=“no importance”, 2=“little im-
portance”, 3=“important”, and 4=“very important”. The
subjects could also indicate that they did not know the ar-
tifact.
The documentation artifacts were divided by activities of
a typical development process, discriminating for each activ-
ity artifacts specific to the structured analysis (e.g. context
diagram), object-orientation (based on the RUP, e.g. use
case diagram), or both (e.g. Entity-Relationship Model).
The complete list of 34 artifacts, as they were presented in
the questionnaire is the following:
Requirement elicitation:
Structured analysis: (1) requirements list, (2) con-
text diagram, (3) requirement description.
Object-oriented: (4) vision document, (5) use case
diagram.
Structured and OO: (6) conceptual data model, (7)
glossary.
1http://www.clb.triall.com.br/sergio.cozzetti/ (in Por-
tuguese)
2It is our experience that, prior to the USDP and the UML,
different countries had different “universal” development
method. In Brazil, all software engineers usually know a
technique called “structured analysis”, in France it would
be the “Merise method”, etc.
Analysis:
Structured analysis: (8) functions derived from the
requirements, (9) hierarchical function diagram,
(10) data flow diagram.
Object-oriented: (11) use cases specifications, (12)
class diagram, (13) activity diagram, (14) sequence
diagram, (15) state diagram.
Structured and OO: (16) non functional prototype,
(17) logical data diagram (MER), (18) data dic-
tionary.
Design:
Structured analysis: (19) architectural model, (20)
general transaction diagram, (21) components
specification.
Object-oriented: (22) collaboration diagram, (23)
components diagram, (24) distribution diagram.
Structured and OO: (25) physical data model, (26)
functional prototype.
Coding:
Structured and OO: (27) comments in source code,
(28) source code
Test:
Structured and OO: (29) unitary test plan, (30) sys-
tem test plan, (31) acceptance test plan.
Transition:
Structured and OO: (32) data migration plan, (33)
transition plan, (34) user manual.
The results of this survey are presented and commented
in Section 5.
4.2 Survey on the use of documentation
In this second survey, we distributed a questionnaire to
software maintainers asking them to indicate, at the end of
a maintenance project, what documentation artifacts they
had used to understand the system maintained. The ques-
tionnaire was distributed on paper to allow the same person
to answer it for various maintenance projects (doing so on
the Internet would have required an identification scheme
to relate together the various answers of a given subject).
The questionnaire allowed one subject to answer up to eight
times (eight different maintenance projects). The selection
of the subjects was done by convenience on a voluntary and
anonymous basis. No effort was applied to select the same
subjects as in the first survey, some maintainers actually an-
swered both surveys, other only the first or only the second.
Subjects are from various parts of Brazil. We obtained 237
answers (maintenance projects) from 52 subjects.
Again, the questionnaire has a first, characterization part,
and a second part with the documentation artifacts.
The first part included characterization of the sub ject and
of the system maintained. For the subject characteriza-
tion, the questions were: Email to send the result of the
survey (optional); Experience in maintenance (1-3 years, 3-
5 years, 5-10 years, >10 years); Experience with the sys-
tem maintained (1-3 months, 3-6 months, 6-12 months, >12
months). For the system characterization, we used the work
of Kitchenham et al. [12]: Application domain (e.g. bank,
teaching, ...); Size of the maintenance team (1 person, one
team, several teams); System’s age (<1 year, 1-2 years, 3-
5 years, >5 years); System’s maturity (recently developed -
mostly corrective maintenance, growing - corrective and evo-
lutive maintenance, stabilized - few corrective and mostly
evolutive maintenance, phasing out - new product being de-
veloped); Documentation quality (complete? - yes/no, up-
to-date? - yes/no, readable? - yes/no); Approach (struc-
tured analysis, object-orientation); Maintenance type real-
ized during the survey (corrective, evolutive); Is the system
maintained using a defined maintenance process? (yes/no)
The second part of the questionnaire listed the same doc-
umentation artifacts as in the first survey, asking whether
the artifact existed (for the system maintained) and if so,
whether the maintainer used it to understand the system
during the maintenance project. One difference with the
first survey is that we designed two questionnaires, one for
the structured analysis (24 documentation artifacts), and
one for object-orientation (25 documentation artifacts).
The documentation artifact were divided by activities of
a typical development process as already explained in the
preceding section.
5. SURVEYS RESULTS
5.1 Survey on opinion of maintainers
As explained in the preceding section, the first survey
tried to establish which documents, software maintainers
judged more important to help understand a system. The
subjects of the first survey proved to be heterogeneous. We
have no data on the characterization of the total population
of software maintainers (neither at large nor in Brazil) to
compare it with the observed population.
Position: total=76, manager=20 (26%), analyst=48 (63%),
programmer=5 (7%), and 3 consultants (4%). This
shows a population biased toward the “higher levels”
of the profession. However, one must also consider
that many programmers are actually called analyst /
programmer, or sometimes junior analyst, which may
distort the characterization.
Known approach: total=76, structured=22 (29%), OO=6
(8%), both=48 (63%). We were surprised by the high
quantity of software engineers who declared knowing
object-orientation (63+8=71%), but it may only re-
flects the popularity of the RUP and UML rather than
actual OO programming.
Experience (years): total=76, 1-3 years=17 (22%), 3-5
years=19 (25%), 5-10 years=17 (22%), >10 years=23
(31%). The experience is evenly distributed.
Experience (number of systems maintained): total=
76, 1-5=26 (34%), 6-10=15 (20%), 11-20=15 (20%),
>20=20 (26%). Again the experience in number of
system maintained is evenly distributed.
Table 5.1 give the result of the first survey for structured
analysis and object-orientation. The results are ranked in
decreasing order of importance (percentage of “very impor-
tant” among all those who know the artifact). For each
approach, the number of subject considered includes only
those that answered for this approach (70 for structured
and 54 for object-oriented).
The results seem to indicate that, in the opinion of soft-
ware maintainers:
In general, the two approaches give coherent results.
the two most important artifacts are the source code
and the comments it contains. This result was ex-
pected.
Data models ranked high in both approaches with a
preference for the logical data model (structured anal-
ysis: 3rd, 4th, and 9th; object-orientation: 3rd, 4th,
5th, and 12th).
Non technical, user-point-of-view, artifacts such as the
requirements (or use cases), acceptance test plan or
user manual fared well too (structured analysis: 5th,
6th, 7th, and 11th; object-orientation: 6th, 7th, 8th,
and 11th).
The various test plans also received relatively good
marks although less so in the object-orientation (struc-
tured analysis: 6th, 10th and 13th; object-orientation:
8th, 13th, and 15th).
A big surprise to us was that general views of the sys-
tem (e.g. architectural model, vision document) are
not highly praised (structured analysis: 18th, 19th,
and 23rd; object-orientation: 18th, 22nd, 24th).
Overall, and merging the two approaches on the opinion
of software maintainers, the most important documentation
artifacts to help understanding a system prior to maintain-
ing it seem to be: source code and comments, a data model
(whether it is logical, physical or class diagram), and infor-
mation about the requirements (requirement list/description,
use case diagram/description, acceptance tests).
5.2 Survey on the use of documentation
The second survey studied the actual use of documenta-
tion artifacts to understand a system under maintenance.
One goal of this second survey was to get more objective
results not merely based on maintainers’ opinion.
As already noted, we got answers from 54 maintainers per-
forming 237 maintenances, an average of 4.5 maintenances
by maintainer. The maximum was 8 maintenances for a
given maintainer (maximum allowed by the questionnaire)
and the minimum 1 maintenance for a given maintainer.
For the characterization of the subject maintainers, we
have the following results:
Maintenance experience: total=52, 1-3 years=17 (33%),
3-5 years=8 (15%), 5-10 years=11 (21%), >10 years=
16 (31%). The experience in maintenance is evenly
distributed.
System experience: total=54, 0-3 months=14 (26%), 3-6
months=8 (15%), 6-12 months=11 (20%), >12 months
=21 (39%). Again the experience in the system main-
tained is evenly distributed.
Table 1: Importance of documentation artifacts for the structured analysis paradigm (according to 70 software
maintainers) and for object-orientation (according to 54 software maintainers). The second to last column
give the percentage of “very important” notes among maintainers that knew the documentation artifact (i.e.
total minus last column).
Structured analysis Important % very Does not
rank artifact no little yes very important known
1 Source code 0 0 5 63 92.6% 2
2 Comments 0 4 11 54 78.2% 1
3 Logical data model (MER) 0 3 14 50 74.6% 3
4 Physical data model 0 1 24 42 62.6% 3
5 Requirement description 3 7 28 41 59.4% 1
6 Acceptance test plan 6 8 16 34 53.1% 6
7 Requirement list 6 9 17 36 52.9% 2
8 Data dictionary 1 10 24 31 46.9% 4
9 Conceptual data model 4 7 25 29 44.6% 5
10 System test plan 4 10 23 29 43.9% 4
11 User manual 6 8 23 29 43.9% 4
12 Implantation plan 5 7 27 28 41.7% 3
13 Unitary test plan 5 12 22 25 39.0% 6
14 Data migration plan 6 10 25 23 35.9% 6
15 Data flow diagram 5 11 29 23 33.8% 2
16 Functional prototype 7 12 24 21 32.8% 6
17 Component specification 4 10 32 19 29.2% 5
18 Architectural model 5 15 26 18 28.1% 6
19 Context diagram 4 25 21 17 25.3% 3
20 Hierarchical function diagram 5 15 30 15 23.0% 5
21 Glossary 4 21 26 15 22.7% 4
22 Functions derived from requirements 4 17 21 12 22.2% 16
23 General transaction diagram 5 14 25 11 20.0% 15
24 Non functional prototype 8 13 29 12 19.3% 8
object-orientation Important % very Does not
rank artifacts no little yes very important known
1 Source code 0 0 3 50 94.3% 1
2 Comments 0 4 9 41 75.9% 0
3 Logical data model (MER) 0 3 12 38 71.6% 1
4 Class diagram 0 2 18 33 62.6% 1
5 Physical data model 0 2 19 32 60.3% 1
6 Use case diagram 0 5 17 31 58.4% 1
7 Use case specification 1 3 21 26 50.9% 3
8 Acceptance test plan 2 8 15 25 50.0% 4
9 Data dictionary 1 8 19 24 46.1% 2
10 Implantation plan 2 9 18 23 44.2% 2
11 User manual 3 8 19 23 43.3% 1
12 Conceptual data model 2 5 22 22 43.1% 3
13 Unitary test plan 1 10 19 19 38.7% 5
14 Data migration plan 3 9 18 19 38.7% 5
15 System test plan 1 9 22 20 38.4% 2
16 Sequence diagram 0 5 30 18 33.9% 1
17 Activity diagram 3 7 27 15 28.8% 2
18 Vision document 2 10 23 13 27.0% 6
19 Functional prototype 6 13 19 14 26.9% 2
20 Non functional prototype 6 10 22 11 22.4% 5
21 Glossary 3 13 25 11 21.1% 2
22 Component diagram 5 9 28 8 16.0% 4
23 State diagram 6 14 23 7 14.0% 4
24 Distribution diagram 5 15 26 3 6.1% 5
25 Collaboration diagram 6 14 28 1 2.0% 5
The characterization of the subject systems may be di-
vided in responses per questionnaire (or per maintainer) and
responses per maintenance. In the first group, we have a
maximum of 52 responses:
Team size: total=51, one person=5 (10%), one team=40
(78%), several teams=6 (12%). Not all respondants
answered this question. A large majority of those who
responded indicate a medium sized team (one team).
System age: total=50, <1 year=3 (6%), 1-2 years=12 (24
%), 3-5 years=24 (48%), >5 years=11 (22%). The ma-
jority of systems are relatively old (3 years and more)
and should qualified as legacy systems.
System maturity: total=51, recently developed=3 (6%),
growing=32 (64%), stabilized=14 (27%), phase out=2
(4%). Most of the systems are still growing, which in-
dicates systems still under regular use for which many
new requirements need to be implemented (and cor-
rected).
Documentation quality:
Completeness: total=51, complete=14 (27%), not
complete=37 (73%).
Actuality: total=51, up-to-date=17 (33%), out-of-
date=34 (67%).
Readability: total=51, readable=41 (80%), unread-
able=10 (20%).
The quality of the documentation is as could be ex-
pected: generally incomplete and out-of-date.
Defined process: total=50, no process=15 (30%), with
process=35 (70%). There are more maintenances per-
formed with a defined process than we expected, but
4 respondents did not answer this question.
For per maintenance characterization of the systems, we
have:
Used approach: total=237, structured analysis=142 (60%),
object-orientation=95 (40%). These numbers corre-
spond better than the first survey to what we expected
with a larger number of structured systems.
Maintenance type: total=237, corrective=64 (27%), evo-
lutive=125 (53%), both=48 (20%). Since one main-
tainer could record various maintenance in the same
questionnaire, there are 48 answers for which we don’t
know if they were corrective or evolutive. Note that a
higher number of evolutive maintenance was expected,
since literature (e.g. [17]) indicates an average of only
20% of corrective maintenance.
Table 5.2 give the results of the second survey for both
approaches:
The two approaches seem to show more difference be-
tween themselves than in the first survey.
Again, there is no surprise with the presence of source
code and comments in the most used artifacts (both
approaches: 1st and 3rd). However it is interesting to
see that, although still the most used, they are much
less used in the object-orientation (61% and 66%) than
in the structured analysis (71% and 95%).
As for the first survey, the test plans are well used,
mainly for the structured analysis paradigm (struc-
tured analysis: 2nd, 5th, and 10th; object-orientation:
6th, 10th, and 16th).
The data models lost the importance they had in the
first survey (structured analysis: 6th, 8th, and 20th;
object-orientation: 4th, 9th, 11th, and 22nd).
The user-point-of-view artifacts are more scattered this
time with a preference for the detailed specification of
the requirements (or use cases) (structured analysis:
4th, 10th, 14th, and 17th; object-orientation: 5th, 8th,
16th, and 20th).
Finally, the little necessity of a general view of the
system seems partly confirmed (structured analysis:
13th, 22nd, and 23rd; ob ject-orientation: 15th, 18th,
and 25th)
A summary of both approaches is more difficult in this
second survey since they differ more. We will neverthe-
less outline the presence among the most used artifacts of:
source code and comments, unitary test plan, requirement
(or use case) description, data model (logical or physical).
Among the differences, we may cite the importance of the
system test plan and components specification only for the
structured analysis paradigm, and the functional and non
functional prototypes only for object-orientation.
Overall, the two surveys confirm the overwhelming impor-
tance of the source code and comments as documentation ar-
tifacts to help understand a system to maintain. Data mod-
els (mainly logical and physical) and a description of the re-
quirements (or use case) are also two items that were highly
rated and used. To our surprise, documentation artifacts
offering a general view of a system (prominently the archi-
tectural model) were neither well rated nor much used. This
goes against common belief or other studies as presented in
section 3. We have no definite answer yet on why our sur-
vey differs from others (e.g. Forward and Lethbridge). A
possible explanation would be the difference in population
characterization.
6. CONCLUSION
Software documentation has long been a much discussed
topic in software engineering. Although it has always been
heralded as an important aid to software development and
maintenance, it is notoriously absent or out-dated in many
legacy software.
Recently, agile methods have shaken a bit the traditional
view of software documentation, proposing a development
model that rely more on informal communication than on
documentation. We explained, however, that this model
does not suit software maintenance which still has great need
for documentation.
An old question arose then to identify the documentation
artifacts most important to help software maintainers. We
conducted two surveys among software maintainers to try
to settle this issue. In the first survey, the maintainers were
asked what artifacts they though important or not, in the
second, they were asked what artifacts they had actually
used. In both surveys the subject population was hetero-
geneous and, although we have no data to prove it, seems
representative of the broad community of software maintain-
ers.
Table 2: Actual use of documentation artifacts
for 142 maintenances with the structured anal-
ysis paradigm and 95 maintenances with object-
orientation. The documentation artifacts are sorted
in decreasing order of percentage of use.
structured analysis Exist Used
artifact # %
1 Source code 142 135 95.1%
2 Unitary test plan 96 72 75.0%
3 Comments 140 100 71.4%
4 Requirement description 111 66 59.5%
5 System test plan 59 35 59.3%
6 Logical data model (MER) 128 72 56.3%
7 Component specification 67 35 52.2%
8 Physical data model 127 61 48.0%
9 Implantation plan 82 37 45.1%
10 Acceptance test plan 77 34 44.2%
11 Data dictionary 85 37 43.5%
12 Functions derived from requir. 29 11 37.9%
13 General transaction diagram 32 12 37.5%
14 User manual 131 48 36.6%
15 Hierarchical functions diagram 39 14 35.9%
16 Data flow diagram 51 17 33.3%
17 Requirement list 78 25 32.1%
18 Functional prototype 51 13 25.5%
19 Data migration plan 49 12 24.5%
20 Conceptual data model 82 19 23.2%
21 Non functional prototype 56 10 17.9%
22 Architectural model 72 8 11.1%
23 Context diagram 68 7 10.3%
24 Glossary 50 5 10.0%
object-orientation Exist Used
artifact # %
1 Source code 95 63 66.3%
2 Non functional prototype 43 27 62.8%
3 Comments 81 50 61.7%
4 Logical data model (MER) 85 45 52.9%
5 Use case specification 87 43 49.4%
6 Unitary test plan 72 33 45.8%
7 Functional prototype 59 27 45.8%
8 Use case diagram 87 38 43.7%
9 Physical data model 83 35 42.2%
10 System test plan 67 26 38.8%
11 Class diagram 80 31 35.4%
12 Implantation plan 48 17 33.3%
13 Activity diagram 57 19 28.9%
14 Data dictionary 76 22 28.6%
15 Component diagram 28 8 28.4%
16 Acceptance test plan 67 19 27.3%
17 Sequence diagram 77 21 27.1%
18 Vision document 59 16 26.7%
19 Glossary 60 16 21.9%
20 User manual 64 14 21.6%
21 State diagram 37 8 17.1%
22 Conceptual data model 41 7 11.1%
23 Data migration plan 27 3 9.5%
24 Collaboration diagram 50 4 8.0%
25 Distribution diagram 18 1 5.6%
The surveys confirmed that source code and comments
are the most important artifact to understand a system to
be maintained. Data model and requirement description
were other important artifacts. Surprisingly, and contrary
to what we found in the literature, architectural models and
other general view of the system are not very important.
This could simply indicate that such documentation arti-
facts are used once to have a global understanding of the
system and never consulted again after. This explanation is
based on the difference between quantity and quality: the
architectural model is little used, but could be nevertheless
important.
Further research is needed to establish whether the use
of source code and comments depends on the existence of
other documentation artifacts, whether the experience in
maintenance or in the system maintained has an impact on
the documentation used, etc.
7. REFERENCES
[1] V. M. aes Teles. Extreme Programming. Novatec
Editora Ltda, Rua cons. Moreira de Barros, 1084,
conj. 01, S˜ao Paulo, SP, 02018-012, Brazil, 2004.
ISBN: 85-7522-047-0.
[2] S. W. Ambler. Agile documentation. available on the
internet at: http://www.agilemodeling.com/essays/
agileDocumentation.htm, 2001-2005. Last accessed on
May 27, 2005.
[3] N. Anquetil, K. M. Oliveira, A. G. dos Santos, P. C.
da Silva jr., L. C. de Araujo jr., and S. D. Vieira. A
tool to automate re-documentation. In Forum of the
CAISE, Conference on Advanced Information Systems
Engineering (CAiSE’05), jun. 15 2005. accepted for
publication.
[4] M. J. ao Sousa. A survey on the software maintenance
process. In International Conference on Software
Maintenance, ICSM’98, pages 265–74. IEEE, IEEE
Comp. Soc. Press, Mar. 1998.
[5] L. C. Briand. Software documentation: How much is
enough. In Proceedings of the Seventh European
Conference on Software Maintenance and
Reengineering (CSMR’03), pages 13–17. IEEE, IEEE
Comp. Soc. Press, March 26 - 28 2003.
[6] F. A. Cioch and M. Palazzolo. A documentation suite
for maintenance programmers. In Proceedings of the
1996 International Conference on Software
Maintenance (ICSM’96), pages 286–95. IEEE, IEEE
Comp. Soc. Press, Nov 1996.
[7] A. Forward and T. C. Lethbridge. The relevance of
software documentation, tools and technologies: a
survey. In DocEng ’02: Proceedings of the 2002 ACM
symposium on Document engineering, pages 26–33,
New York, NY, USA, 2002. ACM Press.
[8] P. Grubb and A. Takang. Software Maintenance:
Concepts and Practice. World Scientific Publishing
Co., Singapore, 2nd edition, 2003.
[9] HCI. What to put in software maintenance
documentation. Available on the Internet at:
http://www.hci.com.au/hcisite2/journal/
What to put in software maintenance documenta-
tion.htm, 2001–2002. Last accessed on May 27,
2005.
[10] S. Huang and S. Tilley. Towards a documentation
maturity model. In SIGDOC ’03: Proceedings of the
21st annual international conference on
Documentation, pages 93–99, New York, NY, USA,
2003. ACM Press.
[11] M. Kajko-Mattsson. The state of documentation
practice within corrective maintenance. In Proceedings
of the International Conference on Software
Maintenance (ICSM’01), pages 354–363. IEEE, IEEE
Comp. Soc. Press, Nov. 07-09 2001.
[12] B. A. Kitchenham, G. H. Travassos, A. von
Mayrhauser, F. Niessink, N. F. Schneidewind,
J. Singer, S. Takada, R. Vehvilainen, and H. Yang.
Towards an ontology of software maintenance. Journal
of Software Maintenance: Research and Practice,
11:365–389, 1999.
[13] M. Lehman. Programs, life cycles and the laws of
software evolution. Proceedings of the IEEE,
68(9):1060–76, sept. 1980.
[14] M. Lindvall, V. R. Basili, B. W. Boehm, P. Costa,
K. Dangle, F. Shull, R. Tesoriero, L. A. Williams, and
M. V. Zelkowitz. Empirical findings in agile methods.
In Proceedings of the Second XP Universe and First
Agile Universe Conference on Extreme Programming
and Agile Methods - XP/Agile Universe 2002, pages
197–207, London, UK, 2002. Springer-Verlag.
[15] S. L. Pfleeger. Software Engineering: Theory and
Practice. Prentice Hall, 2nd edition, 2001.
[16] V. Phoha. A standard for software documentation.
Computer, 30(10):97–98, Oct. 1997.
[17] T. M. Pigoski. Practical Software Maintenance: Best
Practices for Software Investment. John Wiley &
Sons, Inc., 1996.
[18] C. J. Poole, T. Murphy, J. W. Huisman, and
A. Higgins. Extreme maintenance. In International
Conference on Software Maintenance, ICSM’01, pages
301–10. IEEE, IEEE Comp. Soc. Press, Nov. 2001.
[19] R. S. Pressman. Software Engineering: A
Practitioner’s Approach. McGraw-Hill, 5th edition,
2001.
[20] V. Rajlich. Incremental redocumentation using the
web. IEEE Software, 17(5):102–6, Sep 2000.
[21] R. C. Seacord, D. plakosh, and G. A. Lewis.
Modernizing Legacy Systems – Software technologies,
engineering processes, and business practices.
Addison-Wesley, 2003.
[22] B. Thomas and S. Tilley. Documentation for software
engineers: what is needed to aid system
understanding? In SIGDOC ’01: Proceedings of the
19th annual international conference on Computer
documentation, pages 235–236, New York, NY, USA,
2001. ACM Press.
[23] S. Tilley and H. M¨uller. Info: a simple document
annotation facility. In SIGDOC ’91: Proceedings of the
9th annual international conference on Systems
documentation, pages 30–36, New York, NY, USA,
1991. ACM Press.
[24] S. R. Tilley. Documenting-in-the-large vs.
documenting-in-the-small. In Proceedings of
CASCON’93, pages 1083–90. IBM Centre for
Advanced Studies, Oct. 1993.
[25] S. R. Tilley, H. A. M¨ueller, and M. A. Orgun.
Documenting software systems with views. In
Proceedings of the 10th International Conference on
Systems Documentation, SIGDOC’92, pages 211–19.
ACM, ACM Press, Oct 1992.
... As recently shown by Xia et al. [47], 58% of developers' time was spent in comprehending code. In addition to the code itself, code comments are considered as the most important form of documentation for program comprehension [5]. Source code is constantly evolving, with developers regularly refactoring and integrating new functionality; however, code comments are often ignored when the code goes through changes [32,37,45], leading to the inconsistency between code and comments that not only brings about confusion in software development and maintenance [15] but can also result in bugs [37]. ...
... as an example, 5 ...
Preprint
When changing code, developers sometimes neglect updating the related comments, bringing inconsistent or outdated comments. These comments increase the cost of program understanding and greatly reduce software maintainability. Researchers have put forward some solutions, such as CUP and HEBCUP, which update comments efficiently for simple code changes (i.e. modifying of a single token), but not good enough for complex ones. In this paper, we propose an approach, named HatCUP (Hybrid Analysis and Attention based Comment UPdater), to provide a new mechanism for comment updating task. HatCUP pays attention to hybrid analysis and information. First, HatCUP considers the code structure change information and introduces a structure-guided attention mechanism combined with code change graph analysis and optimistic data flow dependency analysis. With a generally popular RNN-based encoder-decoder architecture, HatCUP takes the action of the code edits, the syntax, semantics and structure code changes, and old comments as inputs and generates a structural representation of the changes in the current code snippet. Furthermore, instead of directly generating new comments, HatCUP proposes a new edit or non-edit mechanism to mimic human editing behavior, by generating a sequence of edit actions and constructing a modified RNN model to integrate newly developed components. Evaluation on a popular dataset demonstrates that HatCUP outperforms the state-of-the-art deep learning-based approaches (CUP) by 53.8% for accuracy, 31.3% for recall and 14.3% for METEOR of the original metrics. Compared with the heuristic-based approach (HEBCUP), HatCUP also shows better overall performance.
... The observation that support team members in SD spend a major portion of their time reading source code and the comments they contain (De Souza, Anquetil and De Oliveira, 2005) are typical of the additional costs that are carried over from the development phase due to lack of detailed documentation. Documentation reduces knowledge loss when team members become unavailable as they move to another company or work on a new project and promotes software reuse. ...
Preprint
Full-text available
The concept of agility originated in manufacturing and was later adopted in the past two decades by the software development discipline. In this article we argue that in the process some important aspects of the agility theory have been either ignored or misinterpreted. The historical review in this investigation of the evolving paradigms and practices in software development and manufacturing suggests that the principles underlying agility if faithfully implemented could lead to significant improvement in the software development process
... Closer to our work in this space are the empirical studies on documentation. These studies use user surveys to analyze the importance and quality of software documentation [1,2,11,14,19,54], but focus on software maintenance in general, unlike our work on the specific task of troubleshooting. Most analogous to our work on TSG quality is the taxonomy of documentation quality proposed by Aghajani et al. [2], to which we compare our work in detail in Section 2.2. ...
Preprint
Incident management is a key aspect of operating large-scale cloud services. To aid with faster and efficient resolution of incidents, engineering teams document frequent troubleshooting steps in the form of Troubleshooting Guides (TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed, unstructured, and often incomplete, requiring developers to manually understand and execute necessary steps. This results in a plethora of issues such as on-call fatigue, reduced productivity, and human errors. In this work, we conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents and find that TSGs are widely used and help significantly reduce mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and propose a taxonomy of issues that highlights significant gaps in TSG quality. To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG -- a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis. Our evaluation of AutoTSG on 50 TSGs shows the effectiveness in both identifying TSG statements (accuracy 0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly, we survey ten Microsoft engineers and show the importance of TSG automation and the usefulness of AutoTSG.
... According to [12], outdated comments that are no longer aligned with the associated method entities result in confusions for the developers and hinder the process of changing code in future. As evident from the results, writing quality source code comments in program is regarded as a good practice [13]. Researchers introduced a quality metric called code/comment ratio to quantify the quality of the overall code [14], [15]. ...
Thesis
Code comments are a means to document the functionality of a particular block of code. The process of adding comments to the code has been around for several years and is widely used by developers to explain the purpose of the code for better understanding. Code comments provide additional information on code that might be of interest to the reader. Researchers have investigated the effect of code comments on software development tasks and have demonstrated the use of comments in a number of ways, including maintenance, reusability and bug detection. Comments are important in code because they allow for better understanding of the code, which improves maintainability. Given the importance of code comments, it becomes important for student developers to brush up their commenting skills. It's important for new programmers to have good commenting habits, so that they can clearly communicate their intent for future code maintainers. When students first learn to program, they tend to focus on getting the programming task done but reflecting on their program logic and documenting it will help them learn from their mistakes. The purpose of this study is to investigate how students comment on their source code and use machine learning methods to classify student code comments. The work involves initial manual classification of code comments and then building a machine learning model to automatically classify student code comments. This machine learning approach helped in predicting the comments and saves teachers time and energy; and help in giving automated feedback to students.
... Source code documentation is particularly important in software engineering since it facilitates software development, bug fixing and software maintenance [14,45,46]. However, writing documentation takes time and is usually postponed by the developers towards the end of the project, only if time permits. ...
Preprint
Full-text available
Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.
... Moreover, due to the small size of this corpus, automatic shellcode generation and summarization can be treated as low-resource tasks, which makes solving these two tasks more challenging. Automated approaches to code generation and summarization tasks are of great importance for software development and maintenance [13]. The generated high-quality comments and code can help to improve developer productivity and then improve software quality [14]. ...
Preprint
Full-text available
A shellcode is a small piece of code and it is executed to exploit a software vulnerability, which allows the target computer to execute arbitrary commands from the attacker through a code injection attack. Similar to the purpose of automated vulnerability generation techniques, the automated generation of shellcode can generate attack instructions, which can be used to detect vulnerabilities and implement defensive measures. While the automated summarization of shellcode can help users unfamiliar with shellcode and network information security understand the intent of shellcode attacks. In this study, we propose a novel approach DualSC to solve the automatic shellcode generation and summarization tasks. Specifically, we formalize automatic shellcode generation and summarization as dual tasks, use a shallow Transformer for model construction, and design a normalization method Adjust QKNorm to adapt these low-resource tasks (i.e., insufficient training data). Finally, to alleviate the out-of-vocabulary problem, we propose a rulebased repair component to improve the performance of automatic shellcode generation. In our empirical study, we select a highquality corpus Shellcode IA32 as our empirical subject. This corpus was gathered from two real-world projects based on the line-by-line granularity. We first compare DualSC with six state-of-the-art baselines from the code generation and code summarization domains in terms of four performance measures. The comparison results show the competitiveness of DualSC. Then, we verify the effectiveness of the component setting in DualSC. Finally, we conduct a human study to further verify the effectiveness of DualSC.
Article
Context Source code summarization is a crucial yet far from settled task for describing structured code snippets in natural language. High-quality code summaries could effectively facilitate program comprehension and software maintenance. A good code summary is supposed to have the following characteristics: complete information, correct meaning, and consistent description. In recent years, numerous approaches have been proposed for code summarization, but it is still very challenging for developers to automatically learn the complex semantics from the source code and generate complete, correct and consistent code summaries. Objective In this paper, we propose KGCodeSum, a novel keyword-guided abstractive code summarization approach that incorporates structural and contextual information. Methods To improve summaries’ quality, we leverage both the structural information embedded in code itself and the contextual information from related code snippets. Meanwhile, we make use of keywords to guide summaries’ generation to guarantee the code summaries contain key information. Finally, we propose a new dynamic vocabulary strategy which can effectively resolve the UNK problems in code summaries. Results Through our evaluation on the large-scale benchmark datasets with 2.1 million java method-comment pairs and 1.1 million C/C++ function-summary pairs, We have observed that our approach could generate better code summaries than existing state-of-the-art approaches in terms of completeness, correctness and consistency. In addition, we also find that incorporating the dynamic vocabulary strategy into our approach could significantly save time and space in the model training process. Conclusion Our KGCodeSum approach could effectively generate code summaries.
Article
Software comments sometimes are not promptly updated in sync when the associated code is changed. The inconsistency between code and comments may mislead the developers and result in future bugs. Thus, studies concerning code-comment synchronization have become highly important, which aims to automatically synchronize comments with code changes. Existing code-comment synchronization approaches mainly contain two types, i.e., (1) deep learning-based (e.g., CUP), and (2) heuristic-based (e.g., HebCUP). The former constructs a neural machine translation-structured semantic model, which has a more generalized capability on synchronizing comments with software evolution and growth. However, the latter designs a series of rules for performing token-level replacements on old comments, which can generate the completely correct comments for the samples fully covered by their fine-designed heuristic rules. In this article, we propose a composite approach named CBS (i.e., Classifying Before Synchronizing) to further improve the code-comment synchronization performance, which combines the advantages of CUP and HebCUP with the assistance of inferred categories of Code-Comment Inconsistent (CCI) samples. Specifically, we firstly define two categories (i.e., heuristic-prone and non-heuristic-prone) for CCI samples and propose five features to assist category prediction. The samples whose comments can be correctly synchronized by HebCUP are heuristic-prone, while others are non-heuristic-prone. Then, CBS employs our proposed Multi-Subsets Ensemble Learning (MSEL) classification algorithm to alleviate the class imbalance problem and construct the category prediction model. Next, CBS uses the trained MSEL to predict the category of the new sample. If the predicted category is heuristic-prone, CBS employs HebCUP to conduct the code-comment synchronization for the sample, otherwise, CBS allocates CUP to handle it. Our extensive experiments demonstrate that CBS statistically significantly outperforms CUP and HebCUP, and obtains an average improvement of 23.47%, 22.84%, 3.04%, 3.04%, 1.64%, and 19.39% in terms of Accuracy, Recall@5, Average Edit Distance (AED), Relative Edit Distance (RED), BLEU-4, and Effective Synchronized Sample (ESS) ratio, respectively, which highlights that category prediction for CCI samples can boost the code-comment synchronization performance.
Article
Context: Coding is an incremental activity where a developer may need to understand a code before making suitable changes in the code. Code documentation is considered one of the best practices in software development but requires significant efforts from developers. Recent advances in natural language processing and machine learning have provided enough motivation to devise automated approaches for source code documentation at multiple levels. Objective: The review aims to study current code documentation practices and analyze the existing literature to provide a perspective on their preparedness to address the said problem and the challenges lie ahead. Methodology: We provide a detailed account of the literature in the area of automated source code documentation at different levels and critically analyze the effectiveness of the proposed approaches. This also allows us to infer gaps and challenges to address the problem at different levels. Findings: 1) The research community focused on method level summarization. 2) Deep learning has dominated the last five years of this research field. 3) Researchers are regularly proposing bigger corpora for source code documentation. 4) Java and Python are the widely used programming languages as corpus. 5) BLEU is the most favored evaluation metric for the research persons.
Article
Full-text available
Researchers and professionals know the importance of the documenta- tion for the ecient maintenance of legacy software. Unfortunately, many legacy systems lack this important artifact. Maintenance then becomes a dicult process where software engineers must study and understand the system over and over again. A possible solution out of this situation is to re-document the legacy system. In this article we will present a software re-documentation process, its main features, and constituting activities. We will also present a tool we are developing to automate this process as much as possible. This tools runs in Java and is currently designed for Visual Basic legacy systems.
Conference Paper
Full-text available
In recent years, the use of, interest in, and controversy about Agile methodologies have realized dramatic growth. Anecdotal evidence is rising re- garding the effectiveness of agile methodologies in certain environments and for specified projects. However, collection and analysis of empirical evidence of this effectiveness and classification of appropriate environments for Agile projects has not been conducted. Researc hers from four institutions organized an eWorkshop to synchronously and virtually discuss and gather experiences and knowledge from eighteen Agile experts spread across the globe. These ex- perts characterized Agile Methods and communicated experiences using these methods on small to very large teams. They discussed the importance of staff- ing Agile teams with highly skilled developers. They shared common success factors and identified warning signs of problems in Agile projects. These and other findings and heuristics gathered through this valuable exchange can be useful to researchers and to practitioners as they establish an experience base for better decision making.
Conference Paper
Full-text available
This paper highlights the results of a survey of software professionals. One of the goals of this survey was to uncover the perceived relevance (or lack thereof) of software documentation, and the tools and technologies used to maintain, verify and validate such documents. The survey results highlight the preferences for and aversions against software documentation tools. Participants agree that documentation tools should seek to better extract knowledge from core resources. These resources include the system's source code, test code and changes to both. Resulting technologies could then help reduce the effort required for documentation maintenance, something that is shown to rarely occur. Our data reports compelling evidence that software professionals value technologies that improve automation of the documentation process, as well as facilitating its maintenance.
Book
From the Book:Software systems become legacy systems when they begin to resist modification and evolution. However, the knowledge embodied in legacy systems constitutes significant corporate assets. Assuming these system still provide significant business value, they must then be modernized or replaced. This book describes a risk-managed approach to legacy system modernization that applies a knowledge of software technologies and an understanding of engineering processes within a business context.AudienceModernizing Legacy Systems: Software Technologies, Engineering Processes and Business Practices should be useful to anyone involved in modernizing a legacy system. As a software engineer, it should help you understand some of the larger business concerns that drive a modernization effort. As a software designer, this book should help you understand the impact of legacy code, coupled with incremental development and deployment practices, on design activities. As a system architect, this book explains the processes and techniques that have failed or succeeded in practice. It should also provide insight into how you can repeat these successes and avoid the failures. As an IT manager, this book explains how technology and business objectives influence the software modernization processes. In particular, it should help you answer the following questions: When and how do I decide if a modernization or replacement effort is justified? How do I develop an understanding of the legacy system? How do I gain an understanding of, and evaluate the applicability of, infsystem technologies that can be used in the modernization of my system? When do I involve the stakeholders and how can I reconcile their conflicting needs? What role does architecture play in legacy system modernization? How can I estimate the cost of a legacy system modernization? How can I evaluate and select a modernization strategy? How can I develop a detailed modernization plan?Organization and ContentModernizing Legacy Systems: Software Technologies, Engineering Processes and Business Practices shows how legacy systems can be incrementally modernized. It uses and extends the methods and techniques described in Building Systems from Commercial Components Wallnau, 2001 to draw upon engineering expertise early in the conceptual phase to ensure realistic and comprehensive planning.This book features an extensive case study involving a major modernization effort. The legacy system in this case study consists of nearly 2 million lines of COBOL code developed over 30 years. The system is being replaced with a modern system based on the Java 2 Enterprise Edition (J2EE) architecture. Additional challenges include a requirement to incrementally develop and deploy the system. We look at the strategy used to modernize the system; the use of Enterprise JavaBeans, message-oriented middleware, Java, and other J2EE technologies to produce the modern system; the supporting software engineering processes and techniques; and the resulting system. Chapters 1 of this book provides an introduction to the challenges and practices of software evolution and Chapter 2 introduces the major case study in the bo introduces the Risk-Managed Modernization (RMM) approach which is elaborated in Chapters 4 through 17 and illustrated by the case study. Throughout Chapters 4 through 17 we provide an activity diagram of RMM as a road map to each chapter.Chapter 18 provides some recommendations to help guide your modernization efforts (although these recommendations cannot be fully appreciated without reading the main body of the book).Throughout this book we use the Unified Modelling Language (UML) to represent architecture drawings and design patterns. A brief introduction to UML is provided in Chapter 6.
Conference Paper
Software engineers rely on program documentation as an aid in understanding the functional nature, high-level design, and implementation details of complex applications. However, no one really knows what types of documentation are truly useful to software engineers to aid system understanding. This workshop focuses on issues related to this fundamental problem, such as what formats the documentation should take, who should produce it, and when. The juxtaposition of a technical communication audience with software engineering researchers and practitioners will provide new insights into the problem.
Conference Paper
This paper presents preliminary work towards a maturity model for system documentation. The Documentation Maturity Model (DMM) is specifically targeted towards assessing the quality of documentation used in aiding program understanding. Software engineers and technical writers produce such documentation during regular product development lifecycles. The documentation can also be recreated after the fact via reverse engineering. The DMM has both process and product components; this paper focuses on the product quality aspects.