Proposal for a Scientiﬁc Software Lifecycle Model
Mathematics and Computer Science Division
Argonne National Laboratory
Lemont, IL 60439
Flash Center for Computational Science
University of Chicago
Chicago, IL 60637
Lois Curfman McInnes
Mathematics and Computer Science Division
Argonne National Laboratory
Lemont, IL 60439
Abstract—Improvements in computational capabilities have
lead to rising complexity in scientiﬁc modeling, simulation, and
analytics and thus the software implementing them. In addition,
a paradigm shift in platform architectures has added another
dimension to complexity, to the point where software produc-
tivity (or the time, effort, and cost for software development,
maintenance, and support) has emerged as a growing concern for
computational science and engineering. Clearly communicating
about the lifecycle of scientiﬁc software provides a foundation
for community dialogue about processes and practices for various
lifecycle phases that can improve developer productivity and soft-
ware sustainability—key aspects of overall scientiﬁc productivity.
While the mainstream software engineering community have
produced lifecycle models that meet the needs of software projects
in business and industry, none of the available models adequately
describes the lifecycle of scientiﬁc computing software. In par-
ticular, software for end-to-end computations for obtaining sci-
entiﬁc results has no formalized development model. Examining
development approaches employed by teams implementing large
multicomponent codes reveals a great deal of similarity in their
strategies. In earlier work, we organized related approaches into
workﬂow schematics, with loose coupling between submodels for
development of scientiﬁc capabilities and reusable infrastructure.
Here we consider an orthogonal approach, formulating models
that capture the workﬂow of software development in slightly
different scenarios, and we propose a scientiﬁc software lifecycle
model based on agile principles.
Index Terms—scientiﬁc computing, software engineering, soft-
The topic of software lifecycles for business and commercial
software is well researched with many models that meet the
needs of different types of projects. A lifecycle model decom-
poses software development into distinct phases, where each
phase has its own requirements, speciﬁcations, and methods.
Many reasons make such decomposition into phases desirable.
For example, each phase can control its own quality and result
in higher quality software overall. Similarly, in larger projects,
phases can help deﬁne roles for developers and bring clarity
to the development process. Some standard lifecycle models
are waterfall , where each stage is completed before the
next stage can begin; V-shaped , which is an extension
of the waterfall model that also incorporates testing phases
for each development stage; iterative , where development
stages are cycled over for subsets of requirements until the
ﬁnal product is obtained; spiral , where iterations occur for
ongoing and new requirements; big bang , where develop-
ment occurs without deﬁned process; and agile, which allows
cycling through any group of phases emphasizing incremental
changes . (See ,  for a general description of various
software lifecycle models.)
Because of the unique requirements of scientiﬁc software,
a mismatch exists between the needs of scientiﬁc software
developers and the theory of mainstream software engineering.
Aspects of some lifecycle models apply; for example, the gen-
eral philosophy of the agile approach ﬁts well. But the methods
used in nonscientiﬁc software under the agile approach do not
ﬁt nearly as well for scientiﬁc code. The biggest challenge
in having well-deﬁned methods and timelines for scientiﬁc
software development is that often the numerical methods and
abstractions being used in implementations are themselves
subject of research, and therefore not fully speciﬁed ahead
of time. There have been efforts to adopt the agile model
for research-driven software. For example, the TriBITS 
effort has produced a package that also incorporates an agile
lifecycle model for research-driven software development.
The model addresses concerns of software that downstream
becomes a component in a larger software collection. The
Blue Brain Project ,  is adapting this model for their
own computational needs. In general this model is suitable
for software that implements research ideas and becomes a
reusable component in other larger collections of interoperable
However, many projects exist in scientiﬁc domains where
the primary software objective is to be the means for con-
ducting research instead of being the product or the goal
of the research. End-to-end simulation codes fall into this
category. They may use libraries and other third-party software
as components, but the codes have different usage models
and user expectations. In most successful scientiﬁc software
development projects, there is an implicit understanding of
the software lifecycle, even if is not articulated. In earlier
work  presented as an idea paper at the WSSSPE work-
shop in 2016, we devised schematics of scientiﬁc software
development workﬂows with a view toward engaging the
community in examining this important aspect of software
productivity. Here we reﬁne ideas from the earlier work and
take the next step toward devising a lifecycle model applicable
to simulation software and data analytics associated with
scientiﬁc simulations. Our methodology follows a three-step
process. Requirement gathering, design, implementation, and
veriﬁcation & validation are four distinct phases in a typical
development cycle for all kinds of software. In the ﬁrst step
we map the activities during the development of scientiﬁc
software to these well known and well understood phases. In
the second step we examine the existing lifecycle models and
evaluate their applicability to the conceptualized workﬂows.
In the third step we use the insights from the ﬁrst two steps
to propose a lifecycle model that covers important aspects of
scientiﬁc software development and maintenance.
II. WORKFLOW F OR SC IE NT IFI C SOF TWAR E
DEV EL OP ME NT
We begin by looking at the workﬂow for typical multi-
physics scientiﬁc software development projects. Examples of
such development projects include FLASH , Uintah ,
Pluto , Ramses , Cactus , and many more. All of
these codes use high performance computing (HPC) platforms
for running simulations. Figure 1 captures essential features
of the workﬂow that has been implicit during development
in many such projects. All boxes in the ﬁgure can be, and
usually are, under research. The research topic may be of
general interest going beyond the project, such as numerical
methods being applied, or it may be driven by the needs of the
project itself. Many feedback loops in the workﬂow indicate
ongoing research and reﬁnement in corresponding sections of
the workﬂow based on insights gained during the project.
The process starts with devising a mathematical model for
the phenomena of interest. The equations are discretized, and
numerical methods are devised and implemented for solving
the equations. Here the workﬂow for scientiﬁc software de-
velopment begins to diverge from that of mainstream software
development. The veriﬁcation of scientiﬁc software addresses
not only expected behavior, but also convergence and stability
of the numerical solvers. A failure in either takes the workﬂow
back to numerical solvers, which may need to be revised or
redesigned. Similarly, validation of output against observations
may reveal that the discretization or approximations used in
the mathematical model are inadequate, which in turn can
completely reset the workﬂow to the ﬁrst step.
Figure 1 illustrates a rough mapping of the workﬂow for
software development of scientiﬁc capabilities onto four basic
phases: requirements gathering, design, implementation, and
veriﬁcation. Setting aside issues such as release, maintenance,
and user support, these phases apply to any standard software
process. What differs in the realm of scientiﬁc software is the
feedback among various phases. From the perspective of these
distinct phases of development, the workﬂow can be simpliﬁed
into interactions among the phases as shown in Figure 2. Each
phase in the ﬁgure shows entities that are resolved in the
Fig. 1. Workﬂow for developing multiphysics software: overall perspective.
Fig. 2. Interaction among development phases for multiphysics software:
Another important aspect of scientiﬁc software functionality
is reusable infrastructure, or using a loose deﬁnition of the
word framework (e.g., see ), the entity that provides basic
services (such as data structures related to discretizations and
data layout, parallelization, and I/O), enables composability,
and allows orchestration of calculations. A ﬂexible and exten-
sible framework is a critical component of scientiﬁc software,
with unique requirements in its development cycle, therefore
deserving its own separate workﬂow and design space. Frame-
work development comes closest to other general business
and commercial software, in that the control ﬂow from one
phase to another is linear, as shown in Figure 3. This reusable
infrastructure is the most stable part of the resulting scientiﬁc
software, and once it has been implemented, a change to
the framework is a major undertaking. Modiﬁcations to a
framework would normally require starting at the requirement
gathering phase. Not surprisingly, the diagram of interaction
among phases is also linear, as shown in Figure 4.
We also examine the scientiﬁc process workﬂow from the
perspective of data used in computational science, which
may be data generated by or input to simulations, or ob-
servational data used for validation, or all used together for
advancing scientiﬁc understanding. Some of the processes
Fig. 3. Workﬂow for developing multiphysics software: infrastructure.
Fig. 4. Interaction among development phases for multiphysics software:
are similar to computations, e.g., much of the analysis starts
with a hypothesis, models are mathematical, and analysis
may involve numerical methods also used in simulations.
Some other processes are different, for example archiving
and retrieval, which have no equivalent in simulations. The
schematic in Figure 5 captures the workﬂow characteristics of
scientiﬁc data management and analysis, and Figure 6 shows
the corresponding phase interactions. More feedback loops
exist for data analytics than in framework design because
insights and inferences can lead to modifying or replacing
algorithms used in analysis (similar to the feedback loops for
scientiﬁc capabilities, shown in Figure 2).
III. EXISTING SOF TWAR E LIFECYCLE MOD EL S
We now examine existing software lifecycle models to see
what, if any, mapping is possible between the models and the
development workﬂow of scientiﬁc software.
a) Waterfall model: is the simplest of the software
lifecycle models, and it is also the least applicable to scientiﬁc
software. The main reason is that it relies upon the a relation-
ship between phases where the next phase cannot begin until
the ﬁrst phase is complete. Because scientiﬁc software is by
Fig. 5. Workﬂow for developing multiphysics software: data management
Fig. 6. Interaction among development phases for multiphysics software: data
management and analytics.
design meant for exploration of new ideas and insights, linear
progression is incompatible with its goals. Framework design
and development comes the closest to being able to follow this
b) V-shaped model: differs from the waterfall model
in having testing phases corresponding to each development
phase. Because it also needs to have no unknown requirements,
it has similar drawbacks for adoption by scientiﬁc software as
the waterfall model.
c) Iterative model: operates by allowing the waterfall
model to proceed for a subset of requirements, and then going
back to the beginning for the next set of requirements. It
overcomes one problem of the other two models discussed
so far in that it allows going back to the ﬁrst phase. However,
it still lacks the ﬂexibility of permitting evolving requirements
that can happen in scientiﬁc software.
d) Spiral model: is a reﬁnement of the iterative model
where phases are repeated for previously implemented require-
ments as well as new requirements over and over until the
project objectives are met. However, this model is still not
adequate for scientiﬁc software, because, as seen in Figure 2,
feedback loops exist among multiple phases in the workﬂow,
Verif icati on+
Fig. 7. Overall lifecycle model for scientiﬁc software derived by analyzing
workﬂow in Figure 1 and mapping it to lifecycle phases in Figure 2. Figures 4
and 6 show details of interactions among development phases of infrastructure
so the spiral may end up folding back on itself.
e) Big bang model: is the model in which the vast
majority of scientiﬁc software development projects have
operated until recently. This model does not have a well
deﬁned process or requirements and thus is inherently risky.
This model can be acceptable for small projects with just a
few developers; however, as demonstrated by the many failed
large projects in the scientiﬁc world, it clearly does not apply
to any moderate to large project.
f) Agile model: comes closest to being applicable to
scientiﬁc software development because it allows cycling
through any group of phases and emphasizes incremental
changes. Its philosophy applies, though many of the methods
that implement the philosophy do not. For example, sprints
have very little use in software that is used for research and
is being researched.
IV. PROPOSED LIFECYCLE MODEL
The proposed lifecycle model for scientiﬁc software, shown
in Figure 7, is derived from agile methodology and includes
steps beyond the initial development cycle discussed in Section
II. The initial development phase is taken from Figure 2, since
the scientiﬁc capabilities of multiphysics software have the
most demanding interaction among phases. The other aspects
of scientiﬁc software (infrastructure, shown in Figure 4 and
analysis, shown in Figure 6) have a subset of complexity
of interaction among phases. Two-way arrows represent tight
coupling and feedback loops that exist among various phases
in the development cycle. Note that any of the arrows or phases
can be nulliﬁed in a traversal of the cycle depending upon the
need of the project.
Boxes outside the development phase represent later stages
in the software lifecycle, with their arrows pointing to the
phase where they are more likely to plug into the development
cycle. For example, the two way arrow between maintenance
and release boxes indicates user interactions with issues and
bugs reported back. Sometimes the issues may be resolved
with discussion; at other times a new implementation may
be needed, hence the arrow from the maintenance box to
the implementation box. Normally implementation phase will
resolve most issues and bugs reported, however, occasionally
the severity of the issue may cause going back earlier in
the development cycle, to design or even to requirement
gathering phases. The diagonal arrows among the boxes permit
escalation of development complexity as needed. Similarly,
capability addition is normally expected to plug into the
development cycle at the design phase, while integration of
new research is likely to cause going back all the way to
requirements gathering. Because this model permits nullifying
arrows and phases as needed, it provides the ﬂexibility of
bypassing one or more phases for either capability addition or
integrating new research if needed. Therefore, for any stage in
software development, the cycle can be made simpler or more
complex depending upon the needs of the moment. Similar to
the agile methodology, our model supports frequent releases
whenever there is a stable code version.
V. CONCLUSIONS AND FUTURE WO RK
Through the process of mapping typical workﬂows for
development of scientiﬁc software, especially as it applies
to the most complex multiphysics codes, we have unraveled
the dependencies and feedback loops within the lifecycle of
such software. We have synthesized a lifecycle model that,
by permitting null instances of phases and connecting arrows,
uniﬁes many complex workﬂows into a simple schematic. This
lifecycle model captures the essential features and phases of
the most complex scientiﬁc software development.
One aspect of scientiﬁc software that we have not addressed
in our current model is that of refactoring existing software.
The software in question could be a legacy code or a well
constructed software that nevertheless has to be refactored be-
cause of the exigencies of platform requirements. An important
common feature of such development is that the new structure
needs to be built while retaining large chunks of original
code. This approach provides a path to incremental adoption,
necessary in most scientiﬁc refactoring projects. Considering
a lifecycle model for refactoring will be a next step.
This work was supported by the U.S. Department of
Energy Ofﬁce of Science Ofﬁce of Advanced Scientiﬁc
Computing Research. The submitted manuscript has been
created by UChicago Argonne, LLC, Operator of Argonne
National Laboratory (Argonne). Argonne, a U.S. Department
of Energy Ofﬁce of Science laboratory, is operated under
Contract No. DE-AC02-06CH11357. The U.S. Government
retains for itself, and others acting on its behalf, a paid-up
nonexclusive, irrevocable worldwide license in said article to
reproduce, prepare derivative works, distribute copies to the
public, and perform publicly and display publicly, by or on
behalf of the Government. The Department of Energy will
provide public access to these results of federally sponsored
research in accordance with the DOE Public Access Plan.
 Agile methodology. http://agilemethodology.org/.
 R. A. Bartlett, M. A. Heroux, and J. M. Willenbring. Overview of
the tribits lifecycle model: A lean/agile software lifecycle model for
research-based computational science and engineering software. In E-
Science (e-Science), 2012 IEEE 8th International Conference on, pages
1–8. IEEE, 2012.
 M. Berzins, J. Luitjens, Q. Meng, T. Harman, C. Wight, and J. Peterson.
Uintah - a scalable framework for hazard analysis. In TG ’10: Proc. of
2010 TeraGrid Conference, New York, NY, USA, 2010. ACM.
 B. W. Boehm. A spiral model of software development and enhance-
ment. Computer, 21(5):61–72, 1988.
 Cactus Computational Toolkit, 2013.
 A. Dubey, K. Antypas, M. Ganapathy, L. Reid, K. Riley, D. Sheeler,
A. Siegel, and K. Weide. Extensible component-based architecture for
FLASH, a massively parallel, multiphysics simulation code. Parallel
Computing, 35(10-11):512–522, 2009.
 A. Dubey and L. McInnes. Idea paper: Software lifecycle for scientiﬁc
simulation software. Working Towards Sustainable Software for Science:
Practice and Experience (WSSSPE4), http://wssspe.researchcomputing.
org.uk/wp-content/uploads/2016/06/WSSSPE4 paper 16.pdf.
 M.-O. Gewaltig and R. Cannon. Current practice in software devel-
opment for computational neuroscience and how to improve it. PLoS
Comput Biol, 10(1):e1003376, 2014.
 D. E. Keyes, L. C. McInnes, C. Woodward, W. Gropp, E. Myra, M. Per-
nice, et al. Multiphysics simulations: Challenges and opportunities.
The International Journal of High Performance Computing Applications,
 H. Markram. The blue brain project. Nature Reviews Neuroscience,
 A. Mignone, C. Zanni, P. Tzeferacos, B. van Straalen, P. Colella, and
G. Bodo. The PLUTO Code for Adaptive Mesh Computations in
Astrophysical Fluid Dynamics. The Astrophysical Journal Supplement
Series, 198:7, Jan. 2012.
 K. Petersen, C. Wohlin, and D. Baca. The waterfall model in large-scale
development. In International Conference on Product-Focused Software
Process Improvement, pages 386–400. Springer, 2009.
 Robert Half International. 6 basic SDLC methodologies:
Which one is best? https://www.roberthalf.com/technology/blog/
6-basic- sdlc-methodologies- the-pros- and-cons.
 I. Spence and K. Bittner. What is iterative development? https://www.
 R. Teyssier. Cosmological hydrodynamics with adaptive mesh reﬁne-
ment. A new high resolution code called RAMSES. Astronomy and
Astrophysics, 385:337–364, Apr. 2002.
 Tutorials Point. SDLC: Big bang model. https://www.tutorialspoint.
com/sdlc/sdlc bigbang model.htm.
 Tutorials Point. SDLC: V-model. https://www.tutorialspoint.com/sdlc/
sdlc v model.htm.
 Tutorials Point. Software development life cycle tutorial. http://www.