ArticlePDF Available

How Do Scientists Develop and Use Scientific Software?

Authors:
  • Simula Metropolitan Center for Digital Engineering

Abstract

New knowledge in science and engineering relies increasingly on results produced by scientific software. Therefore, knowing how scientists develop and use software in their research is critical to assessing the necessity for improving current development practices and to making decisions about the future allocation of resources. To that end, this paper presents the results of a survey conducted online in October-December 2008 which received almost 2000 responses. Our main conclusions are that (1) the knowledge required to develop and use scientific software is primarily acquired from peers and through self-study, rather than from formal education and training; (2) the number of scientists using supercomputers is small compared to the number using desktop or intermediate computers; (3) most scientists rely primarily on software with a large user base; (4) while many scientists believe that software testing is important, a smaller number believe they have sufficient understanding about testing concepts; and (5) that there is a tendency for scientists to rank standard software engineering concepts higher if they work in large software development projects and teams, but that there is no uniform trend of association between rank of importance of software engineering concepts and project/team size.
How Do Scientists Develop and Use Scientific Software?
Jo Erskine Hannay
Dept. of Software Engineering
Simula Research Laboratory
Dept. of Informatics, Univ. of Oslo
johannay@simula.no
Hans Petter Langtangen
Center for Biomedical Computing
Simula Research Laboratory
Dept. of Informatics, Univ.of Oslo
hpl@simula.no
Carolyn MacLeod
Dept. of Computer Science
University of Toronto
cmacleod@cs.utoronto.ca
Dietmar Pfahl
Dept. of Software Engineering
Simula Research Laboratory
Dept. of Informatics, Univ.of Oslo
dietmarp@simula.no
Janice Singer
Software Engineering Group
National Research Council of Canada
janice.singer@nrc-cnrc.gc.ca
Greg Wilson
Dept. of Computer Science
University of Toronto
gvwilson@cs.utoronto.ca
Abstract
New knowledge in science and engineering relies increas-
ingly on results produced by scientific software. Therefore,
knowing how scientists develop and use software in their
research is critical to assessing the necessity for improv-
ing current development practices and to making decisions
about the future allocation of resources. To that end, this
paper presents the results of a survey conducted online in
October–December 2008 which received almost 2000 re-
sponses. Our main conclusions are that (1) the knowledge
required to develop and use scientific software is primar-
ily acquired from peers and through self-study, rather than
from formal education and training; (2) the number of sci-
entists using supercomputers is small compared to the num-
ber using desktop or intermediate computers; (3) most sci-
entists rely primarily on software with a large user base;
(4) while many scientists believe that software testing is im-
portant, a smaller number believe they have sufficient un-
derstanding about testing concepts; and (5) that there is a
tendency for scientists to rank standard software engineer-
ing concepts higher if they work in large software develop-
ment projects and teams, but that there is no uniform trend
of association between rank of importance of software en-
gineering concepts and project/team size.
1. Motivation
There is probably not a single scientist who has not, at some
point in time, used a software system to analyze, visualize,
or simulate processes or data. Many scientists use such soft-
ware daily, while others develop it for their own use or for
a wider community.
As many researchers have pointed out [2, 3, 6, 8], there
is a wide chasm between the general computing commu-
nity and the scientific computing community. As a result,
there has been little exchange of ideas relevant to scientific
application software.
One reason for this is that in scientific computing, a de-
veloper must have intimate knowledge of the application
domain (i.e., the science), whereas in “regular” software de-
velopment (of, say, an enterprise resource planning system),
developers are much less likely to need to be domain ex-
perts. It follows that a scientific software developer is likely
to be among the end-users, whereas a developer in “tradi-
tional” software engineering is most likely not. Scientific
software is also often explorative: the purpose of the soft-
ware is usually to help understanding a new problem, im-
plying that up-front specification of software requirements
is difficult or impossible [9]. This may inhibit initiation of
fruitful collaboration between software engineers and sci-
entists.
Whether these facts are the cause of differences in the
development processes of scientific software versus those
of other software, and whether one should apply software
engineering processes to the development of scientific soft-
ware, is the subject of active investigation. However, it is
obvious that there are differences in development process,
and in the roles of those who develop and use different kinds
of research software.
The aim of this study was therefore to investigate how
the majority of working scientists develop and use scien-
tific software in their day-to-day work. Our overall research
questions are listed below.
RQ1. How did scientists learn what they know about devel-
oping/using scientific software?
RQ2. When did scientists learn what they know about de-
veloping/using scientific software?
RQ3. How important is developing/using scientific soft-
ware to scientists?
RQ4. How much of their working time do scientists spend
on developing/using scientific software?
RQ5. Do scientists spend more time developing/using sci-
entific software than in the past?
RQ6. On what scale of hardware do scientists develop/use
scientific software?
RQ7. What are the sizes of the user communities of scien-
tific software?
RQ8. How familiar are scientists with standard concepts of
software engineering?
RQ9. Does program size, time spent on programming, or
team size influence scientists’ opinions about the im-
portance of good software development practices?
2. Research Method
From the research questions, we formulated our expecta-
tions with regards to each question as a set of hypotheses.
These hypotheses, in turn, gave rise to questionnaire items
that were given on an online survey, which we advertised
through mailing lists, bulletin boards, word of mouth, and
with advertisements in both the online and print editions of
American Scientist magazine [1].
3. Results
3.1. Demographics
1972 usable responses were collected between October and
December 2008. Most scientists classified themselves into
the age ranges 18 to 30 years (649 responses—33%) or 30 to
40 years (681 responses—34%). The remaining age ranges
(40 to 50 years, 50 to 60 years, over 60 years) received 343,
187, and 97 responses repectively. Fifteen respondents did
not disclose their age.
Valid responses were received from a total of 40 coun-
tries. More than 50% of all responses came from five coun-
tries: United States, Canada, United Kingdom, Germany,
and Norway, with 579, 136, 136, 117, and 99 responses,
respectively. Grouped by continents, most responses came
from Europe and North America with 725 responses (in-
cluding Russia and Turkey) and 715 responses, respectively,
followed by Australia/New Zealand and Asia with 66 and
57 responses, respectively. The number of responses from
South America, Central America, and Africa was below 50
for all three geographic regions taken together.
About two thirds of the respondents stated that their
highest academic degree is a Ph.D. (or equivalent). About
20% stated they have an M.Sc. degree (or equivalent), and
about 10% stated they have an B.Sc. degree (or equivalent).
About 50% of the respondents stated that they are
academic researchers (professors, post-docs, or simi-
lar), followed by graduate students (25%), programmers
(20%), government research scientists (16%), engineers
(15%), software engineers (13%), teachers (12%), man-
agers/supervisors (8%), industrial research scientists (7%),
system administrators (7%), laboratory technicians (3%),
and clinicians (1%). The percentages do not add up to 100%
because scientists were given the opportunity to check more
than one option to describe their current occupation.
3.2. Main Findings
For each research question, we present our associated ex-
pectations in the form of hypotheses, and then give the rel-
evant survey results.
We refined RQ1 into the following two hypotheses:
H1a. Most scientists learn most of what they know about
developing software on their own or informally from
their peers, rather than through formal training.
H1b. Most scientists learn most of what they know about
using software on their own or informally from their
peers, rather than through formal training.
Survey results:
H1a: Using a five-point scale (‘not at all important’,
‘not important’, ‘somewhat important’, ‘important’, ‘very
important’), 96.9% of the responses state that informal self
study is important or very important for developing soft-
ware. 60.1% state that informal learning from peers is im-
portant or very important. Only 34.4% state that formal ed-
ucation at an educational institution is important or very im-
portant and only 13.1% state that formal training at work is
important or very important.
H1b: 96.5% of the responses state that informal self
study is important or very important for using software.
69.4% state that informal learning from peers is important
or very important. Only 26.6% state that formal education
2
at an educational institution is important or very important
and only 17.1% state that formal training at work is impor-
tant or very important.
We refined RQ2 into the following two hypotheses:
H2a. Most scientists learn what they know about develop-
ing software early in their careers (undergraduate and
graduate degrees, or equivalent years in industry).
H2b. Most scientists learn what they know about using
software early in their careers (undergraduate and
graduate degrees, or equivalent years in industry).
Survey results:
H2a: One finding related to hypothesis H1a was that
formal education is important, or very important, for only
34.4%. On a more detailed level, the survey results indicate
that the importance of graduate studies is clearly greater
than that of undergraduate studies which, in turn, is clearly
greater than that of high school studies. Another finding re-
lated to hypothesis H1a was that formal training at work was
considered as important or very important for only 13.1%.
Independently of whether scientists received formal training
at the workplace or learned informally through self-study or
from peers, there is the following trend: The last five years
during professional work are more important for learning
about developing scientific software than the periods six to
ten years ago, eleven to 15 years ago, or longer than 15 years
ago. However, this trend is not as pronounced as the effect
of recency (and thus higher level) of formal education.
H2b: Similar to the results for H2a, the importance of
graduate studies is clearly greater than that of undergrad-
uate studies which, in turn, is clearly greater than that of
high school studies. Also, the importance of learning about
software development during professional work increases
slightly for more recent professional work.
We refined RQ3 into the following two hypotheses:
H3a. Developing scientific software is important to the ma-
jority of scientists.
H3b. Using scientific software is important to the majority
of scientists.
Survey results:
H3a: 84.3% of the responses state that developing sci-
entific software is important or very important for their own
research. 46.4% state that developing scientific software is
important or very important for the research of others.
H3b: 91.2% of the responses state that using scientific
software is important or very important for their own re-
search. (We did not ask whether the use of scientific soft-
ware might be important for others.)
We refined RQ4 into the following two hypotheses:
H4a. Most scientists spend less than ten hours (or one day)
per week (or less than 20% of their working hours)
developing software.
H4b. Most scientists spend less than ten hours (or one day)
per week (or less than 20% of their working hours)
using software for research purposes.
Survey results:
H4a: On average, scientist spend approximately 30% of
their work time developing scientific software.
H4b: On average, scientist spend approximately 40%
of their work time using scientific software.
We refined RQ5 into the following two hypotheses:
H5a. In the past, scientists and engineers used to spend less
time developing software than today.
H5b. In the past, scientists and engineers used to spend less
time using software than today.
Survey results:
H5a: Using a five-point scale (‘much less time’, ‘less
time’, ‘same amount of time’, ‘more time’, ‘much more
time’), 53.5% of the responses state that scientists spend
more or much more time developing scientific software than
they did 10 years ago. 44.7% state that scientists spend
more or much more time developing scientific software than
they did 5 years ago. 14.5% state that scientists spend more
or much more time developing scientific software than they
did 1 year ago.
H5b: 85.9% of the responses state that scientists spend
more or much more time using scientific software than they
did 10 years ago. 69.5% state that scientists spend more or
much more time using scientific software than they did 5
years ago. 19.8% state that scientists spend more or much
more time using scientific software than they did 1 year ago.
We refined RQ6 into the following two hypotheses:
H6a. Most scientists use a desktop computer (or laptop) for
their software development work, very few develop
scientific software on intermediate computers (or par-
allel small-size clusters), and less than 10% ever de-
velop scientific software on a supercomputer.
H6b. Most scientists use a desktop computer (or laptop) to
run scientific software, very few use scientific soft-
ware on intermediate computers (or parallel small-
size clusters), and less than 10% ever use scientific
software on a supercomputer.
Survey results:
H6a: Desktop computers: 42.9% of the scientists de-
velop scientific software using exclusively desktop com-
puters, and 77.9% of the scientists spend 60.0% or more
of their time developing scientific software using desktop
computers. Only 4.3% of the scientists never develop sci-
entific software on a desktop computer. Intermediate com-
puters: 55.2% of the scientists never use an intermediate
computer to develop scientific software, and 81.3% of the
3
scientists spend 20.0% or less of their time developing sci-
entific software using intermediate computers. Only 0.7%
of the scientists always develop scientific software on an
intermediate computer. Supercomputers: 75.2% of the sci-
entists never use a supercomputer to develop scientific soft-
ware, and 91.6% of the scientists spend 20.0% or less of
their time developing scientific software using a supercom-
puter. Only 0.3% of the scientists always develop scientific
software on a supercomputer and only 2.3% spend 50.0%
or more of their time developing scientific software on a su-
percomputer. Average time: Scientists state that they spend
on average 5.6% of their time developing scientific software
using supercomputers, 12.8% of their time developing sci-
entific software using intermediate computers, and 79.1%
of their time developing scientific software using desktop
computers. Note that the averages do not add up to 100%
because some respondents either did not provide data that
adds up to 100% or did not provide responses to all three
hardware categories. In the latter case, we assumed that not
providing data for a hardware category implies that 0% of
the time developing scientific software is spent on hardware
of that category.
H6b: Desktop computers: 48.5% of the scientists use
scientific software exclusively on desktop computers, and
81.7% of the scientists spend 60.0% or more of their time
using scientific software on desktop computers. Only 2.3%
of the scientists never use scientific software on a desktop
computer. Intermediate computers: 58.0% of the scientists
never use scientific software on an intermediate computer,
and 85.5% of the scientists spend 20.0% or less of their time
using scientific software on intermediate computers. Only
0.2% of the scientists always use scientific software on an
intermediate computer. Supercomputers: 79.9% of the sci-
entists never use scientific software on a supercomputer,and
93.2% of the scientists spend 20.0% or less of their time
using software on a supercomputer. Only 0.2% of the sci-
entists always use scientific software on a supercomputer
and only 1.7% spend 50.0% or more of their time using
scientific software on a supercomputer. Average time: Sci-
entists state that they spend on average 4.5% of their time
using scientific software on supercomputers, 10.4% of their
time using scientific software on intermediate computers,
and 83.1% of their time using scientific software on desktop
computers. Note that the averages do not add up to 100%
because some respondents either did not provide data that
adds up to 100% or did not provide responses to all three
hardware categories. In the latter case, we assumed that not
providing data for a hardware category implies that 0% of
the time using scientific software is spent on hardware of
that category. The standard deviations were larger for H6a
than those for H6b. More scientists use, than develop, sci-
entific software.
We refined RQ7 into the following hypothesis:
H7. Most scientific software is used by either a very small
number of people or a very large number.
Survey results: 56.2% of the respondents believe that the
most important scientific software they use has more than
5000 users worldwide. In contrast, 10.7% of the respon-
dents believe that the most important software they use has
less than 3 users worldwide. Similar patterns apply to the
second, third and fourth important scientific software used.
We refined RQ8 into the following two hypotheses:
H8a. Most scientists are not familiar with standard soft-
ware engineering concepts.
H8b. Most scientists do not consider standard software en-
gineering concepts as important for their work.
Survey results: Scientists were asked to rank their under-
standing of the following software engineering concepts:
software requirements (e.g., eliciting, analyzing, specify-
ing and prioritizing functional and non-functional require-
ments), software design (e.g., specifying architecture and
detailed design using design by contract, design patterns,
pseudo-code, or UML), software construction (e.g., coding,
compiling, defensive programming), software verification
(e.g., correctness proofs, model-checking, static analysis,
inspections), software testing (e.g., unit testing, integration
testing, acceptance testing, regression testing, code cov-
erage, convergence analysis), software maintenance (e.g.,
correcting a defect, porting to new platforms, refactoring),
software product management (e.g., configuration manage-
ment, version control, release planning), software project
management (e.g., cost/effort estimation, task planning,
personnel allocation). Then scientists were asked to assign
to each of these software engineering concepts an impor-
tance rank. The question about the scientists’ understanding
of software engineering concepts used the following five-
point scale: ‘No idea what it means’, ‘Vague understand-
ing’, ‘Novice understanding’, ‘Understand for the most
part’, ‘Expert-level understanding’. The question about the
scientists’ judgment of the importance of each of these con-
cepts for their work used the following five-point scale:
‘Not at all important’, ‘Not important’, ‘Somewhat impor-
tant’, ‘Important’, ‘Very important’. The related question
had the following formulation: ‘How well do you think you
understand each of the following software engineering con-
cepts, and how important is each to your work?’.
Figure 1 summarizes results related to H8a and H8b.
The black and the white bars show the relative frequen-
cies (percentages) of the answers in the two top categories
for understanding and importance, respectively. (The label
‘good/expert understanding’ represents answer categories
‘Understand for the most part’ and ‘Expert-level under-
standing’.) The gray ‘gap’ bars represent the differences be-
tween the combined percentages of the two top categories of
assumed understanding of a software engineering concept
4
52.0
49.7
79.3
45.4
46.6
59.4
46.1
30.0
46.4
44.1
81.6
52.4
60.2
45.0
31.1
22.9
5.6
5.6
-2.3
-7.0
-13.6
14.4
15.0
7.1
-20
0
20
40
60
80
100
sw
requirements
sw design
sw
construction
sw verification
sw testing
sw
maintenance
sw product
mgmt.
sw project
mgmt.
Software Engineering Concept
Percent
good/expert understanding important/very important gap
Figure 1. Perceived Importance versus Un-
derstanding of Standard Software Engineer-
ing Concepts
and perceived importance of that concept. We used only
the responses in the two top categories because we wanted
to check whether scientists have at least as much good or
expert-level understanding of a standard software engineer-
ing concept as they believe this concept is important or very
important for their work. The survey results show that this is
not the case for the concepts ‘software construction’, ‘soft-
ware verification’, and in particular for ‘software testing’.
We refined RQ9 into the following hypothesis:
H9. How scientists rank the importance of standard soft-
ware development practices for their own software de-
velopment is independent of the size of the program
they are working on, and the amount of time they spend
developing scientific software, but the importance rank
increases with the size of the team they are part of.
Survey results: We used one-way analyses of variance
(ANOVA) to compare average rank of importance scores
between groups defined by project size and groups defined
by team size. Since we were interested in all pairwise
comparisons between groups, we employed Tukey’s HSD
method for post-hoc comparisons when the ANOVA null
hypotheses of no differences between groups were rejected
at α=.05.
Importance of software requirements: Respondents who
are involved in small projects and who work with small
teams are more likely to rank software requirements as
somewhat less important than respondents who are involved
with larger projects and larger teams. The one exception to
this trend is that respondents working with very large teams
also rank software requirements as less important. The re-
lationship between time spent working on software devel-
opment and rank of importance of software requirements
is less clear. Nevertheless, the data show that people who
reported the lowest rank for software requirements spend
significantly less time developing software than do people
who reported the highest rank.
Importance of software design: Results of the ANOVA
show that the mean rank of the importance of software
design differed between respondents engaged in different
project sizes and in different team sizes. In general, sci-
entists working on small projects with less than 5000 LOC
and in teams of one or two people ranked software design
as less important than did people working on larger projects
in bigger teams. Scientists who ranked the importance of
software design highest spend significantly more time on
developing software than those who reported lower ranks
of importance.
Importance of software construction: The ranking of this
concept was generally high overall. Compared to large and
small projects, scientists working on middle-sized projects
ranked this concept higher (rank ‘very important’) but there
was no difference in rank by team size. Time spent develop-
ing software tended to increase with importance rank of this
concept—in particular, scientists who ranked this concept
‘very important’ spend significantly more time developing
software than do other scientists.
Importance of software verification: Most scientists
ranked software verification as somewhat important and this
did not differ significantly by either project size or team
size. Differences in time spent developing software between
rank scores were only marginally significant. There is some
evidence that those who ranked verification as ‘important’
or ‘very important’ report a higher proportion of their time
developing software than those who ranked verification as
‘very unimportant’.
Importance of software testing: Testing is generally con-
sidered important, and differences across groups are statis-
tically significant but small. Importance ranks are signifi-
cantly lower (but only slightly) among scientists working on
smaller projects (less than 5000 LOC) and ranks are slightly
higher among scientists in mid-sized teams than larger or
smaller teams. Scientists who gave testing a rank of ‘not
important’ (the second lowest rank) spend significantly less
time developing software than those who gave testing the
highest ranks (‘important’ and ‘very important’).
Importance of software maintenance: Software mainte-
nance was generally ranked as moderately important. The
relationship with project size is somewhat unclear, but the
data seems to suggest that rank is lower among scientists
who are involved with smaller, rather than larger, projects.
Similarly, higher importance ranks are associated with sci-
entists who work with larger, rather than smaller, teams.
There is a general, but small, trend where an increase in
time spent developing software associates with an increase
in rank of importance.
Importance of software product management: Although
not all pair-wise comparisons are statistically significant
5
and they are difficult to summarize, the analysis shows a
fairly clear trend where an increase in importance of this
concept is associated with an increase in both project size
and team size. Likewise, the time spent on developing soft-
ware generally increases with increasing rank of importance
for this concept; i.e., scientists who spend more time devel-
oping software ranked this concept higher than those who
spend less time developing software.
Importance of software project management: Software
project management was generally ranked as moderate to
low importance. The relationship between the ranking and
project and team size are very clear. Scientists working on
small projects (less than 5000 LOC) and with teams of less
than three people ranked this concept low (‘not important’),
while others ranked this concept slightly higher (‘somewhat
important’). The relationship between the rank of impor-
tance and the time spent developing software is statisti-
cally significant but a clear trend is not evident. While the
ranked importance of the concept increases with the time
spent on developing software for the ranks ‘not at all im-
portant’, ‘not important’, ‘somewhat important’, and ‘very
important’, the average time spent on developing software
of those who assigned the rank ‘important’ to this concept
is almost as low as that of those who assigned the lowest
rank (‘not at all important’).
4. Discussion
We here discuss what we find to be the most relevant im-
plications of our findings. We view the implications from
the perspectives of both the scientific computing commu-
nity and the software engineering community. We also dis-
cuss the most pressing threats to validity of our survey.
4.1 Importance of Scientific Software
The results for RQ3 of the study show that both develop-
ing and using scientific software is of very high importance
for scientists’ own research. Almost half of the respondents
stated that they believe developing scientific software is im-
portant for other scientists. While the questions we asked
do not allow to draw direct conclusions about the hypoth-
esis that the importance of developing and using scientific
software has been increasing over time, the stated belief of
scientists that today much larger amounts of data are gener-
ated and archived than five and ten years ago may be inter-
preted as an indirect support for this hypothesis.
Corroborating this evidence, our findings for RQ4 indi-
cate that scientists spend on average about 50% more of
their total work time for developing scientific software and
100% more of their total work time using scientific soft-
ware than expected. Further, our findings for RQ5 clearly
support the hypothesis that today, scientists spend more or
much more time for for both developing and using scientific
software than they did five and ten years ago.
4.2 Software Engineering Practices
The results for RQ8 indicate that there is a great deal of
variation in the level of understanding of standard software
engineering concepts by scientists. The level of impor-
tance that scientists assign to a standard software engineer-
ing concepts is mostly consistent with their understanding
of this concept. We found, however, that in particular for
the concepts ‘software testing’ and ‘software verification
scientists assign on average a higher level of importance to
these concepts than they judge their level of understanding
of these concepts.
Software testing is particularly challenging for scien-
tific software because the answers are known to contain
mathematical approximation errors of unknown size. More
specifically, the challenge consists in separating software
bugs from model errors and approximation errors. More-
over, the correct output from running a simulation code
is seldom known. This makes standard testing procedures
in software engineering (e.g., regression testing and black-
box testing) less appropriate for scientific software in many
situations. These facts may be the reason why scientists
have instead focused on testing techniques that are based on
mathematical insight in the scientific problem being solved.
It is quite common, as soon as evidence for a correct code is
provided, to use the verified output in regression tests. How-
ever, we postulate that scientists have mostly reinvented
this concept rather than having imported the technique from
software engineering. Thus, although it should be beneficial
to teach scientists about software engineering testing tech-
niques, one must be aware that scientific software testing
raises issues that have not yet been addressed sufficiently
by the software engineering community.
The rationale behind RQ9 was that larger projects and
larger development teams might increase the perceived im-
portance of various software engineering practices. The
results support RQ9 to some extent (large projects and
large teams are associated with higher perceived importance
of certain software engineering concepts than are small
projects and small teams), but there is no consistent trend
of association that links an increase of project or team size
to perceived importance of software engineering concepts.
4.3 Education and Training
The results for RQ1 and RQ2 support the hypotheses that
for both developing and using scientific software, informal
self-study and informal learning from peers is clearly more
important than formal education at an academic institution
or formal training at the work place. Our findings also
6
suggest that both learning at an educational institution and
learning during professional work become more important
the more recent they occur.
We postulate that these observation are due to the fol-
lowing: First, there is a general lack of formal training in
programming and software development among scientists.
Second, the training that scientists do receive is often sup-
plied by a computer science department, which gives gen-
eral software courses that scientists might not see the rele-
vance of. A third aspect is that scientists may not see the
need for more formal training in programming or software
development. Codes often start out small and only grow
large with time as the software proves its usefulness in sci-
entific investigations. The demand for proper software engi-
neering is therefore seldom visible until it is “too late”. As
modern scientific software tends to be more complex, there
is an increasing awareness among scientists of the need for
better development tools and more formal training. How-
ever, as suggested by our findings, this awareness arises
primarily in larger projects and is difficult to experience in
projects met in basic education. Training targeted at practi-
tioners in science and engineering is therefore important.
However, many engineering programs have, in fact, re-
moved programming courses and formal training in, e.g.,
numerical methods from the curriculum. Since our findings
clearly suggest that scientific software is becoming increas-
ingly more important and that an increasing amount of time
is spent on both developing and using scientific software,
one can ask if such moves are in the right direction. The an-
swer lies in what further in-depth studies can tell us about
how scientists and engineers use and develop software.
4.4 Scale
Our findings for RQ7 partly support the hypothesis that
most scientific software is either used by a very large num-
ber of people (more than 5000 users) or by a very small
number of people (less than three). The size category ‘more
than 5000 users’ received clearly the largest number of re-
sponses (i.e., consistently greater than 50% of all responses
for the four top most important pieces of software used).
That scientists rely mostly on software with a large user
base may come as a result of an increasing number of (1)
commercial packages, (2) open source projects, and (3)
community efforts in establishing common software bases.
Scientists’ need for programming in such contexts is of-
ten restricted to smaller problem-dependent code for prob-
lem specification. The mix of personal, problem-dependent
code interacting with a larger, more general software frame-
work brings forward some challenges with testing and de-
bugging (see above), since the scientist does not have com-
plete control of all details. This contrasts to the past when
scientists often had complete control when writing com-
plete codes themselves. This, by the way, is still a domi-
nating principle in university education.
Regarding RQ6, the results of the study support the hy-
pothesis that the majority of scientists use desktop comput-
ers most of the time for developing and using scientific soft-
ware. The results of the study did not confirm the hypothe-
ses that less than 10% of scientists ever use a supercomputer
for either developing or using scientific software. Actually,
almost a quarter of scientists use supercomputers for devel-
oping scientific software sometimes, and about a fifth use
scientific sometimes on a supercomputer. However, very
few spend more than a fifth of their time developing or us-
ing scientific software on a supercomputer.
Many supercomputer centers have a significant staff for
helping scientists with software issues, and the competence
transfer from such help cannot be underestimated. Nev-
ertheless, taking into account the importance of desktop
and small-cluster computing indicated by this survey, one
should consider allocating more resources for improving the
scientists’ competence in development techniques and tools
related to desktop and small-cluster computing.
4.5 Threats to Validity
Every empirical study will have shortcomings. Here, we
discuss the most pressing issues for our study.
Construct validity pertains to how well the measures in
an empirical study reflect the concepts under investigation,
and also to how well-defined the concepts are. In our study,
this translates to how meaningful our research questions are,
how appropriate the derived hypotheses are, and to what
extent the survey questionnaire items were appropriate for
giving answers to the hypotheses and research questions.
We made efforts to follow standard guidelines for design-
ing survey questionnaires, e.g., [4], and we ran the ques-
tionnaire as a pilot study in the field as a means to validate
the questionnaire items. Nevertheless, since this was a first
attempt, there are several items that may be improved for
future applications of this survey.
External validity concerns the extent to which con-
clusions drawn on the study’s specific operationalizations
transfer to variations of these operationalizations [10].
Here, this pertains to how well our conclusions transfer to
other prospective respondents. As indicated in Section 3.1
scientists from certain regions were not represented in any
great numbers. It is therefore, unclear how well our conclu-
sions transfer to scientists in these, and other, regions.
Statistical conclusion validity pertains to the conclusions
drawn from the statistical analyses, and the appropriateness
of the statistical methods used in the analyses. With re-
gards, to the latter, the assumption for ANOVA is that the
population distribution is normal, with the same standard
deviation, for each group. In our case, we have no a pri-
7
ori knowledge of how the subgroup population distributions
are. Nevertheless, our large sample size justifies the use of
the ANOVA in estimating the means of whichever popu-
lation distribution the groups may have, and at the present
state of knowledge, means seem to be a sensible summary
expression of a group’s response on a variable. With re-
gards to the conclusions drawn, our large sample size gives
the statistical analyses an adequate power to show effects.
However, it should be noted that highly significant effects
are not necessarily large effects. We took this into account
in our reporting in Section 3.2.
5. Related Work
To our knowledge, there are few publications in software
engineering that focus on the development of scientific soft-
ware. Smith, who is a computational scientist, has ad-
dressed software engineering topics in several works, e.g.,
[11]. Basili et al. [2] discussed the potential interplay of sci-
entists and software engineering in the context of codes for
high-performance computing on supercomputers. Sanders
and Kelly [7] interviewed scientists at two universities to
better understand development practices. Greenough and
Worth [5] reviewed scientific software development prac-
tices and based much of their information on an extensive
questionnaire. Tang [12] conducted a survey quite similar
to the present one to investigate factors such as educational
background, working experience, group size, software size,
development practices, and software quality.
6. Future Plans
Science is very diverse and different fields of science use
and develop software in different ways. A more refined
analysis, where various subgroups of scientists are ad-
dressed separately, constitutes an obvious and necessary
improvement of the present analysis. In several instances,
the underlying rationale for responses are not always clear,
and we consider follow-up interviews with selected respon-
dents to clarify such issues. For example, the interpretation
of what a standard software engineering concept actually
is, might vary among respondents. Furthermore, when we
know the answers to our research questions, it becomes pos-
sible to change practices and migrate the relevant software
engineering knowledge to the science field.
Acknowledgments
The authors wish to thank Laurel Duquette, Statistical Con-
sulting Service University of Toronto, for support in doing
the statistical analyses. This work was funded in part by a
grant from The MathWorks.
References
[1] Advertisement for survey. American Scientist, page 445,
November/December 2008.
[2] V. R. Basili, D. Cruzes, J. C. Carver, L. M. Hochstein,
J. K. Hollingsworth, and M. V. Zelkowitz. Understand-
ing the high-performance computing community: A soft-
ware engineer’s perspective. IEEE Software, 25(4):29–36,
July/August 2008.
[3] J. C. Carver, R. P. Kendall, S. E. Squires, and D. E. Post.
Software development environments for scientific and engi-
neering software: A series of case studies. In Proc. Interna-
tional Conference on Software Engineering, pages 550–559,
2007.
[4] F. J. Fowler, Jr. Survey Research Methods. Sage, third edi-
tion, 2002.
[5] C. Greenough and D. J. Worth. Computational science and
engineering department software development best practice.
Technical report ral-tr-2008-022, SFTC Rutherford Apple-
ton Laboratory, 2008.
[6] R. Sanders and D. Kelly. Dealing with risk in scien-
tific software development. IEEE Software, 25(4):21–28,
July/August 2008.
[7] R. Sanders and D. Kelly. Dealing with risk in scien-
tific software development. IEEE Software, 25(4):21–28,
July/August 2008.
[8] J. Segal. When software engineers met research scientists.
Empirical Software Engineering, 10(4):517–536, 2005.
[9] J. Segal and C. Morris. Developing scientific software. IEEE
Software, 25(4):18–20, July/August 2008.
[10] W. R. Shadish, T. D. Cook, and D. T. Campbell. Experimen-
tal and Quasi-Experimental Designs for Generalized Causal
Inference. Houghton Mifflin, 2002.
[11] W. S. Smith, L. Lai, and R. Khedri. Requirements analy-
sis for engineering computation: A systematic approach for
improving software reliability. Reliable Computing (Spe-
cial Issue on Reliable Engineering Computation), 13:83–
107, 2007.
[12] J. Tang. Developing scientific computing software: Current
processes and future directions. Master’s thesis, Department
of Computing and Software, Faculty of Engineering, Mc-
Master University, 2008.
8
... We wish to understand the impact 1 INTRODUCTION 1.1 Research Questions of the often cited gap, or chasm, between software engineering and research software (Kelly, 2007;Storer, 2017). Although scientists spend a substantial proportion of their working hours on software development (Hannay et al., 2009a;Prabhu et al., 2011), many developers learn software engineering skills by themselves or from their peers, instead of from proper training (Hannay et al., 2009a). Hannay et al. (2009a) observe that many scientists showed ignorance and indifference to standard software engineering concepts. ...
... We wish to understand the impact 1 INTRODUCTION 1.1 Research Questions of the often cited gap, or chasm, between software engineering and research software (Kelly, 2007;Storer, 2017). Although scientists spend a substantial proportion of their working hours on software development (Hannay et al., 2009a;Prabhu et al., 2011), many developers learn software engineering skills by themselves or from their peers, instead of from proper training (Hannay et al., 2009a). Hannay et al. (2009a) observe that many scientists showed ignorance and indifference to standard software engineering concepts. ...
... Although scientists spend a substantial proportion of their working hours on software development (Hannay et al., 2009a;Prabhu et al., 2011), many developers learn software engineering skills by themselves or from their peers, instead of from proper training (Hannay et al., 2009a). Hannay et al. (2009a) observe that many scientists showed ignorance and indifference to standard software engineering concepts. For instance, according to a survey by Prabhu et al. (2011), more than half of their 114 subjects did not use a proper debugger when coding. ...
Preprint
Full-text available
We selected 29 medical imaging projects from 48 candidates, assessed 10 software qualities by answering 108 questions for each software project, and interviewed 8 of the 29 development teams. Based on the quantitative data, we ranked the MI software with the Analytic Hierarchy Process (AHP). The four top-ranked software products are 3D Slicer, ImageJ, Fiji, and OHIF Viewer. Generally, MI software is in a healthy state as shown by the following: we observed 88% of the documentation artifacts recommended by research software development guidelines, 100% of MI projects use version control tools, and developers appear to use the common quasi-agile research software development process. However, the current state of the practice deviates from the existing guidelines because of the rarity of some recommended artifacts, low usage of continuous integration (17% of the projects), low use of unit testing (about 50% of projects), and room for improvement with documentation (six of nine developers felt their documentation was not clear enough). From interviewing the developers, we identified five pain points and two qualities of potential concern: lack of development time, lack of funding, technology hurdles, ensuring correctness, usability, maintainability, and reproducibility. The interviewees proposed strategies to improve the state of the practice, to address the identified pain points, and to improve software quality. Combining their ideas with ours, we have the following list of recommendations: increase documentation, increase testing by enriching datasets, increase continuous integration usage, move to web applications, employ linters, use peer reviews, design for change, add assurance cases, and incorporate a "Generate All Things" approach.
... In order to achieve correct and reproducible code, programming knowledge is needed and knowledge gaps among CSS researchers clearly exist, already due to the high interdisciplinary of the field [23]. Beyond disciplinary boundaries, however, the gap also stems from the fact that the vast majority of researchers, even those who develop research software, are primarily self-taught and not equipped with any formal training in software development [24]. Researcher can probably write executable code but have varying skills in standard software development practices such as using unit tests and continuous integration [25] to show correctness. ...
... The diversity of computational environments overshadows the human component. Previous studies showed that most researchers run their analyses on their desktop or laptop computers rather than standardized computing environments [24]. Worse, the details of the computational environment used in the analysis is usually underdescribed. ...
Article
Full-text available
Open science practices have been widely discussed and have been implemented with varying success in different disciplines. We argue that computational-x disciplines such as computational social science, are also susceptible to the symptoms of the crises, but in terms of reproducibility. We expand the binary definition of reproducibility into a tier system which allows increasing levels of reproducibility based on external verifiability to counteract the practice of open-washing. We provide solutions for barriers in Computational Social Science that hinder researchers from obtaining the highest level of reproducibility, including the use of alternate data sources and considering reproducibility proactively.
... The software for these impact models is categorized as research software, which includes "source code files, algorithms, 50 computational workflows, and executables developed during the research process or for a research objective" (Barker et al., 2022). Impact modelling research software is predominantly developed and maintained by scientists without formal training in software engineering (Hannay et al., 2009;Barton et al., 2022;Carver et al., 2022;Reinecke et al., 2022). Most of these researchers are self-taught software developers with little knowledge of software requirements (specifications and features of software), industry-standard software design patterns (Gamma et al., 1994), good coding practices (e.g., using descriptive 55 variable names), version control, software documentation, automated testing and project management practice (e.g. ...
... Most of these researchers are self-taught software developers with little knowledge of software requirements (specifications and features of software), industry-standard software design patterns (Gamma et al., 1994), good coding practices (e.g., using descriptive 55 variable names), version control, software documentation, automated testing and project management practice (e.g. agile) (Carver et al., 2013(Carver et al., , 2022Hannay et al., 2009;Reinecke et al., 2022). We hypothesize that this leads to the creation of source code that is not well-structured, not easily (re)usable, difficult to modify and maintain, has scarce internal documentation (code comments) and external documentation (e.g. ...
Preprint
Full-text available
Research software for simulating Earth processes enables estimating past, current, and future world states and guides policy. However, this modelling software is often developed by scientists with limited training, time, and funding, leading to software that is hard to understand, (re)use, modify, and maintain, and is, in this sense, non-sustainable. Here we evaluate the 10 sustainability of global-scale impact models across ten research fields. We use nine sustainability indicators for our assessment. Five of these indicators-documentation, version control, open-source license, provision of software in containers, and the number of active developers-are related to best practices in software engineering and characterize overall software sustainability. The remaining four-comment density, modularity, automated testing, and adherence to coding standards-contribute to code quality, an important factor in software sustainability. We found that 29% (32 out of 112) of the global 15 impact models (GIMs) participating in the Inter-Sectoral Impact Model Intercomparison Project were accessible without contacting the developers. Regarding best practices in software engineering, 75% of the 32 GIMs have some kind of documentation, 81% use version control, and 69% have open-source license. Only 16% provide the software in containerized form which can potentially limit result reproducibility. Four models had no active development after 2020. Regarding code quality, we found that models suffer from low code quality, which impedes model improvement, maintenance, reusability, and 20 reliability. Key issues include a non-optimal comment density in 75%, insufficient modularity in 88%, and the absence of a testing suite in 72% of the GIMs. Furthermore, only 5 out of 10 models for which the source code, either in part or in its entirety, is written in Python show good compliance with PEP 8 coding standards, with the rest showing low compliance. To improve the sustainability of GIM and other research software, we recommend best practices for sustainable software development to the scientific community. As an example of implementing these best practices, we show how reprogramming 25 a legacy model using best practices has improved software sustainability.
... More than 90% of scientists view software as an important part of their research (Hannay et al., 2009). We began the development of agcounts out of necessity, as a way to meet the needs of our research team. ...
... Scientist developers also need to make their code usable for users which has been a longstanding barrier in accelerometry (Pfeiffer et al., 2022). Accessible and transparent software enhances the learning process for both developers and users while also increasing reproducibility and driving innovation in physical behavior research (Hannay et al., 2009;Jiménez et al., 2017). ...
Article
Portable accelerometers are used to capture physical activity in free-living individuals with the ActiGraph being one of the most widely used device brands in physical activity and health research. Recently, in February 2022, ActiGraph published their activity count algorithm and released a Python package for generating activity counts from raw acceleration data for five generations of ActiGraph devices. The nonproprietary derivation of the ActiGraph count improved the transparency and interpretation of accelerometer device-measured physical activity, but the Python release of the count algorithm does not integrate with packages developed by the physical activity research community using the R Statistical Programming Language. In this technical note, we describe our efforts to create an R-based translation of ActiGraph’s Python package with additional extensions to make data processing easier and faster for end users. We call the resulting R package agcounts and provide an inside look at its key functionalities and extensions while discussing its prospective impacts on collaborative open-source software development in physical behavior research. We recommend that device manufacturers follow ActiGraph’s lead by providing open-source access to their data processing algorithms and encourage physical activity researchers to contribute to the further development and refinement of agcounts and other open-source software.
... However, the vast majority of neuroscience researchers lack formal programming education, and as such, knowledge is variable across the field (Hannay et al., 2009). This is where fMROI stands out. ...
Preprint
Full-text available
This study introduces fMROI, an open-source software designed for creating regions-of-interest (ROIs) and visualizing magnetic resonance imaging data. fMROI offers a user-friendly graphical interface that simplifies the creation of complex ROIs. It is compatible with various operating systems and enables the integration of user-specified algorithms. Comparative analysis against popular neuroimaging software demonstrates the feasibility, applicability, and ease of use of fMROI. Notably, fMROI's interactive graphical interface with a real-time viewer allows users to identify inconsistencies and design more accurate ROIs, saving significant time by avoiding errors before storing ROIs as NIfTI files. Additionally, fMROI supports automation through command-line accessibility, making it ideal for large-scale analyses. As an open-source platform, fMROI provides a valuable resource for researchers in the neuroimaging community, facilitating efficient ROI creation and streamlining neuroimage analysis.
Article
In today's scientific landscape, research software has evolved from being a supportive tool to becoming a fundamental driver of discovery, particularly in life sciences. Beyond its roots in software engineering, research software now plays a crucial role in facilitating efficient data analysis and enabling the exploration of complex natural phenomena. The advancements in simulations and modeling through research software have significantly accelerated the pace of scientific research while reducing associated costs. This growing reliance underscores the importance of software in ensuring reproducibility – a cornerstone of scientific rigor and trustworthiness. Although verifying reproducibility presents challenges, well-developed and openly accessible research software enhances transparency and aids in the early detection of errors. Although verifying reproducibility can be challenging, well-developed and accessible research software improves transparency and facilitates error detection. This mini-review examines the characteristics of research software and summarizes the key events that have shaped its development, alongside changes in requirements and guidelines. Moreover, we propose two additional principles – reviewability and supportability – complementing the widely accepted FAIR principles (Findability, Accessibility, Interoperability, and Reusability). These new principles aim to improve the efficiency and effectiveness of software evaluation during the peer review process. Through this review, we aim to assist scientists, especially those without extensive software development expertise, in understanding best practices for developing research software and the underlying motivations driving these practices.
Article
Since high-throughput techniques became a staple in biological science laboratories, computational algorithms, and scientific software have boomed. However, the development of bioinformatics software usually lacks software development quality standards. The resulting software code is hard to test, reuse, and maintain. We believe that the root of inefficiency in implementing the best software development practices in academic settings is the individualistic approach, which has traditionally been the norm for recognizing scientific achievements and, by extension, for developing specialized software. Software development is a collective effort in most software-heavy endeavors. Indeed, the literature suggests teamwork directly impacts code quality through knowledge sharing, collective software development, and established coding standards. In our computational biology research groups, we sustainably involve all group members in learning, sharing, and discussing software development while maintaining the personal ownership of research projects and related software products. We found that group members involved in this endeavor improved their coding skills, became more efficient bioinformaticians, and obtained detailed knowledge about their peers’ work, triggering new collaborative projects. We strongly advocate for improving software development culture within bioinformatics through collective effort in computational biology groups or institutes with three or more bioinformaticians. Availability and implementation Additional information and guidance on how to get started is available at https://ferenckata.github.io/ImprovingSoftwareTogether.github.io/.
Article
Computer code plays a vital role in modern science, from the conception and design of experiments through to final data analyses. Open sharing of code has been widely discussed as being advantageous to the scientific process, allowing experiments to be more easily replicated, helping with error detection, and reducing wasted effort and resources. In the case of psychology, the code used to present stimuli is a fundamental component of many experiments. It is not known, however, the degree to which researchers are sharing this type of code. To estimate this, we conducted a survey of 400 psychology papers published between 2016 and 2021, identifying those working with the open-source tools Psychtoolbox and PsychoPy that openly share stimulus presentation code. For those that did, we established if it would run following download and also appraised the code’s usability in terms of style and documentation. It was found that only 8.4% of papers shared stimulus code, compared to 17.9% sharing analysis code and 31.7% sharing data. Of shared code, 70% ran directly or after minor corrections. For code that did not run, the main error was missing dependencies (66.7%). The usability of the code was moderate, with low levels of code annotation and minimal documentation provided. These results suggest that stimulus presentation code sharing lags behind other forms of code and data sharing, potentially due to less emphasis on such code in open-science discussions and in journal policies. The results also highlight a need for improved documentation to maximize code utility.
Conference Paper
Modern research depends on software, and it is developed specifically for and during research. Moreover, a growing number of researchers develop and use software to conduct or support their research. However, an overall understanding of research software is still needed. Our goal is to characterize the publication landscape on research software use and development that have been used in the context of software outlining research contributions and contemporary gaps for future research. We conducted a systematic mapping study and identified 20 studies published in the last 15 years. We found several types of contributions, in which there is an emphasis on analysis and some trends in the type of research evaluation, and the main type of empirical evaluation is a survey. Our results suggest that there are opportunities for solution proposals to address gaps concerning factors that influence research software success in an academic context, providing guidelines, checklists, and models.
Article
Full-text available
This paper argues that the reliability of engineering computation can be significantly improved by adopting software engineering methodologies for requirements analysis and specification. The argument centers around the fact that the only way to judge the reliability of a system is by comparison to a specification of the requirements. This paper also points to methods for documenting the requirements. In particular, a requirements template is proposed for specifying engineering computation software. To make the mathematical specification easily understandable by all stakeholders, the requirements documentation employs the technique of using tabular expressions. To clarify the presentation, this paper includes a case study of the documentation for a system for analyzing statically determinant beams.
Article
Full-text available
Studies of computational scientists developing software for high-performance computing systems indicate that these scientists face unique software engineering issues. Previous failed attempts to transfer SE technologies to this domain haven't always taken these issues into account. To support scientific-software development, the SE community can disseminate appropriate practices and processes, develop educational materials specifically for computational scientists, and investigate the large-scale reuse of development frameworks.
Article
Full-text available
This paper describes a case study of software engineers developing a library of software components for a group of research scientists, using a traditional, staged, document-led methodology. The case study reveals two problems with the use of the methodology. The first is that it demands an upfront articulation of requirements, whereas the scientists had experience, and hence expectations, of emergent requirements; the second is that the project documentation does not suffice to construct a shared understanding. Reflecting on our case study, we discuss whether combining agile elements with a traditional methodology might have alleviated these problems. We then argue that the rich picture painted by the case study, and the reflections on methodology that it inspires, has a relevance that reaches beyond the original context of the study.
Conference Paper
Full-text available
The need for high performance computing applications for computational science and engineering projects is growing rapidly, yet there have been few detailed studies of the software engineering process used for these applications. The DARPA High Productivity Computing Systems Program has sponsored a series of case studies of representative computational science and engineering projects to identify the steps involved in developing such applications (i.e. the life cycle, the workflows, technical challenges, and organizational challenges). Secondary goals were to characterize tool usage and identify enhancements that would increase the programmers¿ productivity. Finally, these studies were designed to develop a set of lessons learned that can be transferred to the general computational science and engineering community to improve the software engineering process used for their applications. Nine lessons learned from five representative projects are presented, along with their software engineering implications, to provide insight into the software development environments in this domain.
Article
Considerable emphasis in scientific computing (SC) software development has been placed on the software qualities of performance and correctness. How ever, other software qualities have received less attention, such as the qualities of usability, maintainability, testability and reusability. Presented in this work is a survey titled "Survey on Developing Scientific Computing Software", which is apparently the first conducted to explore the current approaches to SC software development and to determine which qualities of SC software are in most need of improvement. From the survey we found that systematic development process is frequently not adopted in the SC software community, since 58% of respondents mentioned that their entire development process potentially consists only of coding and debugging. Moreover, semi-formal and formal specification is rarely used when developing SC software, which is suggested by the fact that 70% of respondents indicate that they only use informal specification. In terms of the problems in SC software development, which are discovered by analyzing the survey results, a solution is proposed to improve the quality of SC software by using SE methodologies, concretely, using a modified Parnas' Rational Design Process (PRDP) and the Unified Software Development Process (USDP). A comparison of the two candidate processes is provided to help SC software practitioners determine which of the two pro cesses fits their particular situation. To clarify the discussion of PRDP and USDP for SC software and to help SC software practitioners better understand how to use PRDP and USDP in SC software, a completely documented one-dimensional numerical integration solver (ONIS) example is presented for both PRDP and USDP.
Article
Not all scientific computing is high-performance computing - the variety of scientific software is huge. Such software might be complex simulation software developed and running on a high-performance computer, or software developed on a PC for embedding into instruments; for manipulating, analyzing or visualizing data or for orchestrating workflows. This special issue provides some flavor of that variety. It also explores the question of how the development of scientific software can be improved.
Article
The development of scientific software involves risk in the underlying theory, its implementation, and its use. Through a series of interviews, the authors explored how research scientists at two Canadian universities developed their software. These interviews indicated that the scientists used a set of strategies to address risk. They also suggested where the software engineering community could perform research focused on specific problems faced by scientific software developers.
Article
Comparing the improvement of existing software development processes to fixing up an old house, the author argues that working on one item at a time can be more practical than starting from the ground up. The author works at a software company that he imagines resembles many others. The company started out with five employees, including one programmer, and has grown to about 170 employees and $35 million a year in sales. Naturally the organization has experienced growing pains, and it is currently redefining what development means at the company. The article presents the author's view of the company's struggle to improve