Conference PaperPDF Available

Verification and validation in industry—A qualitative survey on the state of practice

Authors:

Abstract and Figures

Verification and validation activities take a substantial share of project budgets and need improvements. This is an accepted truth, but the current practices are seldom assessed and analyzed. In this paper we present a qualitative survey of the verification and validation processes at 11 Swedish companies. The purpose was to exchange experience between the companies and to lay out a foundation for further research on the topic. The survey is conducted through workshop and interview sessions, loosely guided by a questionnaire scheme. It is concluded from the survey that there are substantial differences between small and large companies. In large companies, the documented process is emphasized while in small companies, single key, persons have a dominating impact on the procedures. Large companies use commercial tools while small companies make in-house tools or use shareware. Common to all the surveyed companies is that verification and validation is considered important, and thus having rather high status. Information exchange between companies during the survey was considered to be very valuable to the involved subjects.
Content may be subject to copyright.
Abstract
Verification and validation activities take a substantial
share of project budgets and need improvements. This is
an accepted truth, but the current practices are seldom
assessed and analyzed. In this paper, we present a qualita-
tive survey of the verification and validation processes at
11 Swedish companies. The purpose was to exchange
experience between the companies and to lay out a foun-
dation for further research on the topic. The survey is con-
ducted through workshop and interview sessions, loosely
guided by a questionnaire scheme. It is concluded from the
survey that there are substantial differences between small
and large companies. In large companies, the documented
process is emphasized while in small companies, single
key persons have a dominating impact on the procedures.
Large companies use commercial tools while small com-
panies make in-house tools or use shareware. Common to
all the surveyed companies is that verification and valida-
tion is considered important, and thus having rather high
status. An experience from the survey as such, is that the
information exchange between the companies during the
survey was considered very valuable to the involved sub-
jects.
1. Introduction
Verification and validation (V&V) are the activities
performed during a software development project to
ensure that the right system is developed and meets the
expectations of its customers (validation) and that the
developed system is correct and conforms to its specifica-
tions (verification) [Boehm79]. The verification and vali-
dation of software systems take a substantial share of
project budgets. Classical rules of thumbs indicating how
much time is spent on V&V still seem to be valid. Brooks,
for example, devotes 50% of the budget for testing in his
classical book “The Mythical Man-Month” [Brooks75].
However very little effort is spent actually surveying the
current status of V&V activities today. Other parts of the
development process are surveyed, for example require-
ments engineering [Lubars92, Groves00], and particular
attention has been paid to the application of scenarios
[Weidenhaupt98]. A recently conducted quantitative sur-
vey, with focus on lead time consumption, identifies the
test process as taking very much of the lead-time when
developing distributed real-time systems [Bratthall00].
In order to get a better picture of the current practice in
industry, a qualitative survey has been conducted and is
reported in this paper. The purpose is to learn how V&V is
conducted in a set of companies, investigate variations
between the companies, and trying to identify patterns in
the observations. The survey serves two different purposes
for the stakeholders in the survey. For the researchers, the
survey acts as a starting point for further research on
improvement of V&V processes. They want to identify
issues relevant to industry for further research. For the
industry participants, the survey acts as an experience
interchange between peers in different non-competing
companies.
The paper is structured as follows. In Section 2, the
methodology used in the survey is presented. Section 3
discusses the trustworthiness of the study. Section 4
presents the surveyed companies and their basic character-
istics. In Section 5 we report the observations made, the
analyses conducted and the findings made during the sur-
vey, and finally, a summary is presented in Section 6.
2. Methodology
The empirical research conducted within software engi-
neering has so far primarily been of quantitative nature.
However, software engineering empiricists should, like
Robson in the social science research, play down the dif-
ferences and regard the differences between qualitative
and quantitative research as primarily technical
[Robson93].
A quantitative study can provide exact answers to
research questions, including statistical analysis with cer-
tain confidence levels [Wohlin00]. However, the studied
questions are for practical reasons of detailed nature, for
Verification and Validation in Industry
- A Qualitative Survey on the State of Practice
Carina Andersson and Per Runeson
Software Engineering Research Group
Department of Communication Systems
Lund University, Box 118, SE-221 00 Lund, Sweden
{carina.andersson,per.runeson}@telecom.lth.se
example, “is inspection technique A more efficient than tech-
nique B?” [Thelin01], or “are inspection meetings worth
spending effort on?” [Porter97]. A qualitative study is less
exact of its nature [Robson93]. No statistical significance is
achieved, and no exact figures are reported. On the other
hand, broader questions can be researched, like “why do/ do
not people use inspections?” or the question of this study,
“how are companies conducting V&V?”. Both quantitative
and qualitative research should be conducted in order to
explore, explain and improve software engineering practice.
This study is a qualitative study, based on data collection
in workshops and interview session, using the focused inter-
view technique [Robson93]. The procedures followed are
presented in the next sub-section.
2.1. Procedures
The survey was conducted in two cycles. First, five com-
panies were surveyed using the workshop format. The com-
panies attended a workshop series, organized within a local
software process improvement network (SPIN)
1
. The attend-
ants assigned themselves to the workshop based on their
interests. The workshop cycle was conducted as follows:
1. The workshop host presented the company V&V activi-
ties. The hosts were free to present any topic, as long as
they included a list of strong and weak issues regarding
their V&V process.
2. The researchers checked that a list of questions was cov-
ered (see Appendix A) and raised questions if not.
3. The researchers summarized the meeting in a report,
which was proofread by the company representative.
4. The findings were analyzed, compiled into a joint report
and fed back to the companies.
In the second cycle another six companies were surveyed
using a more direct interview format. Those companies were
approached by the researchers, and selected to achieve diver-
sity with respect to size, age of company, and application
domain. This cycle was conducted in a similar manner, except
that the only participants at the meeting were the interviewees
and the researchers. Finally, within the second cycle, one of
the first five organizations was interviewed again, since they
were at a point of major process change at the time of the first
interview. The second interview was conducted six months
later, when some of the changes had taken place.
2.2. Researcher View
A qualitative survey is not an objective study, but a view
of the world seen from a researchers viewpoints. In order to
enable critical review of the observations and conclusions of
the study, the viewpoints of the researcher are reported here,
leading to the research questions investigated. Below, a few
statements summarize the values of the researchers, which
performed this investigation.
Process focus. The documented development process is
a means for communicating within an organization and
for capturing experience. A process focus, i.e. to have a
defined process, may contribute to manage and control
the organization’s activities.
Balance between process and people. All knowledge and
experience cannot be elicited in a documented process.
Software engineering is hence dependent on individuals.
The process is assumed to be more important for large
companies, while individuals may be more important for
smaller companies.
Inspections. It is assumed, and to some extent empiri-
cally shown [Basili87], that inspections are efficient
means for early defect removal. Inspections are also
assumed to contribute to information spreading within a
project or an organization.
Structured V&V. It is assumed that a structured approach
to V&V, for example statistical testing and model-based
development, would help many companies, and would
improve their efficiency and effectiveness.
The view is further defined in terms of the set of questions
raised, and might affect unconsciously in the observations
and the interpretation of the observations. Hence this open
presentation of the view provides the readers with means to
make their own interpretation. The research questions derived
from the researcher viewpoints are more specifically:
RQ1. How much is the documented process emphasized
by the organizations?
RQ2. Are there any relations between the process empha-
sis and characteristics of the organizations or their products?
RQ3. Which criteria are ruling the selection and improve-
ment of the process?
3. Trustworthiness
In a quantitative study, the concept of validity is central
when judging the result of an empirical study [Wohlin00]. In
qualitative studies, the concept of trustworthiness is in focus
instead [Lincoln85]. Different aspects of trustworthiness are
credibility, transferability, dependability and confirmability.
They are presented below together with comparisons to their
validity counterparts, and an analysis of the trustworthiness of
the current study.
Credibility concerns the identification and description of
the subjects in the study. The parallel to quantitative studies is
internal validity, in particular instrumentation, while for
example, the validity threat of statistical regression is not
applicable.
The primary techniques used for improving credibility are
triangulation and peer debriefing. Triangulation involves the
use of multiple sources to enhance the rigour of the research
and peer debriefing contribute to guarding against researcher
bias. One measure taken towards triangulation is to have two
kinds of questions: direct questions where the interviewees
are asked how they perform their V&V, and indirect questions
wheretheyareaskedwhatisgoodandbadintheirV&V
process. Peer debriefing includes that the researchers in pair
performed the analysis of the current study, and the results are
1. http://www.spin-syd.org
reported back to the subjects of the study in writing and in
seminars.
Transferability is the counterpart to external validity. The
purpose of qualitative studies is not to make generalizations,
but to explore the variation between and within the different
cases. We have tried to vary parameters, such as age of sub-
ject organizations, size and application domain, to get a suffi-
ciently varying foundation for the study.
Dependability issues are highly related to credibility
issues. The dependability is related to how the study is con-
ducted, i.e. investigation procedures are systematic, well doc-
umented and clear, with the purpose to provide safeguards
against researcher bias. The validity counterpart in quantita-
tive studies is reliability [Robson93] or conclusion validity
[Wohlin00].
In this study, the investigation procedures are openly
reported (see Section 2.1), the researcher view is presented
(see Section 2.2), data collection is documented, although not
publicly available for confidentiality reasons. The presented
observations reflect the survey participants’ answers, i.e. it is
the company representatives’ own pictures which is pre-
sented, with a risk that it is polished. On the other hand, the
participants had nothing to gain by giving false information.
However, in any improvement program, the starting point is
to elucidate the existing situation by means of the involved
personnel.
Confirmability has to some extent its counterpart in con-
struct validity. Is the study designed to show what it is
intended to show? In qualitative studies, the means for evalu-
ating confirmability are to assess the process and to assess the
collected data. One option is to assign an auditor to assure the
quality of the study. This is not conducted within the current
study.
In summary, the study is performed according to a struc-
tured method and reported as openly as confidentiality issues
allow. Additional measures could have been taken, such as
triangulation using different methods or involving external
survey process auditors.
4. The Surveyed Companies
In the survey eleven companies operating in the southern
part of Sweden have taken part. The companies are listed in
Table 1, and referred to as pseudonyms for confidentiality
reasons.
4.1. Products
The participating organizations are characterized in terms
of their developed products, and the products are categorized
regarding application domain or type. Radar develops large
real-time systems, including image analysis. Phone develops
consumer products in the area of communication. Network
develops software, primarily embedded in their own products
for communication and storage. The core product of Automa-
tion is a stand-alone software product, used to develop appli-
cations in the domain of control and communication. Read,
Security and Monitor do all develop products with embedded
software for image processing, but do also co-operate with
industrial partners who use the products as components in
their systems. The surveyed parts of Product, Sales and Sub-
Sales develop support systems for internal or semi-internal
use. The core products of Product and Sales are not in the
software development domain, but still they have additional
software development to manage the internal needs of sale
support systems and products information systems. SubSales
is a subcontractor to Sales, and is developing the software
framework, to which Sales add the domain specific parts.
4.2. Size
The variation in size among the participating companies is
noticeable, ranging from small companies like SubSales,with
6 developers, to Radar with approximately two hundred
developers, see Table 1. The size of the organizations relate to
the development staff on one site. If other departments exist
elsewhere, these are not included in the presented figures.
Table 1. Participating organizations.
Company Products Customer Business model # Developers Age (years)
Radar Radar image processing Defence Contract 200 >20
Phone Communication Private Market 150 ~15
Network Communication Business Market 120 18
Automation CASE tools and control Business Market 80 ~25
CASE CASE tools Business Market 50 11
Read Image processing Private Contract/Market 30 6
Security Image processing Business Market 20 5
Monitor Image processing Business Market 20 3
Product Support systems Internal Contract 7 4
Sales Support systems Internal Contract 7 10
SubSales Support systems Subcontractor Contract 6 11
The history of the companies is also different; some of
them are established many years ago, whereas others are
recently started. The ages in Table 1 reflect the number of
years the surveyed activities in each company have been
active.
4.3. Product Values
The developed products are also categorized due to
whether the products are focused on features or characteris-
tics. Different products may have their richness in the features
or in the characteristics of the functionality. For a scanner for
example, may the characteristics be of most importance; the
quality of the image or the speed of the image transfer. Of
minor interest to the user may the features be; the possibilities
to manipulate, like rotating, the image.
Radar, Read, Security and Monitor are companies that
develop products that are dependent on the characteristics of
the functionality but comprise a smaller number of features,
which requires another testing focus, see Figure 1. Often the
quality of the output is of most concern in a product that has
more characteristics. For example in image processing, algo-
rithms may be changed, and the testing has to verify that it
has improved the characteristics for a given set of situations.
5. Observations
This section describes the observations that were made
during the workshops and interviews. It is, as mentioned
before, the researchers’ viewpoints that are reported here. The
observations are grouped into different categories. In
Section 5.1, observations related to the development process
are reported. Section 5.2 reports on the development environ-
ment. The more subjective values within the companies are
reported upon in Section 5.3. Hence, the first two subsections
contain issues, which are more objectivity observable, while
the latter depend on the interviewees’ view of their compa-
nies.
5.1. Process
The observations made regarding the V&V process are
focused on the documented process models, the documenta-
tion used in the project, the specific test methods used, the
organization and finally issues on communication about the
V&V activities.
5.1.1. Process Models. The discussions concerning the V&V
process proceeded from a general test process, according to
Figure 2, since all of the companies had some kind of model
to follow, though not always documented. Everybody was
able to map his or her activities into this general model,
though naturally with different degrees of formalism. The
survey displayed a spectrum of process definition, ranging
from a very well defined process at Radar to a very informal
and unemphasized at SubSales. Descriptions of each com-
pany process are available in Appendix B. Table 2 shows the
relationships between company age and the degree of empha-
sis of the process. Table 3 instead focus on company size.
A common opinion among the companies is that it is
important to increase the visibility of the verification and val-
idation process, and allow the members of the organizations
to understand the importance of the verification and valida-
tion. Communicating the process is of high priority, and often
a part of the ongoing improvement work.
Worth noticing is the overall awareness of the need for
continuous improvement, though the approach taken to
improvements varies a lot. Some of the companies have a his-
tory without any formal process and existing documentation,
but today they realize the need of a defined, visible process.
Training and motivating the organization is of importance,
especially including the developers who need to have an
understanding of the verification and validation process, as
they perform the early tests.
5.1.2. Documentation. The type of produced artifacts and the
consistencies between them often depend on how well the
companies’ processes are defined. Among the recently started
companies, the quality of the produced documents often var-
ies, without company standards and at varying levels of
detail. However, as the organizations are small, this is not
considered very harmful.
Radar is working in the defence domain, which implies
strict documentation as a natural thing, with high require-
ments of detailed documentation, which still is a necessity.
The process defines very clearly which documents should be
produced in each phase. For example, already during the
module tests, the developers should document formal test
results. The developers at Phone also have to document their
results from the module tests before delivering to the next
phase. Extensive documentation is also required in the other
processphasesatPhone.
Automation has mentioned that the process of changing
project documents is slow, with several mandatory steps to go
through; to perform the change, to convene a group of
Figure 1. Features vs. characteristics
Primarily
features
Primarily
characteristics
Radar
Read
Security
Monitor
Phone
Network
Automation
CASE
Product
Sales/SubSales
Module
testing
Integration
testing
System
testing
Acceptance
testing
Figure 2. The general test process
reviewers, to review, to update, and to sanction the proposal.
This is considered inefficient, partly depending on the size of
the company. Some of the other surveyed companies have the
advantage of being able to introduce changes more quickly.
One example is Security, which emphasized their existing
positive attitudes to changes, the ease of introducing new rou-
tines, such as new templates. They do not have the need of
going through several instances when making decisions.
Read has not had any prescribed documentation, but this is
changing. This is a part of their new process, which also
points out that the test strategy will be decided early in the
projects. Monitor recently started organizing its documenta-
tion and has noticed quality variations, depending on who has
written the documents. This is however an area that is recog-
nized as a future improvement area in this company, and the
goal is to make the documentation more uniform. Sales is
another company with informal documentation, including
few rudimentary test specifications.
5.1.3. Methods. None of the companies in the survey use any
statistical methods, like reliability growth models, to decide
when to stop the testing. A few examples of used V&V meth-
ods were presented during the survey.
Radar and Automation are experimenting with the use of
factorial design methods to decide on which test cases to run
[Berling00, Cohen96]. Monitor has evaluated factorial design
methods, but concluded them to be too extensive to apply in
their context.
Security and Monitor are developing test matrices, which
visualize the selection of combination that will be covered by
the test cases and document that the chosen test cases cover
all requirements. This has improved the traceability between
requirements and test cases, which in the survey often is men-
tioned as a difficulty.
Automation has informally tried pair specification in a
smaller team, with good results. Pair specification is an anal-
ogy to pair programming, which means that developers work
operatively in pairs in front of the computer [Beck99]. How-
ever, it is not written down as a method and there are no
instructions for this.
Among the more recently started companies the spirits and
ideas from the first employees are still clearly visible. These
ideas have affected the companies’ present strategies. One
example is Security, which from the start has had an
employee that considered code inspections as important and
valuable to the company. His interest in this area has affected
the other employees and the importance of code inspections
has been emphasized from the beginning, and thereby all of
the code is thoroughly reviewed. Both Automation and Secu-
rity have mentioned their inspections as actions they are con-
sidering as strengths in their companies. At Automation it is
part of the well-defined process to review produced docu-
ments, however, they also mention that the inspection of code
is an action that needs improvement to be more efficient.
5.1.4. Organization. The larger companies have separate test
teams, while the smaller companies usually have one respon-
sible person for verification and validation. However, at the
small companies the rest of the employees are often involved
part-time in the testing. The developers generally perform the
unit and module testing. The various phases of a life-cycle
model, see Figure 2, may overlap to a considerable extent,
and many test tasks may actually span multiple
phases [Kit95]. This gives an indistinct delivery to the test-
ers, that might be hard to define.
Radar has a separate organizational unit for system verifi-
cation. Depending on the customer requirements, software
modules may be tested by the developer or by the system ver-
ification unit as an independent V&V procedure. As the sys-
tem is growing larger and larger, the development unit now
performs some integration tests that were previously con-
ducted by the system verification unit. This implies that sub-
systems are delivered to the system verification unit instead
of modules.
Developers at Phone have to sign a test report when they
have performed the module tests, consisting of a specified
Table 2. Process emphasis versus company age.
0-6 7-12 13-19 20-
Emphasized Radar
Automation
Phone
Network
CASE
Security
Monitor
Read
Not
Emphasized
Product Sales
SubSales
number of test cases. Thereafter the code is delivered at an
acceptance meeting, to the integration test group. After the
integration tests another acceptance meeting takes place
before delivering to the system tests.
Network has a quality assurance (QA) group, which is
responsible for the system testing and is also engaged in
reviews. In the opinion of the QA-group, it is better to do all
the testing after the development is finished, otherwise there
is a risk that the developers reintroduce defects in the product
and the need of regression tests increases. The QA-group
receives the product after the integration tests are performed
with the criterion that all functionality should have been
implemented.
At Automation the separation between module and integra-
tion tests of the components is difficult, since its development
is based on daily build, followed by the testing of the whole
system. The testers are responsible of the system tests, which
are performed after the integration of the product. The accept-
ance tests are carried out as separate projects, without influ-
ence by former test teams.
The test group at CASE has recently grown noticeably as a
result of reorganizations. From having a test group that was
small and primarily consisted of students, having a job on the
side, the company now has a test group consisting of profes-
sional testers. The project staff is almost equally divided into
developers and testers.
The subcontractor SubSales performs most of the pro-
gramming of Sales product, while the major part of the test-
ing is performed by Sales themselves. This influences Sales
test organization; the majority of the project members are
involved in the test activities. At Product, a similar organiza-
tion is used, but in this case there are two sub-departments,
one responsible for development and one for test.
5.1.5. Communication and Experience. The small compa-
nies in the survey emphasize the possibilities for informal
communication. In companies with only a few employees, it
is easy to start communicating, for example around the water
tank, about things that are important to know in order to do
the job properly, and to spread understanding.
Automation openly discuss the problems regarding com-
municating and establishing the processes and instructions
with the employees. They would also like to see improve-
ments in the area of using the experience available in the
company, but not accessible in a useful way. For example, it
is desirable to use existing test experiences to make better test
specifications, however they have no implemented method
for this.
Even though it is desirable to pass test experiences on to
junior employees, the available experiences, which often any-
way result in good quality products, are mentioned. At sev-
eral companies, a major part of the personnel has worked for
many years with the systems. This has made the organizations
flexible when it comes to testing, and the problem solving
ability is noticeable. Most of the training is performed inter-
nally at each company, comprising of, for example, test tools
and environments, processes and methodology. Other skills
are obtained through experience.
5.1.6. Interpretations. Our interpretation of the observations
concerning the documented V&V process is the following:
The process is more emphasized in large companies.
Smaller companies have started to identify inconsisten-
cies in the documentation standard, but have not consid-
ered these being too harmful. This difference is not
related to age of the companies, but the age and the size
is correlated on most of the cases observed.
The approach taken to improvement is very much
dependent on the people in the company. Read, Security
and Monitor have a joint background and some core
technologies in common, but Monitor is focusing on
hands-on activities, Read is defining process models and
Security is working with an assessment model. The
choice of improvement model is to a large degree influ-
enced by key persons making the improvement deci-
sions.
Table 3. Process emphasis versus # developers.
-9 10-49 50-100 101-
Emphasized Automation Radar
Phone
Network
CASE
Security
Monitor
Read
Not
Emphasized
Product
Sales
SubSales
The use of structured approaches to V&V is sparse.
Instead the selection of test cases is very much based on
the experience of the staff. The unit and module tests are
mostly performed informally by the developers. This is
an area often mentioned as a candidate for improvement.
On the other hand, no one has reported particular prob-
lems that can be traced back to the lack of structured
methods specifically.
5.2. Tools and Environment
The observations within tools and environments for V&V
are made concerning defect reporting systems used, test auto-
mation and configuration management tools and procedures.
5.2.1. Defect Reporting Systems. The defect reporting sys-
tems at the companies are generally used when someone else
than the developers are testing the product. These systems are
sometimes built at the companies, sometimes bought exter-
nally. The larger companies use a commercial tool, while the
smaller use in-house tools, freeware or simply e-mail commu-
nication.
In one small company, Monitor, an alarming e-mail is sent
as a broadcast to all the developers, and the responsible
developer acts on this. At Product and Sales,withtheirsmall
project sizes, the problems are often handled informally,
reported over the telephone. At larger companies a change
control board, more or less formally, appoints a developer to
correct a found defect.
Mentioned at Phone is the lack of information whether the
fault is reported during function or system testing, which
could be desirable for fault monitoring.
5.2.2. Test Automation. Practically everyone has mentioned
a wish of increasing the use of automatic tests. However, the
lack of time for introducing and implementing automation
conflicts with this wish. Read, who has currently restructured
its V&V process, has also thought of test automation, but is
considering this to be of less importance. There are other
areas that have higher priorities at the moment.
All the testing is however not done manually at the compa-
nies. Radar uses simulated environments in its verification, to
reduce the costs of testing the product in the operational envi-
ronment. Phone has tools to ensure that memory leaks are
found during the module tests, and to measure code coverage.
During the function and system test phases more automation
is desirable. CASE has changed their software architecture to
support of using test scripts. Read has a robot to support some
testing of their products. Security and Monitor have recorded
material in their databases, series of images and movies, to
test their products with, in order to be able to verify adjust-
ments in the image analysis algorithms. Sales is using auto-
matic test programs combined with manual tests when
performing regression tests. Their intention is to make Sub-
Sales, their subcontractor, to also perform automatic tests
before delivery.
The problem with distributed systems is mentioned during
the survey. Different configurations of third part products
imply that a lot of test time is required. At Sales the lack of
routines for this at the installation tests is noticeable, and con-
sidered to be an area in need of improvements.
5.2.3. Configuration Management. The documents and
other artifacts produced are handled with different degrees of
formalism in the companies. The larger companies have well
defined processes with routines and rules that imply a docu-
mentation that is formally reviewed and version managed.
Commercial tools are used for the configuration management.
Automation and Security both consider their configuration
management as working well. Monitor has recently taken the
step towards a more organized configuration management,
while at Product the configuration management is handled
very informally, without any possibility to fall back to docu-
ment versions between baselines.
5.2.4. Interpretation. Our interpretation of the observations
regarding tools and environment falls back on the differences
between small and large companies.
Small companies have in-house or freeware tools, large
companies have externally developed tools. This is, of
course, a matter of the ability to invest in powerful tools,
related to the less visible cost of spending working hours
to develop and maintain internal solutions. When it
comes to configuration management, again the large
companies are more organized, but also smaller compa-
nies may have a well organized configuration manage-
ment.
Regarding automation, the companies seem to have a
balanced view between the vision of saving test execu-
tion time and the awareness that test automation requires
extensive investment in test scripts and environments.
The result so far is that little test automation is con-
ducted.
5.3. Corporate Values
A third area of observations is softer issues related to atti-
tudes towards the V&V activities, key priorities for the com-
pany and their relations to business partners. Observations
related to these issues are here gathered under the headline of
corporate values.
5.3.1. Attitudes. Among the small companies all the employ-
ees usually take joint responsibility for the developed prod-
uct. Everyone is doing one’s share to get a complete product
according to the specifications. Hence the developers usually
give positive response when the testers find faults in the prod-
uct.
This attitude seems harder to achieve in a larger company,
where it is harder to grasp the final product due to its size and
complexity, and where processes, roles and responsibilities
are defined to ensure that the right person takes action. The
general positive will is present in these companies as well,
but there is a need for more structure. Although there is a
strong will, the attitude against the testers are not always
entirely positive, for example when it comes to requests of a
more thorough documentation. The attitudes may also change
during the project, varying from positive to more negative
when the pressure increases in the final phases of the project.
5.3.2. Key Priorities. All participants were asked about their
priorities between costs, quality, and time; which factor they
considered to be of most importance in their projects. The
most common answer was quality, while time had second pri-
ority, but these answers are not completely unambiguous.
Radar, CASE, Monitor and Sales consider the time schedule
to have highest priority, and if it comes to time pressure,
instead they compromise and reprioritize the functionality.
Phone is currently restructuring their development process
to be more incremental. Considerable delays in former
projects have forced this change, since release dates are of
high priority. The new process facilitates the possibility to
release a well tested product without implementing a subset
of the planned features if the project of any reason is delayed.
At Network they consider it more acceptable with a greater
number of faults in a new product, which implies that the pri-
orities are different in different projects. Security also men-
tions that the process conformance is dependent on which
kind of project is running. If it, for example, is a pilot project
with the purpose of showing proof of a concept, more short
cuts may be taken. Product instead has the choice of delaying
its releases until they consider themselves to be ready or post-
poning some implementations to later releases.
When considering the priorities of time, the issue of plan-
ning was mentioned as a problem in most of the surveyed
companies. In Monitor, it is the goal of their new process that
the tests should be planned at an early stage, as early as the
start of the project planning activities. All of the surveyed
companies agree with Monitor that this is of importance and
their ambition as well. Read has also mentioned in their pro-
posed process that the plans of having activities throughout
the product’s life-cycle should be established at project start,
instead of that the test work starts at the end of the projects.
Especially the planning for changes, e.g. change request, is
mentioned as an area that needs improvements. Time plans
are often changed due to many participants involved in the
projects, and thereby unexpected occurrences.
5.3.3. Relations to Business Partners. Half of the inter-
viewed companies have their business relations to a market,
rather than to specific customers. Hence some of these issues
do not apply to their situation.
Some of Radars verification activities are performed in
cooperation with their customers, though it is important to
have communicated the agreements of the acceptance activi-
ties, which vary from customer to customer, before these are
supposed to be performed.
Sales mentioned their attempt to be proactive by distribut-
ing known problems in the software to their customers. How-
ever, this gave negative feedback. The customers instead
became disappointed in the amount of existing bugs, even
though the amount was not higher than in earlier releases.
Like Sales, Product has only internal customers, which
allows them to be able to be conservative in the growth of
functionality. Implementations can easily be postponed to
later releases, and the developers can be fastidious when it
comes to “flashy” features.
Sales is an evident example of organizations having very
close relationships with some of their subcontractors. It is a
part of the agreement that SubSales delivers semi-manufac-
tured products, very much like an internal development
group, while other subcontractors to Sales deliver a final
product. The relations to SubSales are very informal and per-
sonal; contacts are handled by telephone, even when it comes
to changes in the specifications. This relation, which occa-
sionally is considered not to be businesslike enough, and its
difference from relations with the other subcontractors, has
been pointed out as a culture problem.
5.3.4. Interpretation. Our interpretation of the observations
regarding corporate values and attitudes is the following:
V&V is not a discipline where inexperienced staff is
considered to be suitable. Instead most of the companies
agree on the need for skilled software testing profession-
als. In the surveyed companies the testers achieve a fair
share of recognition and rewards for contribution made
to product quality and the status of being part of a test
team has improved.
Regarding key priorities, the answers are ambiguous,
though the choice is often between time and quality,
which results in compromises in functionality. Differ-
ences can be seen by observing the companies in view of
their customers. The companies with internal customers
are not exposed to competition, and hence can compro-
mise more freely about functionality and occasionally
release dates.
6. Summary
All of the companies have problems and difficulties in
their V&V processes, which might be improved by new
approaches and ideas. Several similar experiences have been
recognized, even though it is gained under different circum-
stances in different environments.
The attempts of finding some patterns in making compari-
sons between the surveyed companies resulted in an almost
linear dependency between the level of emphasized process
and the size of the companies. A question asked at the
moment of the analysis was when a company grows over the
threshold, and when the need of more defined procedures gets
significant. No conclusions on this matter were however
drawn. Noticeable was how one key person in the small com-
panies could have considerable impact, while this was a little
bit harder to discover at the larger companies.
The general opinion among the participants of the survey
is that this has been valuable to all involved parties. The com-
prehensive discussions, between the participants during the
workshop, of general V&V process issues have shown new
possible approaches, both for the recently started companies
and the more established ones. During the meetings the dis-
cussions have mostly concerned generally applicable situa-
tions and ideas such as methods, artifacts, and tools, while
specific techniques, test selection procedures, and detailed
levels of test environment have been left out. Thus, the dis-
cussions have been of interest to everybody and the exchange
of information has been successful, despite the participants’
different basic conditions.
Regarding research methodology, the survey is conducted
using a structured procedure, reported as openly as possible.
Qualitative methods are relatively new to the software engi-
neering domain; hence a practice has to evolve on how to per-
form such studies. This study represents an initial attempt,
aiming at both characterization of current practice in industry
as a starting point for further research and for experience
exchange between peers in the surveyed companies.
7. Acknowledgement
The researchers are thankful to the participating compa-
nies within SPIN-syd for their contribution. Thanks also to
Daniel Karlström, Thomas Olsson and Dr. Martin Höst at the
Department of Communication Systems at Lund University
for reviewing an earlier draft of this paper and being involved
in discussions on the topic. This study was partially funded
by The Swedish Agency for Innovation Systems (VIN-
NOVA) under grant for The Center for Applied Software
Research at Lund University (LUCAS).
8. References
Basili87 V. R. Basili and R. W. Selby, “Comparing the
Effectiveness of Software Testing Strategies”,
IEEE Transactions on Software Engineering,Vol.
13, No. 12, pp. 1278–1298, 1987.
Beck99 K. Beck, “Embracing change with Extreme Pro-
gramming”, IEEE Computer, Vol 42, No. 10, pp.
70-77, 1999.
Berling00 T. Berling and P. Runeson, “Application of Facto-
rial Design to Validation of System Performance”,
Proceedings 7th Annual IEEE International Con-
ference and Workshop on the Engineering of Com-
puter Based Systems (ECBS), Edinburgh,Scotland,
UK, pp. 318-326, 2000.
Boehm79 B. Boehm, “Software Engineering: R & D Trends
and Defence Needs”, Chapter 19 in Research
Direction in Software Technology (P. Wegner ed.),
Cambridge, MA, MIT Press, 1979.
Bratthall00 L. Bratthall, P. Runeson, K. Adelswärd-Bruck and
W. Eriksson, “A Survey of Lead-Time Challenges
in the Development and Evolution of Distributed
Real-Time Systems”, Information and Software
Technology, Vol. 42, No. 13, pp. 947-958, 2000.
Brooks75 F. P. Brooks, The Mythical Man-Month,Addison-
Wesley, 1975.
Cohen96 D.M.Cohen,S.R.Dalal,J.PareliusandG.C.Pat-
ton, “The Combinatorial Design Approach to
Automatic Test Generation”, IEEE Software,pp.
83-88, September 1996.
Groves00 L. Groves, R. Nickson, G. Reeve, S. Reeves and M.
Utting, “A Survey of the Software Development
Practices in the New Zealand Software Industry”,
Proceedings 12th Australian Software Engineering
Conference, Canberra, Australia, pp. 189-201,
2000.
Kit95 E. Kit, Software Testing, in the Real World,Addi-
son-Wesley, 1995.
Koomen99 T. Koomen and M. Pol, Test Process Improvement,
A practical step-by-stepguide to structured testing,
Addison-Wesley, 1999.
Lincoln85 Y. S. Lincoln and E.G.Guba, Naturalisitic Enquiry,
Newbury Park and London, 1985
Lubars92 M.Lubars,C.PottsandC.Richter,“AReviewof
the State of the Practice in Requirements Model-
ing”, Proceedings 9th IEEE International Sympo-
sium of Requirements Engineering, San Diego,
CA, USA, pp. 2-14, 1992.
Porter97 A. A. Porter, H. P. Siy, C. A. Toman and L. G.
Votta, “An Experiment to Assess the Cost-Benefits
of Code Inspections in Large-Scale Software
Development”, IEEE Transactions on Software
Engineering, 23(6):329-346, 1997.
Robson93 C. Robson, Real World Research, Blackwell, 1993.
Thelin01 T. Thelin, P. Runeson and B. Regnell, “Usage-
Based Reading - An Experiment to Guide Review-
ers with Use Cases”, Information and Software
Technology, 43(15):925-938, 2001.
Weidenhaupt98 K. Weidenhaupt, K. Pohl, M. Jarke, and P. Haumer,
“Scenarios in System Development: Current Prac-
tice”, IEEE Software,Vol.15,No.2,pp.34-45,
1998.
Wohlin00 C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B.
Regnell and A. Wesslén, Experimentation in Soft-
ware Engineering: An Introduction,KluwerAca-
demic Publishers, 2000.
Appendix A
Questionnaire
Test phases
Describe the different phases of the verification and valida-
tion process.
Describe the artifacts belonging to each phase.
Is the process iterative or like the waterfall model?
Does the project concern a new product or evolutionary
development?
When is the testing finished?
Who decides when the testing is finished?
Scope of the verification and validation process, what is
the goal?
Rank the following parameters:
Cost, Time, Quality (few faults, depending on the func-
tionality, etc.)
Consequences
Do they find the faults they want to?
Are they finished on time?
Roles
How is the company organized?
Who is doing the testing? Developers, Testers, Customers,
Externally,...
Training
What kind of training and experience do the testers have?
Tools
Which are used?
For what purpose?
Level of details
How much of the product is tested?
Characteristics.
The functionality that is used the most, or the functionality
that probably has the most faults, or all functionality.
Available data
Are there any data collected and recorded as faults are
reported and controlled?
Number of found faults, consumed time.
Purpose/Usage?
Attitudes
From the test team’s perspective?
Status
What is a successful test, (creative destruction?).
From the developers’ perspective?
Unusual ideas
Are the products tested in any specific way?
Personal assessment of the verification and validation
process
Mention three well respectively three less well performed
characteristics.
Appendix B
Detailed Description of the Process models
Radar: The process is very formal and well emphasized, to
ensure that the right thing is done at the right time. It is con-
sidered a necessity to be able to sort out the created complex-
ity, derived from large projects, with many participants. The
procedures are quite well followed since the customer
requires it in many cases.
Phone: Has recently formally introduced an incremental
process model, which includes design and implementation,
function test, and system test. The incremental process has
forced more detailed planning, which turned out to reduce the
amount of errors that were introduced earlier due to insuffi-
cient planning.
Network: Claims its process to be quite mature, and one of
the company’s strengths. The process is well defined in their
development handbook, and is continuously improved by
input from developers, testers and managers with assistance
of the quality assurance engineer, whose main responsibility
is project follow-up and process assessment.
Automation: Is dependent on corporate level decisions,
e.g. a joint high-level project management model. They have
a well-defined process, developed at corporate level. At lower
levels the activities are defined in terms of templates and
guidelines. However, as the process documentation is very
extensive, there is a tendency to find discrepancies between
the documented process and the actual work.
CASE: The process has been changed recently, as a result
of a merge of two companies. Each product release consists
of several increments, which are tested separately. The proc-
ess is now more formal and the activities are defined in terms
of templates.
Read: Is currently changing its process model radically.
They have recently made an assessment of the former V&V
activities in order to draw conclusions from the findings and
define a new V&V process. The assessment reflected an
inconsistent life cycle model without a testing strategy, and
therefore also several improvement possibilities. After the
second interview with Read parts of the new process model
had been put into practise.
Security: Has recently evaluated its V&V process accord-
ing to the Test Process Improvement (TPI) model
[Koomen99]. The history shows the need of improvements in
the areas of documenting the test process, templates for the
test plan, test specification, and test reports. However, formal
reviews of the documents and the code were made, and both a
configuration management, as well as a defect tracking sys-
tem existed. Through this evaluation a controlled improve-
ment of the test process has been started, with assistance of
the TPI model, that shows the company where and how to
introduce improvements.
Monitor: Is a young organization and is presently evolving
a model for their projects to follow. At start-ups, with only a
few employees, the need of a well-defined process model,
with pronounced V&V activities, is less evident. Neverthe-
less, when a small company is growing, there is a risk that the
people-focused way to run the development continues.
Though, Monitor is aware of this risk, and is currently
improving their V&V process. Their approach to improve-
ment is to introduce several hands-on activities, such as code
inspections, configuration management, and the use of a data-
base for defects reporting. Monitor is connected to Read,as
subsidiary, but is not dependent on the parent company’s
process models in any way.
Product: The V&V steps are defined in terms of different
test environments. It is required to go through all of them,
although it is not defined to what extent. The developed sys-
tem interfaces towards the corporate management informa-
tion system, and hence the financial consequences of software
failures can be very large. Documents, such as test plans, are
produced but not considered to be of any major use as the
V&V depends on very experienced people.
Sales: Has a well-defined release process, with release
dates set on a yearly basis, and gradually freezing points
before these dates. Two weeks before release, the system is
put in a “deep freeze” state, meaning that only minor adjust-
ments, correcting the most critical parts are permitted. The
company considers this as a condition for maintaining high
quality and reducing the costs for continuous changes.
SubSales: Has a very informal development process and,
after unit testing, delivers the products to Sales, which han-
dles the rest of the testing. The informality is also reflected in
the contacts between the two business partners.
... Because they are limited to crashing bugs (we will discuss in detail in Section 5). As a result, the TikTok team follows the most widely-used practice in industry, i.e., manual testing [2,77], to validate these privacy functionalities w.r.t. the specifications. ...
... However, manual testing is cumbersome and time-consuming. For example, to check the aforementioned example of privacy functionality, a human tester needs to manually execute these steps: 1 ○ logging in the accounts of user A and B on two separate mobile devices, 2 ○ posting a public video on the behalf of B, 3 ○ making that A follows B, B does not follow A and B is not private, and then checking whether A can see B's videos, 4 ○ setting B as private, and then checking whether A can still see B's videos. In practice, it may take several minutes to validate such one privacy functionality. ...
... Visual techniques are generally accessible and interpretable across a broad audience base. These techniques have shallow learning curves, do not rely on knowledge of formal mathematics or statistics, can provide insight during simulation execution, and have been found to be more commonly applied in practice (Andersson & Runeson 2002;Padilla et al. 2018a). ...
Article
Full-text available
Validation is the process of determining if a model adequately represents the system under study for the model's intended purpose. Validation is a critical component in building the credibility of a simulation model with its end-users. Effectively conducting validation can be a daunting task for both novice and experienced simulation developers. Further compounding the difficult task of conducting validation is that there is no universally accepted approach for assessing a simulation. These challenges are particularly relevant to the paradigm of Agent-Based Modeling and Simulation (ABMS) because of the complexity found in these models' mechanisms and in the real-world situations they attempt to represent. To aid both the novice and expert in conducting a validation process for an agent-based simulation, this article reviews nine methods that are useful for this process, including foundational topics of docking, empirical validation, sampling, and visualization, as well as advanced topics of bootstrapping, causal analysis, inverse generative social science, and role-playing. Each method is reviewed with respect to its benefits and limitations as a validation-supporting method for ABMS. Suggestions that may support a validation plan for an agent-based simulations, are also provided. This article is an introductory guide for understanding and conducting ABMS validation for developers of all experience levels.
... Comparative studies between early-phase verification and validation (V&V) techniques and traditional post-development testing have consistently favored the former (Andersson & Runeson, 2014) (M. Altaie et al., 2020). ...
Article
Full-text available
Purpose: The core objectives of this conducted research were threefold: first, to assess whether early-phase verification and validation (V&V)techniques, encompassing software design review, code review, and inspection, significantly contribute to defect prevention throughout the software development process; second, to evaluate the impact of these techniques on the efficiency of software development in terms of time, cost, and resource allocation; and third, to conduct a comprehensive comparative analysis between early-phase verification and validation (V&V)techniques and traditional post-development testing regarding their effectiveness in defect prevention and software quality improvement. Methodology: Employing an experimental approach, this study conducted case studies within Tanzanian software development organizations. Data were meticulously gathered through surveys and interviews involving software professionals. Both quantitative and qualitative data were systematically analyzed to ensure data reliability and ethical considerations. Findings: The research findings indicate that early-phase verification and validation (V&V) techniques shine as powerful tools for defect prevention, leading to a substantial reduction in defect counts and elevating software quality metrics, such as reliability and user satisfaction. Notably, these techniques demonstrated a favorable impact on software development efficiency by shortening development cycles and curtailing costs. When compared to traditional post-development testing, early-phase verification and validation (V&V) techniques consistently outperformed in the realm of defect prevention and software quality enhancement. Unique Contribution to Theory, Practice and Policy: The study's outcomes underscore the transformative potential of early-phase verification &validation techniques, urging the software development industry to pivot from defect detection to defect prevention. The insights garnered here offer practical guidance for organizations aiming to streamline their software development processes, economize costs, and deliver top-notch software products that closely align with user expectations. While conducted in a Tanzanian context, these findings reverberate across the global software development landscape, underscoring the universal relevance of early-phase verification and validation (V&V) techniques as pivotal enablers of software quality assurance and operational efficiency.
... Many engineering-based solutions exist for exploring ABM outcomes in ways that facilitate understanding and provide data support findings, such as statistical debugging [12,13], visual inspection [14,15], or logic tracing [16,17]. However, these approaches generally (1) convey localized information about the system that is restricted in scope, (2) require domain knowledge of analytics in addition to the modeled system in order to facilitate proper understanding, (3) sometimes provide statistical significance in support of outcomes, (4) do not inherently or intuitively connect the content of the outcomes with the practical significance for the policy makers, decision makers, and/or researchers utilizing the results, and (5) are very time-consuming to explore [18][19][20]. ...
Article
Full-text available
Large language models (LLMs) excel in providing natural language responses that sound authoritative, reflect knowledge of the context area, and can present from a range of varied perspectives. Agent-based models and simulations consist of simulated agents that interact within a simulated environment to explore societal, social, and ethical, among other, problems. Simulated agents generate large volumes of data and discerning useful and relevant content is an onerous task. LLMs can help in communicating agents’ perspectives on key life events by providing natural language narratives. However, these narratives should be factual, transparent, and reproducible. Therefore, we present a structured narrative prompt for sending queries to LLMs, we experiment with the narrative generation process using OpenAI’s ChatGPT, and we assess statistically significant differences across 11 Positive and Negative Affect Schedule (PANAS) sentiment levels between the generated narratives and real tweets using chi-squared tests and Fisher’s exact tests. The narrative prompt structure effectively yields narratives with the desired components from ChatGPT. In four out of forty-four categories, ChatGPT generated narratives which have sentiment scores that were not discernibly different, in terms of statistical significance (alpha level α=0.05), from the sentiment expressed in real tweets. Three outcomes are provided: (1) a list of benefits and challenges for LLMs in narrative generation; (2) a structured prompt for requesting narratives of an LLM chatbot based on simulated agents’ information; (3) an assessment of statistical significance in the sentiment prevalence of the generated narratives compared to real tweets. This indicates significant promise in the utilization of LLMs for helping to connect a simulated agent’s experiences with real people.
... Comparative studies between early-phase verification and validation (V&V) techniques and traditional post-development testing have consistently favored the former (Andersson & Runeson, 2014) (M. Altaie et al., 2020). ...
... In addition to the dearth of academic research, there is a lack of empirical work investigating the state of the practice regarding validation of hazard analysis among practitioners. Studies have investigated practitioners' views on closely related fields, such as the incident investigation (Dodshon & Hassall, 2017), the state of the art in verification and validation in Cyber-Physical Systems (Zheng et al., 2017), approaches on reliability and safety engineering for safety-critical systems (Singh & Singh, 2021), and the state of practice in verification and validation of software systems (Andersson & Runeson, 2002). However, to the best of the authors' knowledge, there appears to be no research literature that explores the state of the practice in validation of hazard analysis methods in the safety-critical industries among practitioners. ...
Article
While many hazard analysis techniques exist, little empirical research has been dedicated to their use in industrial contexts, in particular concerning how practitioners validate hazard analyses. This raises questions about the accuracy, comprehensiveness, and credibility of safety analyses, and how practitioners consider this issue in relation to the overall system safety work. Acquiring qualitative evidence regarding the validation of hazard analysis among practitioners is important to support evidence-based safety practices. This paper qualitatively investigates the state of practice in hazard analysis and its validation for system safety among practitioners. Twenty semi-structured interviews were conducted with practitioners in safety–critical industries in North America. Feedback from practitioners indicates that only a limited number of hazard analysis methods are applied in industry, which are mainly based upon linear accident theory. It is also found that almost all practitioners perform some form of validation as they believe this type of safety work improves safety. Experts Reviews and benchmark exercises are the only methods reported for validating hazard analysis. In addition, practitioners highlighted several weaknesses of the current hazard analysis and hazard analysis validation practices, of which subjectivity is seen as the most important one. The authors discuss this in context of the emerging academic consensus that hazard analysis is inherently subjective, but that it can nevertheless be very useful especially when it relies on strong evidence. Also, several opportunities for organizations, regulatory bodies, and academic institutions are identified to improve the current state of the practice in both hazard analysis and hazard analysis validation. https://www.sciencedirect.com/science/article/abs/pii/S092575352300022X?dgcid=author#t0005
... But script test also exposes a number of shortcomings during the rapid iteration of the software product. For example, in industrial practice and real production environments, test cases are often difficult to design, and testers rarely test product exactly as designed [2][3] [4][5] [6] ; maintenance of test cases or test scripts requires significant manpower and time costs [7] ; different testers use the same test strategy or test script to perform tests, but the test results (such as the number and quality of the defects) may be different, which means there is inconsistency in the test results [8] ; the mechanical nature of script test weakens the tester's functionality and enhances the tester's substitutability, which leads to the serious career crisis and concern of the tester [9][10] [11] ; at the same time, experience tells us that good test design not only depends on the test technique, but also the experience and intellectual creativity of the tester [12][13] [14] . In order to improve the efficiency of testing and give full play to the intelligence of testers in testing work, "exploratory testing" has emerged. ...
Preprint
Compared with scripted test, exploratory testing has the advantages of finding more defects and higher quality defects. However, in decades there are few researches on the proprietary testing methods for exploratory testing. In this paper, a new express delivery testing method is proposed based on the inspiration of the FedEx tour method and the express industry mode. The test types of the express delivery testing method are expanded from data to data, interaction objects and activities, states of internal and external and sequence of activities. Through practice verification, compared with the FedEx tour method, this method can excavate hidden test points that are easy to be missed, design more test cases, find more faults, and has higher test effectiveness.
Article
The book, The Mythical Man-Month, Addison-Wesley, 1975 (excerpted in Datamation, December 1974), gathers some of the published data about software engineering and mixes it with the assertion of a lot of personal opinions. In this presentation, the author will list some of the assertions and invite dispute or support from the audience. This is intended as a public discussion of the published book, not a regular paper.
Article
Reading methods for software inspections are used for aiding reviewers to focus on special aspects in a software artefact. Many experiments were conducted for checklist-based reading and scenario-based reading concluding that the focus is important for software reviewers. This paper describes and evaluates a reading technique called usage-based reading (UBR). UBR utilises prioritised use cases to guide reviewers through an inspection. More importantly, UBR drives the reviewers to focus on the software parts that are most important for a user. An experiment was conducted on 27 third year Bachelor's software engineering students, where one group used use cases sorted in a prioritised order and the control group used randomly ordered use cases. The main result is that reviewers in the group with prioritised use cases are significantly more efficient and effective in detecting the most critical faults from a user's point of view. Consequently, UBR has the potential to become an important reading technique. Future extensions to the reading technique are suggested and experiences gained from the experiment to support replications are provided.