Conference PaperPDF Available

Useful Statistical Methods for Human Factors Research in Software Engineering: A Discussion on Validation with Quantitative Data

Authors:
  • Chalmers University of Technology and The University of Gothenburg

Abstract

In this paper we describe the usefulness of statistical validation techniques for human factors survey research. We need to investigate a diversity of validity aspects when creating metrics in human factors research, and we argue that the statistical tests used in other fields to get support for reliability and construct validity in surveys, should also be applied to human factors research in software engineering more often. We also show briefly how such methods can be applied (Test-Retest, Cronbach's α, and Exploratory Factor Analysis).
Useful Statistical Methods for Human Factors Research in
Software Engineering: A Discussion on Validation with
Quantitative Data
Lucas Gren
Chalmers and University of Gothenburg
Gothenburg, Sweden 412–92 and
University of São Paulo
São Paulo, Brazil 05508–090
lucas.gren@cse.gu.se
Alfredo Goldman
University of São Paulo
São Paulo, Brazil 05508–090
gold@ime.usp.br
ABSTRACT
In this paper we describe the usefulness of statistical val-
idation techniques for human factors survey research. We
need to investigate a diversity of validity aspects when cre-
ating metrics in human factors research, and we argue that
the statistical tests used in other fields to get support for
reliability and construct validity in surveys, should also be
applied to human factors research in software engineering
more often. We also show briefly how such methods can be
applied (Test-Retest, Cronbach’s α, and Exploratory Factor
Analysis).
CCS Concepts
Applied computing Mathematics and statistics;
Law, social and behavioral sciences;
Keywords
Human factors; Psychology; Quantitative data; Statistical
tests; Validation
1. INTRODUCTION
Science has from the beginning contributed enormously to
the development of mankind. We have successfully observed
the world and created models that help us understand and
predict a diversity of events in the world, such as describing
waves or the photoelectric effect. However, it is important
to note that our predictive models are only models. As the
famous statistician George E.P. Box said:
“Essentially, all models are wrong, but some are
useful.” p. 424 [3].
The problem is that in more complex systems the deter-
ministic models are no longer useful in the same way. This is
ACM ISBN 978-1-4503-2138-9.
DOI: http://dx.doi.org/10.1145/2897586.2897588
when the mathematical models can be extended with prob-
abilities. Stochastic models that express how likely an event
is to occur then makes way more sense than setting out to
describe all variables deterministically (which is often not
feasible) [1].
The human mind is excellent at seeing patterns in a huge
number of variables [15]. Therefore, when investigating hu-
man factors, it often makes sense to collect qualitative data
and let the researchers (preferably, independently of each
other) systematically look for patterns in the data set (e.g.
a grounded theory approach) [9]. However, as it is always
good to look at a phenomenon (or construct) from different
perspectives, a triangulation is always preferred. Therefore,
it makes sense to collect quantitative data as well as quali-
tative and use statistical methods to analyze the former, i.e.
using both words and numbers in the analysis [18].
Empirical software engineering (ESE) is a quite new re-
search field compared to, for example, psychology. ESE has
come a long way and made great advances, and some re-
searchers have stressed the importance of evidence-based
software engineering (see e.g. [14]). They highlight an exam-
ple were the same data was used for validation as for factor
extraction, which of course should never be allowed. How-
ever, we believe the software engineering field is ready for
having a more scientific and precise guidelines to quantita-
tive survey data for human factors research. There seems to
be a gap in the usage of statistical validation between the
more technical aspects of software engineering and human
factors aspects.
When building prediction models for software, for exam-
ple, a factor analysis seems to be recommended [11], but in
softer software engineering aspects it is not standard prac-
tice and many authors get survey studies published without
such validation of scales (see e.g. [4, 2, 7]). When it comes
to human factors research in software engineering, it seems
to be more depending on the statistical knowledge of the au-
thor then common practice for publication, i.e. many human
factors survey studies have such tests in software engineering
(see e.g. [27]), but far from all.
Such methods are well-used in high impact management
journals as well (see e.g. [21, 8]). The problem is that if one
skips this part and directly run statistical tests on, for exam-
ple, a measurement’s correlation to another, they make little
sense since we do not know if we managed to measure the
intended construct. Skipping such validation would make
editors of more mature fields reject the manuscript [23]. A
validation process is of course not only about statistical val-
idation tests, but they should be conducted as an important
step in the validation process [13].
This paper will present some of these methods that have
frequently been used in e.g. psychology for almost a century,
directly applicable to human factors research in software en-
gineering. This paper is organized as follows: Section 2
describes the similarities between human factors research in
software engineering and other more mature fields, Section 3
presents statistical test often used in the such fields, and Sec-
tion 4 discusses the previous sections and their implications
to software engineering research.
2. SOCIAL SCIENCE RESEARCH
Many subareas within software engineering have social sci-
ence aspects. Many researchers have stated, for example,
that agility is undeniably a soft as well as a hard issue [19,
25]. However, we prefer calling these types of issues simple
(hard) and complex (soft) instead since these terms describe
these issues better. The hard issue is a simple ditto with few
and clear variables and the soft one is a complex adaptive
social system.
In order to clearly explain how we believe this is appli-
cable to software engineering we will use agile development
processes as an example. Studies have been conducted that
set out to investigate the social or cultural aspects of ag-
ile development. Withworth and Biddle [26], for example,
verifies that agile teams need to look at social-psychological
aspects to fully understand how they function. There are
also a set of studies connecting agile methods to organiza-
tional culture. These connect the agile adoption process to
culture to see if there are cultural factors that could jeopar-
dize the agile implementation, which there are [12, 24]. One
study divides culture in different layers depending on their
visibility according to Schein [20]. That paper shows that an
understanding of culture layers increase the understanding
of how an agile culture could be established [25].
If we want to investigate these types of issues we can look
at other fields that have been dealing with social science
for more than half a century. If we use humans and their
opinions in research we only investigate their perception of
what is happening in the organization. Even if this is the
case, we still need to check that items used to measure a
construct manage to circle it somehow, i.e. items that are
different but still correlated in relation to the construct un-
der investigation. In order to do this we need two things;
first, a reasonably large sample representative of the pop-
ulation (and large enough to remove individual and cohort
bias), and second, make sure our items are correlated and
pinpoint our construct of interest. The latter is often for-
gotten in software engineering research.
To simplify this explanation, let us look at a simple ex-
ample. If a test (e.g. a survey) gives the correlation matrix
in Table 1 the corresponding factor loadings would be the
ones showed in Table 2.
How to obtain the factors in a factor analysis is an ad-
vanced mathematical method and we will not go into details
on how the calculations are conducted. However, the main
reasoning behind the technique is that if pis the number of
variables (X1, X2, ..., Xp) and mis the number of underly-
ing factors (the model presumes we have underlying factors)
F1, F2, ..., Fmis that specific variable’s representation in the
Table 1: A correlation matrix.
A B C D E F
A'(reading) 1.00 .60 .50 .15 .20 .10
B'(vocabulary) 1.00 .50 .15 .10 .10
C'(spelling) 1.00 .10 .20 .15
D'(addition) 1.00 .60 .60
E'(subtraction) 1.00 .60
F'(multiplication) 1.00
Table 2: The corresponding factor matrix with fac-
tor loadings.
Factor'1 Factor'2
A'(reading) .70 .15
B'(vocabulary) .60 .15
C'(spelling) .60 .10
D'(addition) .10 .70
E'(subtraction) .10 .65
F'(multiplication) .05 .60
latent factors. Each measured (or observed) variable is then
a linear combination of the factors and reproduce maximum
correlations: Xj=aj1F1+aj2F2+, ..., +ajm Fm+ejwhere
j= 1,2, ..., p and aj1is the factor loading of the jth vari-
able on the first factor and so on. The factor loadings can
be seen as weights (or contrast) of a linear regression model
and tell us how much the variable has contributed to the
factor. There are different extraction techniques for factor
analysis and if the error variance is included it is called a
Principle Component Analysis since we use all variance to
find factors. If the error variance is excluded it is often called
a Principle Axis Factor extraction, but the output pattern
matrices are very similar [6].
In this case we probably had reason to believe that there
were two factors (or constructs) based on the variables in the
study. If, for example addition and subtraction were uncor-
related in our result, we would have reason to doubt our
measurement of the construct of mathematical skills. Even
if a construct like “mathematical skill” is ambiguous, we still
need to make sure our subset (like mathematical operators
–addition, subtraction, and multiplication–) are valid. In
this simple example we would have empirical support that
items A, B, and C describe one construct and D, E, and F
another, which also makes sense (i.e. high face validity). In
Section 3 we explain this method in more detail.
We can extend the same reasoning to that of “agility”.
Even if we do not have a clear definition of this term we can
still research agile practices or behavior. For example, if we
want to research Integration Testing or Retrospectives, we
must use items that are different but correlate in a satis-
factory way. The only more scientifically validated scale for
agile practices that we found was the article“Perceptive Ag-
ile Measurement: New Instruments for Quantitative Studies
in the Pursuit of the Social-Psychological Effect of Agile
Practices by So and Scholl [22], where they use the method
presented in this paper. As an example, their scale for “Ret-
rospectives” (assessed on a Likert scale from 1 –Never– to 7
–Always–) contains the items: (1) How often did you apply
retrospectives? (2) All team members actively participated
in gathering lessons learned in the retrospectives. (3) The
retrospectives helped us become aware of what we did well
in the past iteration/s. (4) The retrospectives helped us
become aware of what we should improve in the upcoming
iteration/s. (5) In the retrospectives we systematically as-
signed all important points for improvement to responsible
individuals. (6) Our team followed up intensively on the
progress of each improvement point elaborated in a retro-
spective.
These items were developed using two pretests and a val-
idation study with a sample of N=227. When running such
an analysis on other agile measurement tools, they often
show problems with validity since the quantitative valida-
tion step was disregarded in its development [10].
The whole discussion of “if we measure what we think we
measure”is dealt with in most papers in the Validity Threats
section. But what is validity? When it comes to tests in
human systems we cannot just look at the measurement tool
itself but also the context and interpretation of test scores,
like in the definition of validity by Messick [17]:
“Validity is not a property of the test or assess-
ment as such, but rather of the meaning of the
test scores. These scores are a function not only
of the items or stimulus conditions, but also of
the persons responding as well as the context
of the assessment. In particular, what needs to
be valid is the meaning or interpretation of the
score; as well as any implications for action that
this meaning entails.”
This means we always validate the usage of a test, and never
the test itself.
When soft issues are investigated in Psychology they are,
of course, analyzed using both quantitative and qualitative
data. However, after a more exploratory investigation (usu-
ally through qualitative case studies) we need to proceed
and collect empirical evidence of the phenomenon (or con-
struct) we found and see if numbers support our ideas. Of
course we need a holistic view of validity and studies us-
ing quantitative data sometimes get undeserved credibility
since the mathematical methods alone can seem advanced
and serious. However, this is also what we see as a danger
in software engineering survey research. If we create a sur-
vey and skip the statistical validation procedures we do not
know what we measure. The statistical validation is only
one aspect of the validation needed, but we should at least
make sure that part supports our hypotheses.
3. USEFUL STATISTICAL METHODS FOR
VALIDATION
There are many ways to categorize and list validity threats
in different fields. In this paper we will only present a few
aspect of validity where statistical tests can help us. These
are: (1) Reliability: Stability. (2) Reliability: Internal Con-
sistency. (3) Some aspects of Construct Validity.
The first in the list could quite easily be calculated by
conducting a test and a retest in the same context, and
then calculate the Pearson’s correlation coefficient for the
repeated measurements (which is a test for statistical de-
pendence between two random variables or two data sets).
If the first test is strongly correlated to the second (close to
1) we have some evidence that our test is stable.
For testing the internal consistency of a scale, the most
common method used is Cronbach’s α[5]. Without provid-
ing a mathematical definition, the αcan be seen as an overall
correlation coefficient for a set of items. The αis a function
of the number of items in a test, the mean covariance be-
tween item pairs, and the variance of the overall total score.
The idea is that if a set of items is meant to measure a cer-
tain construct, the included items must be correlated. The
Conbach’s αcan be used as a step in an Exploratory Factor
Analysis (EFA), which is a way to test aspects of Construct
Validity.
The first step of a EFA is meant to investigate underlying
variables in data (i.e. what factors explain most of the vari-
ance orthogonally). The next step is to rotate the factors
and group them if they correlate and explain much of the
same variance (i.e. the factors in a scale should not correlate
too much or too little if they are considered to explain and
measure a construct). A factor analysis is a statistical help
to find groups of variables that explain certain construct in
a data set. For more details, see e.g. [6].
The first thing to do when conducting a factor analysis
is to make sure the items have the assumptions fulfilled for
such a method, i.e. they need to be correlated to each other
in a way that they can measure the same concept. Testing
the Kaiser-Meyer-Olkin Measure of Sampling Adequacy and
Bartlett’s Test of Sphericity is a way to do this. Sphericity
is how much the items (an overall test) are dependent on
each other, i.e. the null hypothesis is that the items form an
identity matrix (no correlations between items and there-
fore not suitable for a factor analysis). If this test is signif-
icant we can proceed with our analysis. The Kaiser-Meyer-
Olkin Measure of Sampling Adequacy tests how correlated
the items are, and if the value is under <.5 they are unac-
ceptable as a rule of thumb and items with low correlations
to the others need to be removed. Suitable items to remove
can be selected based on such correlations from an Anti-
Image table. If factors are removed and an acceptable value
is obtained, we can create a Factor Matrix with our new
suggested factors (again, for more details see e.g. [6]). The
extraction is often based on Eigenvalues >1 and the first
factor will usually explain more variance than the following
ones. This is because the algorithm tries to find a set of
items that explain as much of the variance as possible, and
then the second factors does that same thing for variance
not explained by the first factor, and so on and so forth.
We would also like to state that the factor rotation can be
done based on the assumption that the items do are not de-
pendent on each other (an orthogonal rotation) or that they
could be dependent (an oblique rotation). In an oblique ro-
tation we get information of how internally correlated the
items in the different factors are. If the results are unsat-
isfactory, they could of course be used to reorganize factors
and improve/rethink the included items.
Getting “enough” data is always a tricky aspect of statis-
tical tests, and maybe more so in software engineering than
management or psychology. When it comes to factor analy-
sis, the sample size needed is dependent on e.g. communal-
ities between, and over-determination of, factors. Commu-
nality (or Internal Consistency) is the joint variables’ possi-
bility to explain variance in a factor, which can be calculated
by the Cronbach’s αmentioned above. Over-determination
of factors is how many factors are included in each variable
[16].
4. DISCUSSION
To summarize, the method presented above is one way
of checking if the numbers support the idea that a test re-
ally measures what we hope it does. This will not replace
other aspects of validity but we do not see any disadvan-
tages with collecting empirical evidence for surveys used in
research. In the field of e.g. psychology, researchers need to
be very careful stating that surveys, that have not been sci-
entifically validated, give any kind of evidence for a certain
research hypothesis. We believe the field of software engi-
neering should be as careful when using poorly validated
tools both in research and practice.
Of course we realized that getting a lot of data can be
cumbersome. Sometimes we need to do as well as we can
given small samples and scarce information. However, the
research community should be aware of the drawbacks and,
at least, use statistical validation techniques when applicable
to a data set.
The main hinder for using such statistical methods is that
the knowledge of statistics are generally lower than what is
needed to understand the methods presented here. This is a
dilemma in all social science research and we understand its
implications, but we believe some more training in applied
statistics and method (i.e. psychometrics) in software en-
gineering education could somewhat reduce this knowledge
gap. In addition, all tests we have presented here are imple-
mented in most statistical software (including open source
alternatives) and what is needed is to interpret the output.
5. REFERENCES
[1] D. J. Bartholomew. Stochastic models for social
processes. Wiley, London, 1967.
[2] A. Bosu, J. Carver, R. Guadagno, B. Bassett,
D. McCallum, and L. Hochstein. Peer impressions in
open source organizations: A survey. Journal of
Systems and Software, 94:4–15, 2014.
[3] G. E. P. Box and N. R. Draper. Empirical
Model-building and Response Surface. John Wiley &
Sons, Inc., New York, NY, USA, 1986.
[4] J. Chen, J. Xiao, Q. Wang, L. J. Osterweil, and M. Li.
Perspectives on refactoring planning and practice: an
empirical study. Empirical Software Engineering,
pages 1–40, 2015.
[5] L. Cronbach. Coefficient alpha and the internal
structure of tests. Psychometrika, 16(3):297–334, 1951.
[6] L. Fabrigar and D. Wegener. Exploratory Factor
Analysis. Series in understanding statistics. OUP
USA, 2012.
[7] D. M. Fern´andez and S. Wagner. Naming the pain in
requirements engineering: A design for a global family
of surveys and first results from germany. Information
and Software Technology, 57:616–643, 2015.
[8] M. T. Frohlich and R. Westbrook. Arcs of integration:
an international study of supply chain strategies.
Journal of operations management, 19(2):185–200,
2001.
[9] B. Glaser and A. Strauss. The discovery of grounded
theory: Strategies for qualitative research. Aldine
Transaction (a division of Transaction Publishers),
New Brunswick, N.J., 2006.
[10] L. Gren, R. Torkar, and R. Feldt. The prospects of a
quantitative measurement of agility: A validation
study on an agile maturity model. Journal of Systems
and Software, 107:38–49, 2015.
[11] N. Hanebutte, C. S. Taylor, and R. R. Dumke.
Techniques of successful application of factor analysis
in software measurement. Empirical Software
Engineering, 8(1):43–57, 2003.
[12] J. Iivari and N. Iivari. The relationship between
organizational culture and the deployment of agile
methods. Information and Software Technology,
53(5):509–520, 2011.
[13] I. Izquierdo, J. Olea, and F. J. Abad. Exploratory
factor analysis in validation studies: Uses and
recommendations. Psicothema, 26(3):395–400, 2014.
[14] B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard,
P. W. Jones, D. C. Hoaglin, K. El Emam, and
J. Rosenberg. Preliminary guidelines for empirical
research in software engineering. IEEE Transactions
on Software Engineering, 28(8):721–734, 2002.
[15] R. Kurzweil. How to create a mind: the secret of
human thought revealed. Penguin Books, New York,
N.Y., 2013.
[16] R. C. MacCallum, K. F. Widaman, S. Zhang, and
S. Hong. Sample size in factor analysis. Psychological
methods, 4:84–99, 1999.
[17] S. Messick. Validity of psychological assessment:
validation of inferences from persons’ responses and
performances as scientific inquiry into score meaning.
American psychologist, 50(9):741, 1995.
[18] M. B. Miles and A. M. Huberman. Qualitative data
analysis: a sourcebook of new methods. Sage, Beverly
Hills, 1984.
[19] P. Ranganath. Elevating teams from ’doing’ agile to
’being’ and ’living’ agile. In Agile Conference
(AGILE), 2011, pages 187–194, Aug 2011.
[20] E. Schein. Organizational culture and leadership.
Jossey-Bass, San Francisco, 4 edition, 2010.
[21] P. Serrador and J. K. Pinto. Does Agile work? – A
quantitative analysis of agile project success.
International Journal of Project Management,
33(5):1040–1051, July 2015.
[22] C. So and W. Scholl. Perceptive agile measurement:
New instruments for quantitative studies in the
pursuit of the social-psychological effect of agile
practices. In Agile Processes in Software Engineering
and Extreme Programming, pages 83–93. Springer,
2009.
[23] B. Thompson and L. G. Daniel. Factor analytic
evidence for the construct validity of scores: A
historical overview and some guidelines. Educational
and psychological measurement, 56(2):197–208, 1996.
[24] C. Tolfo and R. Wazlawick. The influence of
organizational culture on the adoption of extreme
programming. Journal of systems and software,
81(11):1955–1967, 2008.
[25] C. Tolfo, R. Wazlawick, M. Ferreira, and F. Forcellini.
Agile methods and organizational culture: Reflections
about cultural levels. Journal of Software Maintenance
and Evolution: Research and Practice, 23(6):423–441,
2011.
[26] E. Whitworth and R. Biddle. The social nature of
agile teams. In Agile Conference (AGILE), 2007,
pages 26–36. IEEE, 2007.
[27] C. Wohlin. Are individual differences in software
development performance possible to capture using a
quantitative survey? Empirical Software Engineering,
9(3):211–228, 2004.
... Factor analysis was carried out on each construct in the measuring instrument and Likert items with low correlations were dropped. Gren and Goldman 35 argued that Factor analysis seeks to ensure that items are correlated with one another and they measure the same concept. The Kaiser-MeyerOlkin (KMO) Measure of Sampling Adequacy asserts that adequacy should be greater than 0.5 and correlations less than 0.5 should be removed. ...
... The Kaiser-MeyerOlkin (KMO) Measure of Sampling Adequacy asserts that adequacy should be greater than 0.5 and correlations less than 0.5 should be removed. 35 The Principal Component Analysis Extraction method was utilized with eigenvalues above one. The researchers utilized Bartlett's test of sphericity to establish the sampling adequacy and to also to establish the factorability of the output matrix. ...
Article
Full-text available
The study investigated the cybersecurity awareness (CSA) in Zimbabwean universities from the perspectives of the students. The study hypothesized a model to proffer theoretical and practical solutions to higher education students on cybersecurity training programs. The paper used a positivist approach to test three hypotheses on CSA among Zimbabwe universities. To test the hypothesized model, a total of 322 questionnaires were distributed among three universities in Zimbabwe. Descriptive and inferential statistics were used to test the data. ANOVA tests and regressions were used to test three hypotheses from the hypothesized model. Chi square was used for cross-case analysis to test relationships between age, gender, education level, and institution on recoded variable CSA. The findings reveal that the social engineering attacks, malware attacks, and the internet of things attacks are all positively related to CSA findings from cross-case analysis indicated that there were no differences on gender and age on CSA while there were differences on education level and institution on CSA. The findings from this study have implications for decision-makers to come up with policies concerning CSA in the higher education sector. K E Y W O R D S cybersecurity awareness, internet of things attacks, malware attacks, social engineering attacks, Zimbabwe
... Graziotin et al. (2015), for example, have observed that SE scholars tend to confuse affect-related psychological constructs such as emotions and moods with related, yet different, constructs such as motivation, commitment, and well-being. This misalignment becomes particularly evident when validated psychological tests are adapted by the SE community, often resulting in modifications to test items that compromise the tests' psychometric reliability and validity [Gren and Goldman 2016, Gren 2018, Felipe et al. 2023]. Consequently, while psychometrics are valuable, their universal acceptance remains contentious within psychology. ...
Conference Paper
Full-text available
This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources. Furthermore, we delve into the relevance of this challenge within national and international contexts, highlighting the increasing interest in understanding software engineer behavior. Real-world initiatives and academic endeavors are also examined to underscore the potential for advancing this research agenda and, consequently, refining software engineering practices based on behavioral aspects. Lastly, this paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences, ultimately contributing to a deeper understanding of software engineer behavior and software engineering practices.
... Graziotin et al. (2015), for example, have observed that SE scholars tend to confuse affect-related psychological constructs such as emotions and moods with related, yet different, constructs such as motivation, commitment, and well-being. This misalignment becomes particularly evident when validated psychological tests are adapted by the SE community, often resulting in modifications to test items that compromise the tests' psychometric reliability and validity [Gren and Goldman 2016, Gren 2018, Felipe et al. 2023]. Consequently, while psychometrics are valuable, their universal acceptance remains contentious within psychology. ...
Preprint
Full-text available
This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources. Furthermore, we delve into the relevance of this challenge within national and international contexts, highlighting the increasing interest in understanding software engineer behavior. Real-world initiatives and academic endeavors are also examined to underscore the potential for advancing this research agenda and, consequently, refining software engineering practices based on behavioral aspects. Lastly, this paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences, ultimately contributing to a deeper understanding of software engineer behavior and software engineering practices.
... This includes a thorough evaluation of the psychometric properties of candidate instruments. Gren and Goldman [57] have argued in favor of "useful statistical methods for human factors research in software engineering" (paper title), which include underused methods such as Test-Retest, Cronbach's α, and exploratory factor analysisall of which are covered in this article. Gren [56] has also offered a psychological test theory lens for characterizing validity and reliability in behavioral software engineering research, further enforcing our view that software engineering research that investigates any psychological construct quantitatively should maintain fair psychometric properties. ...
Article
Full-text available
A meaningful and deep understanding of the human aspects of software engineering (SE) requires psychological constructs to be considered. Psychology theory can facilitate the systematic and sound development as well as the adoption of instruments (e.g., psychological tests, questionnaires) to assess these constructs. In particular, to ensure high quality, the psychometric properties of instruments need evaluation. In this article, we provide an introduction to psychometric theory for the evaluation of measurement instruments for SE researchers. We present guidelines that enable using existing instruments and developing new ones adequately. We conducted a comprehensive review of the psychology literature framed by the Standards for Educational and Psychological Testing. We detail activities used when operationalizing new psychological constructs, such as item pooling, item review, pilot testing, item analysis, factor analysis, statistical property of items, reliability, validity, and fairness in testing and test bias. We provide an openly available example of a psycho-metric evaluation based on our guideline. We hope to encourage a culture change in SE research towards the adoption of established methods from psychology. To improve the quality of behavioral research in SE, studies focusing on introducing, validating, and then using psychometric instruments need to be more common.
... In the context of building prediction models for software, factor analysis seems to be recommended [41]. When it comes to human factors research in software engineering, the use of latent variable analysis seems a bit more scarce but recommended when, e.g., validating questionnaires [39]. In Figure 6 (Latent variable analysis) we see that the usage of latent variable analysis has been low (272 papers) in ESE over the years. ...
Article
Full-text available
Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001–2015 and 5196 papers. Results from both review steps was used to: i) identify and analyse the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner’s context.
Article
India contributes about 20% of this population making it as the second most populous country in the world. Originally, most of the important predictions were made using the Malthusian growth models. The science of data analytics has opened up new possibilities in the creation of prediction graphs. Prediction graphs give useful information about tackling the problem of increasing population. R programming language is used to identify factors that impact the rate of change of population. Important factors such as literacy rate, death rate, religion and so on, deeply impact the rate of population growth. From the Kaiser-Meyer-Olkin test and Factor Analysis found that out of all factors that were considered, religious differences and migration rate were the most important factors affecting the rate of population growth.
Article
News sharing has become prevalent on many social media platforms. Users are not only exposed to news shared by others, but also actively share information with a diverse set of motivations. In this work, we propose five news sharing motivations based on the intrinsic and extrinsic factors found in prior literature. Through an online experiment, we further examine how a host of factors, including motivations, influence participants' decision to share news online. We then prompt participants to switch their original decision for extra compensation, observing how different news types, motivational and demographic factors may affect the switch. Our analysis suggests that sharing decisions can be reversed when a strong external stimulus (higher bonus) is presented. Further, there are motivational factors that independently influence participants' reversal decisions. Finally, using our work as an empirical basis, we propose designs for future new sharing systems.
Chapter
In this paper, we summarize the research done by the first author on Agile Methods in Brazil in a historical setting. In the beginning, Alfredo Goldman started as an enthusiast of Agile Methods, without pretending to become an agile advocate. However, as he perceived the importance of this new form of software development and in the belief of promoting a different way of looking at software engineering, naturally his contributions and achievements took him in this regard. We present Goldman’s agile software development research topics, and their respective contributions. We had the hard task to summarize more than a decade of research in only one short text. We show the influence of his work within Agile Methods since 2001, not only on teaching, but also on the research field and on the Brazilian software development industry.
Chapter
The paper highlights the problem of the criteria-based assessment of the software system taking into account their sustainability, maintainability, and usability. The extended iteration model of creating software as well as its subsequent implementation is emphasised. The mathematical model for validating the software by highly qualified experts is introduced. It is followed by presenting the general pseudocode on the basis of which it is possible to program the software responsible for choosing the appropriate experts who are able to validate the software types.
Article
Full-text available
Iterative development increasingly seeks to incorporate design modification and continuous refactoring in order to maintain code quality even in highly dynamic environments. However, there does not appear to be consensus on how to do this, especially because research results seem to be inconsistent. This paper presents an empirical study based upon an industry survey of refactoring practices and attitudes. The study explored differences in attitudes about refactoring among participants who played roles in software development, and how these different attitudes affected actual practice. The study found strong agreement among all roles about the importance of refactoring, and agreement about the negative effects upon agility of deferring refactoring. Nevertheless, the survey found that roles had different perspectives on the different kinds of tasks in an agile process. Accordingly, there was no universally agreed-upon strategy for how to plan to carry out refactoring. Analysis of the survey results has raised many interesting questions suggesting the need for a considerable amount of future research.
Article
Full-text available
Agile Development has now become a well-known approach to collaboration in professional work life. Both researchers and practitioners want validated tools to measure agility. This study sets out to validate an agile maturity measurement model with statistical tests and empirical data. First, a pretest was conducted as a case study including a survey and focus group. Second, the main study was conducted with 45 employees from two SAP customers in the US. We used internal consistency (by a Cronbach’s alpha) as the main measure for reliability and analyzed construct validity by exploratory principal factor analysis (PFA). The results suggest a new categorization of a subset of items existing in the tool and provides empirical support for these new groups of factors. However, we argue that more work is needed to reach the point where a maturity models with quantitative data can be said to validly measure agility, and even then, such a measurement still needs to include some deeper analysis with cultural and contextual items.
Article
Full-text available
The Agile project management methodology has been widely used in recent years as a means to counter the dangers of traditional, front-end planning methods that often lead to downstream development pathologies. Although numerous authors have pointed to the advantages of Agile, with its emphasis on individuals and interactions over processes, customer collaboration over contracts and formal negotiations, and responsiveness over rigid planning, there are, to date, very few large-scale, empirical studies to support the contention that Agile methods can improve the likelihood of project success. Developed originally for software development, it is still predominantly an IT phenomenon. But due to its success it has now spread to non-IT projects. Using a data sample of 1002 projects across multiple industries and countries, we tested the effect of Agile use in organizations on two dimensions of project success: efficiency and overall stakeholder satisfaction against organizational goals. We further examined the moderating effects of variables such as perceived quality of the vision/goals of the project, project complexity, and project team experience. Our findings suggest that Agile methods do have a positive impact on both dimensions of project success. Further, the quality of the vision/goals is a marginally significant moderator of this effect. Implications of these findings and directions for future research are discussed.
Article
Full-text available
Background: The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers. Method: The use of exploratory factor analysis is examined in 117 papers published between 2011 and 2012 in 3 Spanish psychological journals with the highest impact within the previous five years. Results: RESULTS show significant rates of questionable decisions in conducting EFA, based on unjustified or mistaken decisions regarding the method of extraction, retention, and rotation of factors. Conclusions: Overall, the current review provides support for some improvement guidelines regarding how to apply and report an EFA.
Technical Report
Empirical software engineering research needs research guidelines to improve the research and reporting processes. We propose a preliminary set of research guidelines aimed at stimulating discussion among software researchers. They are based on a review of research guidelines developed for medical researchers and on our own experience in doing and reviewing software engineering research. The guidelines are intended to assist researchers, reviewers and meta-analysts in designing, conducting and evaluating empirical studies. Editorial boards of software engineering journals may wish to use our recommendations as a basis for developing guidelines for reviewers and for framing policies for dealing with the design, data collection and analysis and reporting of empirical studies.