Content uploaded by Jörn Kobus
Author content
All content in this area was uploaded by Jörn Kobus on Apr 04, 2016
Content may be subject to copyright.
RANKING-TYPE DELPHI STUDIES IN IS RESEARCH:
STEP-BY-STEP GUIDE AND ANALYTICAL EXTENSION
Jörn Kobus
TU Dresden, Germany
joern.kobus@mailbox.tu-dresden.de
Markus Westner
OTH Regensburg, Germany
markus.westner@oth-regensburg.de
ABSTRACT
Ranking-type Delphi is a frequently used method in IS research. However, besides several studies investigating a
rigorous application of ranking-type Delphi as a research method, a comprehensive and precise step-by-step guide on
how to conduct a rigorous ranking-type Delphi study in IS research is currently missing. In addition, a common critic of
Delphi studies in general is that it is unclear if there is indeed authentic consensus of the panelists, or if panelists only
agree because of other reasons (e.g. acquiescence bias or tiredness to disagreement after several rounds). This also applies
to ranking-type Delphi studies. Therefore, this study aims to (1) Provide a rigorous step-by-step guide to conduct
ranking-type Delphi studies through synthesizing results of existing research and (2) Offer an analytical extension to the
ranking-type Delphi method by introducing Best/Worst Scaling, which originated in Marketing and Consumer Behavior
research. A guiding example is introduced to increase comprehensibility of the proposals. Future research needs to
validate the step-by-step guide in an empirical setting as well as test the suitability of Best/Worst Scaling within
described research contexts.
KEYWORDS
Delphi, Best/Worst Scaling, MaxDiff, Maximum-Difference-Scaling
1. INTRODUCTION
In Information System (IS) research the Delphi method has been used for almost three decades and enjoys
increasing popularity (Paré et al., 2013, p. 207). Research using Delphi covers a wide range of IS topics.
Recent publications include, e.g., complexity in IS programs (Piccinini et al., 2014), critical skills for
managing IT projects (Keil et al., 2013), or investigations of key issues in IS security management (Polónia
and Sá-Soares, 2013). In addition, the adaption and evolution of the Delphi method is of research interest,
e.g., in order to explore the application of Delphi as forecasting tool in IS research (Gallego and Bueno,
2014), assess rigor (Paré et al., 2013), or identify possibilities to build theory (Päivärinta et al., 2011).
While different types of Delphi studies exist (explained later in this paper), ranking-type Delphi can be
considered as the most relevant for IS research. It focusses on classical IS research topics, e.g., identification
and ranking of Critical Success Factors (CSF), identification of components of research frameworks, or
prioritization of selection criteria. In addition its use is indicated “[i]n business to guide future management
action […]” (Paré et al., 2013, p. 208).
Thus, this paper contributes to IS research by: (1) Providing a rigorous step-by-step guide for conducting
ranking-type Delphi studies through synthesizing results of existing research; and (2) Offering an analytical
extension in order to increase the likelihood for authentic panelist consensus through introducing Best/Worst
Scaling. Best/Worst Scaling is a method originating from Marketing and Consumer Behavior research but is
relatively new to IS research.
Regarding (1), the paper synthesizes existing research regarding rigorous application of ranking-type
Delphi (Okoli and Pawlowski, 2004; Paré et al., 2013; Schmidt, 1997). Although these papers deal with
several main criticisms of ranking-type Delphi studies, e.g., regarding (a) The selection of appropriate experts
(Okoli and Pawlowski, 2004, p. 16); (b) The inappropriate use of statistics (Schmidt, 1997, pp. 764–768); or
(c) The missing report of response and retention rates (Paré et al., 2013, p. 207), none of them provides
comprehensive and precise instructions by themselves. Therefore, we consolidate the main contributions of
these papers and suggest a step-by-step guide with increased density and a higher level of completeness of
information as a result.
Regarding (2), the paper proposes an analytical extension of the ranking-type Delphi method to decrease
a general problem of (ranking-type) Delphi studies: Response style biases (Paulhus, 1991, p. 17). An
example for response style biases is the pressure to conform with group ratings (Witkin and Altschuld, 1995,
p. 188). This could happen for example if participants agree to a consensus only because they are tired of
arguing and not because they got convinced.
The paper is structured as follows. Section 2 explains and defines relevant terms. Section 3 proposes a
step-by-step guide for ranking-type Delphi studies which consolidates existing research. Section 4 extends
this guide by adding Best/Worst Scaling to it. The final chapter concludes the paper and briefly discusses the
proposals.
2. CONCEPTS AND DEFINITIONS
2.1 Delphi Method
The objective of the Delphi method is to achieve the most reliable consensus in a group of experts. This is
done by questioning individual experts during several rounds. In between rounds, feedback on the other
experts’ opinions is provided. Direct confrontation of the experts is avoided. (Dalkey and Helmer, 1963,
p. 458)
Originally, Delphi was used for forecasting. However, it continuously evolved during the last decades and
is used today in a variety of research types. Paré et al. (2013, p. 208) – based on Okoli and Pawlowski (2004),
Schmidt (1997), and Rauch (1979) – distinguish four types of Delphi studies: (1) Classical Delphi focusing
on facts to create a consensus; (2) Decision Delphi focusing on preparation and decision for future directions;
(3) Policy Delphi focusing on ideas to define and differentiate views; and (4) Ranking-type Delphi focusing
on identification and ranking of key factors, items, or other types of issues.
In the paper at hand we will focus especially on type (4): Ranking-type Delphi as (a) It is widely used –
Paré et al. (2013, p. 209) found that 93% of investigated Delphi papers from 1984-2010 used this type – and
(b) it fulfills the requirements of IS research best (see above).
2.2 Best/Worst Scaling
Best/Worst Scaling (also referred to as Maximum Difference Scaling or MaxDiff) is based upon random
utility theory (Louviere et al., 2013, pp. 293–300). It is defined as “[…] a choice-based measurement
approach that reconciles the need for question parsimony with the advantage of choice tasks that force
individuals to make choices (as in real life)” (ibid, p. 292). Best/Worst Scaling can be seen as a way to
overcome some major shortcomings of common rating approaches (e.g. ties among items, response style
bias, and standardization difficulties (Cohen and Orme, 2004, p. 32)).
Best/Worst Scaling builds on a body of items. A set consists of a number of items from the body. A
respondent gets presented a series of sets and is asked to choose one best item and one worst item in each set.
(Lee et al. 2008)
Compared to the Paired Comparison method, which also can overcome the above-mentioned
shortcomings, Best/Worst Scaling is more efficient (Cohen and Orme, 2004, p. 34) as more statistical
relevant information are provided by respondents in each comparison round. To ensure the validity of the
Best/Worst Scaling approach, a careful design is necessary to decide which items are shown in which sets.
This includes especially (1) Frequency balance, meaning that each item appears an equal number of times
during all sets; (2) Orthogonality, meaning that each item is paired with each other item an equal number of
times during all sets; (3) Connectivity, meaning that the sets are designed in a way that it is possible to infer
the relative order of preference for all items; and (4) Positional balance, meaning that each item appears an
equal number of times on the left and right side. (Sawtooth, 2013, p. 7)
Because of the previously mentioned benefits regarding preference mapping, Best/Worst Scaling is a
frequently used method in Marketing and Consumer Behavior research (e.g. Louviere et al., 2013, Cohen,
2009, Cohen and Orme, 2004). However, it seems not to be commonly used in IS research and hence
provides an interesting and original method which can contribute to the methodological development of the
research field. Applied to the given research context, Best/Worst Scaling is proposed as a ranking mechanism
to enrich the data analysis phase of a ranking-type Delphi study in a IS research setting as described later.
2.3 Guiding example
In order to make it easier to follow our proposal, we use a consecutive example of a ranking-type Delphi
study to illustrate each phase. The guiding example takes a strategic IS perspective and investigates the
identification and ranking of Critical Success Factors (CSFs) for the implementation of a process
improvement method in an IT organization. The example was inspired by own research, however was
simplified and complemented with fictional data where appropriate.
3. STEP-BY-STEP GUIDE FOR RANKING TYPE DELPHI STUDIES
An overview on our proposed guideline to conduct rigorous ranking-type Delphi studies can be found in
Figure 1. Phase 1 is based on Okoli and Pawlowski (2004, pp. 20–23), who themselves built on Delbecq et al.
(1975). Phase 2 to 4 is based on Paré et al. (2013, p. 210) and Schmidt (1997, pp. 768–771). Especially
Schmidt (1997) states that ranking-type Delphi got very popular in IS research and introduced a consistent
approach for data collection (phase 2), analysis (phase 3), and presentation (phase 4).
Figure 1. Proposed guideline for ranking-type Delphi study based on (Okoli and Pawlowski, 2004; Paré et al., 2013;
Schmidt, 1997).
Figure 2 provides a detailed overview which elements of the ranking-type Delphi guideline are described
by which authors. We decided to build around these papers as they recently provided an overview on rigor in
IS ranking-type Delphi studies (Paré et al., 2013), are highly cited
1
(Okoli and Pawlowski, 2004), or are the
first that introduced a structured approach to ranking-type Delphi studies (Schmidt, 1997).
1
As of September 2015, citations in (1) Web of Science: 366 and (2) Google Scholar: 1238.
Figure 2. Sources for elements of ranking-type Delphi guideline.
3.1 Phase 1 – Choosing right experts
The choice of the right experts for Delphi studies is described as “[…] perhaps the most important yet
most neglected aspect” (Okoli and Pawlowski, 2004, p. 16). Since Delphi study results depend mainly on the
answers of chosen experts, it is necessary to define a thorough process for their appropriate selection.
Adapting Okoli and Pawlowski (2004, pp. 20–23), we propose a five steps approach to choose appropriate
experts as initial phase of the study: (1.1) Identify expert categories; (1.2) Identify expert names; (1.3)
Nominate additional experts; (1.4) Rank experts; and (1.5) Invite experts.
Step (1.1) aims at developing selection criteria for experts, e.g., regarding disciplines or skills,
organizations or literature (academic or practitioner authors). Step (1.2) then identifies experts meeting those
selection criteria. A personal list of experts can serve as an initial starting point. Step (1.3) sends a brief
description of the Delphi study to the already identified experts and asks them to nominate further experts in
the field. Additionally, as much biographical information as possible about all (already identified and
nominated) experts’ demographics and profiles are documented. Step (1.4) then ranks the experts in priority
for invitation to the Delphi study based on their qualification. Step (1.5) invites the experts to the study in
descending order of the ranking. The subject, required procedures, and the type and extent of experts’
commitments are explained. This step is repeated, until an appropriate number of experts agreed on their
participation. Anonymity of expert participants has to be ensured at all times.
Guiding example: (1.1) A success factor study typically aims to incorporate the perspective of experts
with considerable experience on the topic of interest. In our case – CSFs for the implementation of a process
improvement method in an IT-organization – emerging expert categories could be (line) managers and
consultants with relevant expertise and experience. We decided for consultants.
(1.2) Since researchers’ personal networks can be a source to recruit experts (Paré et al., 2013, p. 210), we
use our network to contact a renowned global IT-consulting company. We set the threshold for consultants to
be able to participate to having supported at least seven implementations of the process improvement method.
Together with the main contact of the consulting firm, a list of 10 potential panelists is created.
(1.3) By reference of four members from the established list, seven additional possible panelists are
identified. Therefore, in total 17 possible panelists are identified.
(1.4) We rank the panelists in descending order by the number of implementations they have supported.
(1.5) While there seems no agreement on an optimal number of panelists for a ranking-type Delphi study,
the number of participating experts "[…] should not be too large (in order to facilitate consensus)" (Paré et
al., 2013, p. 208). We therefore decide to invite the 12 highest ranked experts. If one of these experts is
unable to take part, we invite the highest ranked of the remaining list instead until we achieve the number of
12 committed participants.
3.2 Phase 2 – Data collection & Phase 3 – Data analysis
In the Delphi method data is repetitively collected, analyzed, and reconciled with experts. Therefore, we
describe data collection (phase 2) and data analysis (phase 3) together as they cannot be separated
distinctively.
Before the iterative data collection (phase 2) can start, an initial instrument pre-test (i.e., instruction and
questionnaire) is conducted to ensure that all experts understand tasks and objectives (Paré et al., 2013,
p. 210). The following data collection phase itself consists of three steps (Schmidt, 1997, pp. 768–771): (2.1)
Discover issues; (2.2) Determine most important issues; and (2.3) Rank issues.
An issue hereby could be for example an item or a factor. (2.1) To discover the most important issues,
first and foremost as many issues as possible have to be identified. Clear instructions are provided to experts
and there is no restriction on the number of answers experts can give. After the initial data collection, the
researchers consolidate and group familiar answers through content analysis (Mayring, 2000, pp. 4–5). The
consolidated results then need to be verified by the experts again to ensure the correct understanding of the
intended meaning and appropriateness of grouping.
(2.2) To not overwhelm the experts by the amount of issues they should rank in step (2.3), a further focus
on the most important issues might be necessary (as a rule of thumb the list should comprise approximately
20 issues or less (Schmidt, 1997, p. 769)). For this, the consolidated and validated list of issues is randomly
ordered, and sent together with clear selection instructions to the experts. The researchers then delete all
issues that were not selected. In case there are still too many issues left, step (2.2) can be repeated.
In step (2.3) the experts are asked to rank the issues in descending order, from most important to least
important. As the Delphi method is an iterative approach, step (2.3) is repeated until an appropriate trade-off
between level of consensus and feasibility (defined as indulgence of respondents and researcher’s resources
and additional time requirements) is reached. Within each new ranking round, respondents can revise their
ranking decision supported by a controlled feedback based on (3.1) Mean rank; (3.2) Kendall’s W – a
coefficient of concordance (Kendall and Gibbons, 1990); (3.3) Top half rank (percentage of experts who
ranked respective item in their top half); and (3.4) Relevant comments/justifications by respondents.
Stopping criteria for the Delphi data collection are either a strong consensus or a clear indication that no
more differences in answers can be expected. Kendall’s W, assuming values between 0 and 1 can serve as a
quantitative measure for this purpose. Values around .1 indicate very weak agreement; values around .5
moderate agreement, and values around .9 very strong agreement (Schmidt, 1997, p. 767).
Guiding example: Before data collection starts, the instrument is pre-tested with two consultants whose
experience (five projects) is not sufficient to be included into the final participation list.
(2.1) In order to discover as many implementation success factors as possible, it seems necessary to
provide an efficient and convenient way for experts to take part in the Delphi study. For this, we offer the
panelists to either e-mail their input, conduct a call, or use a web-survey with personalized hyperlinks. In case
of ambiguity, the respective panelist is asked to clarify its input. There is no limit to the number of success
factors an expert can mention. In total 47 success factors are mentioned. Once the initial gathering is finished,
these 47 success are qualitatively investigated (Mayring, 2000, pp. 4–5) to check for duplicates and grouping
possibilities (e.g. ‘Leadership needs to role model the change’ and ‘Active leadership’ could possibly be
merged to one group of success factors named ‘Leadership involvement’). Further researchers review the
results independently to ensure consistency. A description for each category is created. After this, every
expert verifies if their mentioned success factors are correctly reflected in the grouping logic.
(2.2) After consolidation and grouping, 12 CSF remain. This means that there is no need to reduce the
number of success factors for the upcoming ranking round further.
(2.3) In the next round the panelists are asked to rank the list of 12 CSF. They start with the one they
believe is most important as first ranked, down to the one which they believe is least important as twelfth
ranked. In addition, the experts can justify their decision. After a ranking round, the experts are provided with
(3.1) Mean rank; (3.2) Kendall’s W; (3.3) Top-half rank and (3.4) Relevant comments/justifications.
We conduct two rounds of ranking. After the first round, the level of consensus is perceived as moderate
(W1 = 0.51). Therefore, we provide the experts with results (3.1-3.4) of the first round and conduct another
ranking round. It results in a strong consensus (W2 = 0.73). As for round 2 already several reminders were
necessary to keep the experts motivated and impatience was expressed by some experts we decide to not do a
third round as the results are deemed a satisfactory compromise between consensus and indulgence of
respondents.
3.3 Phase 4 – Data presentation
In the data presentation phase the final study results are presented. Regarding the choice of experts, this
includes (4.1.1) The response rate for the initial call for participation (as indication if experts consider the
exercise as relevant/important); (4.1.2) The number of participants for each Delphi round (as indication of
flagging interest and for replicable calculations); and (4.1.3) The documentations of profiles of participating
experts.
Regarding results, sub-results, and calculations, it is necessary to provide sufficient raw data to support
the accountability of statistics. At least the (4.2.1) Final whole rank; (4.2.2) Mean ranks for each round;
(4.2.3) Evolution of ranks of an item in each round; and (4.2.4) Kendall’s W for each round should be
reported. Additionally, the total number of issues generated in the first phase of data collection (2.1), and
transparency on consensus level of the pared list at the end of the second phase (2.2) need to be reported.
Guiding example: (4.1.1) The response rate for the initial participation call was around 83% (from the
first 12 experts asked, 10 took part. The two experts who have not taken part are replaced by two experts
from the remaining list). (4.1.2) In total, two ranking-type Delphi rounds were necessary. All 12 experts have
taken part in the two Delphi rounds. (4.1.3) Table 1 illustrates how the profiles of experts could be depicted
for this example.
Table 1. Template to illustrate expert profiles of Delphi panel.
Describtion/
expert ID
Role/Position
Main country
of involvement
Eperience
(# of projects)
Expert 1 Partner Germany 11
Expert 2 Partner Sweden 10
Expert 3 Senior expert UK 20+
Expert 4 Senior expert Germany 30+
Expert 5 Partner Spain 12
Expert 6 Partner Norway 10
Expert 7 Partner Czech Republic 30+
Expert 8 Partner Germany 9
Expert 9 Partner France 10
Expert 10 Partner Sweden 20+
Expert 11 Partner UK 8
Expert 12 Partner Denmark 7
(4.2.1-4.2.4) The number of CSF generated in the first phase of data collection (step 2.1) is 47. After
consolidation and grouping (step 2.2), all 47 CSF can be assigned to 12 groups. This means there was full
consensus on the list shown in Table 2. The remaining study results can be obtained from Table 3. As the
order of the mean ranks (4.2.2) did not change in between the two rounds, we omit the information on
evolution in ranks (4.2.3).
Table 2. Remaining CSF after consolidation (step 2.2).
Success factor/
expert ID
Expert 1
Expert 2
Expert 3
Expert 4
Expert 5
Expert 6
Expert 7
Expert 8
Expert 9
Expert 10
Expert 11
Expert 12
∑
%
CSF 1 x x x x x x x x x x 10 83%
CSF 4 x x x x x x x x x x 10 83%
CSF 6 x x x x x x x x 867%
CSF 2 x x x x x x x 758%
CSF 9 x x x x x x x 758%
CSF 12 x x x x x x 650%
CSF 8 x x x x x x 650%
CSF 3 x x x x x x 650%
CSF 10 x x x x x 542%
CSF 11 x x x x x 542%
CSF 5 x x x x x 542%
CSF 7 x x x x 433%
Total 6 5 9 4 8 7 7 8 5 9 5 6
Table 3. Results of ranking (step 2.3) rounds 1 and 2 of the Delphi study.
Success
factor
Mean rank
(round 1)
Mean rank
(round 2)
Final rank
CSF 6 4,38 2,45 1
CSF 9 5,87 3,81 2
CSF 1 6,74 4,56 3
CSF 2 6,98 5,03 4
CSF 4 7,23 5,56 5
CSF 12 7,56 6,32 6
CSF 8 8,12 6,89 7
CSF 11 8,89 7,66 8
CSF 10 9,10 9,33 9
CSF 3 9,23 10,11 10
CSF 7 10,11 10,89 11
CSF 5 10,21 11,01 12
Kendall's W 0,51 0,73
Based on these results the discussion of the findings would take place. However, as this is highly content-
and less process-related, this is out of scope for the paper at hand.
4. ANALYTICAL EXTENSION FOR RANKING TYPE DELPHI STUDIES
USING BEST/WORST SCALING
IS ranking-type Delphi studies use several ranking mechanisms. These include direct ranking of items
(Kasi et al., 2008); ranking based on ratings on a predefined scale – for example on a Likert scale (Liu et al.,
2010; Nakatsu and Iacovou, 2009); or ranking based on expert allocation of points from a predefined pool
(Nevo and Chan, 2007). However, all these mechanisms do have several well-known and documented
disadvantages related to response style biases. Paulhus (1991, p. 17) enumerates the three most prominent
response style biases as (1) Social desirability bias (tendency to lie or fake); (2) Acquiescence bias (tendency
to agree); and (3) Extreme response bias (tendency to use extreme ratings). A way to overcome these biases
in ranking-type Delphi studies is the introduction of Best/Worst scaling as a ranking mechanism (Lee et al.,
2008, p. 335). Since a subjective preference order can be calculated based on numerous smaller decisions, it
gets much harder for the panelist to predict/deliberately influence the final ranking list. In order to apply
Best/Worst Scaling to the introduced step-by-step guide, an extension of it is proposed as shown in Figure 3.
Fi
gure 3. Extended guideline for ranking-type Delphi study featuring Best/Worst Scaling for Data analysis.
Extended Phase 3 – Data analysis: (3.1 – new) In order to use Best/Worst Scaling as ranking
mechanism in ranking-type Delphi studies, proper (a) Design, (b) Execution, and (c) Analysis need to be
defined.
(a) Regarding design, the list of all remaining issues (result of phase 2.2) serves as body of items. In
addition, it needs to be decided on the number of sets (questions) experts get asked; the number of items per
set; and the appearance of items in sets considering frequency, orthogonality, connectivity and positional
balance (compare section 2.2).
(b) Regarding execution, it should be easy for the experts to take the questionnaire. While it would be
possible to use paper and pen or regular e-mail communication for this, a web-based version of the
questionnaire seems to be the most appropriate form for taking the survey.
(c) Regarding analysis, several possible options exist to transform the results of Best/Worst Scaling on
individual level to a ranking. The simplest option is to calculate ‘best minus worst’ (# of times when issue
was selected best - # of times when issue was selected worst). However, more sophisticated options include
for example the application of linear probability models, conditional logit models, or rank-ordered logit
models (Louviere et al., 2013, pp. 295–296).
(3.2-3.5) The steps of Mean rank, ‘Top-half’ rank and Kendall’s W and comments by participants are
identically to the previously introduced step-by-step guide.
Guiding Example: (3.1 – new) While it is possible to manually design and conduct Best/Worst Scaling,
we follow Louviere et al. (2013, p. 295) and use more sophisticated statistical software in order to prepare for
non-trivial analyses. We decided to use a web-based solution from Sawtooth Software (Sawtooth, 2015) to
design and analyze our proposed ranking. We did so, as provided technical papers (Sawtooth, 2013) offered
transparent and sufficient information on functionality and proficiency of the software.
(a) Design: As the body of items consists of 12 CSF, we decide to let the experts rank 12 sets (Sawtooth,
2013). In each set, four items are shown of which the experts select the best and the worst CSF. The
appropriate appearance (frequency, orthogonality, connectivity and positional balance) of CSFs in the sets
was ensured by the software. Figure 4 provides an example for a set.
Figure 4. Example of set in web-survey.
(b) Execution: We decided for a web-survey as this provided the participants freedom to take the survey
whenever and wherever they wanted. In addition, it turned out that a smartphone compatible version is
important.
(c) Analysis: Using statistical software we calculated the results for each individual expert based on
conditional logit models, which can be used to investigate choice behavior (McFadden, 1974, pp. 105–106).
Table 4 provides a sample result for an expert. The row ‘Value’ provides the zero-centered interval scores
derived by the conditional logit model. A higher value indicates more importance of the CSF. In our example
Table 4 is calculated for each expert and used to obtain respective ranking of CSF.
Table 4. Result for expert ranking derived using Best/Worst Scaling.
Rank 1 2 3 4 5 6 7 8 9 10 11 12
CSF 3 1 8 11 10 9 6 4 2 5 7 12
Value 49,0 23,4 23,4 3,0 0,1 - 2,1 - 2,7 - 3,6 - 4,0 - 5,2 -30,3 -51,0
(3.2-3.5) After having obtained the ranking of each individual expert with the help of a conditional logit
model, we can obtain mean rank, ‘Top-half’ rank and Kendall’s W and comments by participants as
described in above step-by-step guide.
5. CONCLUSION
The study at hand contributes to IS research twofold.
First, it introduces a rigorous step-by-step approach to conduct ranking-type Delphi studies, which are
used widely in IS research. In addition, a guiding example is introduced which provides possible
visualizations and helps IS scholars to think and act end-product oriented, while conducting a ranking-type
Delphi study.
Second, it introduces an analytical extension for ranking-type Delphi studies which can further increase
rigor. The level of consensus should neither be influenced by response style bias, nor by the researchers’
persistency (i.e., by asking experts again and again until some of them give in to achieve consensus) but only
by the real opinions of the experts. Through the introduction of Best/Worst Scaling as ranking mechanism the
problem of illusionary consensus of Delphi panelists can be reduced. In Best/Worst Scaling the experts are
forced to choose a CSF deliberately as most/least important compared to other CSFs. In this way a subjective
preference order can be calculated and it gets much harder for the panelist to predict/deliberately influence its
final ranking list. In addition, the guiding example introduces an easy-to-conduct, time efficient and cognitive
low demanding way for panelists to conduct a ranking based on Best/Worst Scaling.
While the study was inspired by a real world research example, its main limitation is that it has not yet
been applied in full to a real world research question. While we are confident that the described process
would yield expected results, future research still needs to prove that.
In addition, future research needs to investigate in detail if Best/Worst Scaling could even replace more of
phase 3 (Data analysis) as the only ranking mechanism. While Kendall’s W is a well-established and used
measure of consensus in ranking-type Delphi studies, future research should investigate what other measures
would be suitable with the additional data on preferences gained by described ranking mechanism.
REFERENCES
Cohen, E. (2009) ‘Applying best‐worst scaling to wine marketing’, International Journal of Wine Business Research,
vol. 21, no. 1, pp. 8–23 [Online]. DOI: 10.1108/17511060910948008.
Cohen, S. and Orme, B. (2004) ‘What's your preference? Asking survey respondents about their preferences creates new
scaling decisions’, Marketing Research, vol. 16, pp. 32–37.
Dalkey, N. and Helmer, O. (1963) ‘An experimental application of the Delphi method to the use of experts’, Management
Science, vol. 9, no. 3, pp. 458–467.
Delbecq, A. L., Van de Ven, A. H. and Gustafson, D. H. (1975) Group techniques for program planning: A guide to
nominal group and Delphi processes, Scott, Foresman Glenview, IL.
Gallego, D. and Bueno, S. (2014) ‘Exploring the application of the Delphi method as a forecasting tool in Information
Systems and Technologies research’, Technology Analysis & Strategic Management, vol. 26, no. 9, pp. 987–999.
Kasi, V., Keil, M., Mathiassen, L. and Pedersen, K. (2008) ‘The post mortem paradox: a Delphi study of IT specialist
perceptions’, European Journal of Information Systems, vol. 17, no. 1, pp. 62–78 [Online]. Available at http://
dx.doi.org/10.1057/palgrave.ejis.3000727.
Keil, M., Lee, H. K. and Deng, T. (2013) ‘Understanding the most critical skills for managing IT projects: A Delphi
study of it project managers’, Information & Management, vol. 50, no. 7, pp. 398–414.
Kendall, M. G. and Gibbons, J. D. (1990) Rank correlation methods, 5th edn, London, New York, NY, E. Arnold;
Oxford University Press.
Lee, J. A., Soutar, G. and Louviere, J. (2008) ‘The best-worst scaling spproach: An alternative to Schwartz's values
survey’, Journal of Personality Assessment, vol. 90, no. 4, pp. 335–347.
Liu, S., Zhang, J., Keil, M. and Chen, T. (2010) ‘Comparing senior executive and project manager perceptions of IT
project risk: a Chinese Delphi study’, Information Systems Journal, vol. 20, no. 4, pp. 319–355 [Online].
DOI: 10.1111/j.1365-2575.2009.00333.x.
Louviere, J., Lings, I., Islam, T., Gudergan, S. and Flynn, T. (2013) ‘An introduction to the application of (case 1) best–
worst scaling in marketing research’, International Journal of Research in Marketing, vol. 30, no. 3, pp. 292–303.
Mayring, P. (2000) ‘Qualitative Content Analysis’, Forum: Qualitative Social Research, vol. 1, no. 2, pp. 1–10.
McFadden, D. (1974) ‘Conditional logit analysis of qualitative choice behavior’, in Zarembka, P. (ed) Frontiers in
econometrics, New York, NY: Academic Press, pp. 105–142.
Nakatsu, R. T. and Iacovou, C. L. (2009) ‘A comparative study of important risk factors involved in offshore and
domestic outsourcing of software development projects: A two-panel Delphi study’, Information & Management,
vol. 46, no. 1, pp. 57–68 [Online]. DOI: 10.1016/j.im.2008.11.005.
Nevo, D. and Chan, Y. E. (2007) ‘A Delphi study of knowledge management systems: Scope and requirements’,
Information & Management, vol. 44, no. 6, pp. 583–597 [Online]. DOI: 10.1016/j.im.2007.06.001.
Okoli, C. and Pawlowski, S. D. (2004) ‘The Delphi method as a research tool: An example, design considerations and
applications’, Information & Management, vol. 42, no. 1, pp. 15–29 [Online]. DOI: 10.1016/j.im.2003.11.002.
Päivärinta, T., Pekkola, S. and Moe, C. E. (2011) ‘Complexity in IS programs: A Delphi study. ICIS 2011 proceedings.’.
Paré, G., Cameron, A.-F., Poba-Nzaou, P. and Templier, M. (2013) ‘A systematic assessment of rigor in information
systems ranking-type delphi studies’, Information & Management, vol. 50, no. 5, pp. 207–217.
Paulhus, D. L. (1991) ‘Measurement and control of response bias’, in Robinson, J. P., Shaver, P. R. and Wrightsman, L.
S. (eds) Measures of personality and social psychological attitudes, New York: Academic, pp. 17–59.
Piccinini, E., Gregory, R. and Muntermann, J. (2014) ‘Complexity in IS programs: A Delphi study. ECIS 2014
proceedings.’.
Polónia, F. and Sá-Soares, F. de (2013) ‘Key issues in Information Systems security management. ICIS 2013
proceedings.’.
Rauch, W. (1979) ‘The decision Delphi’, Technological Forecasting and Social Change, vol. 15, no. 3, pp. 159–169
[Online]. DOI: 10.1016/0040-1625(79)90011-8.
Sawtooth (2013) The maxdiff system technical paper. Access date: 25.08.2015 [Online]. Available at https://
www.sawtoothsoftware.com/download/techpap/maxdifftech.pdf (Accessed 15 September 2015).
Sawtooth (2015) SSI web: maxdiff. Access date: 15.08.2015. [Online]. Available at https://www.sawtoothsoftware.com/
products/maxdiff-software/maxdiff (Accessed 15 September 2015).
Schmidt, R. C. (1997) ‘Managing delphi surveys using nonparametric statistical techniques’, Decision Sciences, vol. 28,
no. 3, pp. 763–774 [Online]. DOI: 10.1111/j.1540-5915.1997.tb01330.x.
Witkin, B. R. and Altschuld, J. W. (1995) Planning and conducting needs assessments: A practical guide, Thousand
Oaks, CA, Sage Publications.