Page 1
657
How Similarity Helps to Efficiently Compute
Kemeny Rankings
∗
Nadja Betzler
Institut für Informatik
Friedrich-Schiller-Universität
Jena
Ernst-Abbe-Platz 2
D-07743 Jena, Germany
betzler@minet.uni-
jena.de
Jiong Guo
Institut für Informatik
Friedrich-Schiller-Universität
Jena
Ernst-Abbe-Platz 2
D-07743 Jena, Germany
guo@minet.uni-jena.de
Michael R. Fellows
PC Research Unit
Office of DVC (Research)
University of Newcastle
Callaghan, NSW 2308,
Australia
michael.fellows@newcastle.edu.au
Rolf Niedermeier
Institut für Informatik
Friedrich-Schiller-Universität
Jena
Ernst-Abbe-Platz 2
D-07743 Jena, Germany
niedermeier@minet.uni-
jena.de
Frances A. Rosamond
PC Research Unit
Office of DVC (Research)
University of Newcastle
Callaghan, NSW 2308,
Australia
frances.rosamond@newcastle.edu.au
ABSTRACT
The computation of Kemeny rankings is central to many
applications in the context of rank aggregation. Unfortu-
nately, the problem is NP-hard. We show that the Kemeny
score (and a corresponding Kemeny ranking) of an election
can be computed efficiently whenever the average pairwise
distance between two input votes is not too large. In other
words, Kemeny Score is fixed-parameter tractable with
respect to the parameter“average pairwise Kendall-Tau dis-
tance da”. We describe a fixed-parameter algorithm with
running time 16?da?· poly. Moreover, we extend our stud-
ies to the parameters“maximum range”and“average range”
of positions a candidate takes in the input votes. Whereas
Kemeny Score remains fixed-parameter tractable with re-
spect to the parameter “maximum range”, it becomes NP-
complete in case of an average range value of two. This
excludes fixed-parameter tractability with respect to the pa-
rameter “average range” unless P=NP.
Categories and Subject Descriptors
F.2.2 [Theory of Computation]: Analysis of Algorithms
and Problem Complexity—Nonnumerical Algorithms and Prob-
lems; G.2.1 [Mathematics of Computing]: Discrete Math-
ematics—Combinatorics; I.2.8 [Computing Methodolo-
gies]: Artifical Intelligence—Problem Solving, Control Meth-
ods, and Search; J.4 [Computer Applications]: Social
∗Most of the results of this paper have been presented at
COMSOC’08 under the title“Computing Kemeny Rankings,
Parameterized by the Average KT-Distance”.
and Behavioral Sciences
General Terms
Algorithms
Keywords
Rank aggregation, NP-hard problem, exact algorithm, fixed-
parameter tractability, structural parameterization
1. INTRODUCTION
Aggregating inconsistent information has many applica-
tions ranging from voting scenarios to meta search engines
and fighting spam [1, 8, 11, 14]. In some sense, one deals
with consensus problems where one wants to find a solution
to various“input demands”such that these demands are met
as well as possible. Naturally, contradicting demands cannot
be fulfilled at the same time. Hence, the consensus solution
has to provide a balance between opposing requirements.
The concept of Kemeny consensus (or Kemeny ranking) is
among the most important conflict resolution proposals in
this context. In this paper, extending and improving previ-
ous results [3], we study new algorithmic approaches based
on parameterized complexity analysis [13, 17, 21] for effi-
ciently computing optimal Kemeny consensus solutions in
practically relevant special cases. To this end, we employ
the “similarity” between votes by measuring their average
pairwise distance.
Kemeny’s voting scheme can be described as follows. An
election (V,C) consists of a set V of n votes and a set C of
m candidates. A vote is a preference list of the candidates,
that is, a permutation on C. For instance, in the case of
three candidates a,b,c, the order c > b > a would mean that
candidate c is the best-liked and candidate a is the least-
liked for this voter. A “Kemeny consensus” is a preference
list that is“closest”to the preference lists of the voters. For
Cite as: How Similarity Helps to Effi ciently Compute Kemeny Rank-
ings, Nadja Betzler, Michael R. Fellows, Jiong Guo, Rolf Niedermeier,
Frances A. Rosamond, Proc. of 8th Int. Conf. on Autonomous Agents
and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and
Castelfranchi (eds.), May, 10–15, 2009, Budapest, Hungary, pp. 657–664
Copyright © 2009, International Foundation for Autonomous Agents
and Multiagent Systems (www.ifaamas.org), All rights reserved.
Page 2
AAMAS 2009 • 8th International Conference on Autonomous Agents and Multiagent Systems • 10–15 May, 2009 • Budapest, Hungary
658
each pair of votes v,w, the so-called Kendall-Tau distance
(KT-distance for short) between v and w, also known as the
inversion distance between two permutations, is defined as
KT-dist(v,w) =
X
{c,d}⊆C
dv,w(c,d),
where the sum is taken over all unordered pairs {c,d} of
candidates, and dv,w(c,d) is 0 if v and w rank c and d in the
same order, and 1 otherwise. Using divide-and-conquer, the
KT-distance can be computed in O(m·logm) time [20]. The
score of a preference list l with respect to an election (V,C)
is defined asP
its scoreP
is as follows:
v∈VKT-dist(l,v). A preference list l with the
minimum score is called a Kemeny consensus of (V,C) and
v∈VKT-dist(l,v) is the Kemeny score of (V,C),
denoted as K-score(V,C). The underlying decision problem
Kemeny Score
Input: An election (V,C) and an integer k > 0.
Question: Is K-score(V,C) ≤ k?
Known results. Bartholdi et al. [2] showed that Kemeny
Score is NP-complete, and it remains so even when re-
stricted to instances with only four votes [14, 15]. Given the
computational hardness of Kemeny Score on the one side
and its practical relevance on the other side, polynomial-
time approximation algorithms have been studied. The Ke-
meny score can be approximated to a factor of 8/5 by a
deterministic algorithm [23] and to a factor of 11/7 by a
randomized algorithm [1]. Recently, a polynomial-time ap-
proximation scheme (PTAS) has been developed [19]. How-
ever, its running time is completely impractical. Conitzer,
Davenport, and Kalagnanam [11, 8] performed computa-
tional studies for the efficient exact computation of a Ke-
meny consensus, using heuristic approaches such as greedy
and branch-and-bound. Their experimental results encour-
age the search for practically relevant, efficiently solvable
special cases. These experimental investigations focus on
computing strong admissible bounds for speeding up search-
based heuristic algorithms. In contrast, our focus is on exact
algorithms with provable asymptotic running time bounds
for the developed algorithms.
provided further, exact classifications of the classical com-
putational complexity of Kemeny elections.
ically, whereas Kemeny Score is NP-complete, they pro-
vided PNP
?
sions of the problem. Very recently, a parameterized com-
plexity study based on various problem parameterizations
has been initiated [3]. There, fixed-parameter tractabil-
ity results for the parameters “Kemeny score”, “number of
candidates”and“maximum KT-distance between two input
votes” are reported.
Finally, it is interesting to note that Conitzer [7] uses a
(different) notion of similarity (which is, furthermore, im-
posed on candidates rather than voters) to efficiently com-
pute the closely related Slater rankings. Using the concept
of similar candidates, he identifies efficiently solvable spe-
cial cases, also yielding a powerful preprocessing technique
for computing Slater rankings.
Hemaspaandra et al. [18]
More specif-
-completeness results for other, more general ver-
New results. Our main result is that Kemeny Score can
be solved in 16?da?· poly(n,m) time, where da denotes the
average KT-distance between the pairs of input votes. This
means a significant improvement over the previous algorithm
for the maximum KT-distance dmax between pairs of input
votes, which has running time (3dmax+ 1)! · poly(n,m) [3].
Clearly, da ≤ dmax.
can show that Kemeny Score can be solved in 32rmax·
poly(n,m) time, where rmax denotes the maximum range of
candidate positions of an election (see Section 2 for a formal
definition). In contrast, these two fixed-parameter tractabil-
ity results are complemented by an NP-completeness result
for the case of an average range of candidate positions of only
two, thus destroying hopes for fixed-parameter tractability
with respect to this parameterization.
In addition, using similar ideas, we
2. PRELIMINARIES
Let the position of a candidate c in a vote v, denoted
by v(c), be the number of candidates that are better than c
in v. That is, the leftmost (and best) candidate in v has
position 0 and the rightmost has position m − 1. For an
election (V,C) and a candidate c ∈ C, the average posi-
tion pa(c) of c is defined as
pa(c) :=1
n·
X
v∈V
v(c).
For an election (V,C), the average KT-distance da is de-
fined as1
1
n(n − 1)·
da :=
X
u,v∈V,u=v
KT-dist(u,v).
Note that an equivalent definition is given by
da :=
1
n(n − 1)·
X
a,b∈C
#v(a > b) · #v(b > a),
where for two candidates a and b the number of votes in
which a is ranked better than b is denoted by #v(a > b).
The latter definition is useful if the input is provided by the
outcomes of the pairwise elections of the candidates includ-
ing the margins of victory. Furthermore, we define
d := ?da?.
Further, for an election (V,C) and for a candidate c ∈ C,
the range r(c) of c is defined as
r(c) := max
v,w∈V{|v(c) − w(c)|} + 1.
The maximum range rmax of an election is given by rmax :=
maxc∈Cr(c) and the average range ra is defined as
ra :=1
m
X
c∈C
r(c).
Finally, we briefly introduce the relevant notions of pa-
rameterized complexity theory [13, 17, 21]. Parameterized
algorithmics aims at a multivariate complexity analysis of
problems. This is done by studying relevant problem param-
eters and their influence on the computational complexity
of problems. The hope lies in accepting the seemingly in-
evitable combinatorial explosion for NP-hard problems, but
confining it to the parameter. Thus, the decisive question
is whether a given parameterized problem is fixed-parameter
1To simplify the presentation, the following definition counts
the pair (u,v) as well as the pair (v,u), thus having to divide
by n(n − 1) to obtain the correct average distance value.
Page 3
Nadja Betzler, Michael R. Fellows, Jiong Guo, Rolf Niedermeier, Frances A. Rosamond • How Similarity Helps to Effi ciently Compute Kemeny Rankings
659
v1
: a > b > c > d > e > f > ...
...
a > b > c > d > e > f > ...
b > a > d > c > f > e > ...
...
b > a > d > c > f > e > ...
vi
vi+1
:
:
v2i
:
Figure 1: Small maximum range but large average
KT-distance.
tractable (FPT) with respect to the parameter. In other
words, for an input instance I together with the parame-
ter k, we ask for the existence of a solving algorithm with
running time f(k)·poly(|I|) for some computable function f.
3. ON PARAMETERIZATIONS OF
KEMENY SCORE
This section discusses the “art” of finding different, prac-
tically relevant parameterizations of Kemeny Score. Our
paper focusses on structural parameterizations, that is, struc-
tural properties of input instances that may be exploited to
develop efficient solving algorithms for Kemeny Score. To
this end, here we investigate the realistic scenario (which, to
some extent, is also motivated by previous experimental re-
sults [11, 8]) that the given preference lists of the voters show
some form of similarity. More specifically, we consider the
parameters “average KT-distance” between the input votes,
“maximum range of candidate positions”, and“average range
of candidate positions”. Clearly, the maximum value is al-
ways an upper bound for the average value. The parameter
“average KT-distance”reflects the situation that in an ideal
world all votes would be the same, and differences occur to
some (limited) form of noise which makes the actual votes
different from each other (see [12, 10, 9]). With average
KT-distance as parameter we can affirmatively answer the
question whether a consensus list that is closest to the input
votes can efficiently be found. By way of contrast, the pa-
rameterization by position range rather reflects the situation
that whereas voters can be more or less decided concerning
groups of candidates (e.g., political parties), they may be
quite undecided and, thus, unpredictable, concerning the
ranking within these groups. If these groups are small this
can also imply small range values, thus making the quest for
a fixed-parameter algorithm in terms of range parameteri-
zation attractive.
It is not hard to see, however, that the parameterizations
by “average KT-distance” and by “range of position” can
significantly differ. As described in the following, there are
input instances of Kemeny Score that have a small range
value and a large average KT-distance, and vice versa. This
justifies separate investigations for both parameterizations;
these are performed in Sections 4 and 5, respectively. We
end this section with some concrete examples that exhibit
the announced differences between our notions of vote sim-
ilarity, that is, our parameters under investigation. First,
we provide an example where one can observe a small max-
imum candidate range whereas one has large average KT-
distance, see Figure 1. The election in Figure 1 consists
of n = 2i votes such that there are two groups of i identi-
v1
v2
v?
:
:
:
a
b
a
...
>
>
>
b
c
b
>
>
>
c
d
c
>
>
>
d
e
d
>
>
>
e
f
e
>
>
>
f>
>
>
...
...
f
a
1
...
Figure 2: Small average KT-distance but large max-
imum range.
cal votes. The votes of the second group are obtained from
the first group by swapping neighboring pairs of candidates.
Clearly, the maximum range of candidates is 2. However,
for m candidates the average KT-distance da is
da =2 · (n/2)2· (m/2)
n(n − 1)
> m/4
and, thus, da is unbounded for an unbounded number of
candidates.
Second, we present an example where the average KT-
distance is small but the maximum range of candidates is
large, see Figure 2. In the election of Figure 2 all votes
are equal except that candidate a is at the last position in
the second vote, but on the first position in all other votes.
Thus, the maximum range equals the range of candidate a
which equals the number of candidates, whereas by adding
more copies of the first vote the average KT-distance can be
made smaller than one.
Finally, we have a somewhat more complicated example
displaying a case where one observes small average KT-
distance but large average range of candidates.2
end, we make use of the following construction based on an
election with m candidates. Let Vm be a set of m votes
such that every candidate is in one of the votes at the first
and in one of the votes at the last position; the remaining
positions can be filled arbitrarily. Then, for some N > m3,
add N further votes VN in which all candidates have the
same arbitrary order. Then, the average KT-distance of the
constructed election is
To this
da = D(Vm) + D(VN) + D(VN,Vm),
where D(Vm) (D(VN)) is the average KT-distance within
the votes of Vm (VN) and D(VN,Vm) is the average KT-
distance between pairs of votes with one vote from VN and
the other vote from Vm. Since m2is an upper bound for the
pairwise (and average) KT-distance between any two votes,
it holds that D(Vm) ≤ m2, D(VN) = 1, and D(VN,Vm) ≤
m2. Further, we have m · (m − 1) ordered pairs of votes
within Vm, N ·m pairs between VN and Vm, and N ·(N −1)
pairs within VN. Since N > m3it follows that
da ≤m(m − 1) · m2+ Nm · m2+ N(N − 1) · 1
N(N − 1)
In contrast, the range of every candidate is m, thus the
average range is m.
≤ 3.
4.PARAMETERAVERAGEKT-DISTANCE
2Clearly, this example also exhibits the situation of a
large maximum candidate range with a small average KT-
distance. We chose nevertheless to present the example from
Figure 2 because of its simplicity.
Page 4
AAMAS 2009 • 8th International Conference on Autonomous Agents and Multiagent Systems • 10–15 May, 2009 • Budapest, Hungary
660
In this section, we further extend the range of parameteri-
zations studied so far (see [3]) by giving a fixed-parameter al-
gorithm with respect to the parameter“average KT-distance”.
We start with showing how the average KT-distance can be
used to upper-bound the range of positions that a candidate
can take in any optimal Kemeny consensus. Based on this
crucial observation, we then state the algorithm.
4.1 A Crucial Observation
Our fixed-parameter tractability result with respect to the
average KT-distance of the votes is based on the following
lemma.
Lemma 1. Let da be the average KT-distance of an elec-
tion (V,C) and d = ?da?. Then, in every optimal Kemeny
consensus l, for every candidate c ∈ C with respect to its
average position pa(c) we have pa(c) − d < l(c) < pa(c) + d.
Proof. The proof is by contradiction and consists of two
claims: First, we show that we can find a vote with Ke-
meny score less than d · n, that is, the Kemeny score of the
instance is less than d · n. Second, we show that in every
Kemeny consensus every candidate is in the claimed range.
More specifically, we prove that every consensus in which the
position of a candidate is not in a“range d of its average po-
sition”has a Kemeny score greater than d·n, a contradiction
to the first claim.
Claim 1: K-score(V,C) < d · n.
Proof of Claim 1: To prove Claim 1, we show that there
is a vote v ∈ V withP
By definition,
X
⇒ ∃v ∈ V with da ≥
w∈VKT-dist(v,w) < d · n, implying
this upper bound for an optimal Kemeny consensus as well.
da =
1
n(n − 1)·
v,w∈V,v=w
KT-dist(v,w) (1)
1
n(n − 1)· n ·
X
w∈V,v=w
KT-dist(v,w)
(2)
=
1
n − 1·
X
KT-dist(v,w).
w∈V,v=w
KT-dist(v,w) (3)
⇒ ∃v ∈ V with da· n >
X
w∈V,v=w
(4)
Since we have d = ?da?, Claim 1 follows directly from
Inequality (4).
The next claim shows the given bound on the range of pos-
sible candidates positions.
Claim 2: In every optimal Kemeny consensus l, every
candidate c ∈ C fulfills pa(c) − d < l(c) < pa(c) + d.
Proof of Claim 2: We start by showing that, for every
candidate c ∈ C, we have
K-score(V,C) ≥
X
v∈V
|l(c) − v(c)|. (5)
Note that, for every candidate c ∈ C, for two votes v,w
we must have KT-dist(v,w) ≥ |v(c) − w(c)|. Without loss
of generality, assume that v(c) > w(c). Then, there must
be at least v(c) − w(c) candidates that have a smaller po-
sition than c in v and that have a greater position than c
in w. Further, each of these candidates increases the value of
KT-dist(v,w) by one. Based on this, Inequality (5) directly
follows as, by definition, K-score(V,C) =P
the positions in l such that l(c) = 0. Accordingly, we shift
the positions in all votes in V , that is, for every v ∈ V
and every a ∈ C, we decrease v(a) by the original value
of l(c). Clearly, shifting all positions does not affect the rel-
ative differences of positions between two candidates. Then,
let the set of votes in which c has a nonnegative position
be V+and let V−denote the remaining set of votes, that
is, V−:= V \V+.
Now, we show that if candidate c is placed outside of
the given range in an optimal Kemeny consensus l, then
K-score(V,C) > d · n. The proof is by contradiction. We
distinguish two cases:
v∈VKT-dist(v,l).
To simplify the proof of Claim 2, in the following, we shift
Case 1: l(c) ≥ pa(c) + d.
As l(c) = 0, in this case pa(c) becomes negative. Then,
0 ≥ pa(c) + d ⇔ −pa(c) ≥ d.
It follows that |pa(c)| ≥ d. The following shows that Claim 2
holds for this case.
X
v∈V
|l(c) − v(c)| =
X
X
v∈V
|v(c)|
(6)
=
v∈V+
|v(c)| +
X
v∈V−
|v(c)|. (7)
Next, replace the termP
this, use the following, derived from the definition of pa(c):
X
⇔
v∈V−
= n · |pa(c)| +
v∈V−|v(c)| in (7) by an equiv-
v∈V+|v(c)|. For
alent term that depends on |pa(c)| andP
n · pa(c) =
X
v∈V+
|v(c)| −
X
X
X
v∈V−
|v(c)|
|v(c)| = n · (−pa(c)) +
v∈V+
|v(c)|
v∈V+
|v(c)|.
The replacement results in
X
v∈V
|l(c) − v(c)| = 2 ·
X
v∈V+
|v(c)| + n · |pa(c)|
≥ n · |pa(c)| ≥ n · d.
This says that K-score(V,C) ≥ n · d, a contradiction to
Claim 1.
Case 2: l(c) ≤ pa(c) − d.
Since l(c) = 0, the condition is equivalent to 0 ≤ pa(c)−d ⇔
d ≤ pa(c), and we have that pa(c) is nonnegative. Now, we
show that Claim 2 also holds for this case.
X
v∈V
|l(c) − v(c)| =
X
X
v∈V
|v(c)| =
X
X
v∈V+
|v(c)| +
X
v∈V−
|v(c)|
≥
v∈V+
v(c) +
v∈V−
v(c) = pa(c) · n ≥ d · n.
Thus, also in this case, K-score(V,C) ≥ n·d, a contradic-
tion to Claim 1.
Page 5
Nadja Betzler, Michael R. Fellows, Jiong Guo, Rolf Niedermeier, Frances A. Rosamond • How Similarity Helps to Effi ciently Compute Kemeny Rankings
661
Based on Lemma 1, for every position we can define the
set of candidates that can take this position in an optimal
Kemeny consensus. The subsequent definition will be useful
for the formulation of the algorithm.
Definition 1. Let (V,C) be an election. For every inte-
ger i ∈ {0,...,m − 1}, let Pi denote the set of candidates
that can assume the position i in an optimal Kemeny con-
sensus, that is, Pi := {c ∈ C | pa(c) − d < i < pa(c) + d}.
Using Lemma 1, we can easily show the following.
Lemma 2. For every position i, |Pi| ≤ 4d.
Proof. The proof is by contradiction. Assume that there
is a position i with |Pi| > 4d. Due to Lemma 1, for every
candidate c ∈ Pi the positions which c may assume in an
optimal Kemeny consensus can differ by at most 2d−1. This
is true because, otherwise, candidate c could not be in the
given range around its average position. Then, in a Kemeny
consensus, each of the at least 4d + 1 candidates must hold
a position that differs at most by 2d−1 from position i. As
there are only 4d − 1 such positions (2d − 1 on the left and
2d − 1 on the right of i), one obtains a contradiction.
4.2Basic Idea of the Algorithm
In Subsection 4.4, we will present a dynamic programming
algorithm for Kemeny Score. It exploits the fact that every
candidate can only appear in a fixed range of positions in
an optimal Kemeny consensus.3The algorithm “generates”
a Kemeny consensus from the left to the right. It tries out
all possibilities for ordering the candidates locally and then
combines these local solutions to yield an optimal Kemeny
consensus.
More specifically, according to Lemma 2, the number of
candidates that can take a position i in an optimal Kemeny
consensus for any 0 ≤ i ≤ m−1 is at most 4d. Thus, for po-
sition i, we can test all possible candidates. Having chosen a
candidate for position i, the remaining candidates that could
also assume i must either be left or right of i in a Kemeny
consensus. Thus, we test all possible two-partitionings of
this subset of candidates and compute a “partial” Kemeny
score for every possibility. For the computation of the par-
tial Kemeny scores at position i we make use of the partial
solutions computed for the position i − 1.
4.3Definitions for the Algorithm
To state the dynamic programming algorithm, we need
some further definitions. For i ∈ {0,...,m − 1}, let I(i)
denote the set of candidates that could be “inserted” at po-
sition i for the first time, that is,
I(i) := {c ∈ C | c ∈ Pi and c / ∈ Pi−1}.
Let F(i) denote the set of candidates that must be “forgot-
ten” at latest at position i, that is,
F(i) := {c ∈ C | c / ∈ Pi and c ∈ Pi−1}.
3In contrast, the previous dynamic programming algo-
rithms [3] for the parameters“maximum range of candidate
positions” and “maximum KT-distance” rely on decompos-
ing the input whereas here we rather have a decomposition
of the score into partial scores. Further, here we obtain a
much better running time by using a more involved dynamic
programming approach.
For our algorithm, it is essential to subdivide the overall
Kemeny score into partial Kemeny scores (pK). More pre-
cisely, for a candidate c and a subset R of candidates with
c / ∈ R, we set
pK(c,R) :=
X
c?∈R
X
v∈V
dR
v(c,c?),
where for c / ∈ R and c?∈ R we have dR
have c > c?, and dR
partial Kemeny score denotes the score that is“induced”by
candidate c and the candidate subset R if the candidates
of R have greater positions than c in an optimal Kemeny
consensus.4Then, for a Kemeny consensus l := c0 > c1 >
··· > cm−1, the overall Kemeny score can be expressed by
partial Kemeny scores as follows.
v(c,c?) := 0 if in v we
v(c,c?) := 1, otherwise. Intuitively, the
K-score(V,C) =
m−2
X
i=0
m−1
X
j=i+1
X
v∈V
dv,l(ci,cj) (8)
=
m−2
X
m−2
X
i=0
X
c?∈R
X
v∈V
dR
v(ci,c?) for R := {cj | i < j < m}
(9)
=
i=0
pK(ci,{cj | i < j < m}). (10)
Next, consider the corresponding three-dimensional dy-
namic programming table T. Roughly speaking, define an
entry for every position i, every candidate c that can as-
sume i, and every candidate subset C?⊆ Pi\{c}.
entry stores the “minimum partial Kemeny score” over all
possible orders of the candidates of C?under the condition
that c takes position i and all candidates of C?take positions
smaller than i. To define the dynamic programming table
formally, we need some further notation.
Let Π(C?) denote the set of all possible orders of the candi-
dates in C?, where C?⊆ C. Further, consider a Kemeny con-
sensus in which every candidate of C?has a position smaller
than every candidate in C\C?. Then, the minimum partial
Kemeny score restricted to C?is defined as
(
s=1
The
min
(d1>d2>···>dx)∈Π(C?)
x
X
pK(ds,{dj | s < j < m} ∪ (C\C?))
)
with x := |C?|. That is, it denotes the minimum partial
Kemeny score over all orders of C?. We define an entry of
the dynamic programming table T for a position i, a candi-
date c ∈ Pi, and a candidate subset P?
For this, we define L :=S
stricted to the candidates in L∪ {c} under the assumptions
that c is at position i in a Kemeny consensus, all candidates
of L have positions smaller than i, and all other candidates
have positions greater than i. That is, for |L| = i−1, define
i⊆ Pi with c / ∈ P?
i. Then, an entry
i.
j≤iF(j) ∪ P?
T(i,c,P?
i) denotes the minimum partial Kemeny score re-
T(i,c,P?
i) :=min
(d1>···>di−1)∈Π(L)
+ pK(c,C\(L ∪ {c})).
Dynamic Programming Algorithm
i−1
X
s=0
pK(ds,C\{dj | j ≤ s})
4.4
4By convention and somewhat counterintuitively, we say
that a candidate c has a greater position than a candidate c?
in a vote if c?> c.
Page 6
AAMAS 2009 • 8th International Conference on Autonomous Agents and Multiagent Systems • 10–15 May, 2009 • Budapest, Hungary
662
Input: An election (V,C) and, for every 0 ≤ i < m, the
set Piof candidates that can assume position i in an optimal
Kemeny consensus.
Output: The Kemeny score of (V,C).
Initialization:
01 for i = 0,...,m − 1
02for all c ∈ Pi
03 for all P?
04T(i,c,P?
05 for all c ∈ P0
06T(0,c,∅) := pK(c,C\{c})
Update:
07 for i = 1,...,m − 1
08 for all c ∈ Pi
09for all P?
10if |P?
and T(i − 1,c?,(P?
11
i⊆ Pi\{c}
i) := +∞
i⊆ Pi\{c}
j≤iF(j)| = i − 1
i∪ F(i))\{c?}) is defined then
i) =min
c?∈P?
+pK(c,(Pi∪
i<j<m
i∪S
T(i,c,P?
i∪F(i)T(i − 1,c?,(P?
i∪ F(i))\{c?})
[
I(j))\(P?
i∪ {c}))
Output:
12 K-score = minc∈Pm−1T(m − 1,c,Pm−1\{c})
Figure 3: Dynamic programming algorithm for Ke-
meny Score
The algorithm is displayed in Figure 3.
modify the algorithm such that it outputs an optimal Ke-
meny consensus: for every entry T(i,c,P?
has to store a candidate c?that minimizes T(i − 1,c?,(P?
F(i))\{c?}) in line 11. Then, starting with a minimum en-
try for position m−1, one reconstructs an optimal Kemeny
consensus by iteratively adding the“predecessor”candidate.
The asymptotic running time remains unchanged. More-
over, in several applications, it is useful to compute not
just one optimal Kemeny consensus but to enumerate all of
them. At the expense of an increased running time, which
clearly depends on the number of possible optimal consensus
rankings, our algorithm can be extended to provide such an
enumeration by storing all possible predecessor candidates.
It is easy to
i), one additionally
i∪
Lemma 3. The algorithm in Figure 3 correctly computes
Kemeny Score.
Proof. For the correctness, we have to show two points:
First, all table entries are well-defined, that is, for an en-
try T(i,c,P?
i) concerning position i there must be exactly
i − 1 candidates that have positions smaller than i. This
condition is assured by line 10 of the algorithm.5
Second, we must ensure that our algorithm finds an opti-
mal solution. Due to Equality (10), we know that the Ke-
meny score can be decomposed into partial Kemeny scores.
5It can still happen that a candidate takes a position out-
side of the required range around its average position. Since
such an entry cannot lead to an optimal solution according
to Lemma 1, this does not affect the correctness of the algo-
rithm. To improve the running time it would be convenient
to “cut away” such possibilities. We leave considerations in
this direction to future work.
Thus, it remains to show that the algorithm considers a de-
composition that leads to an optimal solution. For every
position, the algorithm tries all candidates in Pi. According
to Lemma 1, one of these candidates must be the “correct”
candidate c for this position. Further, for c we can observe
that the algorithm tries a sufficient number of possibilities
to partition all remaining candidates C\{c} such that they
have either smaller or greater positions than i. More pre-
cisely, every candidate from C\{c} must be in exactly one
of the following three subsets:
1. The set F of candidates that have already been forgot-
ten, that is, F :=S
2. The set of candidates that can assume position i, that
is, Pi\{c}.
3. The set I of candidates that are not inserted yet, that
is, I :=S
Due to Lemma 1 and the definition of F(j), we know that
a candidate from F cannot take a position greater than i−1
in an optimal Kemeny consensus. Thus, it is sufficient to ex-
plore only those partitions in which the candidates from F
have positions smaller than i. Analogously, one can argue
that for all candidates in I, it is sufficient to consider parti-
tions in which they have positions greater than i. Thus, it
remains to try all possibilities for partitioning the candidates
from Pi. This is done in line 09 of the algorithm. Thus, the
algorithm returns an optimal Kemeny score.
0≤j≤iF(j).
i<j<mI(j).
Theorem 1. Kemeny Score can be solved in O(16d·
(d2· m + d · m2logm · n) + n2· mlogm) time with average
KT-distance da and d = ?da?.
programming table is O(16d· d · m).
Proof. The dynamic programming procedure requires
the set of candidates Pi for 0 ≤ i < m as input. To deter-
mine Pi for all 0 ≤ i < m, one needs the average positions
of all candidates and the average KT-distance da of (V,C).
To determine da, compute the pairwise distances of all pairs
of votes. As there are O(n2) pairs and the pairwise KT-
distance can be computed in O(mlogm) time [20], this takes
O(n2· mlogm) time. The average positions of all candi-
dates can be computed in O(n · m) time by iterating once
over every vote and adding the position of every candidate
to a counter variable for this candidate. Thus, the input
for the dynamic programming algorithm can be computed
in O(n2· mlogm) time.
Concerning the dynamic programming algorithm itself,
due to Lemma 2, for 0 ≤ i < m, the size of Pi is upper-
bounded by 4d. Then, for the initialization as well as for
the update, the algorithm iterates over m positions, 4d can-
didates, and 24dsubsets of candidates. Whereas the initial-
ization in the innermost instruction (line 04) can be done in
constant time, in every innermost instruction of the update
phase (line 11) one has to look for a minimum entry and one
has to compute a pK-score. To find the minimum, one has to
consider all candidates from P?
set of Pi−1, it can contain at most 4d candidates. Further,
the required pK-score can be computed in O(n · mlogm)
time.Thus, for the dynamic programming we arrive at
the running time of O(m · 4d · 24d· (4d + n · mlogm)) =
O(16d· (d2· m + d · m2logm · n)).
Concerning the size of the dynamic programming table,
there are m positions and any position can be assumed by
The size of the dynamic
i∪F(i). As P?
i∪F(i) is a sub-
Page 7
Nadja Betzler, Michael R. Fellows, Jiong Guo, Rolf Niedermeier, Frances A. Rosamond • How Similarity Helps to Effi ciently Compute Kemeny Rankings
663
at most 4d candidates. The number of considered subsets is
bounded from above by 24d. Hence, the size of the table T
is O(16d· d · m).
Finally, let us discuss the differences between the dynamic
programming algorithm used for the “maximum pairwise
KT-distance”in [3] and the algorithm presented in this work.
In [3], the dynamic programming table stored all possible
orders of the candidates of a given subset of candidates. In
this work, we eliminate the need to store all orders by using
the decomposition of the Kemeny score into partial Kemeny
scores. This allows us to restrict the considerations for a
position to a candidate and its order relative to all other
candidates.
5.SMALL CANDIDATE RANGE
In this section, we consider two further parameterizations,
namely“maximum range”and“average range”of candidates.
As exhibited in Section 3, the range parameters in gen-
eral are“orthogonal”to the distance parameterizations dealt
with in Section 4. Whereas for the parameter “maximum
range” we can obtain fixed-parameter tractability by using
the dynamic programming algorithm given in Figure 3, the
Kemeny Score problem becomes NP-complete already in
case of an average range of two.
5.1 Parameter Maximum Range
In the following, we show how to bound the number of
candidates that can assume a position in an optimal Kemeny
consensus by a function of the maximum range. This enables
the application of the algorithm from Figure 3.
Lemma 4. Let rmax be the maximum range of an elec-
tion (V,C). Then, for every candidate its relative order in
an optimal consensus with respect to all but at most 3rmax
candidates can be computed in O(n · m2) time.
Proof. We use an observation that follows directly from
the Extended Condorcet criterion [22]: If for two candi-
dates b,c ∈ C we have v(b) > v(c) for all v ∈ V , then in
every Kemeny consensus l it holds that l(b) > l(c). Thus, it
follows that for b,c ∈ C with maxv∈V v(b) < minv∈V v(c),
in an optimal Kemeny consensus l we have l(b) < l(c). That
is, for two candidates with “non-overlapping range” their
relative order in an optimal Kemeny consensus can be de-
termined using this observation. Clearly, all these candidate
pairs can be computed in O(n · m2) time.
Next, we show that for every candidate c there are at most
3rmax candidates whose range overlaps with the range of c.
The proof is by contradiction. Let the range of c go from
position i to j, with i < j. Further, assume that there is
a subset of candidates S ⊆ C with |S| ≥ 3rmax + 1 such
that for every candidate s ∈ S there is a vote v ∈ V with
i ≤ v(s) ≤ j. Now, consider an arbitrary input vote v ∈ V .
Since there are at most 3rmax positions p with i − rmax ≤
p ≤ j + rmax for one candidate s ∈ S it must hold that
v(s) < i − rmax or v(s) > j + rmax. Thus, the range of s is
greater than rmax, a contradiction. Hence, there can be at
most 3rmax candidates that have a position in the range of c
in a vote v ∈ V . As described above, for all other candidates
we can compute the relative order in O(n·m2) time. Hence,
the lemma follows.
As a direct consequence of Lemma 4, we conclude that
every candidate can assume one of at most 3rmaxconsecutive
positions in an optimal Kemeny consensus. Recall that for
a position i the set of candidates that can assume i in an
optimal consensus is denoted by Pi(see Definition 1). Then,
using the same argument as in Lemma 2, one obtains the
following.
Lemma 5. For every position i, |Pi| ≤ 6rmax.
In complete analogy to Theorem 1, one arrives at the fol-
lowing.
Theorem 2. Kemeny Score can be solved in O(32rmax·
(r2
maximum range rmax. The size of the dynamic programming
table is O(32rmax· rmax· m).
5.2Parameter Average Range
max· m + rmax · m2logm · n) + n2· mlogm) time with
Theorem 3. Kemeny Score is NP-complete for elec-
tions with average range two.
Proof. The proof uses a reduction from an arbitrary in-
stance ((V,C),k) of Kemeny Score to a Kemeny Score-
instance ((V?,C?),k) with average range less than two. The
construction of the election (V?,C?) is given in the follow-
ing. To this end, let ai,1 ≤ i ≤ |C|2, be new candidates not
occurring in C.
• C?:= C ? {ai | 1 ≤ i ≤ |C|2}.
• For every vote v = c1 > c2 > ··· > cm in V , put the
vote v?:= c1 > c2 > ··· > cm > a1 > a2 > ··· > am2
into V?.
It follows from the extended Condorcet criterion [22] that
if a pair of candidates has the same order in all votes, it must
have this order in a Kemeny consensus as well. Thus, in a
Kemeny consensus it holds that ai > aj for i > j and, there-
fore, adding the candidates from C?\C does not increase
the Kemeny score. Hence, an optimal Kemeny consensus
of size k for (V?,C?) can be transformed into an optimal
Kemeny consensus of size k for (V,C) by deleting the can-
didates of C?\C. The average range of (V?,C?) is bounded
as follows:
1
m + m2·
c∈C?
0
c∈C
1
m + m2· (m2+ m2) < 2.
ra =
X
@X
r(c)
=
1
m + m2·
r(c) +
X
c∈C?\C
r(c)
1
A
≤
Clearly, the reduction can be easily modified to work for
every constant value of at least two by choosing a C?of
appropriate size.
6.CONCLUSION
Compared to earlier work [3], we significantly improved
the running time for the natural parameterization “maxi-
mum KT-distance”for the Kemeny Score problem. There
have been some experimental studies [11, 8] that hinted that
the Kemeny problem is easier when the votes are close to a
consensus and, thus, tend to have a small average distance.
Our results for the average distance parameterization can
Page 8
AAMAS 2009 • 8th International Conference on Autonomous Agents and Multiagent Systems • 10–15 May, 2009 • Budapest, Hungary
664
also be regarded as a theoretical explanation with provable
guarantees for this behavior. Moreover, we provided fixed-
parameter tractability in terms of the parameter“maximum
range of positions”, whereas this is excluded for the parame-
ter “average range of positions” unless P=NP. These results
are of particular interest because we indicated in Section 3
that the parameters“position range”and“pairwise distance”
are independent of each other.
As challenges for future work, we envisage the following:
• Extend our findings to the Kemeny Score problem
with input votes that may have ties or that may be
incomplete (also see [3]).
• Improve the running time as well as the memory con-
sumption (which is exponential in the parameter)—we
believe that still significant improvements are possible.
• Implement the algorithms, perhaps including heuris-
tic improvements of the running times, and perform
experimental studies.
• Investigate typical values of the average KT-distance
and the maximum candidate range, either under some
distributional assumption or for real-world data.
Finally, we want to advocate parameterized algorithmics [13,
17, 21] as a very helpful tool for better understanding and
exploiting the numerous natural parameters occuring in vot-
ing szenarios with associated NP-hard combinatorial prob-
lems. Only few investigations in this direction have been
performed so far, see, for instance [4, 5, 6, 16].
7. ACKNOWLEDGEMENTS
We are grateful to an anonymous referee of COMSOC
2008 for constructive feedback. This work was supported
by the DFG, research project DARE, GU 1023/1, Emmy
Noether research group PIAF, NI 369/4, and project PALG,
NI 369/8 (Nadja Betzler and Jiong Guo). Michael R. Fellows
and Frances A. Rosamond were supported by the Australian
Research Council. This work was done while Michael Fel-
lows stayed in Jena as a recipient of the Humboldt Research
Award of the Alexander von Humboldt foundation, Bonn,
Germany.
8. ADDITIONAL AUTHORS
9.REFERENCES
[1] N. Ailon, M. Charikar, and A. Newman. Aggregating
inconsistent information: ranking and clustering.
Journal of the ACM, 55(5), 2008. Article 23 (October
2008).
[2] J. Bartholdi III, C. A. Tovey, and M. A. Trick. Voting
schemes for which it can be difficult to tell who won
the election. Social Choice and Welfare, 6:157–165,
1989.
[3] N. Betzler, M. R. Fellows, J. Guo, R. Niedermeier,
and F. A. Rosamond. Fixed-parameter algorithms for
Kemeny scores. In Proc. of 4th AAIM, volume 5034 of
LNCS, pages 60–71. Springer, 2008.
[4] N. Betzler, J. Guo, and R. Niedermeier.
Parameterized computational complexity of Dodgson
and Young elections. In Proc. of 11th SWAT, volume
5124 of LNCS, pages 402–413. Springer, 2008.
[5] N. Betzler and J. Uhlmann. Parameterized complexity
of candidate control in elections and related digraph
problems. In Proc. of 2nd COCOA ’08, volume 5165
of LNCS, pages 43–53. Springer, 2008.
[6] R. Christian, M. R. Fellows, F. A. Rosamond, and
A. Slinko. On complexity of lobbying in multiple
referenda. Review of Economic Design, 11(3):217–224,
2007.
[7] V. Conitzer. Computing Slater rankings using
similarities among candidates. In Proc. 21st AAAI,
pages 613–619. AAAI Press, 2006.
[8] V. Conitzer, A. Davenport, and J. Kalagnanam.
Improved bounds for computing Kemeny rankings. In
Proc. 21st AAAI, pages 620–626. AAAI Press, 2006.
[9] V. Conitzer, M. Rognlie, and L. Xia. Preference
functions that score rankings and maximun likelihood
estimation. In Proc. of 2nd COMSOC, pages 181–192,
2008.
[10] V. Conitzer and T. Sandholm. Common voting rules
as maximum likelihood estimators. In Proc. of 21st
UAI, pages 145–152. AUAI Press, 2005.
[11] A. Davenport and J. Kalagnanam. A computational
study of the Kemeny rule for preference aggregation.
In Proc. 19th AAAI, pages 697–702. AAAI Press,
2004.
[12] M. J. A. N. de Caritat (Marquis de Condorcet). Essai
sur l’application de l’analyse ` a la probabilit´ e des
d´ ecisions redues ` a la pluralit´ e des voix. Paris:
L’Imprimerie Royal, 1785.
[13] R. G. Downey and M. R. Fellows. Parameterized
Complexity. Springer, 1999.
[14] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar.
Rank aggregation methods for the Web. In Proc. of
10th WWW, pages 613–622, 2001.
[15] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar.
Rank aggregation revisited, 2001. Manuscript.
[16] P. Faliszewski, E. Hemaspaandra, L. A.
Hemaspaandra, and J. Rothe. Llull and Copeland
voting broadly resist bribery and control. In Proc. of
22nd AAAI, pages 724–730. AAAI Press, 2007.
[17] J. Flum and M. Grohe. Parameterized Complexity
Theory. Springer, 2006.
[18] E. Hemaspaandra, H. Spakowski, and J. Vogel. The
complexity of Kemeny elections. Theoretical Computer
Science, 349:382–391, 2005.
[19] C. Kenyon-Mathieu and W. Schudy. How to rank with
few errors. In Proc. 39th STOC, pages 95–103. ACM,
2007.
[20] J. Kleinberg and E. Tardos. Algorithm Design.
Addison Wesley, 2006.
[21] R. Niedermeier. Invitation to Fixed-Parameter
Algorithms. Oxford University Press, 2006.
[22] M. Truchon. An extension of the Condorcet criterion
and Kemeny orders. Technical report, cahier 98-15 du
Centre de Recherche en´Economie et Finance
Appliqu´ ees, Universit´ e Laval, Qu´ ebec, Candada, 1998.
[23] A. van Zuylen and D. P. Williamson. Deterministic
algorithms for rank aggregation and other ranking and
clustering problems. In Proc. 5th WAOA, volume 4927
of LNCS, pages 260–273. Springer, 2007.
Download full-text