ArticlePDF Available

Abstract and Figures

This study assessed the consistency with which aggressive behavior occurred across 3 different provocation tests that are currently used in practice to evaluate the behavior and safety of dogs. The aim of this study was not to validate the tests, but to evaluate tests that are not validated but are nevertheless being used in a legal context in Switzerland, by investigating the hypothesis that 3 different approaches, all claiming to correctly evaluate the behavior of dogs, should be expected to show significant agreement. The same 60 dogs were tested in 3 behavioral tests being used in Switzerland at the time of this study in the year 2003 (Test A: Test of the American Staffordshire Terrier Club; Test B: Halterprüfung; Test C: Test of the Canton of Basel-Stadt). “Intraspecific behavior” and “interspecific behavior toward humans” that might relate to potential aggressive behavior were of particular interest.
Content may be subject to copyright.
Evaluating aggressive behavior in dogs: a comparison
of 3 tests
Dr. Maya Bra
, Dr. Marcus G. Doherr
, Dr. Doris Lehmann
, Prof. Daniel Mills
Prof. Andreas Steiger
Animal Behaviour and Welfare Group, Department of Biological Sciences, University of Lincoln, United Kingdom;
Department of Clinical Veterinary Science, Animal Neurology, University of Berne, Berne, Switzerland; and
Division of Animal Housing and Welfare, Vetsuisse Faculty, University of Berne, Berne, Switzerland.
Abstract This study assessed the consistency with which aggressive behavior occurred across 3 differ-
ent provocation tests that are currently used in practice to evaluate the behavior and safety of dogs. The
aim of this study was not to validate the tests, but to evaluate tests that are not validated but are nev-
ertheless being used in a legal context in Switzerland, by investigating the hypothesis that 3 different
approaches, all claiming to correctly evaluate the behavior of dogs, should be expected to show signif-
icant agreement. The same 60 dogs were tested in 3 behavioral tests being used in Switzerland at the
time of this study in the year 2003 (Test A: Test of the American Staffordshire Terrier Club; Test B:
¨fung; Test C: Test of the Canton of Basel-Stadt). ‘‘Intraspecific behavior’’ and ‘‘interspecific
behavior toward humans’’ that might relate to potential aggressive behavior were of particular interest.
The observed agreement among the 3 tests was compared relative to chance using a ktest. Signif-
icant but low levels of agreement were found among the 3 tests for the criterion ‘‘intraspecific behav-
ior’’ (k50.133, P5.014), with the highest correlation between Tests A and B (k50.345, P,.001)
and for the criterion ‘‘interspecific behavior’’ (k50.135, P50. 014), with Tests A and B (k50.220,
P5.005) showing the highest correlation. However, significant absolute values of kwere low in all
cases. In a further analysis, dogs evaluated to show no signs of potential aggression in the test situations
by all 3 tests were eliminated, and the results of the remaining dogs (‘‘interspecific behavior,’’ n 523;
‘intraspecific behavior,’’ n 529) were assessed for disagreement in pairwise combinations using a
McNemar chi-square test. No significant levels of disagreement were found for ‘intraspecific behav-
ior,’ however, for ‘‘interspecific behavior,’’ Tests A and B (P5.035), and Tests B and C (P,.001)
differed significantly, with no significant difference between Tests A and B (P50.11). The inconsis-
tency of the results from different tests suggests test bias at the very least and questions the validity of
these tests. Further work examining the validity of each individual test is warranted if they are to be
used in a legal context.
Ó2008 Elsevier Inc. All rights reserved.
Aggression is one of the most common reasons for
the referral of dogs to behavior specialists (Blackshaw,
1988; Beaver, 1994; Askew, 1996; Stafford, 1996;
Address reprint requests and correspondence to: Dr. Maya Bra
¨m, Animal
Behaviour and Welfare Group, Department of Biological Sciences,
University of Lincoln, Riseholme Park, Lincoln, LN2 2LG, UK, Tel: 144
1522 895 481.
1558-7878/$ -see front matter Ó2008 Elsevier Inc. All rights reserved.
Journal of Veterinary Behavior (2008) 3, 152-160
Landsberg et al., 1997; Overall, 1997) and for dogs to be
surrendered to shelters (Christensen et al., 2007). Reports
and articles in the popular press about dangerous dogs
and dog attacks inevitably give rise to public concern,
and such media reports can lead to political responses
aimed at controlling the problem owing to perceived public
pressure for such action. These decisions may not be based
on scientific knowledge. In Great Britain, the Dangerous
Dog Act of 1997 was enacted in response to such reports
and later amended because of serious flaws (Klaassen
et al., 1996), as a result of its unusually quick passage
through parliament. In Germany, controversial lists of
‘dangerous’’ dog breeds have been established, and several
behavioral tests have been developed, dog taxes raised, and
new laws passed. A similar situation exists in France. Swiss
legislators have also acted on this problem in response to
public demands. In Switzerland, the Cantons of Basel-Stadt
and Basel-Land have introduced a breed list and have made
an unvalidated test obligatory for several breeds. Other
cantons are developing other solutions, such as evaluations
by individual behavior experts. However, the same question
is raised in all these countries and cantons: how and who is
to decide which dogs are dangerous?
The current rather ad hoc situation, with a plethora of
approaches and possible tests administered by a wide
variety of different kinds of ‘‘experts,’’ inspired us to look
more closely at the methods used to assess ‘‘aggressive
behavior.’’ Since 2000 several studies were run in Germany
focusing on the behavior tests which are partially compul-
sory for certain breeds in certain ‘‘Bundesla
¨nder’’ (federal
states) (Mittmann, 2002; Bruns, 2003). Mittmann (2002)
studied the occurrence of aggressive behaviors shown by
14 listed breeds in the test of Niedersachsen (Niedersa
sisches Ministerium fu
¨r Erna
¨hrung, Landwirtschaft und
Forsten, 2001) and states that although the behavior test
is useful in triggering and evaluating aggressive behaviors
in dogs, she found that only 5% of the dogs belonging to
the 2 listed categories in Germany showed ‘‘inadequate
aggressive behaviors’’ in this test. No significant difference
was found between the 2 categories of listed dogs, and
hence there was no basis to describe these breeds as
‘more dangerous’’ nor prioritize them for testing. Bruns’s
(2003) results show that the probability of a dog reacting
aggressively in everyday situations is strongly influenced
by its handler and the handler’s knowledge about dogs.
At the time of this study, only 1 test was scientifically
validated to test for aggressive behavior in dogs (Netto and
Planta, 1997), but it had not been widely adopted. Three
other methods currently used to evaluate dog behavior in
Switzerland were chosen for evaluation in this study.
Because none of these 3 methods had been scientifically
validated, the use of such tests to inform legal opinion is
a cause for concern. Given the use of these tests in a legal
context, it is important that the information they provide is
at least reliable, and so it was hypothesized that there
should be substantial agreement between them. Therefore,
the aim of this study was to assess the intertest reliability
of these procedures.
The intertest reliability was studied by having the same
60 dogs assessed for the 2 criteria ‘‘interspecific behavior’
and ‘‘intraspecific behavior’’ by all 3 methods and by
comparing the evaluations. The aim of this study was not to
validate the tests, but to evaluate whether these methods
that all claim to accurately evaluate dog behavior come to
the same conclusion and to quantify their level of agree-
ment or disagreement.
Material and methods
Animals and persons
Dogs were recruited through veterinarians, dog clubs,
and animal behavior therapists in Switzerland and num-
bered according to their sequence of application. A total of
60 dogs belonging to 51 owners were tested in the 3 tests;
42 of the owners participated with 1 dog, and 9 with 2 dogs.
As 69 dogs were enrolled, but as some owners withdrew,
the identity number of dogs does go beyond 60.
Of the 60 dogs that participated in this study, 13
belonged to the FCI (Fe
´ration Cynologique Internatio-
nale) group 8 (retrievers, flushing dogs, and water dogs), 19
to the FCI group 1 (sheepdogs and cattle dogs), 11 were
mixed breeds, and 17 belonged to other pure breeds of the
other FCI groups. Twenty-six were males (11 neutered),
and 34 were females (24 spayed). The ages ranged from 1.5
years to 11 years of age, with a mean age of 4.6 years and a
median age of 3.5 years.
Although 2 of the 3 tests were originally designed for
particular dog breeds belonging to the ‘‘potentially dangerous
dog breeds’’ listed in some cantons of Switzerland, it was
decided not to limit the test to any particular breeds. Svartberg
(2006) did find differences in behavior traits in breeds, how-
ever he states that these are independent of the original histor-
ical function of the breed and are more likely to be related to
the current use of breeding stock. Not limiting the testing to
particular breeds additionally offered the opportunity of test-
ing a larger number of dogs more readily. As the focus of
this study was a comparison of the evaluations of responses
in the 3 tests,every dog was its own control, and so it is argued
that the participating breeds are not of primary importance.
The 3 tests
Three tests were chosen that had similar aims, but were
somewhat different in structure. The 3 tests showed the
following similarities: (1) they all had the objective of
identifying ‘‘potentially dangerous dogs’’; (2) they all
consider ‘‘aggressive behavior’’ as an undesired behavior
in particular situations, because it is inappropriate or
unacceptable in our society and might pose some danger
Bra¨m et al Evaluating aggressive behavior in dogs 153
to other individuals. By contrast, in some tests, such as the
test for German shepherd dogs in Switzerland, a certain
degree of aggressive behavior is desired (Fuchs et al.,
2005); (3) they were all developed by people familiar and
experienced with canine behavior; and (4) they were and
are currently (2007) used in Switzerland. The 3 tests are
structured differently, for example, concerning test location
(open field, fenced area, kennels), duration of the test, and
degree of security for the participants.
Test A: Test of the American Staffordshire
Terrier Club (ASTC) of Switzerland
This test was designed to recognize potentially danger-
ous American Staffordshire terriers or dog–owner teams,
namely, those dogs showing ‘‘undesirable aggressive be-
havior’’ and inappropriate owner behavior. In the regula-
tions of this test, the following characteristics are
mentioned as ‘‘undesirable’’ for the dogs: aggression,
disobedience, fearfulness, and low or too low threshold of
reaction leading to slow or no recovery at all after stress.
The following behaviors are considered as ‘‘undesirable’’
for the owner: uncontrolled and insecure behavior toward
his/her dog, and violation of the animal protection laws.
There are 2 parts to this test; the first is located within a
fenced area where the dog is mostly off-leash, and the
second is on a quiet road bordering the fenced area. The
owners may motivate their dogs as they usually do, that is,
using voice, treats, or toys. In the fenced area:
1. The dog is let free to wander around and investigate the
fenced area (for 1 to 2 minutes), then the owner calls the
dog back.
2. The owner motivates the dog to play (with a toy). Dog
and owner play for about half a minute, then the dog
should stop playing when the owner gives the command.
On command of the expert, the owner resumes playing
with the dog.
3. The owner and the dog (off leash) walk through a group
of 6 to 9 people moving around.
4. The group of people forms a circle around the owner and
dog. The circle of people first closes, then opens around
the dog–owner team, walking slowly the first time, and
running the second time.
5. The owner walks the dog back and forth on the leash. After
walking in one direction, the owner asks the dog to sit,
then continues to walk with the dog. The owner walks
the dog (on the leash) back and forth a second time. On
the way back, the expert walks toward the dog and owner
and first greets the owner by handshake, then greets the
6. The owner walks the dog to the end of the area and puts the
dog into a down position or asks it to sit and gives it the
command to stay. The owner walks about 20 steps away
from the dog. The expert walks past and around the wait-
ing dog several times. The owner goes and gets the dog.
On the road (dog on leash):
7. The owner and dog are passed by a cyclist, then a jogger
from behind and then from the front.
8. The owner and dog pass a ‘‘stimulus’’ dog (9 dogs
shared this job during the test) with a handler standing
to the left. The dog and owner being evaluated turn
around and walk back the way they came, thereby pass-
ing the ‘stimulus’’ dog and handler on their right.
9. The dog is tied to a pole and the owner disappears out of
sight of the dog. The expert walks past the dog noisily
several times.
Test A lasted about 15 minutes in this study.
Test B: The ‘‘Halterpru
¨fung,’’ Switzerland
The ‘‘Halterpru
¨fung’’ (‘‘dog handler test’’) is not an
obligatory test for any dog breed or population category.
The primary aim of this test is to find out if the owner has
his or her dog under control in everyday situations
(‘‘guideability’’). The second goal is to be able to assess
the basic character/nature of the dog and consequently to be
able to decide whether the animal is a risk to its surround-
ings. The test is divided into 2 parts: Part 1, in which the
present state of the dog is determined before it can pass into
the various test conditions of the Part 2, where the dog is off
lead at all times. Within the aspect of ‘‘guideability’’ the
following points are evaluated: (1) general ‘‘guideability’’;
(2) ‘‘guideability’’ and behavior under distraction: (3)
‘guideability’’ and behavior in everyday situations; and
(4) ‘‘guideability’’ and behavior in the presence of conspe-
cifics. Within the aspect of character/nature, the following
points are evaluated: (1) the dog’s reaction to stimuli from
the environment (noise, fast movements, sudden influences,
objects); (2) the dog’s fearfulness; (3) ‘‘nerve stability’’ in
everyday situations; (4) the type of risk the dog poses to its
environment; and (5) remedial measures and recommenda-
tions. The procedure emphasizes obedience and how well
the handler has his or her dog under control, in normal
situations and especially in more stressful situations.
Part 1 involves:
1. Walking on-lead
2. Walking off-lead
3. ‘‘Down’ out of a movement and ‘down’ with
4. Being called back out of a game
Part 2 of the test took place on an open field and on a
path through the woods. The dog is off-lead during the
whole test and is put into the following situations:
1. The dog is confronted with unknown animals (chicken
in a pen, 2 goats tied to a tree trunk).
2. The dog encounters a solitary person riding a bike and
1 person jogging.
3. The dog passes through an active group of people in a
stressful context (ie, several people forming a circle, in
154 Journal of Veterinary Behavior, Vol 3, No 4, July/August 2008
the center of which stand the owner and the dog, the
people in the circle all jump up into the air at the
same time and shout when landing). To relax the situa-
tion, they crouch down afterwards and initiate friendly
physical contact with the dog.
4. The dog is confronted with everyday situations such as a
noisy group of people (shouting, singing, shaking tin
cans filled with pebbles, etc.) that is walking toward
the dog and its owner; the dog must follow its owner
through this group. The group lines up to form a passage
for the dog and owner and when they pass through, the
persons drop their noisy objects (bottles filled with peb-
bles, tin cans).
5. The dog is confronted with conspecifics: the dog and its
owner must walk through a group of people playing with
their dogs.
This test lasted around 10 minutes in this study.
Test C: The Test of the Canton of Basel-Stadt,
The aims of this test are:
1. To recognize potentially dangerous dogs of 6 nominally
‘dangerous dog breeds’’ and dogs of other breeds who
have shown dangerous behavior. This test has the great-
est legal force in the canton of Basel-Stadt, Switzerland.
2. To protect humans and other animals from dog
3. To protect the dog itself from being kept in conditions
not appropriate for its species (Kantonales Veterina
Basel-Stadt, 2001).
Behaviors of both the dog and of the owner are taken into
account in all situations. The following dog behaviors are
noted: normal; aggressive; threatening; fearful; biting with
or without threatening; the dog leading its owner; aggressive
behavior toward the owner; and good, acceptable, or bad
obedience. Concerning the owner, the following are noted:
no reaction to the dog’s behavior; correction of the dog
(verbally or physically); fearful, relaxed, insecure, or
dominant behavior; the owner has to get the dog because
it does not react to the ‘‘come’’ command; and no random or
fearful reaction to or any correction of the dog showing
aggressive behavior toward its owner. Great emphasis is put
on the security of the experts and dogs present by minimiz-
ing possible contact between dog and experts. The experts
and the stimulus dog are both protected by a kennel, and the
owner and dog that are being tested pass by outside these
kennels. The test consists of the following situations:
1. First contact of expert and dog–owner team, separated
by a gate. The expert goes into a kennel and asks the
team to enter the fenced-in area.
2. Owner and dog on the leash walk past the kennels as if
on a normal walk.
3. Owner lets his or her dog off the leash, and both walk
back and forth past the kennels again.
4. A stimulus dog (3 intact males shared this job in this
study) is let out into the second kennel; owner and
dog walk past the kennels again.
5. The expert leaves the kennel and reads the microchip on
the dog’s left shoulder.
6. The owner puts his or her dog on the leash again, and
they walk through a gate and a narrow passageway.
The duration of this test during this study was about
5 minutes per dog.
Time schedule
One test was run per dog per day, with several dogs
assessed each day. The tests were run in a time period of
2 weeks in the month of March 2002 on a dog training
grounds in Du
¨dingen, FR, Switzerland, and on the campus
of the School of Veterinary Medicine, Berne, Switzerland.
The dogs were organized into 6 groups of 10 dogs each. To
control for any possible influence of learning on the dogs’
and handlers’ behaviors, the dogs went through the tests in
different sequences according to a Latin square design. The
owners were not informed about the test results until after
they had taken the last test, and the assessors were asked
not to give the owners any information on how they had
performed during the test, so as to avoid influencing the
owners’ behavior in subsequent tests.
The 3 participating tests had different ways of evaluating
(ie, different test situations) and scoring (ie, numbering,
pass vs non-pass, etc.) to assess different behaviors in dogs,
but they were all forms of a provocation test. It was
therefore proposed to use a common outcome assessment to
compare the results of the 3 tests, based on 1-0 recording at
the end of the test of the dog’s response. Two assessors
were used for each test, with only 1 present in any test
situation, hence no interobserver comparisons were done
within the tests. These individuals were chosen because
they were the usual experts in these tests in the everyday
situation. The experts were therefore very familiar with the
test procedures and experienced with the evaluation of dog
behavior. To avoid bias of their observations, the assessors
were not present at any of the other tests and were not
informed about the evaluations in the other tests. These six
experts (2 experts per test) were asked to evaluate the dogs’
behavior in both intraspecific and interspecific (toward
humans) contexts throughout the test with the following
scoring system (Figure 3): (1) open, friendly, neutral behav-
ior; (2) dominant behavior or mistrustful, fearful behavior,
without aggression; (3) threatening, warning; (4) overt
aggression/attack/biting with threatening/warning; (5) overt
aggression/attack/biting without threatening/warning. This
Bra¨m et al Evaluating aggressive behavior in dogs 155
scale was newly created in collaboration with all 6 experts
and based on an analysis of several tests (among others, the
test of Niedersachsen, Germany [Niedersa
¨chsisches Minis-
terium fu
¨r Erna
¨hrung, Landwirtschaft und Forsten, 2001],
the test for aggressive behavior by Planta and Netto, the
Netherlands [1997; 1999; 2001], and the 3 participating
tests). The assessors used this scoring system to evaluate
the dogs only for this study, hence the usual scoring
systems for the tests were not of relevance for this study.
With every case, the assessors had the option to add
Statistical Analysis
For the comparison of the results of the 3 tests among
the 60 dogs, a ktest statistic was used. This statistic
assesses the level of agreement between raters (in this
study, the 3 tests are considered ‘‘raters’’) that is beyond
what would be expected by chance alone. An extension of k
within the statistics software STATA 7 can be used to study
the agreement of 3 assessments, by calculating a weighted
average of each individual k. Absolute kvalues can range
from 0 to 1. Conventions for the interpretation of ksuggest
less than 0.2 indicates slight agreement, values between
0.21 and 0.40 indicate fair agreement, values between 0.41
and 0.60 indicate moderate agreement, values between 0.61
and 0.80 indicate substantial (high) agreement, and values
over 0.81 indicate very high agreement between 2 raters
(Thursfield, 1995). Two sets of tests were run: (1) a pair-
wise comparison of the agreement of the individual tests,
and (2) a comparison of the agreement of all 3 tests. For
this analysis, the software STATA 7 ( was
To provide a more rigorous assessment of dogs who
reacted with aggression to a test situation in some way,
that is, those showing any evidence of aggressive poten-
tial (including fearful and mistrustful behaviors), we
cross-tabulated the data relating to any dog who was
recorded to have reacted in an aversive way in any of the
tests (ie, any dog that was evaluated by at least 1 test to
not show open, friendly, neutral behavior), and we used
McNemar’s chi-square test of association to undertake
pairwise comparisons. The exclusion of dogs who showed
open, friendly, neutral behaviors in all tests reduced the
risk of error because a large proportion of the population
consistently showed no aggressive potential as a result of
sample bias.
The level of agreement between tests is shown in Tables
1 and 2.Figures 1 and 2 show the results for all dogs. Miss-
ing data, as a result of dogs not showing up for a particular
test or assessors not scoring a dog, are listed as ‘‘missing
values’’ in Tables 1 and 2. The 3 tests appear to agree on
dogs that were evaluated to show open, friendly, neutral
behavior (score 1 on the evaluation scale). These were
the majority of the subjects. Slight agreement between
the results of the 3 tests was suggested by the weighted av-
erage of the individual kfor both the criteria ‘‘intraspecific
behavior’’ (k50.133, P5.014) and ‘‘interspecific behav-
ior toward humans’’ (k50.135, P5.014).
Table 1 Descriptive Results of the Comparison of the Three Tests A, B and C, Intraspecific Behavior towards other Dogs
Absolute values Percentage of total minus missing values
tests same diff missing Total same diff Total
A - C 34 24 2 60 59% 41% 100%
A - B 41 13 6 60 76% 24% 100%
C - B 32 22 6 60 59% 41% 100%
A - C - B 25 29 6 60 46% 54% 100%
Same 5same answers, diff 5different answers, missing 5missing values.
Table 2 Descriptive Results of the Comparison of the Three Tests A, B and C, Interspecific Behavior towards Humans
Absolute values Percentage of total minus missing values
tests same diff missing Total same diff Total
A - C 44 13 3 60 77% 23% 100%
A - B 35 19 6 60 65% 35% 100%
C - B 36 19 5 60 65% 35% 100%
A - C - B 31 25 4 60 55% 45% 100%
Same 5same answers, diff 5different answers, missing 5missing values.
156 Journal of Veterinary Behavior, Vol 3, No 4, July/August 2008
Twenty-three dogs showing some signs of potential
aggression in an interspecific context in at least 1 of the 3
tests and 29 dogs showing some signs of potential aggres-
sion in an intraspecific context were evaluated as a separate
population owing to their potential significance. The score
of the dogs was then collapsed into 2 categories: ‘open,
friendly, neutral behavior’ and ‘potentially aggressive
behavior’ (ie, scores 2 to 5 in the original assessment).
The distribution of subjects between tests was assessed
using McNemar’s test for pairwise comparisons. As the
sample size is rather small (n 523 for interspecific
behavior, n 529 for intraspecific behavior), Pvalues of
around .1 or less were considered to be of interest. A signif-
icant difference was found between Tests A and B (P5
A = Test of the American Staffordshire Terrier Club, B = “Halterprüfung”, C = Test of the Canton of Basel-Stadt
1 = open, friendly, neutral behavior; 2 = dominant behavior, mistrustful, fearful behavior without aggression,
3 = threatening behavior, 4 = attack with threat, 5 = attack without threat
NB: the numbers labelin
the individual do
s are not e
ual to the total number in the
ulation due to some of the do
s not
Interspecific aggression 1-29
aggression scale
aggression scale
1 2 5 6 7 8 9 10121415161719202223242526272830313233343536
dog number
test A
test B
test C
test A
test B
test C
Interspecific aggression 30-60
37 39 40 41 42 43 44 45 46 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
dog number
Figure 1 Interspecific behavior towards humans.
A = Test of the American Staffordshire Terrier Club, B = “Halterprüfung”, C = Test of the Canton of Basel-Stadt
1 = open, friendly, neutral behavior; 2 = dominant behavior, mistrustful, fearful behavior without aggression,
3 = threatening behavior, 4 = attack with threat, 5 = attack without threat
NB: the numbers labelin
the individual do
s are not e
ual to the total number in the
ulation due to some of the do
s not
Intraspecific aggression 1-29
aggression scale
aggression scale
1 2 5 6 7 8 9 10121415161719202223242526272830313233343536
dog number
Intraspecific aggression 30-60
37 39 40 41 42 43 44 45 46 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
dog number
test A
test B
test C
test A
test B
test C
Figure 2 Intraspecific behavior towards other dogs.
Bra¨m et al Evaluating aggressive behavior in dogs 157
.035) and between Tests B and C (P,.001) for the crite-
rion ‘‘interspecific behavior,’’ and the difference between
Tests A and C (P5.109) was inconclusive. The criterion
‘intraspecific behavior’’ showed no convincing evidence
of significant disagreement in all 3 comparisons (P..15
in all cases).
There have been several studies evaluating individual tests
(Serpell and Hsu, 2001; Mittmann, 2003; Bruns, 2003;
Fuchs, 2005), but to the authors’ knowledge, this is the first
report of any attempt to practically compare different forms
of tests currently used in the field to evaluate potential ag-
gressive behaviors in dogs, despite the widespread use of
such tests to make important decisions of both legal and
welfare significance. Diederich et al. (2006) have reviewed
different methodologies used in behavioral evaluations and
found that, at least on a theoretical basis, there is little con-
sensus on the parameters and procedures used to evaluate
behavior in dogs. Taylor and Mills (2006) have emphasized
the need for tests to be validated and have raised concern
over the quality of published tests. These results support
this growing expression of concern over the caliber of
many tests in current use.
Intraspecific behavior of the dog
The weighted average of the individual kover the 3 tests
suggests the 3 tests are not entirely independent, but the
absolute level of agreement is low. Considering that these
tests all evaluated the same 60 dogs and all of the tests
claim to be able to evaluate correctly the behavior of the
dogs, an overall level of agreement of less than 0.2 is of
concern even without significant differences between the
tests. The level of agreement between Tests A and B at 76%
is markedly higher than the agreement of either of these
tests with Test C (both 59%), suggesting that Test C may be
quite different than the other two. The agreement across all
3 tests is even lower (46%). Of further concern is that it is
evident that disagreement mainly concerns dogs that were
evaluated as showing some signs of potential aggressive
behavior by at least one of the tests, that is, unreliability is
greatest with the dogs of potential interest for public safety.
This finding questions whether the raters evaluate behavior
and define behavior terms in the same way and whether it is
possible to generalize a dog’s behavior from one day to
another. The question of being able to generalize behaviors
noted during temperament tests in everyday situations has
been investigated by Christensen et al. (2007). They ran a
prospective study on 67 dogs that had successfully passed
a temperament test at an animal shelter, after these dogs
were re-homed. They found that certain types of aggressive
tendencies (territorial, predatory, and intraspecific aggres-
sion, in particular) were not detected efficiently. Other
studies have found that certain tests are predictive with par-
ticular personality traits but less so with others (Svartberg,
One possible explanation for the higher agreement of
Tests A and B might relate to the way the dog’s behavior
toward other dogs was evaluated in the tests. Tests A and B
both applied no restrictions on the sex of the stimulus dog,
whereas Test C required that only an intact male dog could
be the stimulus dog. The developers of this test argue this is
sufficient, as this is where most of the problems occur.
However, it was in this test that a 4-year-old female intact
dog (dog number 41, Figure 2) passed the test without prob-
lems but later attacked another female neutered dog in-
volved in the experiment.
The surroundings of Test C also differ markedly from
those of the other 2 tests, as the whole test is held within an
enclosed area, and the tested dog is confronted with a
person or dog in a kennel, thereby is separated always from
them by bars. This situation, on the one hand, increases the
security of all individuals involved and makes it easier to
standardize the test, but on the other hand, it is somewhat
artificial and might decrease the capacity to observe the
dog’s behavior in detail.
Interspecific behavior of the tog toward human
Overall agreement between the tests was significantly
better than chance and slightly higher for ‘‘interspecific
behavior toward human beings’’ compared to ‘‘intraspecific
behavior,’’ however the values are still low and there were
significant differences between the tests, with Test B
appearing less similar to the other tests (65% agreement
with both A and C). The 3 tests agreed overall in 55% of the
cases, and this result was 73% between Tests A and C.
Nonetheless, the McNemar test shows a significant differ-
ence between Tests A and C and Tests B and C in the
evaluations of the dogs that did not show open, friendly,
neutral behavior in all 3 tests. In 77% (n 510) of the 13 cases
in which Tests A and C did not agree (Figures 3 and 4), Test A
evaluated the dog higher on the scale, whereas the opposite
was the case in only 23% (n 53). This finding suggests
that Test A was perhaps more provocative than Test C. Test
B evaluated the dogs higher on the scale than Test C in
33% of the cases, with the opposite found in only 2% of the
Criteria Answers
Intraspecific aggression
Interspecific aggression
1 open, friendly, neutral
2 dominant behavior or mistrustful, fearful, without
3 threatening, warning
4 overt aggression / attack / bite with threatening /
5 overt aggression / attack / bite without
threatening / warning
Figure 3 Evaluation score.
158 Journal of Veterinary Behavior, Vol 3, No 4, July/August 2008
cases (Figures 3 and 4). Test B also has the tendency to eval-
uate the dogs higher on the scale than Test A. In Test B, there
are several situations where the interactions are potentially
quite stressful (involving loud noise, people forming a circle
around the dog, people jumping up and down). These stress-
ful situations may be more likely to elicit fearful, defensive
aggressive behavior and have a cumulative effect, increasing
the score of the dogs. Evidence of significant differences be-
tween the tests in the assessment of potential aggressive be-
havior toward human beings is of particular public health
concern. However, the question again arises whether any of
these tests is predictive of real risk in the wider community.
As Christensen et al. (2007) and Svartberg (2005) have
shown, predictability varies with the behavior being evalu-
ated in the tests, and care must be taken to generalize to a
dog’s temperament or character based on the results of a sin-
gle test without taking the dog’s history and behavior in ev-
eryday situations into consideration. The variability of
these results supports this caution in the applicability of
single-test results.
The dogs of most interest to public safety are probably
those that show some sort of averse behavior, since
aggression is frequently a response to perceived aversion.
As all the dogs were recruited by word of mouth or through
dog training schools, the selection of dogs that participated
in this study is not representative of the general population.
It is likely that the population used for this study was biased
in the direction of open, friendly, neutral dogs whose
owners actively work with them, hence the elimination of
all the dogs evaluated by all 3 tests to show open, friendly,
neutral behavior. For the remaining dogs, the McNemar test
suggests a significant disagreement between Tests A and B
and Tests B and C and marginal disagreement between
Tests A and C. This level of disagreement raises the
question of at least one and possibly all 3 tests being
invalid. This question is of importance, because the welfare
of dogs incorrectly evaluated to be a risk is compromised
by potentially imposing unnecessary measures on them,
depriving them of important needs, which potentially also
indirectly influences the owners’ well-being. On the other
hand, animals falsely evaluated to be ‘‘safe’’ are a source of
danger to the public if these tests create a false sense of
security. One aspect not considered in this study is the
repeatability of the tests, that is, whether the results are
similar when the test is repeated. This issue poses some
difficulties when testing behavior, as it is very difficult to
eliminate the influence of the participating animals learning
from one session to the next. A larger population of dogs, if
possible balanced for breed, age, sex, training experience,
and so on would be necessary to test the repeatability of
these tests. Further evidence of the validity, repeatability,
and usefulness of specific tests to evaluate aggressive
behavior and potential dangerousness in dogs is necessary
and urgently required, given current political trends. The
interested reader may refer to Taylor and Mills (2006) for a
guide to the development of valid tests.
In spite of the homogeneity of the population tested, the
reliability and hence the validity of these behavioral tests
used for assessing behavior in dogs in Switzerland should be
questioned with regard to their significance in evaluating
aggressive behavior. We suggest that validity should not be
assumed from the outset. Rather, we suggest validity should
be a prerequisite of use, especially in legal contexts, and at the
very least additional methods of evaluation, such as history of
the dog, its lifestyle, the owner’s behavior, and so on, should
be taken into account when making an assessment.
We thank the Margaret and Francis Fleitmann Founda-
tion, Switzerland for financing this project.
Askew, H.R., 1996. Treatment of Behavioral Problems in Dogs and Cats:
A Guide for the Small Animal Veterinarian. Blackwell Scientific Pub-
lications, Oxford.
Beaver, B.V., 1994. Owner complaints about canine behavior. J. Am. Vet.
Med. Assoc. 204, 1953-1955.
Blackshaw, J.K., 1988. Abnormal behaviour in dogs. Aust. Vet. J. 65,
Bruns, S., 2003. Fu
¨nf Hunderassen und ein Hundetypus im Wesenstest
nach der Niedersa
¨chsischen Gefahrtier-Verordung vom 05.07.2000:
Faktoren, die beissende von nicht-beissenden Hunden unterscheiden,
Inaugural-Dissertation zur Erlangung des Grades einer Doktorin der
¨rmedizin, Hannover.
Christensen, E., Scarlett, J., Campagna, M., Houpt, K.A., 2007. Aggressive
behavior in adopted dogs that passed a temperament test. Appl. Anim.
Behav. Sci. 106, 85-95.
Diederich, C., Giffroy, J., 2006. Behavioural testing in dogs. A review of
methodology in search for standardisation. Appl. Anim. Behav. Sci.
97, 51-72.
Fuchs, T., Gaillard, C., Gebhardt-Henrich, S., Ruefenacht, S., Steiger, A.,
2005. External factors and reproducibility of the behaviour test in
German shepherd dogs in Switzerland. Appl. Anim. Behav. Sci. 94,
Kantonales Veterina
¨ramt Basel-Stadt, 2001. Pressekonferenz ‘‘Potentiell
¨hrliche Hunde: Erfahrungen mit der revidierten Hundegesetzge-
bung,’’ 1. Oktober 2001,
Klaassen, B., Buckley, J.R., Esmail, A., 1996. Does the Dangerous Dog
Act protect against animal attacks: a prospective study of mammalian
bites in the Accident and Emergency department. Injury. 27, 89-91.
Landsberg, G., Hunthausen, W., Ackerman, L., 1997. Handbook of behav-
ior problems of the dog and cat. Butterworth-Heinemann, Oxford.
Mittmann, A., 2002. Untersuchung des Verhaltens von 5 Hunderassen und
einem Hundetypus im Wesenstest nach den Richtlinien der Nieder-
¨chsischen Gefahrtierverordnung, Hannover.
Netto, W.J., Planta, D.J.U., 1997. Behavioural testing for aggression in the
domestic dog. Appl. Anim. Behav. Sci. 52, 243-263.
¨chsisches Ministerium fu
¨r Erna
¨hrung, Landwirtschaft und
Forsten, 2001. Wesenstest fu
¨r Hunde, Zusammenfassung der Referate
des Vets2001 Kongresses, Fribourg, Schweiz, I – XXIX. http://www.
Overall, K.L., 1997. Clinical Behavioral Medicine for Small Animals.
Mosby, Inc., St. Louis, MO.
Bra¨m et al Evaluating aggressive behavior in dogs 159
Serpell, J.A., Hsu, Y., 2001. Development and validation of a novel method
for evaluating behavior and temperament in guide dogs. Appl. Anim.
Behav. Sci. 72, 347-364.
Stafford, K.J., 1996. Opinions of veterinarians regarding aggression in dif-
ferent breeds of dogs. N Z Vet. J. 44, 138-141.
Svartberg, K., 2005. A comparison of behaviour in test and in everyday
life: evidence of three consistent boldness-related personality traits
in dogs. Appl. Anim. Behav. Sci. 91, 103-128.
Svartberg, K., 2006. Breed-typical behaviour in dogsdhistorical remnants
or recent constructs? Appl. Anim. Behav. Sci. 96, 293-313.
Taylor, K., Mills, D.S., 2006. The development and assessment of temper-
ament tests for adult companion dogs. J. Vet. Behav. Clin. Appl. Res.
105, 358-368.
Thursfield, M., 1995. Veterinary Epidemiology, 2
Ed.. Blackwell Science,
London., May 17, 2003.
160 Journal of Veterinary Behavior, Vol 3, No 4, July/August 2008
... x [28] x [45,62] Threatening approach x [69] Post-threat interaction x [56] Problem solving II (Bin) ...
... 10. Obedience (modified from [62]) Aim: to assess the dogs' obedience and distractibility. At the far end of the room, O asked the dog (off lead) to sit, then to lie down and stay. ...
... 15. Ball play (modified from [62]) Aim: to assess the dogs' playfulness. O threw a tennis ball across the room three times. ...
Full-text available
[This corrects the article DOI: 10.1371/journal.pone.0195448.].
... We found four examples that we believed qualified as attempts to measure convergent validity; none was able to establish it convincingly ( Figure 1; Table 1) (Bräm et al., 2008;Bennett et al., 2012;De Meester et al., 2008;Svartberg, 2005). For example, in the study by Bräm et al. (2008), three different tests used in Switzerland were compared in owned dogs. ...
... We found four examples that we believed qualified as attempts to measure convergent validity; none was able to establish it convincingly ( Figure 1; Table 1) (Bräm et al., 2008;Bennett et al., 2012;De Meester et al., 2008;Svartberg, 2005). For example, in the study by Bräm et al. (2008), three different tests used in Switzerland were compared in owned dogs. Behaviors were scored on each test according to a 5-point scale ranging from "open, friendly, neutral" to "overt aggression/attack/biting without threatening/warning". ...
... Kendall's coefficient of concordance was not statistically significant for 4/6 of the aggression characteristics tested, including aggression during handling and aggression when meeting another dog. Bräm et al., 2008 AE For convergent validity, agreement (kappa) for weighted average of pairwise comparisons for intraspecific behavior (k ¼ 0.133, P ¼ 0.14) and interspecific behavior toward humans (k ¼ 0.135, P ¼ 0.014) across the three tests was considered slight. The three tests appeared to agree on dogs that showed open, friendly, neutral behavior, which was most dogs. ...
Full-text available
Conversations with stakeholders, as well as remarks in the literature, suggest that there may be confusion about what can be concluded when a canine behavior evaluation has been described as being “validated,” “reliable,” or “predictive.” To assess the evidence, we searched PubMed and ScienceDirect using the terms “canine,” “behavior evaluation,” “temperament test,” and “shelter” to identify articles that assessed the validity or reliability of evaluations based on battery of tests used or intended for screening shelter dogs for behavior labeled aggressive and/or for adoption suitability. Despite 25+ years of publications, including solid studies performed under good to ideal conditions by skilled investigators, findings indicate there is no evidence that any canine behavior evaluation or individual subtest has come close to meeting accepted standards justifying claims that it is validated for routine use in shelters. Furthermore, the mean reported false-positive error rate in study populations was 35.1%, whereas in more typical shelter populations, it was estimated at 63.8%. We propose that the discrepancy between the actual state of the science and what people assume has been accomplished is primarily due to the following: [1] confusion from mixing colloquial with scientific uses of words such as “validated,” “predictive,” “reliable,” and “agreement”; [2] the limitations of correlation and regression as statistical methods for demonstrating agreement or predictive ability; [3] failure to account for the difference between predictive validity of an instrument in populations of dogs in a research exercise versus predictive ability and error rate for individual dogs in real-world settings; [4] conflating statistical significance with clinical significance; and, as a result of 1-4 aforementioned, [5] conferring overall validation status, despite the results of studies being much more circumscribed. Given their published error rates, one explanation may be that behavior evaluations lack basic face validity and/or a clear focus as to what is being measured and its relevance to postadoption outcomes. This argues against use of any behavior evaluation to make important decisions for shelter dogs, especially if the behavior(s) of concern were only observed during provocative testing. These findings indicate an opportunity to acknowledge what has been learned and bring together all stakeholders to consider the real needs of shelter dogs and what the future might look like.
... x [28] x [45,62] Threatening approach x [69] Post-threat interaction x [56] Problem solving II (Bin) ...
... 10. Obedience (modified from [62]) Aim: to assess the dogs' obedience and distractibility. At the far end of the room, O asked the dog (off lead) to sit, then to lie down and stay. ...
... 15. Ball play (modified from [62]) Aim: to assess the dogs' playfulness. O threw a tennis ball across the room three times. ...
Full-text available
Individual behavioural differences in pet dogs are of great interest from a basic and applied research perspective. Most existing dog personality tests have specific (practical) goals in mind and so focused only on a limited aspect of dogs’ personality, such as identifying problematic (aggressive or fearful) behaviours, assessing suitability as working dogs, or improving the results of adoption. Here we aimed to create a comprehensive test of personality in pet dogs that goes beyond traditional practical evaluations by exposing pet dogs to a range of situations they might encounter in everyday life. The Vienna Dog Personality Test (VIDOPET) consists of 15 subtests and was performed on 217 pet dogs. A two-step data reduction procedure (principal component analysis on each subtest followed by an exploratory factor analysis on the subtest components) yielded five factors: Sociability-obedience, Activity-independence, Novelty seeking, Problem orientation, and Frustration tolerance. A comprehensive evaluation of reliability and validity measures demonstrated excellent inter- and intra-observer reliability and adequate internal consistency of all factors. Moreover the test showed good temporal consistency when re-testing a subsample of dogs after an average of 3.8 years—a considerably longer test-retest interval than assessed for any other dog personality test, to our knowledge. The construct validity of the test was investigated by analysing the correlations between the results of video coding and video rating methods and the owners’ assessment via a dog personality questionnaire. The results demonstrated good convergent as well as discriminant validity. To conclude, the VIDOPET is not only a highly reliable and valid tool for measuring dog personality, but also the first test to show consistent behavioural traits related to problem solving ability and frustration tolerance in pet dogs.
... Similarly, the testing methodology in behavior evaluations appears to be an important factor. Bram et al. (2010) reported that even when conducted on the very same dogs, agreement was low among similar behavioral test batteries commonly used to assess non-food-related human-and dog-directed aggression in Switzerland [10]. ...
Full-text available
It is commonly believed that underweight or emaciated dogs are predisposed to food aggression toward humans. Each year, the American Society for the Prevention of Cruelty to Animals (ASPCA) receives hundreds of dogs from criminal cruelty cases. The dogs range from emaciated to overweight. We analyzed existing data from 900 such dogs to examine the relationship between body condition score and food and chew item aggression toward humans. Across all types of cruelty cases, 9.2% of dogs were aggressive over the food, chew, or both, which is a lower prevalence than that previously reported among shelter dogs. Dogs from cruelty cases originating in New York City were more likely to show aggression over food (z = 3.91, p < 0.001) and chew items (z = 2.61, p = 0.01) than dogs from large-scale cruelty cases, although it is unclear why. Female dogs were less likely to show food (z = −3.75, p < 0.001) and chew item (z = −2.25, p = 0.02) aggression compared to males. Underweight dogs were not more likely to display food aggression, but when they did, the aggression was no more severe than that of normal-weight dogs (Fisher’s exact tests = 0.41 and 0.15 for the Food Bowl and Chew Item scenarios, respectively). Breed type was not a significant predictor of aggression. Canine food aggression does not appear to be an aberrant behavior caused by a history of food scarcity but may be related to biological factors such as sex. These findings could prove useful for animal behavior subject matter experts testifying in court or consulting on cruelty cases, as they could speak with scientific validity to the question of whether there is a link between previous food scarcity and the likelihood of food aggression in dogs.
... Dogs could behave very well during testing, but had previously killed a dog and seriously hurt the dog's owner. Similarly, Bräm et al. [33] assessed 3 provocation tests that were used in practice to evaluate the behavior and safety of dogs in Switzerland, and questioned the validity of each test. For example, an intact female dog passed one test without problems but later attacked another, neutered female dog involved in the experiment. ...
The inspection protocols of the Swedish police, based on the Act (2007:1150) on Supervision of Dogs and Cats, were used to examine the characteristics of 101 seized dogs, their owners, and the circumstances in which the attacks occurred. Most common reasons to seize a dog was that the dog owner was not following a previous order or ban, or that the dog had attacked and caused damage to humans or animals. The most common circumstances of the attacks involved dogs that escaped from gardens, unleashed dogs on walks and attacks by dogs on a leash. Bull breeds caused the highest number of injuries, the most serious injuries, and they were most often categorized as high risk, followed by Rottweilers and German Shepherds. Affenpinscher, Chihuahua, Cocker Spaniel, Japanese Spitz, Pug, Shih Tzu, Shetland Sheepdog and Golden Retriever were identified as victim breeds. The seized dogs had caused substantial harm to humans, animals, and their environment. The largest proportion of dogs returned to owners occurred in the Stockholm region.
... Other observational tests are used to evaluate aggressiveness in dogs, and many of these tests focus on disobedience, fearfulness, and stress reactions to distracting stimuli such as loud noises or sudden movements. Some evaluations, such as Switzerland's "Halterprufung," are designed to assess how capable handlers are at controlling their dogs, while others, such as the test of the Canton of Basel-Stadt, aim to protect humans and other dogs from aggressive behaviors and may even determine where a dog legally can and cannot be in public areas (Bram, Doherr, Lehmann, Mills, & Steiger, 2008). ...
Full-text available
Humans readily attribute personality and behavioral traits to dogs, and these attributions influence decisions about adoption. This study focused on how these attributions could be influenced by breed and pose by using pictures of four breeds (Doberman Pinscher, Golden Retriever, pit bull, and Rottweiler) in 4 poses (dog sitting alone, sitting with a human, standing alone, and walking on a leash with a human). Participants rated each picture on friendliness, aggressiveness, and adoptability. Eye-tracking technology identified which specific features were represented in each picture to determine whether they had any effect on the judgments. Although the Golden Retriever was seen as most adoptable, pose differences had many significant effects that could be useful for increasing the adoptability of all breeds. Data also revealed facial areas that attracted more attention (e.g., faster time to first fixation and longer fixation duration), particularly when the dog was alone. Focus on these areas could help to optimize photographs to present dogs in the friendliest, least aggressive, and most adoptable way.
... Crossbreeds of American Staffordshire Terriers with pit bull terriers appear more often. In many countries, bull-type terriers have to take social tests [Bräm et al. 2008]. These breeds have not any additional breeding requirements in Poland (except from the exterior assessment during the show). ...
... The classification Terrier attracted higher scores in relation to type-specific characteristics such as aggressiveness, as well as playfulness and fearlessness, in contrast to when the fictional breed was classified as a Toy. Despite research suggesting that breed type alone is a poor indicator of aggression towards humans (Sacks et al., 2000;Seksel, 2002;Delise, 2007;Braem et al., 2008), our results indicate that as a member of the Terrier breed group, there may be a bias towards perceiving breeds such as the Staffordshire Bull Terrier as manifesting higher levels of "aggressiveness." ...
Full-text available
A survey was designed to explore the effect of type classification on perception and expectation of a dog's behavior. The survey focused on two forms of presentation: the effect of visual image versus breed name in the identification of a breed as a dangerous dog type, and the effect of breed group classification on expectation of a dog's level of aggressiveness. The findings have serious implications for Staffordshire Bull Terriers. Respondents were over 5 times more likely to misascribe by image alone the Staffordshire Bull Terrier as a dangerous breed as defined under the United Kingdom's Dangerous Dogs Act 1991. Furthermore, the classification of Terrier attracted high scores in relation to type-specific aggressiveness. These findings highlight the need for more research on personal perception of supposedly dangerous dog breeds to better understand and explain this phenomena, leading to better protection of the public and better welfare outcomes for dogs.
... Recently, Sinn et al. (2010) evaluated a behavioral test for military dogs, providing evidence of test reliability as well as validity of this measurement instrument in use at the Department of Defense in the United States. The use of TTs that were not validated, as those currently used in legal contexts in Switzerland to evaluate the behavior and safety of dogs, could be misleading, as shown by Bräm et al. (2008): the tests proved to be inconsistent mainly in the evaluation of dogs that showed signs of potentially aggressive behaviors, thereby confirming that there is plenty of room for scientific research in this area. ...
Unwelcome behaviors in pet dogs may have serious implications for the quality of life of both the animals and their owners. We investigated owners' perceptions about their dogs' behavioral issues as well as other factors that might be predictive of potential canine problematic behaviors. We distinguished between "undesirable behaviors" (behaviors that were unpleasant to the owners) and "problematic behaviors" (behaviors that the owners found difficult to overcome).We designed an online survey eliciting information about owners, their dogs, their relationship with their dogs, and whether the animals exhibited any of 15 potentially problematic behaviors. The largest proportion of respondents (65%) reported that their dogs exhibited undesirable, but not problematic, behaviors and were not interested in their modification. Only 32% of the respondents considered the behavior to be both undesirable and problematic and wished to change it. The owners' perception of a problem was associated with reports of fear- and anxiety-related behaviors. The owner's gender, marital status, and attitude toward the dog as his/her child as well as the dog's age, size, age at acquisition, and breed emerged as robust predictors. Compared with all other behavioral categories, reported aggressive canine behaviors were 3 times more likely to elicit an owner's wish to address them. This study revealed that the behaviors of dogs may be perceived differently by their owners, and the type of perception may influence the owner's actual willingness to change those behaviors. Moreover, we identified the most robust set of factors that, either individually or combined, would help predict a dog's potential problematic behaviors and an owner's attitude toward them, which will be useful in improving rational prevention and treatment strategies.
Temperament tests have been created by a range of organizations and individuals in order to assess useful, predictable behavioral tendencies in working dogs and, increasingly, in companion dogs. For the latter group, such tests may help to select suitable pets from rescue centers or to identify those already in the population that are, or are likely to be, unsuitable as pets (e.g., those with behavior problems involving aggression). Unfortunately, many of these tests seem to have been developed without a systematic scientific approach. Perhaps as a result there are few reports of these tests in the scientific literature and even fewer that fully report their reliability and specific aspects of validity. This pattern is unfortunate, because the outcome of tests for companion dogs may have the potential to affect their welfare and survival. This paper attempts to encourage a more scientific approach to the development, conduct, and evaluation of temperament tests for adult companion dogs. Five key measures of the quality of a temperament test (purpose, standardization, reliability, validity, and practicality) are identified and explained in detail. Methods for the assessment of these qualities are given together with discussion of their limitations.
Relatively few studies have evaluated the effectiveness of standardized temperament testing in preventing the adoption of dogs with aggressive tendencies from animal shelters. The objective of this study was to evaluate the following hypotheses: (1) a percentage of dogs passing a standardized temperament test (i.e. not exhibiting aggressive tendencies) in an animal shelter will exhibit aggressive behaviors after adoption, and (2) these aggressive behaviors will be heavily weighted towards behaviors that may not be effectively simulated during a temperament test such as territorial aggression, predatory aggression, intra-specific aggression, and owner-directed aggression, rather than resource guarding or fear-related behaviors.
Aggressive behaviour in dogs is an increasing problem in The Netherlands. In an attempt to find a solution to this problem the Dutch Ministry of Agriculture, Conservation and Fisheries has financially supported a study aimed at developing an aggression test for dogs. The primary goal is to use the test as an instrument for excluding very aggressive individuals of certain breeds from breeding. On the basis of two pilot studies a test has been developed with 43 sub-tests in which a variety of stimuli are presented relating to contexts that are known to elicit aggression in dogs. In the final test, 112 dogs, 75 of which were potentially aggressive breeds (PAB) and a group of 37 “control dogs”, were tested. Questionnaires were used to collect information about the aggressive history of the dog. The results show clear differences in the aggression-eliciting properties of the sub-tests. Dogs with and without biting history differ significantly in their biting/attack behaviour during the test (Mann-Whitney U-test, P = 0.02). This difference is also found for only the PAB-dogs (MWU-test, P = 0.007). For reliability of analysis, 37 dogs were re-tested. The comparison between test and re-test shows a significant correlation for total attack (SPCC = 0.78) and biting/attack (SPCC = 0.68). So that the test can be implemented in practice, two “Models for Unacceptable Aggression (MUAs)” are discussed. To validate the results of the test and the application of the MUAs the results are compared with the biting history of the dogs. The results of an MUA based exclusively on the biting/attack behaviour shows a significant relation with the biting history for all dogs and for the PAB-dogs. On the basis of these results we consider the test to be a useful instrument for the assessment of aggressive tendencies in dogs, provided the test is performed by trained researchers or trained judges and test assistants.
Dogs show considerable variation in morphology, genetics and behaviour caused by long periods of artificial selection. This is evident in the large number of breeds we have today. Behavioural differences among breeds have often been regarded as remnants from past selection during the breeds’ origin. However, the selection in many breeds has, during the last decades, gone through great changes, which could have influenced breed-typical behaviour. In order to investigate this, breed differences were studied using data from a standardized behavioural test from 13,097 dogs of 31 breeds from the Swedish dog population. Based on the test results, breed scores were calculated for four behavioural traits: playfulness, curiosity/fearlessness, sociability and aggressiveness. These traits have previously been found to be stable and valid, and hence regarded as personality traits in the dog. The present results suggested large differences between breeds in all of the investigated traits, even though there were within-breed variations. No relationships between breed-characteristic behaviour and function in the breeds’ origins were found. Instead, there were correlations between breed scores and current use of the breeding stocks, which suggest that selection in the recent past has affected breed-typical behaviour. The breeds’ use in dog shows, the dominating use in general, was negatively correlated with all investigated traits, both in sires and in dams. In contrast, use in Working dog trials was positively correlated with playfulness and aggressiveness in sires. Thus, these results suggest that selection for dog show use is positively correlated with social and non-social fearfulness, and negatively with playfulness, curiosity in potentially threatening situations and aggressiveness, whereas selection for Working dog use is positively correlated with playfulness and aggressiveness. Furthermore, correlation analyses show that popular breeds have higher sociability and playfulness scores than less popular breeds, suggesting that a positive attitude towards strangers is an important characteristic of a functional pet dog and desirable by dog owners. This indicates that selection towards use in dog shows may be in conflict with pet dog selection. Furthermore, these results suggest that basic dimensions of dog behaviour can be changed when selection pressure changes, and that the domestication of the dog still is in progress. A standardized behavioural test, like the one used in this study, is suggested to be highly useful as a tool in dog breeding programs.
The Swiss German Shepherd Club (SC) has applied a standardised behaviour test for over 50 years. A successful test is a prerequisite for breeding approval. The aim of the study was to investigate the influence of external factors like socialisation, husbandry, training and others on the results of the behaviour test, and to verify if these results were still consistent after a year. The tested traits were self-confidence, nerve stability, hardness, sharpness, defence drive, reaction to gunfire, and temperament. Information about husbandry, training, socialisation, and the dog's behaviour in certain situations, etc. was collected by a questionnaire. From a total of 185 owners, 149 handlers with their dogs were willing to take part in this study. After 1 year, 38 dogs were tested a second time and their owners filled out another questionnaire very similar to the first one. Logistic regression analyses were used to measure the association between the results of the behaviour traits and the different external factors.Training of the young dog and contact with school aged children were significantly associated with one or more of the behaviour traits. Significant odds ratios were found for the associations between the puppy training and nerve stability and self-confidence, as well as between young dog training and the same character traits (nerve stability and self-confidence). A further positive association was found between defence drive and the contact of the dogs with school age children. Reproducibilities of the results of the behaviour test varied between traits, so the average scores for sharpness and defence drive significantly increased from the first to the second test, for temperament however, the scores decreased. Lower scores meant a more desired behaviour as rated by the club. The results of the other traits were similar in the two tests.
As a consequence of their living close to humans as pets, for working purposes or as laboratory animals, dogs give evidence of behavioural variability, stemming from their innate capacities as well as from environmental influences. This paper reviews the behavioural tests used for dogs—tests which serve as an evaluation tool and those which serve as a means of classifying individual animals. In search of a consensus and standardisation, some material and methodological aspects of behavioural testing in dogs were collected. Behavioural test parameters that were taken into account were the terminology of the temperament concept, the test quality requirements and their implementation in the literature, the characteristics of the dog tested (source, breed, age, sex), the characteristics of the social and environmental stimuli used to elicit canine behaviour, the characteristics of the behavioural variables collected and the characteristics of the physical and physiological concomitant data obtained while assessing the behaviour. This review brings to light a lack of consensus regarding all these parameters. The procedures of testing are often particular to the investigator and thus unique. We emphasised this statement by comparing six research studies using a ball, carried out over 40 years. In view of all these differences in methodology, standardisation is suggested through the creation of a reference manual.
Six specific personality traits – playfulness, chase-proneness, curiosity/fearlessness, sociability, aggressiveness, and distance-playfulness – and a broad boldness dimension have been suggested for dogs in previous studies based on data collected in a standardized behavioural test (''dog mentality assessment'', DMA). In the present study I investigated the validity of the specific traits for predicting typical behaviour in everyday life. A questionnaire with items describing the dog's typical behaviour in a range of situations was sent to owners of dogs that had carried out the DMA behavioural test 1–2 years earlier. Of the questionnaires that were sent out 697 were returned, corresponding to a response rate of 73.3%. Based on factor analyses on the questionnaire data, behavioural factors in everyday life were suggested to correspond to the specific personality traits from the DMA. Correlation analyses suggested construct validity for the traits playfulness, curiosity/ fearlessness, sociability, and distance-playfulness. Chase-proneness, which I expected to be related to predatory behaviour in everyday life, was instead related to human-directed play interest and non-social fear. Aggressiveness was the only trait from the DMA with low association to all of the behavioural factors from the questionnaire. The results suggest that three components of dog personality are measured in the DMA: (1) interest in playing with humans; (2) attitude towards strangers (interest in, fear of, and aggression towards); and (3) non-social fearfulness. These three components correspond to the traits playfulness, sociability, and curiosity/fearlessness, respectively, all of which were found to be related to a higher-order shyness–boldness dimension.