Content uploaded by Mieszko Tałasiewicz
All content in this area was uploaded by Mieszko Tałasiewicz on Dec 22, 2018
Content may be subject to copyright.
Evidence and Explanation in Theorizing About Semantics
published in: Ł. Bogucki, P. Cap (eds), Exploration in Language and Linguistics
Peter Lang, Berlin 2018, pp. 51-81
The paper deals with the problem of truth-conditions of the so called donkey sentences (existential versus universal
reading). It starts with a new interpretation of results obtained by Bart Geurts in his Donkey Business. This interpretation
says that the results may be explained by considering only the existential reading. Further on, new experiments are
presented, devised with the aim to gather more data about universal reading and to identify more potential motivations for
particular answers of the informants. The results strengthen the initial presumption that donkey sentences do not have
strictly established truth-conditions for multiple donkeys situations and that correct truth-conditions for these sentences
might be – and should be – stipulated by some semantic theory based upon certain general principles. Universal reading
and existential reading of these sentences are explanations rather than evidence. Consequently, donkey sentences should
not be taken for a test for semantic theories. This fact, strengthened by an array of similar observations from other areas of
linguistics conveyed by Carson Schütze, can be generalized to suggest that the theory of language – even the most empirical
part of it – still leaves much space for philosophical interpretation.
Omnis homo habens asinum videt illum
Let us consider donkey sentences with relative clause, i.e. sentences of the form:
(1) Q FARMER who OWNS a DONKEY BEATS it
where FARMER, DONKEY, OWNS and BEATS are lexical items replaceable with other items of the
appropriate semantic category and Q is a quantifier (‘every’, ‘most’, ‘some’ etc.).
These sentences are widely recognized to be problematic because of the interpretation of the
anaphoric pronoun ‘it’. One sort of problems concerns the truth conditions of such sentences.
If Q =
‘Every’ and a farmer owns many donkeys but beats only some, not all, of them – is (1) true or false?
Or is it ambiguous, between a strong (universal) reading: Every farmer beats all the donkeys he owns,
and a weak (existential) reading: Every farmer beats some of the donkeys he owns? If so, what
triggers the particular readings?
There are many accounts on the market: E-type/D-type (Evans 1977, Elbourne, 2005), discourse-
representational (Heim 1982, Kamp 1981), Russellian (Neale 1990), dynamic (Chierchia 1992,
Kanazawa 1994), context-dependent-quantificational (King 2004), partitive (Brogaard 2007), in-scope
binding (Barker and Shan, 2008), relevant-functional (San Gines and Frapolli, 2017) – to name just
some of them. A popular criterion of evaluation in the competition among them is their adequacy,
This is not the only sort of problems with donkey sentences. See for instance (Grosz et al., 2015). Others are
beyond the scope of the paper, though.
understood as yielding the correct truth-conditions for donkey sentences, and a popular charge
against opponents in the debate is that their accounts ‘make wrong predictions’ about the truth-
conditions or ‘are inadequate to explain’ the distribution of readings of these sentences.
The problem is – and this is one of the main points to be argued for in the present paper – that we do
not have anything like a sound understanding of what the truth-conditions of the donkey sentences
The question of the truth-conditions for donkey sentences was first posed already in the Middle Ages
and the answer was that existential reading was the intended one (Burleigh ca. 1328 (1988), Geach
1962, Parsons 1994). To the contemporary debate the problem has been introduced by Peter T.
Geach, who in 1962 maintained that the obvious answer is universal reading. He asserted:
[Let us consider] the pair of propositions:
(12) Any man who owns a donkey beats it
(13) Some man who owns a donkey does not beat it
Medieval logicians […] argued that a pair such as (12) and (13) could both be true [in the case in which ‘each
donkey-owner had two donkeys and beat only one of them’] and were therefore not contradictories. But plainly
(12) and (13), as they would normally be understood [stress...], are in fact contradictories; in the case supposed,
(13) would be true and (12) false (Geach 1962; quotation after 1980 edition, pp. 143-145).
Geach’s claim poses a number of methodological questions, though. What are the grounds for
discharging medieval interpretations? How can we know that the sentences in question really do
have the interpretation that Geach says they have? Geach appeals to some ‘normal understanding’
but it is not at all clear what it is supposed to mean. There are at least two options available. First, it
is an appeal to a common-sense understanding of the truth conditions of the sentences in question.
Second, it is an access to some salient or allegedly obvious theoretical account of truth-conditions,
yielding normatively some default understanding of these sentences.
Geach himself remained silent about the methodological aspect of his disagreement with Burleigh &
Co about donkey sentences. It is most likely though that he had the second option in mind, as he
repeatedly expressed his general disregard for the explanatory power of the sense of naturalness for
common language-users. He used to treat articles in English as ‘logically vacuous’, accepted
sentences like ‘Horse which Alexander rode lived thirty years’ as well-formed and despised what he
called after Arthur Prior the ‘idiotism of idiom’.
A sudden admiration for common-sense data, out of
the blue, would hardly fit this general picture.
However, it is clearly the first option what is being explored in subsequent literature. And this yields
an array of further questions. To begin with: how common is this alleged common-sense
interpretation in fact? It wasn’t common among medieval logicians. Is there any particular reading,
or any particular distribution of readings common to all language-users, such that we can tell it is a
verdict of the competence about the truth conditions of donkey sentences? This is apparently an
A notable exception is (King 2004) who acknowledges that what the truth conditions of donkey sentences in
fact are is a controversial matter.
Cf. e.g. (Geach, 1975)
empirical question. But before we can try to answer this question we need to decide important
First, what exactly shall we try to measure if we want to get in touch with the linguistic competence
of the community rather than express our own theoretical preferences? We cannot just run an
armchair style thought experiment, which has been justly criticised as showing the personal bias of a
particular researcher rather than the linguistic competence of the community (Schütze 1996). And
we cannot just ask people what reading they have in mind on a particular occasion, because – as it is
well known – there is a danger of substituting speakers’ capability to rationalize their linguistic
behavior (their linguists’ or folk-linguists’ competence, so to speak), with their linguistic
It means that even principled judgments about truth conditions of donkey sentences
would not necessarily count as independent, competence-provided data against which linguistic
theories are tested, as they can turn out to be rather theory-laden interpretations, themselves in
need of being tested. We have to find some way to circumvent such rationalizations and get straight
to the point. We need to ask whether certain sentences are true or false in certain situations, and
generalize from the answers, rather than ask what reading they have, since ‘reading’ is already a
But this is not all we need. Suppose we have established some common understanding of the truth
conditions of donkey sentences by getting some true/false answers. How can we know that this
common-sense understanding of truth conditions faithfully reveals actual truth conditions of these
sentences? How can we know that it is not biased by some truth-conditionally irrelevant factors? We
can point out now at least one well-known reason to suspect that there can be irrelevant factors
interfering in our intuitive judgments (further in this paper we will identify more such reasons).
Namely, pragmatic inappropriateness is often mistaken for semantic falsity.
It means that felt
differences in conditions of acceptability of sentences, say, Every man who had a credit card paid the
bill with it and Every man who had a credit card kept it in a safe place
, cannot be taken as warranting
that there is really a semantic difference between the truth-conditions of these sentences. Indeed,
we normally pay the bills with one credit card per bill and, indeed, it is a wise thing to do to keep all
the credit cards we have in a safe place. It would be somehow irrational to care about one card and
pay no attention to what happens to others. But does this common-sense attitude to the dangers of
theft ipso facto set universal truth-conditions for the sentence Every man who had a credit card kept
it in a safe place?
Since Geach not much research has been conducted to find out what really linguistic competence
says about truth-conditions of donkey sentences with proper treatment of the abovementioned
concerns. There is a lot of work concerning donkey sentences, whether just in passing by or with
particular concentration on them, but about their readings usually an appeal to some intuitions is all
there is, often accompanied by phrases conveying some lack of definitive commitment: ‘it is widely
agreed that’,, ‘it is pretty uncontroversial’, ‘intuitively’, ‘appear to have’, ‘it seems pretty clear’, ‘I am
‘[M]eta-linguistic awareness [...] is a special kind of language performance, one which makes special cognitive
demands, and seems to be less easily and less universally acquired than the language performances of speaking
and listening’ (Schütze 1996: 57, after C. Cazden).
Just for one example of many, see (Banga et al. 2009:1).
Cf. e.g. (Brogaard 2007: 421), (San Gines and Frapolli, 2017).
tempted to think’ and so on. Regardless whether researchers just pick one reading that is handy for
their general purposes or agree that donkey sentences are ambiguous between strong and weak
reading and would try to account for the factors that trigger one or the other reading, a favored
reading or distribution of readings would be taken as granted, as if it were just a bare fact of the
matter, waiting to serve as evidence for our theories or to be explained by some plausible account of
For example, Kanazawa (1994:151) claims that “the main question of [his] paper
[is] the question why it is that donkey sentences have the interpretation they do have. Why is it that
the weak reading and the strong reading are distributed in the way they are…”.
My question is: are the readings distributed in the way Kanazawa says they are? My intuitive
judgment differs very much from his, and I find it difficult to concede that ‘what type of object a
given use of an anaphoric pronoun can take as its value is accessible to direct intuitions’ (Kanazawa
2001: 400). The aim of the paper is to explore the possibilities of finding out, rather than assuming it
or stipulating, what really are the truth conditions of donkey sentences. I will analyze some empirical
work in this subject and present my own experiments leading to somewhat moderately pessimistic
conclusion that donkey sentences just don’t have theory-independent, competence-based truth-
conditions for multiple donkeys situations. It might well be the case that their truth conditions must
be stipulated by some general semantic theory rather than just discovered through our linguistic
competence. If so, the distribution of readings of these sentences cannot play the role of evidence
for a semantic theory.
2. Bart Geurts: Donkey Business
As a notable, insightful work towards empirical testing of what competence says about donkey
sentences we shall count Bart Geurts’ Donkey Business (2002), which is much better scrutinized than
it is usually the case and, in spite of some flaws in the interpretation of experiments, to which I shall
refer below, reveals some really inspiring results (in fact, in my opinion, the results are much more
significant than the author realizes).
Geurts takes three steps in methodological scrutiny. First, he does not try to imagine himself which
reading fits better in (1) on different occasions but rather asks his informants. This reduces the risk of
substituting the professional competence of a linguist for the linguistic competence of the
community. Second, he does not ask them about quantifiers, but about the truth values of selected
donkey sentences in specially designed situations; quantifier reading may be only inferred from the
answers (although this inference is not trivial, and we will see that Geurts has made some mistakes
at this step). This allows for the separation of the problem of linguistic competence from the problem
of logical competence. Third, he does not describe the situations but presents them using pictures.
This reduces the bias that the researcher may pass unintentionally to his or her informants with the
mode of description. The possibility of biased results undermines – Geurts argues – the reliability of
the experiments reported in Yoon (1996), in which the relevant situations were described verbally.
Yoon established some correlation between the distribution of universal reading and existential
Cf. e.g. (Dever, 2004), (van Rooij, 2006).
This is not to say that donkey sentences cannot serve as evidence at all. There is still a vast field for discussion
of the availability of syntactically acceptable transformations of them, under different accounts. For example
see a discussion of weak donkey crossover in (Barker and Shan, 2008). Well-formedness perhaps is ‘accessible
to direct intuitions’ of competent speakers. Anyway, it would require an entirely different study to decide.
reading and some linguistic properties of the predicates standing in the BEATS position, namely
<stative>, <episodic>, <partial> and <total>. However, according to Geurts, we cannot be certain that
this correlation is not an artifact of the experimental setting (verbal description of the situations).
In his own experiments Geurts asked his informants about the truth-value of the following
(2) Every boy that is standing next to a girl holds her hand
(3) Every railway line that crosses a road goes over it
in situations depicted respectively in the following pictures
What is diagnostic in these pictures is the third element in each. A boy stands next to two girls but
holds hands only with one of them, respectively a railway line crosses two roads but goes over only
one of them. The expectation was that informants preferring existential reading would judge
respective sentences true in depicted situations; whereas those preferring universal reading would
be inclined to judge the sentences false (because the third element of each picture falsifies the
Actually, Geurts asked about more sentences, but the following two, being at the opposite ends of some
gradual scale, were the most informative. Geurts highlighted the two, I will confine my analysis to them –
except in the note further in this section.
Pictures are photocopied from the original article.
sentence in universal reading). More precisely, the expectation should be like that. In fact Geurts
expressed a stronger expectation. He argued (2002: 138) that the fact that informants ‘deem the
sentence false […] implies that they assign it a universal reading’. This implication does not hold,
however. It is because a donkey pronoun, as it is reported in literature, can carry an implication of
uniqueness (for further discussion: cf. e.g. Kadmon 1990:302, Brasoveanu 2008:267-268, (Grosz et
al., 2015)). The basic intuition behind the notion is that the singular pronoun ‘it’ crucially refers to a
single owned donkey and that sentences containing such a pronoun can be properly uttered only in
contexts containing just one donkey per farmer. The contexts accompanying the sentence that do
not fulfill the uniqueness requirement (i.e. such that there is more than one donkey per farmer in
them) are prone to elicit judgments of pragmatic infelicity and can result in informants rejecting the
sentence as false. Thus, in the situations depicted above, we cannot infer from the answer ‘false’ to
the universal reading, for it is possible that such an answer indicates rather that the informant
refuses to withdraw the implication of uniqueness.
All we can establish is a one-way dependence: those who prefer universal reading would judge the
sentences false (but not necessarily the other way round), which suggests that in search of universal
reading we must restrict ourselves to reductive reasoning. This does not mean, however, that the
results of such reasoning are invalid or unimportant.
Surprisingly, the results were very different for boys/girls than for railway lines/roads. It turned out
that almost everyone judged sentence (3) false against the depicted situation (90% false, 5% true, 5%
no answer). In comparison, sentence (2) was judged true by 65% and false only by 35%. Such results
are meaningful even after amendments concerning the factor of implication of uniqueness. If we
assume that this factor is rather equal in both cases, we obtain a substantial difference in preference
for universal reading – or so it may appear. How should this difference be explained? What factor is
responsible for it?
What is crucial in Geurts’ analysis of his experiment is the observation that the factor responsible for
the difference is not of a linguistic, but ontological nature: it is a force of identity of farmers. Geurts
observes (2002:141) that informants may – on the level of mental representation of the scene – just
split one farmer into many. In particular, they may count twice the farmers that own two donkeys so
that there will be just one donkey per farmer. That informants would do this is the more likely the
less definite is the farmer’s identity. Railway lines have weak identity: a line from Amsterdam to
Paris, says Geurts, may be easily split and counted as two lines – one from Amsterdam to Brussels,
the other from Brussels to Paris. Boys – and people in general – have much stronger identity and it is
not so easy to split them, even in one’s mental representation of the scene. Yet, it is possible –
informants may count them as characters (Geurts 2002:143). Fred, the boy standing next to two girls,
may be counted as Alice’s neighbour and as Betty’s neighbour, separately. Splitting boys is thus less
Geurts examined not only regular donkey sentences with the farmer quantifier ‘every’, but also donkey
sentences with different farmer quantifiers: ‘some’, ‘no’ and ‘not every’. What is somewhat perplexing is that
he did so using the same pictures as for regular donkeys and claimed (Geurts 2002:137) that uniform true
answers in the case of sentences with ‘some’ and uniform false answers in the case of sentences with ‘no’
suggest that ‘subjects obtain existential readings for both types of sentences’. That conclusion is plainly
baseless. Against such pictures as used by Geurts both readings would yield true with ‘some’ (because there are
some farmers that BEAT all donkeys they own) and false with ‘no’ (for the same reason).
likely than splitting railway lines, but it may happen. After all, there are contexts in which we count
people as characters very naturally, e.g. when we say that an airline company had five million
passengers last year: we do not mean that there were five million distinct persons, for some people
might have travelled with this airline several times, each time counted as a new passenger.
Splitting farmers in fact changes the picture against which the informant judges the sentences. The
diagnostic third element in each picture – in the informant’s mind’s eyes – divides into (A) and (B):
Fig. 3. Splitting railway lines
Fred, Betty’s neighbour Fred, Alice’s neighbour
Fig. 4. Splitting boys
Because of the new-emerging (B)-type situations the sentences are judged false; Geurts finds himself
bound to the conclusion that the preference for universal reading depends on the ontological status
This conclusion is far too modest, though. If such ontological adjustment as described by Geurts
really takes place there emerges a new element of the testing situation: an element in which there is
one farmer that owns one donkey and does not beat it (see Fig. 3 B, 4 B). That falsifies the donkey
sentence even in existential reading and thus the answer false no longer carries anything like the
suggestion of universal reading. This means that Geurts’ findings show not that the preference for
universal reading depends on ontology, but that we don’t need universal reading at all for the
explanation of the distribution of acceptability judgments. All evidence seemingly suggesting the
contrary may be reinterpreted according to Geurts’ hypothesis as showing preference for splitting
farmers rather than for universal reading of donkey sentences. Welcome back to the Middle Ages!
3. Towards a new experiment. The starting point
Theoretically, it could be the case that there is just a mere correlation between the availability of the
mental split and the tendency to obtain the universal reading and that the real reason for ‘false’
answers in the informants’ minds still is the latter but not the former. If such a correlation held
universally – for every donkey sentence and every context the tendency to obtain the universal
reading occurred if and only if the mental split were available – that would mean that for explanatory
purposes one of these concepts: mental split or universal reading, is redundant, or even void and we
don’t need the concept of universal reading for explanation, since we can use the mental split.
Besides, such a correlation would be quite a mystery. Geurts showed a very natural mechanism
behind the availability of the mental split: the force of identity of objects. The firmer is the identity
the less available is the mental split; the looser is the identity, the more available is the mental split.
That fits our expectations about the world: ontological integrity depends on the force of identity. I
can see no comparably natural mechanism responsible for the alleged correlation between the force
of identity of objects and the reading of the quantifier.
However, it might be so that such a correlation holds incidentally just in Geurts-type experimental
settings and that in principle one could look for a different setting in which the tendency for universal
reading would not coincide with the availability of the mental split anymore (and thus would be
empirically detectable). This is the starting point for an idea of devising a new experiment that would
differentiate universal reading from the preference for splitting farmers.
The task isn’t trivial, though. Prima facie, one cannot test donkey sentences in which Q = ‘every’ in
experiments of this sort, because splitting farmers always falsifies even existential reading in such
sentences, provided there is a diagnostic element in the setting at all (if there is no such element, on
the other hand, such a setting would never falsify universal reading, which is equally uninformative).
It seems, however, that there is some promise in trying sentences of the form:
‘Subjects [that split the FARMERS] will reject the sentence, hence obtain the -reading’ (Geurts 2002:142).
(4) Some FARMER who OWNS a DONKEY BEATS it
Admittedly, some theorists (e.g. Kanazawa 1994) maintain that donkey sentences in which Q =
‘some’ never have universal reading on the donkey quantifier, as it would be inconsistent with some
principles underlying their accounts (notably the principle of preservation of certain inferential
patterns due to the monotonicity properties of the determiners in Kanazawa’s account).
Yet it is
still worthwhile to check and see whether the language users in fact tend to avoid the universal
reading on the donkey quantifier when Q = ‘some’, or not. Moreover, the accounts that exclude the
possibility of universal reading in donkey sentences with Q = ‘some’, by no means exhaust the pool of
available theoretical options. There are different sets of competing principles on the market, many of
them freely allowing for universal readings of donkey quantifiers together with existential readings of
farmer quantifiers. Notably Chierchia’s view (1992), which is based upon the monotonicity properties
as much as Kanazawa’s is (albeit focused on different inferential patterns), gives substantially
different predictions than Kanazawa, including the universal reading for donkey sentences with Q =
‘some’ (and for all donkey sentences with right upward monotone determiners). Krifka (1996) adopts
a still different principle (‘stronger interpretation preferred’), which also contradicts Kanazawa’s
proposal. Neale (1990) doesn’t agree with Kanazawa in this point, either.
Let it suffice for supporting the claim that there are no overwhelming arguments a priori against
consideration of donkey sentences with Q = ‘some’ in search of the universal reading.
4. Pilot-test. The discovery of donkey merging
Let me in this point diverge from the standard paradigm of presenting experiments. For the task I set
for myself was not just to gather some new data through a standard, well established procedure,
commonly acknowledged as appropriate for such and such kind of data-gathering, but rather to find
out what procedure would be appropriate for gathering this kind of data we need. Thus it might be
profitable for the reader not just to look at the experiment from the endpoint, when we can ex-post
phrase our expectations and explain the details of the procedures, but rather to see how the idea of
the experiment evolved in the course of research, due to some discoveries made on the way.
I began my inquiry in a loose way, on a small probe of participants with whom both the form of our
questions and the motivations of their answers were discussed freely. At this early stage the test was
supposed to provide some inspiring heuristics and identify relevant factors that might have an
impact on the interpretation of this sort of answers rather than provide the most reliable empirical
data. I’ve focused on the problem how to formulate the questions in order to reveal the universal
reading from under covering factors. I decided that the best way to go is presenting to the
Another popular reason to deny the possibility of the universal reading on the donkey quantifier when Q =
‘some’ is the belief that Q in donkey sentences is an unselective quantifier, which binds all variables in its
scope, farmer variable and donkey variable alike. Thus we can possibly have either or . This idea is
common on different accounts, such as Lewis (1975) on one side, and Kamp (1981) and Heim (1982) on the
The plausibility of Kanazawa’s view depends, inter alia, on the assumption that ‘[...] the intuitions about the
validity of [...] inferences [...] are stronger than the intuitions about the truth conditions of individual donkey
sentences’ (Kanazawa 1994: 149-150). It is a very interesting claim, and perhaps true. No evidence has been
called upon to substantiate this claim, though.
informants donkey sentences in the form of (4), differing as to the force of identity of farmers and
donkeys, and confronting them against two different situational settings each. A combination of
judgments of the truth value of a given sentence in both situations was meant to be diagnostic.
There was a couple of versions of sentences and pictures, eventually I focused on the following two
The sentences were:
(5) Some boys that hold a balloon hold it in their right hand
(6) Some railway lines that cross a road go over it.
The pictures were as on the Fig. 5:
Fig 5. The pilot study. In a situation of type A all farmers own many donkeys and beat some but not a ll of them. This is an -
excluding situation type (informants who would prefer universal reading should judge the donkey sentence false in such a
situation). In B-type situations all farmers own many donkeys, but at least one of them beats all donkeys he owns. This is an
-consistent situation type (informants who would prefer universal reading are free to judge the donkey sentence true in
such a situation).
I considered four possible rational motivations for answers: preference for existential reading,
preference for universal reading, preference for donkey uniqueness and preference for a mental
split. The setting, as I believed, allowed for singling out the preference for universal reading, if there
were any. According to these initial considerations:
- Splitting farmers would erase the difference between the situations of type A and B, because after
the split there would emerge in an A-situation a new farmer who meets the conditions for a B-
situation. Splitting of course fulfills the requirement of uniqueness, too. For every farmer there
would be one donkey such that the farmer owns the donkey. Thus, if an informant split the farmers,
he would say YES in both cases: A and B. Splitting, as we have established, is consistent with there
being no universal reading.
- Existential reading for donkeys, with or without splitting, would yield the same result.
- Preference for uniqueness would either force the informants to split farmers or motivate them to
say NO in both cases (since in both A-type and B-type situations there is more than one donkey per
- Preference for universal reading should exhibit itself in combination [A-NO, B-YES] since A is an -
excluding situation type and B is an -consistent situation type. A-NO says that it is neither farmer
splitting nor a preference for existential reading; B-YES indicates that it is not a preference for
Summing up, such considerations led me to expect that:
[A-YES, B-YES] would mean either plain existential reading or farmer splitting;
[A-NO, B-NO] would convey an informant’s preference for uniqueness;
[A-NO, B-YES] would suggest universal reading;
[A-YES, B-NO] would be contra entailment since A-YES logically entails B-YES (and thus it should be
The participants were students recruited from my classes as well as my colleagues, friends and
members of family, and their colleagues and friends. The test was conducted on them in varying
forms in a period of a month. Usually we used printed questionnaires containing all diagnostic items
mixed with some neutral fillers. The diagnostic items were the pairs depicted on the Fig. 5 or
, as well as Geurts’ original tests, as I considered it a good opportunity to check whether
his results could be confirmed.
The outcome would take the form of notes and comments on individual responses rather than actual
figures suitable for statistical analysis. Eventually I’ve obtained 59 full answers for questionnaires
containing diagnostic items and Geurts’ cases.
As to the Geurts’ original setting, indeed, the general tendency was confirmed, although the spread
between boys-cases and railways-cases was not as big as in his results: in my study sentence (3) was
judged true by 3 participants (5%), false by 55 participants (93%) and one person (2%) didn’t give an
answer; sentence (2) was judged true by 17 people (29%), false by 38 (64%), and 4 people (7%) gave
According to the diagnostic items I’ve obtained the following results:
Before the boys/balloons paradigm became established, we tried several other versions of strong objects,
like girls holding cats on a leash etc. The point is that we couldn’t use Geurt’s original version with boys holding
hands with girls. We needed farmers owning more than two donkeys; boys standing next to more than two
girls would be hard to depict (and possible pictures could be hard to interpret). That’s why we had to find
something else and we tried several options before sticking finally to boys and balloons. As we found nothing
special in any of these particular versions we finally lumped together the records of them. Weak objects were
always railway lines and roads, as in Geurts.
On railway lines and roads there were 35 people (60%) answering [A-YES, B-YES], 12 people (20%)
answering [A-NO, B-YES], 6 people (10%) answering [DON’T KNOW], 3 people (5%) answering [A-YES,
B-NO] and 3 people (5%) answering [A-NO, B-NO].
On boys and balloons (or equivalent) there were 37 people (63%) answering [A-YES, B-YES], 2 people
(3%) answering [DON’T KNOW], 3 people (5%) answering [A-YES, B-NO] and 17 people (29%)
answering [A-NO, B-NO]. No one answered [A-NO, B-YES] in this case.
Although I haven’t submitted them to a serious statistical analysis, these results surprised me in two
First, there appeared a small but solid block of [A-YES, B-NO] answers which I had classified before as
contra entailment; and the answers were from people who had otherwise exhibited a sound rational
pattern of reasoning. Second, the combination [A-NO, B-YES] that I had classified as showing genuine
universal reading appeared but in a surprising place and proportion. Since this preference shows only
when the informant does not split the farmers, it should be more likely for strong objects than for
weak ones, namely for boys rather than railway lines. But it happened to be the converse: such
preference hardly showed itself at all in boys-cases, but climbed up to 20% in railways-cases.
In search for an explanation I made two observations.
First, the informants tend to shift the uniqueness requirement from donkeys owned to donkeys
beaten. If, for instance, some boy holds several balloons, but only one in his right hand, the answer
was YES for the very reason that there was one unique balloon - not unique in the whole situation,
but unique in the boy’s right hand. In our setting there are boys in situation A who hold just one
balloon each in their right hands and, because of these boys the whole situation meets the shifted
requirement, whereas in situation B all boys hold many balloons in their right hands and, therefore,
the situation violates even the shifted requirement (respectively for railways and roads). Thus, the
combination [A-YES, B-NO] turned out to be perfectly rational, despite earlier considerations, being a
variant of the preference for uniqueness.
Second, I’ve observed that what matters for the informants is not only the ontology of farmers, but
also of donkeys. Geurts’ experimental setting depended only on the force of the identity of farmers.
Mine was dependent also on the force of the identity of donkeys. As I now learned, ontologically
weak objects can be not only split, but also merged. The informants, in their struggle for uniqueness,
were not only splitting farmers but also merging donkeys. They were adopting such representations
of the scenes in which a given railway line crosses just one road that is meandering and goes over or
under the railway line many times.
Fig. 6. Donkey merging
Now, if in some case such a meandering road every time went under the railway line – as in situation
B, the respective donkey sentence could be judged true: B-YES. If, on the other hand, in every case
such a road went over the railway line at least in one turn – as in situation A – there was simply no
whole road that the railway line went over – hence A-NO, in full agreement with existential reading.
Such a scenario is certainly much less likely for boys and balloons, which explains why the
combination [A-NO, B-YES] hardly showed for strong entities.
These observations explained the outcome of the experiment but they also showed that the setting
was wrong for my purpose. The shift of the uniqueness requirement can be easily handled by some
minor amendments to the pictures, but the second observation shows a much deeper level. It shows
namely that there is another, hitherto unnoticed, kind of possible motivation for answers: donkey
merging – apart from the existential reading, the universal reading, preference for uniqueness and
farmer splitting – and because of this possibility the combination [A-NO, B-YES], without further
consideration, does not indicate universal reading anymore. We can still explain all the results
without resorting to the hypothesis that some informants exhibit preference for universal reading.
Thus, I needed another experiment, in which I could control ontological manoeuvring and pragmatic
5. Main experiment
5.1. Methods and design
The sentences were the same as before – (5) and (6) – but this time the pictures were different. In
particular, I made sure that uniqueness couldn’t be shifted from donkeys owned to donkeys beaten.
Fig. 7. Main study. Situations A are ∀-excluding, situations B are ∀-consistent, as before. There are no single donkeys and
uniqueness cannot be shifted. The boys are shown with their backs turned to avoid doubts which hand is the right hand.
They look different to prevent merging of farmers (just in case).
I took into consideration six possible motivations for answers. Five of them have already been
discussed: preference for existential reading, preference for universal reading, preference for
uniqueness, farmer splitting and donkey merging. The sixth was the bias towards the ‘yes’ answer.
Not all possible motivations could be properly distinguished by the test. Some combinations of
answers would indicate several alternative motivations. However we can separate the combinations
suggesting the universal reading and attempt to group the rest under two plausible headings,
provided we run the diagnostics not by looking at pairs of answers, but at quadruples. In total, we
get sixteen theoretically available combinations, which can be grouped into three meaningful
categories: universal reading, existential reading and pragmatic reading – and one slot for
combinations to be rejected as meaningless (or at least uninformative for the present purpose):
Cf. Cronbach 1942.
1. sentence (5) with strong objects
(boys & balloons)
2. sentence (6) with weak objects
(railway lines & roads)
Fig. 8. Possible answers
1. Universal reading (UR). The diagnostic pattern [A-NO, B-YES] stands – as we have established
– for either ∀-reading or donkey merging. If it shows on strong objects merging is unlikely; thus it
suggests ∀-reading when accompanied on weak objects by the same pattern (combination II) or by
[A-YES, B-YES], which indicates farmer-splitting on weak objects (combination I). Other possible
outcomes on weak objects are to be rejected as uninformative (see below).
2. Pragmatic reading (PR). This heading groups all combinations that show pragmatic
preference for uniqueness (pattern [A-NO, B-NO] on strong objects). This general preference is
accompanied by different manipulations on weak objects in different combinations. The most
resolute uniqueness lovers would say 4xNO (combination V), those more apt to make some
ontological maneuvres in order to get their uniqueness would split weak farmers (combination III) or
merge weak donkeys (combination IV).
3. Existential reading (ER). This category encompasses a real preference for ∃-reading together
with only a bias towards the ‘yes’ answer and a liberal attitude towards ontology (splitting farmers or
merging donkeys). We can assign to this category the combinations VI and VII. In each of them the
pattern [A-YES, B-YES] shows on strong objects. In VI it also shows on weak objects, in VII there is
donkey merging on weak objects. The pattern [A-YES, B-YES] shows on strong objects in
combinations VIII and XV, too, but I’ve decided to classify these combinations differently (see below).
4. Rejected combinations (REJ). Generally, [A-NO, B-NO] for weak objects, which are more
susceptible to ontological modifications than strong objects, seems to be inconsistent with any YES
on strong objects. I don’t have any explanation to the pattern [A-YES, B-NO], wherever it shows,
Those combinations – to some extent – may reflect just the level of the noise. Or they are
due to some uncontrolled pragmatic effect, connected with something within the scenes or with
some aspect of the mode of presentation (such as the order of pictures in the questionnaire).
It may seem that the assignment of combinations to the ‘existential reading’ is somewhat arbitrary,
as 4xYES answer may well be motivated by the pragmatic preference for uniqueness resulting in
splitting farmers rather than by logical preference for existential quantification. On one hand this is
not disastrous (as possible misfits in this area do not affect the most interesting category of universal
reading), on the other hand, though, it might be worthwhile to take some steps to discriminate
somehow between truly logical and covertly pragmatic motivations in this category.
I decided to check this using a certain priming, namely triggering a logical pattern of responses in a
proportion of the participants by instructing them about truth-conditions for existential quantifiers
against scalar implicature
(including information that ‘some’ is consistent with ‘all’). Explicit logical
instruction would strengthen logical motivation rather than pragmatic one and increase the number
of logically-motivated answers in the instructed group. Roughly: if combination VI is more popular
among instructed participants than among uninstructed ones, it suggests logical rather than
pragmatic motivation behind it.
5.2. Participants, materials and procedure
The experiment in the form of an interactive questionnaire was placed on the Kognilab website.
The research was conducted entirely online by using a free Lime Survey tool. All the questions were
related to pictures, drawn in an accessible manner, black and white, simple but not schematic,
devoid of any details or additions that could in any way bias basic understanding (see above). For
every question there were three answers available: “YES”, “NO”, “DON’T KNOW”. Subjects had to
tick the preferred answer for each question. Only one picture-sentence pair was displayed on the
Uniqueness requirement shifting, which explained this pattern in the pilot-test, is no longer available here.
Scalar implicature is a phenomenon within conversational implicatures (cf. Grice 1989, Horn 1972 and much
subsequent work) that is manifested by people favoring a non-logical (not consistent with logical
understanding of quantifiers) reading of sentences like:
(7) Some girls are holding flowers
when in the accompanying pictures all the girls shown are holding flowers. Most importantly, scalar implicature
results in people judging such sentences false (for contrast: in terms of classical logic some is consistent with
all). The reason for such judgment is a conversational assumption that if a speaker had seriously wanted to
convey the information that all the girls were holding flowers, he would have said so explicitly using a proper
quantifier word (‘all’). Thus, if he is using a weaker quantifier (‘some’), he certainly must have an honest
intention to convey the information that some but crucially not all the girls are holding flowers.
Without additional tests and very sophisticated statistics it would be hard to determine how exactly all the
possible motivations grouped here under the ER heading are distributed. All I’ve got – and all I’ve needed – is
just a suggestion that logical component is still there, solid enough to make a difference after priming.
screen at the same time and returning to the questions that had already been answered in this
scenario was not possible. No time limit was imposed on the subjects.
The sentences were in Polish.
We had assumed that the results obtained for Polish would hold for
There were two versions of the questionnaire, assigned to participants on a random basis (there was
no further randomization, particularly the order of questions within the questionnaires was
). The donkey part, consisting of donkey questions and attention checkpoints (see below),
was identical in both, but only one version was preceded by explicit logical instruction about the
truth conditions for existential quantification. Both versions contained also a simple test checking
whether the informants understand existential quantification logically or pragmatically in the form of
the following sentence-situation pair (Fig.9):
Fig. 9. Some girls that hold a flower wear a hat.
This is not a donkey sentence (although it sounds similar) and it has a definite logical reading: against
the depicted situation it should be judged true - unless we abandon logic and move on to pragmatics:
according to scalar implicature (if ‘all’ then not ‘some’) this sentence is inappropriate (read: false) in
the depicted situation. Now, people who were not instructed and answered ‘YES’ in this point were
‘Niektórzy chłopcy, którzy mają balonik, trzymają go w prawej ręce’ (Some boys that hold a balloon hold it in
their right hand); ‘Niektóre linie kolejowe, które przecinają szosę, idą ponad nią’ (Some railway lines that cross
a road go over it).
This assumption has not been tested empirically. However, donkey experiments have been conducted in
languages different than English (notably Geurts’ original experiment, which was in Dutch, or (San Gines and
Frapolli, 2017) which was in Spanish) and the authors didn’t report any significant distortions. Our intuitive
judgments as well as those of our numerous interlocutors, competent in both languages, assured us that there
is no salient difference in attitude towards quantification in both languages.
As for the diagnostic questions this order was following: sentence 5 in B-type situation, sentence 6 in A-type
situation, sentence 5 in A-type situation and sentence 6 in B-type situaton. The first two went immediately
together, the third and fourth were separated by other questions.
classified as ‘natural logicians’ (NL); those who answered ‘NO’ – as ‘natural pragmatists’ (NPR).
Informants who were instructed about the logic of quantifiers and answered ‘YES’ were classified as
‘forced logicians’ (FL); people who answered ‘NO’ even after being instructed were classified as
‘obstinate pragmatists’ (OP).
Additionally, the questionnaires contained a couple of attention checkpoints - sentence-picture pairs
with plain interpretation (logical interpretation consistent with pragmatic interpretation) in order to
filter out the informants who would give incorrect answers to simple questions.
I have obtained 370 forms. 91 of them were rejected, based upon incorrect answers to the attention
checkpoints or giving at least one ‘DON’T KNOW’ answer to a donkey question.
examined 279 adult native [...] speakers, age 18-62 (M=30,15; SD=10,51) recruited mostly from
students and teachers of two universities and a post-secondary school. There was open access to the
application. Among participants there were 187 women and 92 men. The education level of
participants varied from basic (1%) to PhD and higher (16%) with the most common being education
at MA level (35%).
To begin, let us have the table of accepted responses first (Fig. 10).
Fig. 10. Table of responses. NPR stands for natural pragmatists, NL for natural logicians, OP for obstinate pragmatists and FL
for forced logicians. UR stands for universal reading, PR for pragmatic reading, ER for existential reading and REJ for
Perhaps such exclusiveness would seem too restrictive, yet we decided to avoid further complications.
Keeping ‘DON’T KNOWs’ would raise the number of potentially meaningful combinations from 16 to 81 (34).
The table is full of delights for anyone interested in tracking further correlations, I will explore just a
few of them in greater detail, ones the most relevant to the topic.
The overall aggregated results for all informants together are shown in Fig. 11. The results for natural
logicians, natural pragmatists, forced logicians and obstinate pragmatists are shown in Fig. 12.
Fig. 11. Overall results
Fig. 12. Grouped results.
As we may see, randomization of the assignment of the version with and without logical instruction
to the participants turned out not to be a particularly great idea, as chance isn’t justice: eventually
there were much more forms with instruction, which places the groups without instruction
UR PR ER REJ
NPR NL OP FL
(particularly natural logicians) close to the edge of statistical accountability. Statistical tests for this
group shall be taken cum grano salis then, yet the general tendencies are to be seen quite clearly.
We may observe a statistically significant association between being in one of the four experimental
groups and the number of existential, universal and pragmatic readings: p.
Pragmatic readings were the most frequent among the natural pragmatists. In all other groups the
predominant readings were existential readings. There is a statistically significant difference between
natural and forced logicians χ2(3)=13,09, p<0,005. Natural and obstinate pragmatists are borderline
cases, χ2(3)=4,86, p<0,2. However, the probability that the switch in relative frequency of pragmatic
readings and existential readings between these two groups is accidental is less than 0,1. Generally,
there was statistically significant difference between instructed and not instructed informants:
Finally, let us have a look at the results for particular combinations (Fig. 13, 14)
Fig. 13. Results for all combinations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
NPR NL OP FL
I - UR
Fig. 14. Results for combinations 1, 3 and 6
Let us start, for a warm-up, with the last two figures. The former (Fig. 13) informs us that despite
there being sixteen theoretically available combinations only three of them really matter: I, which is a
paradigm of universal reading, III, which belongs to pragmatic reading, and, above all, VI,
encompassing existential reading.
The latter figure (Fig. 14) shows that the pattern of responses
confined to these three combinations is almost identical with general grouped results stripped of
rejected combinations (Fig. 12). This fact relieves us from the burden of figuring out what are the
hidden motives for picking up these combinations. Plausibly they may be taken as showing merely
the level of a statistical noise. The category of all rejected combinations, shown in overall results on
Fig. 11, reaches a noticeable level of 10%, but this is an artifact of summing up the noise from
aggregation of many meaningless combinations.
Another fact is that both universal reading and existential reading proved to be more common
among forced logicians than among natural ones. According to the hypothesis expressed at the end
of section 5.1, this fact corroborates the claim that there is some truly logical motivation in these
readings, not just bias towards ‘yes’ or ontological liberalism (which does not mean, however, that
these other motivations are significantly suppressed or not involved at all – this is an argument for
presence, not dominance of logical motivations). We might observe, too, that logical instruction
reduces the level of the noise in the answers of those who take the instruction seriously – the
proportion of rejected combinations is lowest among forced logicians.
Still another interesting outcome is that participants without logical instruction would split into
pragmatists and logicians roughly by half, with the slight overweight on the pragmatic side (exactly
58 NPR to 44 NL). It shows how deep pragmatics affects our intuitive judgments (the more so if we
recall that some elements of the logic of quantification are taught in elementary schools, as a part of
However, the main results are surely about the readings. The struggle to tear out universal reading
from under farmer splitting has ended only in a moderate success: the findings corroborate the
hypothesis that the option to equip donkey sentences with universal truth-conditions is open for the
participants. It is rather unpopular, though. Universal readings are less frequent than pragmatic
readings and much less frequent than existential readings. Surely we cannot say that ‘competence’
dictates universal reading for donkey sentences. Frankly speaking, we cannot say that ‘competence’
dictates any definite reading. Pragmatic readings are not much more popular than universal ones,
and existential readings are ‘existential’ more by name than by flesh and blood. Although we have
substantiated – through our manoeuvres with logical instruction – the claim that there is some
genuine logical motivation within this category, there can be much more than that, too (just to
remind: among the most salient possibilities we should count psychological bias for ‘yes’ and
The only exception is suspiciously high popularity of rejected combination XV among obstinate pragmatists. A
tempting hypothesis is that this is somehow related to the fact that obstinate pragmatists are those who failed
the girls-and-flowers test just after being explicitly instructed how to pass it. We should be prepared for a
greater proportion of weird answers in this group, I presume.
pragmatic struggle for uniqueness accompanied with liberal attitude to ontology). And – so far – we
have no idea what are the proportions of this mixture.
The conclusion of the paper is that properly scrutinized experiments on linguistic competence do not
reveal any particular reading as determined by the linguistic competence of language users. We have
an array of interpretations available for the very same sentences in the very same contexts
array of relevant factors that may be responsible for choosing one or another reading by the
informants – many of them very remote from anything like the implicit knowledge of the truth
conditions. What is perhaps the most important is that the readings in general, existential reading
and universal reading alike, need some evidence. The readings are explanations - plausible or not - of
the evidence rather than the evidence itself. They are not raw data. My experiments – and my
interpretation of previous experiments – have shown, I believe, that what really is given directly in
the experiments, and thus should be counted as genuine data, is rather a mere distribution of ‘yes’
and ‘no’ answers to some questions against some pictures. Whether these answers reveal the
universal reading, the existential reading, uniqueness implicature, farmer splitting, donkey merging
or whatever there might be, is not given directly in the experiment but is a matter of further
interpretation based upon hypothesizing about factors that might or might not contribute to a given
pattern of responses.
The significance of these findings is that they may have some impact on the empirical background of
several accounts of donkey anaphora, currently in stock. Until now it was the distribution of readings
what counted as empirical evidence, directly available from the competence through intuition
(Kanazawa) or experiment (Geurts); different accounts were supposed to be competing in explaining
this empirically given distribution of readings. Present results might be taken as motivating a shift in
the notion of empirical data in this domain.
This shift in the notion of empirical data would, in turn,
Recently we’ve got another interesting paper about the truth conditions of donkey sentences which seeks to
observe the standards of experimental work: (San Gines and Frapolli, 2017). However, this paper is explicitly
not concerned with weak/strong ambiguity (ibidem: 11). It is an elaboration of how, according to a specific
framework (van Benthem’s generalized quantification), donkey instances should be counted. It is not devoted
to reflection upon truth conditionally irrelevant factors that might influence otherwise competent speakers, so
that its empirical part is lacking discussion of several important issues. First, the main experiment contains
cases of donkey co-ownership (two farmers own the same donkey), which are very likely to pose a problem for
informants (they may choose to split donkeys in such a case, which would completely change the scene in
which donkey sentences are evaluated). Secondly, the authors don’t take the uniqueness problem into
consideration at all, while at least some parts of the outcome they obtained can be explained precisely by the
informants’ struggle for uniqueness. Thirdly, they show that selective quantification together with the universal
reading would give wrong predictions in certain experimental setting, compared to their own account, but they
leave unnoticed that selective quantification together with the existential reading would give there just the
same predictions as theirs. Finally, potential assessment of their outcome within the present paper would be
still less informative as they work with conditional form of donkey sentences. Conditional form might introduce
different, compared to relative form, semantic intuitions (such as implicit quantifier ‘usually’, cf. (Barker and
It might be a good occasion to mention that the present paper does not take sides in the minimalism-
contextualism debate. The outcome is consistent with and can be accommodated in both frameworks (with
existential reading as the most plausible ‘minimal’ reading for minimalists).
Perhaps other domains within the theory of language would profit from rethinking the notion of empirical
data in these domains, too. Schütze (1996) gives a wide range of examples.
have some impact on the ways in which justification and argumentation should be presented in the
general debate in semantics. For the distribution of readings of donkey sentences used to be treated
not only as a phenomenon per se in need of explanation, but also as empirical evidence for wider
Now, as we can see, it is advisable to seek for an independent motivation for a
preferred account rather than mere appeal to the alleged such-and-such distribution of readings of
donkey sentences. I think that some more stress on discussing the theoretical background of the
proposed interpretations instead of exchanging charges of alleged empirical inadequacies would be
very profitable for the whole debate in general. Wherever there is a bunch of truly empirical data, it
is surely advisable to account for it.
However, the participants in the debate should not extend the
notion of the data to cover things that aren’t directly available from experience but only through
some interpretation – as the interpretations highly depend on more general assumptions, principles
and theoretical objectives that are adopted in a given account. In order to properly assess and
evaluate competing accounts we’d rather consider which principle is simpler, deeper, more
important, more general, having greater explanatory power: preserving inferential patterns
(Kanazawa), preference for stronger readings (Krifka), existential quantification of indefinites, for the
sake of a uniform account of neo-Russellian semantics (Neale), unique reference of pronouns
(Kadmon), or syntactic uniformity in compositional treatment of wide range of cases (Barker and
Quite a philosophy, isn’t it?
Actually, these are two sides of the same coin: according to general methodology of science data predicted
by a theory serve as evidence for this theory and get explained by this theory. Evidence and explanation are
two directions of the same relation. It is rather the question of particular research interests whether donkey
sentences were discussed in the literature on their own or rather as instances illustrating more general issues
(such as anaphoric relations or quantification).
Actually, in philosophy (and in logic) there is an old and noble tradition of ignoring facts. According to this
tradition philosophy (and logic) is about what is possible or necessary – not about what is (contingently)
factual. As David Lewis wrote: “I distinguish two topics: first, the description of possible languages or grammars
as abstract semantic systems whereby symbols are associated with aspects of the world; and second, the
description of the psychological and sociological facts whereby a particular one of these abstract semantic
systems is the one used by a person or population. Only confusion comes of mixing these two topics” (Lewis
1970: 19). For a couple of decades now philosophers have been discovering increasingly often that allowing for
the verdict of competence as one of legitimate sources of support for philosophical proposals might help them
to move on from certain dead ends in argumentation. Methodological scheme behind such a use of empirical
evidence in philosophical debates is roughly this: if in a certain area of philosophical interest – say in language –
there are firmly established, important facts, then it is worthwhile to consider an option that this factual
pattern is not entirely contingent. Perhaps, even if it is not necessary in logical sense of the word, it might
reveal some not contingent structure of ontological or at least cognitive character. The possibility that is
embodied in this pattern might be somehow highlighted among all purely logical possibilities and therefore,
even if one cannot at a given moment state precisely what are the grounds for such discrimination, a
philosophical theory which is consistent with this empirical pattern (or – better – which explains somehow this
pattern) should be considered to be a better theory - ceteris paribus - than one which ignores (or – worse -
explicitly excludes) this pattern. As James Higginbotham would put it: “Given the premise that the sentences
we use express our thoughts, the study of semantic competence [...] is a study at the intersection of
philosophical and linguistic interests and concerns” (Higginbotham 2002:577). Thus philosophical theory may,
according to this methodological scheme, admit being informed by the linguistic competence – provided the
competence says something informative.
Non ergo grammaticus sed philosophus, proprias naturas rerum diligenter
considerans... grammaticam invenit
The origins of this paper reach back a couple of years ago to fruitful discussions I had with my
student Natalia Pietrulewicz. Natalia gave me also a helping hand with preparing and distributing
questionnaires and gathering the responses. I am very grateful to her for that. I would like to thank
the organizers of COGLANG 2010, Łódź and 20th Meeting of the European Society for Philosophy and
Psychology, London, 2012, and audiences of these events for the opportunity to discuss some initial
ideas of the work. Special thanks for Piotr Stalmaszczyk, for inspiring and supporting environment he
created around a series of biannual conferences PHILANG on philosophy of language and linguistics.
This work is indirectly but heavily indebted to him for his support and encouragement.
A part of the work was funded by the grant N N101 188840 2011/2012 from Polish National
Banga, A, Heutinck, I, Berends, S.M. & Petra Hendriks (2009). Some implicatures reveal semantic
differences. In Bert Botma & Jacqueline van Kampen (Eds.), Linguistics in the Netherlands.
Amsterdam: John Benjamins 1-13.
Barker, C. and Shan, C., 2008. Donkey anaphora is in-scope binding. Semantics and Pragmatics, 1(1),
Brasoveanu, A. (2008). Donkey pluralities: plural information states versus non-atomic
individuals. Linguistics and Philosophy, 31, 129-209.
Brogaard, B. (2007). The But Not All: A Partitive Account of Plural Definite Descriptions. Mind and
Language, 22, 402-426.
Burleigh, W. (1988). Von der Reinheit der Kunst der Logik, Erster Traktat. Von den Eigenschaften der
Termini. (De puritate artis logicae. De proprietatibus terminorum). [Translated and edited by Peter
Kunze, with introduction and commentary]. Hamburg: Felix Meiner Verlag.
Chierchia, G. (1992). Anaphora and Dynamic Binding. Linguistics and Philosophy, 15, 111-183.
Cronbach, L. J. (1942). Studies of acquiescence as a factor in the true-false test.; Journal of
Educational Psychology, 33, 401-415.
Dever, J., 2004. Binding into character. Canadian Journal of Philosophy, Supplementary Volume 30,
Elbourne, P.D., 2005. Situations and individuals. Current studies in linguistics. Cambridge, Mass: MIT
Evans, G. (1977) Pronouns, quantifiers and relative clauses. Canadian Journal of Philosophy, 7, 467–
Geach, P. T. (1962). Reference and generality. Ithaca, NY: Cornell University Press.
Geach, P.T., 1975. Names and Identity. In: S. Guttenplan, ed., Mind and Language. Oxford: Clarendon
Geurts. B. (2002). Donkey business. Linguistics and Philosophy, 25, 129–156.
Grice, P. (1989). Studies in the way of words. Cambridge, MA: Harvard University Press.
Grosz, P.G., Patel-Grosz, P., Fedorenko, E. and Gibson, E., 2015. Constraints on Donkey Pronouns.
Journal of Semantics, 32, pp.619–648.
Heim, I. (1982). The Semantics of Definite and Indefinite Noun Phrases. Ph.D. dissertation, University
of Massachusetts, Amherst.
Higginbotham, J. (2002). On linguistics in philosophy and philosophy in linguistics. Linguistics and
Philosophy, 25, 573-584.
Horn, L. (1972) On the semantic properties of the logical operators in English. Doctoral dissertation,
UCLA, Los Angeles, CA. Distributed by IULC, Indiana University, Bloomington, IN.
Kadmon, N. (1990). Uniqueness. Linguistics and Philosophy, 13, 273–324.
Kamp, H. (1981). A Theory of Truth and Semantic Representation. In J. Groenendijk et al. (eds),
Formal Methods in the Study of Language, Mathematical Centre, Amsterdam.
Kanazawa, M. (1994). Weak vs. strong readings of donkey sentences and monotonicity inference in a
dynamic setting. Linguistics and Philosophy, 17, 109-158.
________(2001). Singular donkey pronouns are semantically singular. Linguistics and Philosophy, 24,
King, Jeffrey C., (2004) “Context Dependent Quantifiers and Donkey Anaphora”. In New Essays in the
Philosophy of Language, Supplement to Canadian Journal of Philosophy, vol. 30, M. Ezcurdia, R.
Stainton and C. Viger (ed.), Calgary, Alberta, Canada: University of Calgary Press, 97-127.
Krifka, M. (1996). Pragmatic Strenghtening in Plural Predications and Donkey Sentences. In T.
Galloway and J. Spence (eds), Proceedings from Semantics and Linguistic Theory VI, Cornell
University, Ithaca, NY.
Lewis, D. (1970). General Semantics. Synthese, 22, 18-67.
_______(1975). Adverbs of quantification. In E. Keenan (ed.), Formal Semantics of Natural Language.
Cambridge: Cambridge University Press.
Neale, S. (1990). Descriptive Pronouns and Donkey Anaphora. The Journal of Philosophy, 87, 113-150.
Parsons, T. (1994). Anaphoric pronouns in very late medieval supposition theory. Linguistics and
Philosophy, 17, 429-445.
van Rooij, R., 2006. Free Choice Counterfactual Donkeys. Journal of Semantics, 23(4).
San Gines, A. and Frapolli, M.J., 2017. Stop beating the donkey! A fresh interpretation of conditional
donkey sentences. Theoria, 32(1), pp.7–24.
Schütze, Carson T. (1996). The empirical base of linguistics: Grammaticality judgments and linguistic
methodology. Chicago: University of Chicago Press.
Yoon, Y-E,. (1996). Total and partial predicates and the weak and strong interpretations. Natural
Language Semantics, 4, 217–236.