Awkward questions: language issues in the 2011 census in England
© 2016 Mark Sebba, Lancaster University
The 2011 Census in England broke new ground, as a question about language had never
previously been asked. After stakeholder consultations and a series of trials, the census
authority decided on two questions based on earlier censuses in the USA: one about the
respondent’s ‘main language’ and another about proficiency in English. This paper provides a
critique of the census questions, showing how the pressure to produce questions which were
straightforward to answer and consistent with the predominant monolingual ideology led to
the choice of two questions which were problematic in different ways. This raises doubts
about the validity of the questions themselves and the usefulness of the data collected.
Despite this, the results have been treated as objective fact. The paper asks whether the
census questions on language have served a useful purpose, or whether other methods of
collecting language data would be preferable.
Awkward questions: language issues in the 2011 census in England
Alongside mundane inquiries about the number of vehicles owned and the number of rooms
in the home, national censuses offer the prospect of asking pertinent questions about
respondents’ uses of language, which would gather reliable information about the languages
used throughout the population and allow it to be correlated with other information, such as
location, employment and education. Questions about vehicle ownership are clearly matters
of objective fact, but other questions, such as ‘What is your religion’ invite a more subjective
response1. Somewhere on the scale between these two lie questions about language. Even
‘simple’ questions about what languages respondents know, and how well they know them,
may be complex and subjective in practice. They often require the respondent to make a
combination of judgements, some relatively objective, and some quite subjective, linked to
personal identity and socioeconomic status, influenced by perception by the self and by other
Inevitably, all such questions are asked within a social and historical context which both
constrains the possible answers and motivates respondents to select certain answers rather
than others from those available, in accordance with prevailing ideologies about nation,
ethnicity and language (see, e.g. Urla 1993, Kertzer and Arel 2002, Leeman 2004, Laversuch
2007). Urla (1993: 820) points to ‘a recent emergence of faith in statistical measurement as
the basis for an objective and necessary science of society,’ while arguing that administrative
processes of quantification like censuses are not socially neutral activities that deliver true
‘facts’, or even ideologically motivated distortions of facts. Rather, such processes ‘in and of
themselves constitute our most basic understandings of the social sphere and social actors’.
Such a view contrasts strongly with that of the authorities and the public, by whom censuses
are expected to - but in reality cannot - deliver objective ‘facts’ in the form of statistics.
Many countries include language questions in their national censuses, or have done in recent
times: for example, seven questions related to language use and language proficiency have
been asked in the Canadian census (Sabourin and Bélanger 2013: 3) and two in Australia
(Ozolins 1993). The USA, New Zealand and South Africa also ask language questions in
their decennial censuses, as do 76% of countries in the United Nations Economic
Commission for Europe (Aspinall 2005: 364). In the United Kingdom, however, before 2011
no questions about language had ever been asked in England, where over 80% of the
population live. In this respect the 2011 census in England was ground breaking, as it
included two questions specifically about language: one about the respondent’s ‘main
language’ and a second, for those who declared that their main language was not English,
about English proficiency.
Both questions, though carefully tested and designed to be easy to answer and to provide
good data, can be seen as highly problematic from the viewpoint of linguists and
sociolinguists. To date, however, they have received little attention from academics, and
when the census results have been communicated, whether to users or the public at large, they
have been treated uncritically, as objective and factual. This makes it all the more important
that they be subject to scrutiny now, as preparations are under way for the next census. The
questions themselves were based on census questions previously used in the USA, which
have also attracted criticism from linguists working in this field. This seems a good moment,
therefore, to examine these questions critically in the light of current applied linguistic and
sociolinguistic knowledge. The paper is structured as follows: Section 2 deals with the
historical background and early testing of language questions. Section 3 discusses the ‘main
language’ question which was chosen for the census questionnaire. Section 4 examines the
question about language proficiency in English. Section 5 is for discussion.
2. Language questions in the 2011 census in England
In England and Wales, the census is carried out by the Office for National Statistics (ONS).
Up to and including 2001, no language questions had been asked in a census in England.
Historically, a language question had been asked in other constituent countries of the United
Kingdom – Wales, Scotland and (Northern) Ireland – about the use of the indigenous Celtic
languages Welsh, Scottish Gaelic and Irish, providing information about language
maintenance and shift towards English, but not asking about any other languages.
For the 2011 census, the ONS initially had no plans to ask a question on language. Even its
initial consultations did not specifically invite responses on the topic of language ‘because
ONS believed that there was insufficient evidence of user demand to justify inclusion in the
2011 Census’ (ONS 2006:14). However, of almost 500 responses to the consultation, around
80 involved language, and the ONS concluded that there was a demand for a language
question (ONS 2006:14). Reasons given by stakeholders for needing language information
included a variety of language planning purposes, for example: enabling government bodies
to meet their duties under legislation like the Race Relations Act and (for users of British
Sign Language) the Disability Discrimination Act; allowing local and central government to
allocate resources for teaching English as a second language and providing translation
services within public services; identifying isolated groups of people and addressing social
inclusion; supporting regional or minority languages (ONS 2006: 15).
The United Nations Economic Commission for Europe note in their recommendations for
decennial censuses that there are various kinds of language data which countries may want to
collect (cf. Arel 2002:97):
Multilingual countries and countries with significant immigrant populations may wish to
collect data on languages that are currently written or spoken. Depending on information
needs, the following data may be collected:
a) “Mother tongue”, defined as the first language spoken in early childhood at home;
b) Main language, defined as the language which the person commands best;
c) Language(s) most currently spoken at home and/or work;
d) Knowledge of language(s), defined as the ability to speak and/or write one or more
(United Nations, 2006: 96)
At the first consultation stage, the majority of respondents made a case for needing
information about languages used, rather than proficiency in English, but a second
consultation in 2007 showed that 90% or more of respondents felt it was important to have
information about ability to speak, understand, read and write English, as well as the mother
tongue or first language, main language spoken at home and preferred spoken and written
language(s) for communicating with public authorities (ONS 2009:11). This resulted in the
ONS deciding to ask questions both about languages spoken and about proficiency or skills.
Between four and six years before the census date, the ONS experimented with questions
designed to find out more about respondents’ abilities in multiple languages. They tried using
different versions of a matrix to collect responses to the question ‘What languages can you
understand, speak, read or write?’. The 2007 postal pilot census included a matrix in which
respondents could indicate ‘the ability to understand, speak, read and write English, Welsh
and one other language (to be specified by the respondent)’ as well as sign language (ONS
2009: 20). Testing indicated that the matrix form of the language question was not
satisfactory. Respondents found all versions of it difficult to complete, leading to incomplete
and inconsistent responses. Furthermore, less than half of the respondents to the 2007
consultation thought that this version would meet their needs for language information, inter
alia because it failed to elicit the preferred language of communication, the details of all
languages known, and information on levels of language ability and literacy. One major
problem was a lack of clarity about the level of proficiency required for respondents to report
that they knew a language. If respondents reported they could speak languages which in fact
they knew only a little of, the data would be less useful (ONS 2009:27).
Noting that ‘international research and best practice also advises that a matrix format is not
appropriate for a question in a self-complete census questionnaire’ (ONS 2009:27), the ONS
then abandoned the matrix question design and trialled a number of versions of a question
based on the New Zealand census, ‘In which language(s) can you have a conversation about a
lot of everyday things?’ But again it was clear that consistency of responses would be a major
issue (ONS 2009:31).
At this point, the ONS seems to have concluded that none of the questions tried so far ‘met
all the essential elements of a suitable language question, let alone the desirable elements’ and
they ‘decided to test an alternative style of question for meeting user requirements within
space constraints’ (ONS 2009:31). This question was based on a question asked on the US
census long form questionnaire in 2000, and consisted of two parts: one to elicit the ‘primary
language’ of the respondent, and a second to determine their proficiency in English2. These
appeared in the final census questionnaire, but each was problematic in its own way. They are
considered separately in the next two sections.
3.The ‘Primary Language’ Question
Though based on the US census question, the English census version was differently worded.
The US census asked ‘Does this person speak a language other than English at home?’,
directing the attention to the respondent’s ‘home language’ while making the assumption that
there could be only one. The final form of the question on the census form in England was as
in Figure 1:
(18) What is your main language?
□ English Go to 20
□ Other, write in (including British Sign Language)
Figure 1: the ‘primary language’ question in the 2011 census questionnaire
Thus while assuming, like the US version, that there could only be one ‘main language’, the
question does not specifically ask about languages used in the family environment.
Naming your ‘own’ language(s) is straightforward for some people, but difficult, messy
and/or strategic for others (Le Page and Tabouret-Keller 1989,189; Pattanayak 1981, 54;
Freeland 2003). DeVries (1985: 359) notes that ‘there has been a fair amount of discussion
among researchers about the preferred phrasing of questions of this type’ with none of the
alternatives being completely satisfactory. He points out that asking for the ‘main language’
may give the respondent ‘too much freedom, in the sense that persons who use more than one
language are required to designate one of these as their main language. Such choices then
become as subjective as the ones regarding the ability to speak designated languages’.
According to the ONS, ‘the key purpose of a primary language question is to establish the
language that public authorities can communicate with respondents in’ (ONS 2009:31). Thus
it is not mainly intended to find out about the respondent’s linguistic repertoire, their
language use in specific domains like home or work, or even their preferred language. How
the ONS interpreted the purpose of this question is illustrated by an example reported in
connection with the testing process:
One respondent whose mother tongue was Afrikaans understood the question to be
asking about frequency of use rather than mother tongue and answered ‘English’ as
their main language. Although such answers would reduce the quality of information
on language diversity, the information provided would still meet essential user needs
appropriately as anyone who mainly used English as their primary mode of
communication would have appropriate language skills to be able to access services
adequately in English (ONS 2009: 36).
While the decision to adopt the US-census style question can be seen to be based on the
priorities of users determined by the 2007 consultation, it nevertheless seems to mark a
change in the strategy of the ONS: from attempting to collect information about a range of
languages used by individuals and in households, to focussing closely on a ‘key purpose’
which would be to determine how public authorities could best communicate with their user
communities. Although the need to know the ‘preferred language for communicating with
public authorities’ was expressed by over 90% of the stakeholders in the consultation,
‘mother tongue or first language’ and ‘main language (spoken at home)’ ranked slightly
higher in the list of priorities (ONS 2009: 10), just below ability to speak and understand
English; hence it is noteworthy that the ONS decided to concentrate on one of the user needs,
namely to find a language for communication, and to let go of another, i.e. gathering
information about the different languages which people were able to use for different
purposes in different contexts.
The question as it is asked assumes the respondent can identify a ‘main language’ across both
spoken and written language uses. Even where these are different – for example, where a
person’s main spoken language is Panjabi but their main written language is Urdu – the
respondent will have to choose just one. They may feel that their mother tongue is Panjabi
and therefore record that as their ‘main language,’ but this would not necessarily be the
appropriate language for the authorities to use to communicate with them in writing, as for
some Panjabi speakers, their language of literacy is Urdu. While this example applies (or did
apply) to just one community – Panjabi Muslims (see Gardner- Chloros 1997: 213) – there
could be many people for whom English is the ‘main language’ of literacy, but another
language is the main spoken language especially in the home. Likewise, there could be many
whose main spoken language is a nonstandardised language or regarded as a ‘dialect’, while
the language of literacy is the standard counterpart: for example, speakers of Sylheti for
whom standard Bangla is the language of literacy (Reynolds and Verma 2007:304).
The way the question is contextualised clearly provides an incentive to the respondent to
answer that their main language is ‘English’. Any other answer not only requires more work
to fill in the form, but also requires the respondent to subject his/her proficiency in English to
assessment in a follow-up question. Furthermore, for a multilingual respondent, choosing a
‘main language’ could be difficult, especially as different languages may be ‘main’ in
different contexts. For many people English will be a ‘main’ language in some sense, even if
it is not the language they are most fluent in, so the scales are weighted in favour of ‘English’
being given as the response. While the ONS apparently did not see this as a problem, given
that such respondents would be able to access services in English, it acknowledged that it
would have the potential to reduce information about ‘language diversity’. This question can
therefore be seen to have worked in a number of ways to limit the extent of linguistic
diversity that the census would record. The requirement to select a single ‘main language’ is
out of step with a sociolinguistically informed view of multilingualism.
4. The English Proficiency Question
Those who declared a ‘main language’ other than English were directed to a question on
English proficiency (ONS 2009:36). This question was modelled on a question asked in the
US census, How well does this person speak English?, reworded to address the respondent
and replacing ‘does’ with ‘can’:
(19) How well can you speak English?
Very well Well Not well Not at all
□ □ □ □
Figure 2: the English proficiency question in the 2011 census questionnaire
This question was problematic in a variety of ways, which will be discussed under separate
4.1 Scope of the question and underlying ideologies of language
This question focusses narrowly on one aspect of the respondent’s linguistic competence: his
or her ability to speak English. Despite the prevalence of written communication throughout
British society and the declared need of 91% of respondents to the consultation to know
about ‘ability to read English’ (ONS 2009:10), the form of the question referred only to the
spoken mode. No account was taken of the possibility that a respondent might read English
well, but speak it poorly, or vice versa. Nor did it ask explicitly about habitual use of English,
as might be inferred from the US census question, which uses does rather than can.
More fundamentally, as argued by Zentella et al. (2007) in their cogent critique of the
question as it appears in the US Census, this question is based on a view of language which is
at odds with the one held by most linguists, and sociolinguists in particular. The question,
they say, reflects a ‘correctness ideology’ which is, in fact, the monolingual and prescriptive
view held by the majority of the population. In this view, control of language – English in this
case – ‘can be measured in terms of correctness’ and ‘insufficient correctness means loss of
meaning’ (2007: 10). They warn of the social divisiveness of this, as ‘correctness’ is well
known, from decades of sociolinguistic research, to correlate with the usage of the elite and
dominant class. Furthermore, the assumptions underlying the question ignore the interactive
nature of language by presupposing that ‘acts of communication take place in a vacuum’, and
are the acts ‘of an individual rather than a social production’ (2007: 10).
Zentella et al. (2007: 10) also criticise the arbitrariness of the four-way classification of
proficiency, which, they say, assumes that language ‘can “naturally” be measured in discrete
levels of correctness’ and that ‘individuals understand these levels’. The next section
describes the steps which the ONS took to establish the robustness of this measure.
4.2 Validity of the question and interpretation of the responses
The usefulness of this question depends on accurate self-assessment. But self-assessment of
proficiency in a named language is not straightforward. While psychological research has
shown a substantial correlation between self-concepts of ability and actual levels of ability in
a given domain, it is doubtful whether the same applies to self-assessments of language
proficiency (Edele et al. 2015, 100). No guidance was given to respondents about how to
assess their own ability in English.
Before deciding to use this question format, the ONS conducted a number of tests to assess
the validity and reliability of the question. Cognitive testing found that respondents had
satisfactorily clear interpretations of the proficiency level options; thus, for ‘very well’ ‘it was
thought that if a person did not have English as a first language then to tick this box they
should have a high level of English and possibly be fluent,’ while for ‘well,’ ‘respondents felt
this level of ability was appropriate for people who were not fluent, but were able to use their
language to get by on a day-to-day basis’. ‘Not well’ was felt to apply to someone ‘who could
follow some of what was being said, but with difficulty’ (ONS 2009: 38).
To check the accuracy of the self-assessments by L2 users of English, the ONS asked the
interviewers who were carrying out the census tests (who had not been trained in any form of
language assessment) to give their own assessment of the respondents’ English language
ability. The outcome was almost perfect agreement on the categories ‘very well’ and ‘not
well’. For the category ‘well’, the interviewers assessed the respondents’ English as better
than the respondents themselves did: of those respondents who said that they spoke English
‘well’, the interviewers put 56% in the ‘very well’ category, suggesting ‘a tendency for
people to underestimate their English proficiency’ (ONS 2009: 39) 3.
Acknowledging this mismatch, the ONS report points out ‘However, none of the
discrepancies in judgement between respondent and interviewer crossed the boundary
between ‘not well’ and ‘well’ which is the most crucial distinction for data users’.
Despite the reassurance provided by the ONS, it is doubtful whether the judgements of
interviewers completely untrained in language assessment are satisfactory validation of a
census measurement of language. In addition, the discrepancy in the judgements concerning
the ‘well’ category might give rise to concerns about its validity.
Research carried out in the USA on responses to this question in the 1980 US census shows
the problematic nature of the statement by the ONS that the ‘most crucial distinction for data
users’ is the boundary between ‘well’ and ‘not well’. For many data users the important
distinction may indeed lie between those categories in their commonsense interpretation, and
according to cognitive testing done by the ONS, this view is shared by the general public.
They interpret speaking English ‘well’ as ‘able to use their language to get by on a day-to-day
basis’ (i.e. not needing help with English), while ‘not well’ implies ‘able to follow some of
what was being said, but with difficulty’ (i.e. needing help with English) (ONS 2009: 38).
However, the conclusion drawn by some researchers in the US, where the question has been
used since 1980, has been that the boundary between these categories of people – those ‘not
needing help’ and those ‘needing help’ – does not lie between the respondents who answer
‘well’ and those who answer ‘not well’, but between those who answer ‘very well’ and those
who answer ‘well’.
This conclusion could be drawn from the English Language Proficiency Study, carried out in
1982 by the Census Bureau for the Department of Education. The study administered tests to
respondents chosen from a sample based on their responses to the language usage question in
the 1980 census. Detailed tests of English writing, reading, listening and speaking abilities
were given to respondents in their homes and compared with their answers to the census
questions (Kominski 1989, McArthur 1993: 4). The analysis of the ELPS showed that there
was ‘a strong correlation’ between the census responses and the test score. However,
In terms of a simple “pass-fail” criterion (based on an assigned cutoff point in the
scoring), it was shown that persons responding “very well” to the speaking ability
item had passing levels similar to the English-speaking population that had taken the
test (as a control group), while persons reporting ability levels as “well” or worse had
significantly higher levels of failure (Kominski 1989).
So although as Kominski puts it, ‘in this context, the “how well” item exhibited a fair degree
of validity’ the cut-off point between those who can speak English well enough (‘need no
help’) and those who cannot (‘need help’) seems, for this population at least, to lie between
the self-assessment categories of ‘speak very well’ and ‘speak well’ rather than between
‘speak well’ and ‘don’t speak well’. The former is how it has sometimes been subsequently
interpreted in the US, for example by Edith Macarthur, one of the researchers on the English
Language Proficiency Study. Her report on the ‘changing picture’ of languages in US schools
‘makes the assumption that all persons who spoke a language other than English at home and
who were reported to speak English “well,” “not well,” or “not at all” (i.e. less than “very
well”) had difficulty in English’ (Macarthur 1989: 5). Furthermore, since 1990, the census
response ‘very well’ has been used in the US as the cut-off point in the definition of
‘linguistic isolation’. Individuals and families are classified as ‘linguistically isolated’ unless
at least one person over 14 in their household speaks exclusively English, or speaks English
‘very well4’ (Zentella et al. 2010: 10-11).
On the basis of all the above, at the very least we could say that the validity of the self-
assessment in the English census is problematic. Furthermore, even if the assessment has
validity, its interpretation, in terms of the actual communicative abilities of respondents, is
open to question.
4.3 Further problems with self-assessment
A similar conclusion can be drawn from other studies which have compared self-assessment
of ability in L2 with more objective measures.
In the British context, Carr-Hill et al. studied around 1100 adults from four specific
ethnic/linguistic backgrounds (Bengali, Gujerati, Punjabi and Chinese) and a selection of
refugees with Tamil, Somali, Kurdish or Bosnian as mother tongue. Participants were asked
to self-assess their abilities in English as well as complete a set of language assessment tasks,
including a written test. In the assessment, a score of 49 points was taken as indicating the
achievement of ‘survival level’ English, ‘the level at which it becomes possible to work in an
English speaking environment, though not if extensive verbal and listening communication is
required. Verbal and listening skills remain quite moderate, but a [foundation level of
literacy] has been reached [...]’ (Carr-Hill et al. 1996: 66). Carr-Hill et al. found that on
average, participants tended to be ‘rather optimistic’ in their self-ratings as ‘good’ and ‘very
good’ (p. 97). Thus the average score of the participants who self-assessed as ‘good’ was
47.0, just below ‘survival’ level and therefore, in the authors’ terms, not quite good enough to
work in an English speaking environment. The average score of those who self-assessed as
‘very good’ was just within ‘level 5’, where people ‘could work in many English-speaking
environments provided they were not required to use much written or spoken English
independently, or understand complex instructions; and could function independently in
social and community contexts’ (p.66). Carr-Hill et al. remark that while ‘one can
legitimately call individual self assessments into question on an individual basis, they are
very useful indicators in relative terms’ (p. 97, emphasis in original), shown by ‘very high’
correlations between the self-assessment score, the proportion requiring help from the
interviewer, the total raw point scores and the classification into levels (p. 97).
Edele et al. (2015) review a number of earlier studies where language self-assessment was
carried out using a general type of question, similar to that often used in censuses, e.g. where
participants were asked to rate their language ability on one or more dimensions
(understanding, speaking, etc.) with respect to a four- or five-point scale. (The Carr-Hill et al.
study was not among those they reviewed.) They remark that ‘the few studies that have
examined the validity of the general question type have yielded inconsistent findings’ (p.101)
with correlations between the self-assessments and externally validated tests varying from
moderately good to no correlation, in different studies. They conclude that ‘It is thus unclear
whether the general type of self- assessment on which most empirical evidence on
immigrants’ language skills relies is reliably related to tested language proficiency.’ They cite
a study by Finnie and Meng (2005) which found substantial differences between the results
for self-assessments and test scores. Finnie and Meng, furthermore, found that ‘tested L2
ability predicted participants’ employment status and income reliably, whereas self-assessed
language skills did not’ (Edele et al. 2015: 102).
In their own larger study based on data collected from adolescents in Germany – L2 speakers
of German with L1 backgrounds in Russian, Turkish, Polish and a large number of other
languages - Edele et al. concluded that ability self-assessments (those using a general type of
question as described above) correlated only moderately with tested linguistic ability (Edele
et al. 2015: 112).
Furthermore, a multivariate analysis suggested that self-assessments of both the general and a
more specific type ‘are systematically biased in some groups […] For instance, even though
female students scored higher on the language tests, they estimated their L1 skills on a similar
level as their male peers. This result indicates that, in relative terms, boys tend to
overestimate their abilities’ (Edele et al. 2015: 113). Highly relevant here is the finding that
different groups of immigrants (i.e. groups with different L1) also showed this kind of bias.
‘Whereas the students of Turkish origin who were born in Germany exhibited lower skills on
the L2 tests than did most other groups, they rated their L2 skills on a similar or even higher
level than students from other immigrant groups did. Students of Turkish origin thus tend to
overestimate their language abilities’ (Edele et al. 2015: 113).
This is interesting because the study of ethnic minorities in the British setting by Carr-Hill et
al. appeared to produce a similar finding, although the authors themselves dismissed this
conclusion. Carr-Hill et al. give tables showing the average score of the participants in each
language group who assessed themselves as being in the categories ‘poor’, ‘moderate’,
‘good’, ‘very good’. For example, the average score of the members of the refugee group
(from various language backgrounds) who rated themselves ‘poor’ was within Level 2,
‘moderate’ at Level 3, ‘good’ at Level 4 and ‘very good’ at Level 5, showing that the refugees
collectively made a fairly accurate assessment of their own abilities. However, the range of
scores of homogenous language groups under each self-assessment heading is quite large,
especially for ‘good’ (between 27.6, Level 3, for the Bengali group and 72.8, Level 5, for the
Chinese group) and ‘very good’ (from 62.1 for the Bengalis to 88.6 for the Chinese). A
comparison of the proportion scoring at ‘survival level’ or better with the self-assessed score
likewise shows big differences, with only 10% of the Bengali group who self-assessed as
‘good,’ and only 61% of those who rated their English as ‘very good,’ reaching the ‘survival’
score of 49. By contrast, 79% of the Chinese group who rated themselves ‘good,’ and 89% of
those who rated themselves ‘very good,’ achieved the minimum survival score.
On the basis of these figures, Carr-Hill et al. say, ‘It is possible that reported self-assessment
– which is effectively an expression of conﬁdence - might vary according to linguistic group,
age group and gender […] with Bengalis overstating and Chinese understating their
competence (relative to the whole sample). […] However, detailed breakdowns show that
there is very little relationship with either linguistic group or age or gender (and that was also
true of the correlation coefficients.)’ (Carr-Hill et al. 1996: 98).
Despite this disclaimer, there seem to be good reasons why groups with different L1 could
vary in their judgement of what counts as speaking L2 ‘well’ or ‘very well,’ as Edele et al.
actually found in their study. Edele et al. reflect on some of the reasons why this might occur,
particularly in migrant communities where L2 is not learnt through structured instruction:
Whereas [instructed] foreign language learners typically can draw on explicit
feedback regarding their linguistic performance (e.g., teacher responses, grades,
exams), [untaught] immigrants often lack such feedback. In addition, foreign
language learners have a relatively clear frame of reference for estimating their
language skills, namely, curricular expectations and the performance of their
classmates. The reference for immigrants, in contrast, is more ambiguous. For
example, they may compare their L2 skills to those of other immigrants or to those of
non-immigrants and their L1 skills to those of coethnics with good, basic or no skills
in the language. This ambiguity may decrease the accuracy of immigrants’ self-
assessments. (Edele et al. 2015: 102).
While by no means all the respondents in the English census whose ‘main language’ was not
English are ‘immigrants’, the considerations mentioned by Edele et al. could still apply to
many. Even those who have been settled for long periods may still have an unclear frame of
reference for judging their English abilities. Taking all this research into account, the
apparently straightforward self-assessment required by the English census appears to be of
questionable validity, even on its own terms. There is certainly a possibility that there was
substantial variation in the way that respondents self-assessed, as well as in the accuracy of
their own judgements when compared with more objective assessments.
The 2011 census offered the first opportunity in England to collect detailed information about
multilingualism in the most populous country of the British Isles. Yet despite a careful
programme of testing designed to ensure that user needs were met, the final questionnaire
contained two questions which were both, in different ways, problematic.
Deciding on the ‘main language’ question as the way to elicit information about languages
used other than English meant the loss of a number of opportunities – for example, to find out
about multilingualism (individuals and households where several languages were in use) and
about different contexts or domains, in particular home and work, where different languages
might be used. It did not distinguish between spoken and written language, or ask about
different literacies connected with different languages. It may also have led to over-reporting
of English as the ‘main language’. Thus while this question may have fulfilled the needs of
the census to have a simple question, and provided end users of the data with a
straightforward answer, it also left many important questions about multilingualism in
The language proficiency question, though apparently not seen as problematic by the ONS or
data users, had a number of flaws. It was the same question that was criticised by Zentella et
al. (2010) for its monolingual ‘correctness’ ideology, its failure to treat language as an
interactive process, and the arbitrariness of the four answer categories. Setting those
important issues aside, the consistency and accuracy of self-assessments is a serious
problem. It seems inevitable in the present era that any census question about language
proficiency will involve self-assessment, as other methods would be too costly. However, the
validity of the question itself, with its four possible answers, is called into question by the
studies discussed above. Doubts exist as to whether non-native speakers of a language can
self-assess speaking ability with sufficient accuracy to produce worthwhile data. Even if they
can, it is not clear whether the dividing line between being able to use the language
effectively and not being able to do so should be drawn between the categories ‘very well’
and ‘well’ or between ‘well’ and ‘not well’.
This question is important when it comes to the communication of the census results. Since
the census, many official publications based on the census data have conflated the categories
‘cannot speak English well’ and ‘cannot speak English at all’ into a single category of ‘non-
proficient’ speakers, contrasting with ‘proficient’ speakers who placed themselves in the
‘well’ or ‘very well’ categories5. This is in line with the pre-census observation by the ONS
(based on their interpretation of the US experience) ‘that a four-part scale is clear to users and
allows a two-part distinction in terms of outputs’ (ONS 2009: 38). Yet the main justification
for drawing the line between ‘not well’ and ‘well’ rather than between ‘well’ and ‘very well’
seems to be the very informal checks carried out by the ONS interviewers during the course
of the pre-census tests. This demarcation line between ‘proficient’ and ‘non-proficient’ thus
seems to have become conventionalised without having any reliable empirical basis.
In 2012, a media storm followed the announcement of the census figures which showed that
1.7% of the population in England and Wales self-assessed as ‘cannot speak English well or
at all’. There were widespread misunderstandings of the figures by media and politicians (see
Sebba to appear). No voices – not even from academia – questioned the census methodology
itself. The suggestion that the self-assessment approach might have led to an underestimate of
those whose English is inadequate for everyday purposes, which seems plausible on the basis
of the discussion above, might have led to an even bigger backlash. Therefore findings such
as these are sensitive, and their communication has to be handled with the utmost
responsibility and sensitivity.
Elsewhere, there have been arguments made that language questions are more properly
answered through surveys than through censuses (for example, by the Australian Bureau of
Statistics, see Ozolins 1993:196). In the UK, the future use of decennial censuses has been
questioned, although the current recommendation is for another to be carried out in 2021,
with more use of available administrative data (House of Commons 2014). In fact, data is
already collected outside the census on one large section of the population, namely children at
school. School language censuses have been held in some cities with large numbers of
linguistic minorities since the 1980s, and since 2008 data on language used by pupils has
been collected annually by all authorities (VonAhn et al. n.d.)).
The data collected is substantially different from that asked for in the decennial census,
however. Guidelines say that ‘If a child was exposed to more than one language (which may
include English) during early development the language other than English should be
recorded, irrespective of the child's proficiency in English’ (Department for Education 2014).
The school census gives a broader picture of multilingualism by not focussing on a ‘main
language’ for the child; it therefore potentially reveals the language competences, if not the
preferences, of the parents at the same time. In fact, a comparison of the top 10 languages in
the 2011 annual schools survey for England and the responses to the ‘Main Language’
question in the 2011 census for England shows a high degree of overlap. Eight languages
(Arabic, Bengali, French, Gujarati, Panjabi, Polish, Portuguese, Urdu) appear in both. Of
those that do not, Somali and Tamil appear in the Schools survey and Chinese and Spanish in
the Census top 10 only. Comparing the top 25 languages, 20 languages appear in both6 sets.
This kind of survey data, supplemented with demographic data from the decennial census and
data collected by other surveys, might provide better answers to some of the linguistic
questions than the data collected by the census. Certainly it seems likely that some user needs
may be met using this data (vonAhn et al. n.d.). The disadvantage of this is that correlations
cannot readily be made with other data collected by the national census through multivariate
analysis; for example, the comparisons of proficiency in English with economic activity
published by the ONS (see note 4).
It is of course easy with hindsight to criticise the decisions made by the designers of census
questionnaires. It is not the intention of this paper to direct destructive criticism at the census
authorities or the user groups who request particular questions to be added to the census
questionnaire. The act of census-taking, as shown by Urla (1993), Kertzer and Arel (2002)
and many others, is always politically and ideologically charged. The census language
questions in England were formulated in the context of a prevailing linguistic ideology which
assumes monolingualism as a norm (Blackledge 2000), and favours language shift towards
English as a way of encouraging the integration – or assimilation - of non-indigenous
minorities. Such monolingual, assimilating ideologies are not unique to Britain – arguably
they are shared by several other Anglophone countries – and likewise, ideologies which
validate only the national language can be found in many places. Issues similar to those
raised here will be pertinent to national censuses in those countries too.
While the 2011 UK census questions arguably have not produced the answers potential users
needed, they have been useful in serving some political agendas. For example, the census
statistics were used as an argument to validate a policy of withdrawing translation services to
encourage minorities to learn English (see Sebba to appear). They were also used to support
the announcement made by the Prime Minister in January 2016 that ‘22% of Muslim women
have poor or no English’ and are to be offered English lessons paid for by the government7.
An examination of the statistics, which were published by the ONS a few days earlier, shows
that the figure of 22% is on the face of it correct (the number of women who spoke English
‘not well’ or ‘not at all’ as a proportion of all those who reported themselves to be Muslim).
However, the statistics could also be used to tell a different story. Of Muslim females over
age 3, 51% reported English as their main language. In the group aged 16 to 24, only 6% did
not speak English well or at all; and aged 3 to 15, just 3%. Lack of knowledge of English is
heavily concentrated in the older age groups.
Thus the census language data, with its emphasis on ‘main language’ and proficiency in
English, can now be correlated with information about ethnic group and religion to help drive
a political agenda of assimilation and integration. An alternative approach - one which
welcomes multilingualism, and views multiple languages as valued individual and collective
resources - is possible, but not well served by the information collected by the census.
Until 2011, the UK was ‘conspicuous’ (Aspinall 2005) in its lack of language questions in the
decennial census covering four-fifths of its population. The inclusion of questions in the 2011
census means that language data for England is now available. However, as this paper has
shown, it is not clear that the questions were asked in the best possible way, nor that the
responses have been used as constructively as possible. Furthermore, hardly any attention has
been focussed on the census methodology – the results have simply been accepted as ‘fact’.
More reflection on the linguistic and social complexities of asking and answering questions
about language is, it seems, still necessary.
Aspinall, Peter J. 2005. Why the next census needs to ask about language: Delivery of
culturally competent health care and other services depends on such data. BMJ 331:363–4
Blackledge, Adrian 2000. Monolingual ideologies in multilingual states:
Language, hegemony and social justice in Western liberal democracies. Estudios
de sociolingüística 1(2), 25 -45.
Carr-Hill, Roy, Steve Passingham, Alison Wolf and Naomi Kent. 1996. Lost
opportunities: the language skills of linguistic minorities in England and Wales.
London: Basic Skills Agency.
Department for Education 2014. School census spring and summer 2014 guide for primary
schools. London, Department for Education.
DeVries, John.1985. Some methodological aspects of self-report questions on
language and ethnicity. Journal of Multilingual and Multicultural Development
Edele, Aileen, Julian Seuring, Cornelia Kristen and Petra Stanat 2015, Why bother with
testing? The validity of immigrants’ self-assessed language proficiency. Social Science
Research 52: 99–123
Finnie, R., Meng, R., 2005. Literacy and labour market income: self-assessment versus test
score measures. Appl. Econ. 37, 1935–1951.
Freeland, Jane 2003. Intercultural-bilingual Education for an Interethnic-plurilingual
Society? The Case of Nicaragua. Comparative education 39 (2): 239-260.
Gardner- Chloros, Penelope 1997. ‘Vernacular Literacy in New Minority Settings in Europe.’
In Vernacular Literacy: A Re-Evaluation, by members of the International Group for the
Study of Language Standardization and the Vernacularization of Literacy, edited by
Andrée Tabouret-Keller, R. B. Le Page, Penelope Gardner-Chloros, and Gabrielle Varro, pp.
188-221. Oxford : Clarendon Press.
House of Commons 2014. ‘Too soon to scrap the Census’. Public Administration Select
Committee (PASC) Fifteenth Report of Session 2013–14. London: House of Commons.
Kertzer, David I. and Dominique Arel (2002). Census, identity formation and
the struggle for political power. In Kertzer, David I. and Dominique Arel (eds.),
Census and identity: the politics of race, ethnicity, and language in national
censuses, pp 1-42. Cambridge: Cambridge University Press.
Kominski, R. (1989) How Good is ‘How Well’? An Examination of the Census English-
Speaking Ability Question. Washington, D.C., US Bureau of the Census. Available at
Laversuch, I.M. (2007) The Politics of Naming Race and Ethnicity: Language
Planning and Policies Regulating the Selection of Racial Ethnonyms Used by the US Census
1990–2010, Current Issues in Language Planning, 8:3, 365-382, DOI: 10.2167/cilp128.0
Le Page, R.B. and Tabouret-Keller, A. (1989) Acts of Identity: Creole-Based
Approaches to Language and Ethnicity. Cambridge University Press, Cambridge.
Macafee, Caroline 2015. Scots in the Census: validity and reliability. Unpublished paper,
Forum for Research on the Languages of Scotland and Ulster, Triennial conference.
Leeman, Jennifer 2004. Racializing language: A history of linguistic ideologies
in the US Census. Journal of Language and Politics 3:3, 507–534.
McArthur, Edith K. 1993. Language Characteristics and Schooling in the United
States, A Changing Picture: 1979 and 1989. National Center for Education Statistics,
Washington, DC. Retrieved from http://files.eric.ed.gov/fulltext/ED365167.pdf
Office for National Statistics (ONS) 2006. The 2011 Census: Assessment of
initial user requirements on content for England and Wales - Ethnicity, identity,
language and religion.
ONS 2009: ‘Final recommended questions for the 2011 Census in England and Wales:
Language. October 2009’
Ozolins, Uldis 1993. The politics of language in Australia. Cambridge University Press.
Pattanayak, D.P. (1981) Multilingualism and Mother Tongue Education. Delhi, Oxford
Reynolds, M. and Verma, M. 2007. Indic Languages. In: Britain, D. ed.
Language in the British Isles, 293 - 307. Cambridge: Cambridge University Press.
Sabourin, Patrick and Bélanger, Alain. 2013. Microsimulation of language characteristics
and language choice in multilingual regions with high immigration. Working paper 15.3 of
the Joint Eurostat/UNECE Work Session on Demographic Projections. United Nations
Statistical Commission and Economic Commission For Europe, and Statistical Office Of The
European Union (EUROSTAT).
Sebba, Mark (to appear). ‘English a foreign tongue’: The 2011 Census in England and the
misunderstanding of multilingualism. To appear in Journal of Language and Politics 16:2
United Nations. 2006. United Nations Economic Commission for Europe: Conference of
European Statisticians recommendations for the 2010 censuses of population and housing.
Prepared in cooperation with the Statistical Office of the European Communities (Eurostat).
New York and Geneva, United Nations.
Urla, Jacqueline 1993. Cultural Politics in an Age of Statistics: Numbers, Nations, and the
Making of Basque Identity. American Ethnologist 20:4, 818-843.
VonAhn, Michelle, Ruth Lupton, Richard Wiggins, John Eversley, Antony
Sanderson and Les Mayhew (n.d.). Using school census language data to
understand language distribution and links to ethnicity, socio-economic status
and educational attainment: A guide for local authority users. Institute of
Education, University of London.
Zentella, Ana Celia, Bonnie Urciuoli and Laura R Graham 2007. Problematic
Language Assessment in the US Census. Anthropology News, September 2007,
1 This was a voluntary question (the only one) asked in the UK census in 2001 and 2011.
2 These questions were also asked in Wales along with questions on knowledge of Welsh. However,
this paper will not deal with the Welsh census.
3 It is unclear why the ONS did not give more consideration to the alternative possibility, that the
interviewers were overestimating English proficiency.
4 Zentella et al. (2007: 11) strongly criticise the notion of ‘linguistic isolation’, which they say
reflects ‘misconceived notions about language, […] implying that speakers of languages other than
English are unable and/or unwilling to participate in our society’.
5 For example, ‘People who could not speak English well or at all had a lower rate of employment
(Part of 2011 Census Analysis, English Language Proficiency in the Labour Market Release),
released by the ONS on 29 January 2014 and retrieved from
labour-market/sty-english-language-proficiency.html . See also Census Table LC2603EW,
‘Proficiency in English by economic activity’ and similar tables.
6 Albanian, Malayalam, Pashto, Shona and Yoruba appear only in the Schools survey and Farsi,
Greek, Romanian and Slovak in only the Census top 25.
7 BBC news, 18.01.2016.