Content uploaded by Radhika Gorur
Author content
All content in this area was uploaded by Radhika Gorur on Jun 25, 2016
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=cdis20
Download by: [Victoria University] Date: 24 January 2016, At: 19:24
Discourse: Studies in the Cultural Politics of Education
ISSN: 0159-6306 (Print) 1469-3739 (Online) Journal homepage: http://www.tandfonline.com/loi/cdis20
Leaning too far? PISA, policy and Australia's ‘top
five’ ambitions
Radhika Gorur & Margaret Wu
To cite this article: Radhika Gorur & Margaret Wu (2015) Leaning too far? PISA, policy and
Australia's ‘top five’ ambitions, Discourse: Studies in the Cultural Politics of Education, 36:5,
647-664, DOI: 10.1080/01596306.2014.930020
To link to this article: http://dx.doi.org/10.1080/01596306.2014.930020
Published online: 30 Jun 2014.
Submit your article to this journal
Article views: 1052
View related articles
View Crossmark data
Citing articles: 4 View citing articles
Leaning too far? PISA, policy and Australia’s‘top five’ambitions
Radhika Gorur*and Margaret Wu
The Victoria Institute, Victoria University, Melbourne, VIC, Australia
Australia has declared its ambition to be within the ‘top five’in the Programme for
International Student Assessment (PISA) by 2025. So serious is it about this ambition,
that the Australian Government has incorporated it into the Australian Education Act,
2013. Given this focus on PISA results and rankings, we go beyond average scores to
take a close look at Australia’s performance in PISA, examining rankings by different
geographical units, by item content and by test completion. Based on this analysis and
using data from interviews with measurement and policy experts, we show how
uninformative and even misleading the ‘average performance scores’, on which the
rankings are based, can be. We explore how a more nuanced understanding would
point to quite different policy actions. After considering the PISA data and Australia’s
‘top five’ambition closely, we argue that neither the rankings nor such ambitions
should be given much credence.
Keywords: PISA; education policy; Australian education reforms; objectivity
What’s the good of [the rankings]? What is the benefit to the US to be told that it is number
seven or number 10? It’s useless, meaningless, except for a media beat up and political
huffing and puffing. It’s very important for the US to know, having defined certain goals like
improving participation rates for impoverished students from suburbs in large cities –
whether in fact that is happening, and if it is, why it is happening and if not, why not. And it
is irrelevant whether Chile or Russia or France is doing better or worse –that doesn’t help
one bit –in fact it probably hinders. Makes people feel uncertain, unsure, nervous, and they
rush over there and find out why they are doing better. (Malcolm Skilbeck, former Deputy
Director of Education, OECD; interview transcript)
In August 2012, Julia Gillard, then Prime Minister of Australia, declared that Australia would
strive to be ranked in the ‘top five’in international education assessments by 2025. This
generated a great deal of media attention. Soon after this declaration, the results of two
international assessments –Trends in Mathematics and Science Study (TIMSS) and Progress
in International Reading Literacy Study (PIRLS), both conducted by the International
Association for the Evaluation of Educational Achievement (IEA), were released –and
Australia had ranked rather low in both these assessments: 18th and 25th in TIMSS
mathematics and science, respectively, and27thonPIRLS,forfourth-gradestudents
(Thomson et al., 2012). The TIMSS and PIRLS rankings heightened the anxiety that
Australian politicians had already been expressing with regard to the ‘slide’in the country’s
performance in another international assessment, the Programme for International Student
Assessment (PISA), conducted by the Organisation for Economic Co-operation and
Development (OECD), and it reinforced the government’s determination to get into the
*Corresponding author. Email: Radhika.Gorur@vu.edu.au
Discourse: Studies in the Cultural Politics of Education, 2015
Vol. 36, No. 5, 647–664, http://dx.doi.org/10.1080/01596306.2014.930020
© 2014 Taylor & Francis
Downloaded by [Victoria University] at 19:24 24 January 2016
‘top five’in international rankings. So strong is this ambition that it has been inscribed into
the Australian Education Act of 2013 as its very first objective, which reads: ‘Australia to be
placed, by 2025, in the top 5 highest performing countries based on the performance of
school students in reading, mathematics and science’(Australian Education Act, 2013)as
measured in PISA. This objective of being placed in the ‘top five’has led to an intensification
of Australia’s desire to learn from the systems that are currently in the ‘top five’in PISA, so
that Australia may displace one of them on the PISA league table.
PISA rankings are based on the average performance scores of students in tests of
reading, mathematical and scientific literacy. ‘Average performance score’, however, is
only one of many possible measures on which education systems can be ranked on the
basis of PISA data. Just as the ranking of countries depends on what is tested, who is
tested and which tests are used, so too does it depend on what kinds of analyses are
performed. More relevantly for this discussion, the average score rankings obscure a great
deal of variation and are not particularly useful for developing strategies to improve the
performance.
So, given Australia’s ambition to be in ‘the top five by 2025’, this paper looks beyond
‘average performance’and interrogates PISA data in some detail, examining rankings by
different geographical units, by item content and by test completion. Based on this
analysis, and supported by interviews with expert informants –policy-makers, OECD
officials and measurement experts –we argue that PISA data are quite complex and need
to be examined very closely and understood with great nuance: aggregations such as
average scores hide more than they reveal. Using examples, we make the case that, in part
because of the complexity of international educational assessments and comparisons, the
leap from ‘data’to ‘policy’is a treacherous one.
We begin with a brief overview of Australia’s engagement with PISA and sketch our
recent history that led up to the declaration of Australia’s‘top five in PISA’ambition.
Next, we present a brief survey of the critique of PISA and explain our methodological
and analytical approach. This is followed by a detailed analysis of Australia’s
performance, where three aspects of performance are examined –unit of analysis, item
content and test completion –to demonstrate how average scores could be quite
misleading, particularly if used as the basis for policy decisions. Finally, we explain why
the leap from PISA average score data to policy is problematic.
Australia, PISA and the ‘top five’ambition
PISA was conceptualised in the late 1990s, and the first PISA survey was conducted in
2000. Australia has been actively involved with PISA from its very inception. An Australian
organisation, the Australian Council for Educational Research (ACER), led the consortium
that successfully bid for and later developed and managed PISA until the 2012 survey.
1
An
Australian professor, Barry McGaw, was at the helm, as Director of the Education at the
OECD, when PISA was introduced. Australian psychometricians, statisticians, analysts and
academics actively use PISA data to produce various reports and working papers for the
OECD, ACER and the federal and state governments in Australia.
Australia has been ranked ‘high quality’(i.e., having performance scores above the
average for OECD countries) consistently in each PISA survey so far. But as more
nations and systems joined the survey (43 in 2000, 58 in 2006 and 65 in 2012),
Australia’s rankings have ‘slipped’–as elaborated in Table 1 below.
2
Australia’s scores
have also declined between 2000 and 2009 (see Table 2). The OECD also examines the
648 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
equity of school systems, based on the correlation between the performance of students
and their socio-economic status (SES). Systems where this correlation is higher than the
average for OECD countries are labelled ‘low equity’. Australia was rated ‘low equity’in
PISA 2000, but it recovered from this position and has been consistently placed in the
‘high quality, high equity’quadrant in all the subsequent PISA surveys (for a detailed
account of how equity is measured in PISA, see Gorur, 2014; Rutkowski &
Rutkowski, 2013).
Australia’s performance in 2011 in another major international test, IEA’s TIMSS,
also had not improved compared to the 2007 results (Thomson et al., 2012). Coinciding
with the 2011 TIMSS, Australia participated for the first time in PIRLS, which tests
students’reading literacy in Grade 4 (Year 4 in Australia). Australia was placed 27 out of
45 systems in PIRLS with performance significantly lower than Ireland and Northern
Ireland, the USA, England, Canada, Hong Kong, Singapore and Chinese Taipei
(Thomson et al., 2012). Australia’s poor performance in the 2011 TIMSS and PIRLS
has reinforced the alarm over the state of the education system based on its PISA results.
Australia’s‘declining performance’has been taken up by Australian politicians and
policy-makers and reported widely in the media.
In response to this challenge of ‘declining scores’, there is, currently, a huge appetite
in Australia for borrowing from the policies and practices of four of the PISA ‘top five’–
the East Asian systems of Shanghai (China), Korea, Singapore and Hong Kong (China).
The recent focus in Australia on Asia with the Henry Report on the ‘Asian Century’
(Commonwealth of Australia, 2012) has accentuated this desire to learn from the ‘high-
performing’systems of East Asia.
Interest in learning from the Asian PISA elite is illustrated by the process used in
developing an influential report that was published in 2012 –the Grattan Institute’s
Catching Up: Learning from the Best School Systems in East Asia (Jensen, Hunter,
Table 1. Ranking of Australia in PISA 2000, 2003, 2006 and 2009 on the reading, mathematics
and science literacy scales.
Year Reading Mathematics Science
2000 4 5 7
2003 4 11 6
2006 7 13 8
2009 9 15 10
Table 2. Average scores of Australia in PISA 2000, 2003, 2006 and 2009 on the reading,
mathematics and science literacy scales.
Year Reading Mathematics Science
2000 528 (3.5) 533 (3.5) 528 (3.5)
2003 525 (2.1) 524 (2.1) 525 (2.1)
2006 513 (2.1) 520 (2.2) 527 (2.3)
2009 515 (2.3) 514 (2.5) 527 (2.5)
Note: Standard error in brackets.
Discourse: Studies in the Cultural Politics of Education 649
Downloaded by [Victoria University] at 19:24 24 January 2016
Sonnemann, & Burns, 2012). In the section titled ‘How we wrote the report and how to
read it’, the authors say:
In September, 2011, Grattan Institute brought together educators from Australia and four of
the world’s top five school systems: Hong Kong, Shanghai, Korea and Singapore. The
Learning from the Best Roundtable, attended by the Prime Minister, Julia Gillard, and the
Federal Minister for School Education, Early Childhood and Youth, Peter Garrett, sought to
analyse the success of these four systems, and what practical lessons it provided for Australia
and other countries.
Following the Roundtable, researchers from Grattan Institute visited the four education
systems studied in this report. They met educators, government officials, school principals,
teachers and researchers. They collected extensive documentation at central, District and
school levels. Grattan Institute has used this field research and the lessons taken from the
Roundtable to write this report. (2012,p.6)
While acknowledging that practices cannot simply be plucked from one context and
uncritically adopted in another, Jensen et al. go on, nevertheless, to explain what it is that
these nations are doing that places them at the top of the PISA tables, promising that their
report shows how these practices can be adopted to improve Australia’s performance.
There are several issues with both the substance and the process in this approach of
learning from the ‘best practices’. Rushing off to observe the practices of high-performing
systems and then concluding that these practices are the reason for their success can lead
to erroneous conclusions; the same practices could well be prevalent in low-performing
systems as well –that information is not available, since only high-performing nations are
observed. Moreover, there is no way of knowing how much better the scores of these
nations might have been, had they been using other practices. So there is a basic flaw in
the premise upon which this kind of ‘learning from the best’research rests. Further,
setting up such a forum where ministers from ‘successful’systems explain how they
achieved their excellent results renders the subsequent field work practically redundant –
the ‘lessons’are already presented to policy-makers before the field work has
commenced.
This practice of ‘learning from the best’is, however, neither new nor specific to
Australia. In 2007, McKinsey had done similar work in their report How the World’s Best-
performing Schools come out on Top (Barber & Mourshed, 2007), based on the studies of
PISA high performers, and America’s Common Core published the report Why We’re
Behind: What Top Nations Teach Their Students But We Don’t(Common Core, 2009)
which examined the policies and practices of countries that performed better than the
USA in order to argue for particular approaches to effect improvement.
Critiquing PISA and its use in policy
The widespread influence of OECD and PISA on education policies has not gone
unnoticed by critics, and there is a large body of literature on the subject. Some of the
critique has contextualised PISA within broader discussions about globalisation and the
spread of neoliberalism, new public management and marketization (for example, Grek,
2007,2009; Rizvi & Lingard, 2010; Stronach, 2010). In these types of critique, PISA is
often a ‘policy object’that is a symptom, an example, a consequence or one of the causes
of the coalition of practices which make up the neoliberal imaginary. PISA is also well
covered in the press in many countries, and the effects of media attention on the
650 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
discourses and public perceptions of PISA have also been discussed (for example, Whitty,
2009). Studies of the effects of PISA on the education policies and reforms in particular
nations also abound (for example, Breakspear, 2012; Simola, 2005). The focus in these
critiques is often not an engagement with the actual PISA data; more often, it is PISA’s
uptake in policy and the media, its use in justifying policies and its influence on political
narratives and policy practices.
Assessment and measurement experts, on the other hand, focus their critiques on
‘technical’aspects of PISA –examining the fitness and effects of particular methodo-
logical choices and the validity and reliability of the modeling and the calculations.
Bracey (2008), for example, has argued that PISA’s use of One Dimensional Item
Response Theory limits its analytical possibilities. Bautier and Rayou (2007) demon-
strated that the reasons for students’correct or wrong answers cannot be predicted by an
a priori analysis of the items, thus calling PISA’s reliability into question. The modeling
that underpins survey items related to ‘interest in science’has been examined by Ainley
and Ainley (2011) who argue that constructs such as ‘interest’are premised upon Western
understandings, and conclusions drawn from the responses of students from other
cultures, whose responses are determined by different sets of social and cultural histories,
could be distorted. In such critique, PISA is seen as a technical exercise and the effort is
to assess the extent to which PISA produces accurate representations of realities. The goal
is to encourage PISA to become more precise and accurate in representing an already
existing world. Focusing on measurement as a technical exercise deflects attention from
the performativity and politics of such calculations (Barad, 2003; Desrosières, 1998;
Porter, 1995,2003; Stengers, 2011).
In this paper, we attempt to promote critique that does not make a cut between the
‘political’on the one hand and the ‘technical’on the other. The result of a collaboration
between a statistician and a sociologist of measurement with a common interest in PISA,
our analysis attempts to pick its way carefully, treating PISA neither as purely political,
nor as purely technical, but as a hybrid: a socio-technical object. Our symmetrical
analytical approach and its political impulse are based in the theoretical resources of
actor-network theory (ANT). In particular, we use the approach of a ‘sociology of
measurement’(Derksen, 2000; Gorur, 2014)–a term abbreviated from Woolgar’s(1991)
use of the term ‘sociology of measurement technologies’, to draw attention, as Woolgar
did, to the social and instrumental nature of measurement, as well as its productive
capacity. Here instrumentality refers both to the influence of the instruments and
methodologies used in measurement, and to the way in which things are ‘made to work’,
through cajoling, persuading, coercing, compromising and so on –what in ANT is called
the translation of interests. The instrumentality of apparently ‘objective’statistical
practices can be understood by observing the everyday practices of statisticians (see
Gorur, 2011, for an example). ‘Productive capacity’invokes the idea that measurement is
not merely representative or descriptive but also productive of realities (Latour, 2005). In
other words, unlike ‘technical’critique, which takes measurement to be representational
or descriptive, our approach sees measurement as ‘world-making’. Whilst not ‘critical’or
‘political’in the traditional sense, our critique nevertheless seeks to influence policy. The
aim is to persuade by empirical analysis rather than through normative or theoretical
assertion.
Our argument incorporates statistical analysis using the PISA database, which is
available online,
3
as well as methodologies more usually associated with policy
Discourse: Studies in the Cultural Politics of Education 651
Downloaded by [Victoria University] at 19:24 24 January 2016
ethnographies, such as interviews. These interviews occurred over a period of several
years, starting in 2009, and were conducted across several related projects including the
ongoing strand of work of the first author. The interviewees were experts who were often
uniquely placed, by virtue of their specialised expertise and their official positions, to
provide insights about the phenomena being studied that were not available to others. As
such, these interviews were more in the nature of conversational and collegial
opportunities to explore the phenomena under discussion (large-scale comparisons,
PISA, contemporary issues in education policy and so on), than ‘data’to be analysed by
parsing out themes or for performing discourse analysis. In keeping with the ANT
tradition, there was no effort to overlay the interview data with particular social theories
or to seek to understand what lay ‘behind’the words of the informants; the purpose of the
interviews was to let the actors narrate their own theories and to get them to explain how
they made sense of their worlds (Latour, 2005). The interviewees’expertise often resulted
in explanations that were elegant and economical; so where appropriate, their explana-
tions have been presented verbatim in this paper. In some cases, the interview data
provoked our analysis. In others, they helped us interpret the analysis and to link it to the
way it played out in policy.
Beyond ‘average performance’
If a nation is looking to introduce reforms to raise its PISA performance, there is good
reason to look beyond the rankings and the average scores, as these provide no guidance
for policy reform. ‘Average performance’rankings provide little of practical benefit by
way of ‘lessons to learn’. They simply point out that there is room for improvement, but
they can provide no pointers about where to focus policy efforts. One OECD official
explained why ‘top five’is quite a complex idea, and why one needs to ask complex
questions to inform policy:
Top five in what? …[F]or which students? The average student in Canada, in Korea,
Finland, Shanghai, China –that’s one thing. If you then look at high-performing students or
how low performing students do, then we may get a completely different picture. And that’s
where policy efforts are most interesting for me.
To raise its average performance, Australia would need to develop a nuanced strategy,
targeting particular aspects of the system or particular groups, and focusing resources and
attention on areas of reform that would have the most gratifying and immediate effects on
its performance scores, particularly since resources are never unlimited. For example, it
could aim at raising the performance of the lowest-performing students, or it could focus
on the top 10% of achievers, or direct greater attention towards specific groups such as
Indigenous students or refugee migrants. Alternatively, it could focus attention on
particular content areas in which its students traditionally perform poorly and try to
improve instruction in those areas. So in this section, we examine PISA data more closely
to see what we can learn. We explore how these understandings might inform policy and
explore the complexities of the data as well as the difficulties of making inferences on
their basis.
652 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
Unit of analysis
Initially, PISA was designed for the purpose of measuring the educational performance of
the OECD nations. However, with each subsequent round of PISA, more and more
education systems have shown interest in participating in PISA. In some cases, these
education systems have been associated with cities or provinces rather than whole
countries. China, for example, does not participate as a whole, but Hong Kong and
Shanghai have begun to participate. So PISA’s league tables report on jurisdictions of
various sizes, and comparisons include countries like Australia and provinces like
Shanghai.
There are some issues with this kind of comparison. First, demographic characteristics
are often associated with geography, which is also typically tied to SES. The populations
of cities, in general, have some distinct characteristics –urban populations generally fare
better than rural ones in terms of educational performance. In some cases, certain
provinces or cities might have a preponderance of a certain ethnic group or a particular
SES group –and this also produces differences in performance. Importantly, governance
structures and the challenges of governing would vary greatly between, for example, a
country like Australia and a city like Shanghai. So the unit of analysis is an important
consideration in making comparisons. But in PISA rankings, differences in size,
demography and governance structures are ignored –all ‘systems’are ranked as if they
were the same, whether they are cities, provinces, small countries or very large nations.
Where entire countries participate, PISA presents country-level average performance
on its league tables (although it may provide state- or province-level data to the countries
where the sample size is adequate to produce such data). Where only specific provinces
participate, such as Shanghai-China or Hong Kong-China, the province data appear in the
table. But the performance of a country is not usually uniform throughout the country –
the country average often masks wide variations between one state or province and
another within the country. Table 3 shows the variation in Australia’s reading
performance in PISA 2009, by jurisdiction.
As the table illustrates, while Australian Capital Territory (ACT) and Western
Australia (WA) have averages very close to that of some of the ‘top five’systems,
Tasmania (TAS) and Northern Territory (NT) score below the OECD average. This
variation could be attributed to demographic differences between Australian jurisdictions.
For example, ACT has proportionally more public servants than other states. In contrast,
Table 3. Australian jurisdiction mean scores in PISA 2009 reading.
State Mean score 95% confidence interval
ACT 531 520–543
WA 522 510–534
Queensland 519 505–532
New South Wales 516 505–527
Victoria 513 504–523
South Australia 506 497–516
TAS 483 472–495
NT 481 469–492
Australia 515 510–519
Discourse: Studies in the Cultural Politics of Education 653
Downloaded by [Victoria University] at 19:24 24 January 2016
NT has proportionally more remote schools than other states. There are also differences in
the SES characteristics between states. The PISA results are likely reflecting the
demographic differences between states than differences between education systems.
If we compare Australian jurisdictions with jurisdictions such as Shanghai or small
countries such as Singapore, we get some interesting results (see Table 4).
Comparing Australian jurisdiction results with those of the East Asian PISA elite,
ACT is ranked fifth internationally by mean score. So a part of Australia is already in the
‘PISA top five’! Indeed, ACT’s performance is not statistically significantly different
from second-ranking Korea. WA is ranked eighth internationally by mean score, and it is
also not significantly different to Korea’s performance. On the other hand, TAS and NT
both have mean scores significantly below the OECD mean, with rankings close to those
of Greece and Spain. With such a diverse range of mean scores between Australian
jurisdictions, a focus on Australia’sinternational ranking is not a very useful way to
assess Australian education systems.
What does this insight mean for policy reform? With ACT and WA already in the
PISA ‘top five’, perhaps Australia could use these states as role models to improve its
performance, rather than look to distant and culturally radically different systems such as
Shanghai or Korea. Given that education is deeply culturally embedded, practices and
policies might not ‘travel’that well across cultures. So it would make eminent sense to
find within-country role models.
This ‘context’argument –i.e., the argument that because education is deeply
culturally embedded, what works in one context may not work the same way in another
context, so caution is to be exercised in such borrowing –has been argued robustly in
education. Alexander (2012) has argued with great clarity, that the problem is not with the
desire to learn from others, but with the import from distant shores of miracle cures
advocated by school improvement experts. He endorses Sadler’s(1990) idea that ‘[t]he
practical value of studying in a right spirit and with scholarly accuracy the working of
foreign systems of education is that it will result in our being better fitted to study and
understand our own’(our emphasis). In other words, observations of other nations’
practices should be used as provocations and reference points to reflect on our own
Table 4. PISA 2009 reading literacy.
Rank Jurisdiction Mean score 95% confidence intervals
1. Shanghai-China 556 551–561
2. Korea 539 532–546
3. Finland 536 531–540
4. Hong Kong-China 533 529–537
5. ACT 531 520–543
6. Singapore 526 524–528
7. Canada 524 521–527
8. WA 522 510–534
9. New Zealand 521 516–525
10. Japan 520 513–527
11. Australia 515 510–519
OECD average 493 492–494
Scores for Australia, ACT and WA are given in bold.
654 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
practices. Similarly, Jasanoff (2005, p. 15) sees ‘melioration through imitation’as a
practical ambition that is not to be denigrated, but advocates going beyond seeking
prescriptions of ‘decontextualized best practices for an imagined global administrative
elite’, towards comparisons as opportunities to investigate the complex interplay of
science and politics and their implications for governance at particular locations. This
idea of the cultural specificity of education was reiterated by an OECD official:
You should compare yourself with countries which have the same way of living as you. If
you are in Korea, your parents will spend all their money for you to study and in France they
will keep their money to live …we have some of the information but we need to interpret
correctly and to always give the context of what is the situation in each of the countries.
Another interviewee described this issue as ‘the problem of the unmeasured’, arguing that
looking at the differences between jurisdictions within countries in PISA was more useful,
because people within a country would have a good idea about the particular interplay of
factors that could be influencing student performance. But in many countries, such as the
USA, they do not have the sample size to make a state-by-state comparison of PISA
results. In the most recent Trends in International Mathematics and Science Study
(TIMSS) round, eight US states participated in enough numbers as to get state-level data,
allowing for a limited amount of within-country comparison. As a result, one senior US
Government official explained:
We’ve been encouraging people to look at state comparisons as opposed to other countries …
[I]f I were a state that was not doing well that was looking for [policy lessons] I think
I would look at Massachusetts instead of Singapore because I don’t know what else is
happening in Singapore. Korea is a great example of that I think. Here, pretty frequently we
get questions about time spent on learning, and Korea is often thrown up as ‘they don’t spend
that much time’…but you know what is really happening? They are spending TWICE as
much time [in the after-school private coaching classes], and we don’t have that measured
very well.
Australia ‘over samples’in PISA –in other words, more Australian students participate in
the PISA survey than the sample size stipulated by PISA for national-level results. As a
result, Australia’s PISA results can be differentiated in much greater detail than is
possible in many other countries. As one PISA expert explained:
In PISA 2000, Australia’s sampling was sufficient (roughly 6,500 students) to provide
reliable state estimates. From 2003 onwards, Australia’s sampling was even more extensive
(roughly 14,000 students), so there was sufficient data to support Australia’s major
longitudinal study, the Longitudinal Study of Australian Youth (LSAY). In 2012, Australia
increased the sample size even further (to roughly 18,000 students), and also changed the
design of the sampling, so that it included more schools with fewer students at each school.
Instead of the previous fifty students per school, it sampled 25–30 per school and nearly
doubled the number of participating schools. This allows Australia to get quite detailed data
on sub-groups within states.
These extensive sample sizes provide Australia with a great deal of data, and make it
possible to analyse and understand the patterns of performance, identify issues and areas
upon which to focus and point towards examples of excellent practices within the country
Discourse: Studies in the Cultural Politics of Education 655
Downloaded by [Victoria University] at 19:24 24 January 2016
itself. Comparison with other nations, therefore, is probably the least useful way for
Australia to use the PISA data.
Rankings by item content
Unlike TIMSS, PISA tests are not based on the curricula of the participant nations.
Instead, they are based on an ideal that expert committees deem students should know
and be able to do by the age of 15 in order to succeed in the world beyond school.
Experts in each domain devise the tests, with input from all the member nations, and
there are extensive field trials of the test items.
However, despite the participation of the foremost experts in developing test
questions, PISA questions are constrained by a number of limitations. It is not possible
to assess, in a standardised way, everything that is valued in terms of being ‘well
prepared’for the world. As one senior PISA official put it:
Reading, science and maths are there [in the PISA test] largely because we can do it. We can
build a common set of things that are valued across the countries and we have the technology
for assessing them. There are other things like problem-solving or civics and citizenship –
that kind of thing where there would just be so much more difficulty in developing
agreement about what should be assessed. And then there are other things like team work
and things like that. I just don’t know how you’d assess them in any kind of standardised
way. …So you are reduced to things that can be assessed. They’ve tried writing –but the
cross-cultural language effect seems too big to be comparable.
Even within each domain, there are constraints with regard to what can be included. Each
student takes a total of about two hours of the test –one hour for the ‘major domain’and
a half hour each for the minor domains (the three literacies rotate to take turns at being
the major domain). Testing students’‘preparedness for life’in a certain domain of
knowledge within such constraints of time is very challenging, as one member of the
Science Functional Expert Group described:
Science was a minor domain in the first two tests –and here is an international team with 7
or 8 people and we are told that the maximum testing time you’ve got is 30 minutes. To test
preparedness for life! So [a member of the Committee] said, this is ridiculous –we can’t
possibly do everything, so why don’t we decide on one thing –we will test one aspect of
scientific literacy, and we argued about what that one thing would be for quite a long time
but in the end we decided. …we would try to construct a test about how well 15-year-olds
could critically appraise a media report involving science and technology.
This constraint on time is further exacerbated by the ‘application’focus of PISA. Because
the questions are not curriculum based, students have to be presented with a situation and
asked questions based on the situation presented. This means that within the half hour,
time has to be made available for reading the information on which to base responses,
further reducing the number of questions which can be asked within the half hour. When
the number of test questions is small, the response to each question has a significant
impact on the overall score; in other words, every right or wrong answer can significantly –
and perhaps disproportionately –affect the average, thus making the assessment less
reliable. To overcome this problem of too few questions, PISA creates a larger set of
questions and distributes them across several students; in other words, a ‘test’is answered
by several students, each of whom is administered a different set of questions.
656 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
These practical constraints and methodological choices also have bearing on how we
might understand the results. When a subject is in the minor domain, and therefore has
fewer test items, each particular item that features on the test will have a more significant
impact on a score. For example, if questions of probability feature in the ‘minor domain’
mathematics test, and students have not been exposed to much study of probability at
school, the average mathematics score would look poorer than if those questions had
been left out, or if they had been part of the test when mathematical literacy was the
major domain.
Australia was placed 15th in PISA 2009 in mathematical literacy. But student scores
in a domain could differ quite widely between items. The PISA database also publishes
results by item. Looking beyond Australia’s average score on mathematical literacy and
examining PISA results by item, we find that Australian students do exceptionally well in
answering certain questions, and exceptionally poorly in others. Table 5 shows
Australia’s performance on two PISA 2009 mathematics items.
As we can see, Australia performed extremely well on Items M408Q01TR and
M420Q01TR, ranking third and second, respectively, internationally. For Item
M408Q01TR, Shanghai-China ranked 20th, despite the fact that Shanghai took the top
spot internationally on mathematics literacy, with a mean score much higher than the
second-place country, Singapore. For Item M420Q01TR, Australia outperformed all top-
ranking countries.
Table 5. Percentage correct on two PISA 2009 mathematics items for a subset of countries.
Country Item M408Q01TR Country Item M420Q01TR
Hong Kong-China 0.60 New Zealand 0.66
Finland 0.56 Australia 0.64
Australia 0.56 Canada 0.64
Chinese Taipei 0.55 Ireland 0.62
UK 0.55 Shanghai-China 0.62
New Zealand 0.55 UK 0.60
Macao-China 0.53 USA 0.59
Iceland 0.52 Chinese Taipei 0.58
Ireland 0.51 Singapore 0.57
Singapore 0.50 Denmark 0.57
Canada 0.49 Netherlands 0.57
Spain 0.49 Norway 0.57
Germany 0.49 Czech Republic 0.56
Sweden 0.46 Finland 0.55
France 0.46 Belgium 0.55
Switzerland 0.45 Liechtenstein 0.55
Liechtenstein 0.45 Poland 0.54
Belgium 0.44 Hong Kong-China 0.54
Portugal 0.44 Germany 0.53
Shanghai-China 0.43 Hungary 0.52
More countries …More countries …
Score for Australia is given in bold.
Discourse: Studies in the Cultural Politics of Education 657
Downloaded by [Victoria University] at 19:24 24 January 2016
In contrast, on Item M462Q01DR, Australia ranked 43rd internationally, with an
average score of only 0.1 out of a maximum of two, while Shanghai had an average score
of 1.5 out of a maximum of two.
How are we to understand this variation in Australia’s performance across the
different questions in the mathematics literacy survey? Given that Australian students
answer some questions exceedingly well, it is difficult to make the case, based on these
scores, that there is a crisis in mathematics literacy among Australian students, and that
wide-ranging reforms are required. One explanation could be that the Australian
curriculum does not cover, by the age of 15, some of the questions that are in the
PISA survey. Unfortunately, PISA does not publish all the actual items –only a small set
of items is released –so it is not possible to identify the particular skills or knowledge
that Australia needs to focus on in order to improve its scores. But perhaps at least
Australia could feel a bit more secure about its performance, knowing that it definitely is
in the ‘top five’in at least some aspects of mathematical literacy. If Australia were keen
to raise PISA scores, further sample-based tests could be done to identify topics in which
students do well or badly, using PISA-like questions, and then curriculum and pedagogic
reforms could target those areas.
Another issue is relevant for discussion here. Mathematics was the major domain in
2003, when Australia ranked 11th. Australian media and policy-makers talk about the
‘slide’in Australia’s mathematics performance comparing the results of 2003 with those
of 2000, 2006 and 2009, when mathematical literacy was a minor domain. This produces
a skewed comparison, because, as we noted earlier, the test content in the minor-domain
survey has a greater impact on the average scores. It would be more useful to compare, if
one must, between 2003 and 2012; i.e., across the two surveys when mathematical
literacy was the major domain.
But even this does not really solve the problem. Over a period of nine years, there are
many changes in the cohorts of students; for example, the proportion of immigrant
students might have increased. There are also many changes brought about by a range of
education reforms. In Australia, between 2003 and 2012, we have witnessed the
increasing centralisation through a national curriculum, the introduction of the National
Assessment Program –Literacy and Numeracy (NAPLAN), the setting up of the My
School website, many attempts to introduce performance-based pay for teachers –indeed,
we have had a whole ‘Education Revolution’(Gorur, 2013; Gorur & Koyama, 2013). The
education system has not remained stable, so such comparisons are problematic.
Introducing large-scale reforms on the strength of such apparent trends could result in
diverting resources from crucial areas and placing unnecessary, and perhaps impossible,
demands on teachers and schools.
Test completion
A well-known phenomenon in statistical data collection and measurement is that interest
and accuracy in responding to questionnaires and tests do not follow an even pattern
throughout the duration of the exercise. Generally, questions at the beginning of a
questionnaire tend to be answered with greater interest and accuracy than in the latter
parts, particularly in longer tests. Arguably, how well students maintain their motivation
to respond to the best of their ability depends on how important they deem their
performance on the test to be. Since there are no ‘stakes’attached to performance in PISA
for individual students, this motivation must come from elsewhere, and is often
658 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
influenced by cultural factors. Some cultural differences have been identified in the ways
students from different countries approach tests, and in how seriously they take such
tasks. Using the notion of ‘perceived task value’, Sjøberg (2007) argues that while tests
are premised upon the idea that all students will intend to do their best, students in
different countries vary in their behaviour in this regard. He claims that ‘in many modern
societies, several students are unwilling to give their best performance if they find the
PISA items long, unreadable, unrealistic, and boring, in particular if bad test results have
no negative consequence for them’(p. 203–304). These variations in students’approach
to the tests may also be the result of how seriously these tests are taken by parents,
schools and society at large. Sjøberg describes the observations at one Taiwanese school
where students were taking the TIMSS test, which illustrates the importance accorded to
international tests and educational performance:
An observer from Times Educational observed the TIMSS testing at a school in Taiwan, and
he noticed that pupils and parents were gathered in the schoolyard before the big event, the
TIMSS testing. The director of the school gave an appeal in which he also urged the students
to perform their utmost for themselves and their country. Then they marched in while the
national hymn was played. Of course, they worked hard; they lived up to the expectations
from their parents, school and society. (Sjøberg, 2007, p. 221)
Sjøberg argues that interest in doing well and in persisting for two and a half hours
(2 hours for the test and 30 minutes for the student background survey) is therefore
uneven across cultures, and it could be another variable that explains differences in
performance.
One measure that can serve as a proxy for motivation to complete, or ‘willingness to
answer’(Torija, n.d.), is the number of unanswered items in each test booklet. PISA
provides a code of ‘8’for these un-attempted items (termed ‘not-reached’in PISA). In
Table 6, the average number of not-reached items is computed by country. The first
34 countries in order of the average number of not-reached items are shown in Table 6.
While it is possible that the number of not-reached items is related to students’
proficiency in the test domain, the relationship is not so clear. Examining Table 6, we find
that top-performing countries do not necessarily have fewer not-reached items. PISA is
not designed to be a speed test, so the variation of the number of not-reached items across
different countries could at least in part reflect motivation issues. Australia’s rank is 34 in
this table, somewhat inconsistent with Australia’s ranks in reading, mathematics and
science domains (9th, 15th and 10th, respectively). In Australia, 91% of students reached
the end of the test. In contrast, 95% of US students and 98% of Shanghai students
reached the end of the test. We could conclude that Australia’s average score is negatively
affected by the 9% of students who did not complete the test. Australia’s‘average score’
could be as much a reflection of this willingness to answer or motivation to complete as it
is of the students’‘literacy’.
If motivation can be raised and more Australian students prevailed upon to complete
the test, would Australia get a rank within the top five, negating the need to engage in the
kind of extensive reforms now being considered, including stringent accountability
measures and incentives like performance pay for teachers? This question is worth
contemplating.
Discourse: Studies in the Cultural Politics of Education 659
Downloaded by [Victoria University] at 19:24 24 January 2016
From data to policy –a treacherous leap
International comparisons, however sophisticated and rigorous, are beset with a number
of inevitable limitations. Comparability across a vast diversity of contexts, histories and
cultures can only be achieved through narrowing what is compared (Gorur, 2010; Scott,
1998). Moreover, the ‘success’of education systems, even when narrowly defined for the
purposes of comparison, is deeply affected by a wide variety of often interrelated
factors –and statistical methodologies are not very good at performing analyses that are
sensitive to the relationality of phenomena. Many factors that affect educational
performance cannot be included in such analyses, leading to what some of our
Table 6. Average number of not-reached items.
Country Average number of not-reached items
1. Shanghai-China 0.11
2. Korea 0.16
3. Netherlands 0.18
4. Hong Kong-China 0.39
5. Croatia 0.43
6. Hungary 0.45
7. Chinese Taipei 0.46
8. Slovenia 0.48
9. Finland 0.49
10. Tamil Nadu-India 0.54
11. Poland 0.55
12. Austria 0.56
13. USA 0.57
14. Czech Republic 0.60
15. Estonia 0.61
16. Slovak Republic 0.61
17. Lithuania 0.61
18. UK 0.63
19. Switzerland 0.66
20. Japan 0.68
21. Germany 0.69
22. Singapore 0.69
23. Romania 0.78
24. Liechtenstein 0.82
25. Canada 0.85
26. Latvia 0.87
27. Turkey 0.89
28. Belgium 0.90
29. New Zealand 1.01
30. Denmark 1.03
31. Ireland 1.04
32. Serbia 1.05
33. Norway 1.10
34. Australia 1.11
More countries …
Score for Australia is given in bold.
660 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
interviewees referred to as ‘the problem of the unmeasured’. This makes it difficult to
draw parallels or conclusions from these data. In any case, causation cannot be
established through numbers alone; it can only be attributed through expert interpretation.
In addition, surveys such as PISA are a snapshot of a point in time, and such cross-
sectional studies are limited in the information they can give and the conclusions they can
support.
Longitudinal analyses using such surveys are also challenged by a range of issues.
With PISA, each round of the survey focuses on a different ‘major domain’(reading,
mathematical or scientific literacy), and comparisons from one three-year cycle to the
next are not ‘like to like’. The tests are also not sensitive enough to pick up small changes
in performance, and changes over a period of three years are usually small at a system
level. Depending on the size and nature of the system, it may take many years before the
effects of any reforms are reflected as changes in test scores on PISA. So PISA can at best
only be a very rough description of the state of an education system.
More broadly, our analysis demonstrates the complexity of ‘translating the world into
numbers’(Gorur, 2010) and the ongoing challenge of finding certainty and clarity
through these translations (Gorur & Koyama, 2013). The OECD claims that, thanks to
PISA, we now have ‘an unprecedented comparative knowledge base of school systems
and their outcomes’(OECD, 2007, p. 6). But turning PISA data into a meaningful and
useful knowledge is a challenging enterprise.
We are particularly anxious about Australia’s desire to emulate the East Asian
systems. The high scores of the East Asian nations are linked to an obsession with
educational success (Anderson & Kohler, 2012), driving families to invest heavily in
private cram-schools –an investment that could be very costly in several ways:
[Private tuition] normally maintains or exacerbates social and economic inequalities; it may
dominate children’s lives and restrict their leisure times in ways that are psychologically and
educationally undesirable; and it can be perceived in some settings as a form of corruption
that undermines social trust. (Bray, 2009, pp. 13–14)
The punishing schedule, with students spending long hours at coaching classes after
school; the high levels of competitiveness; and the shame experienced by students who
do less well are linked to high rates of depression and suicide (Ahn & Baek, 2013).
Anderson and Kohler (2012) have linked the East Asian ‘education fever’to significant
drops in fertility rates, as levels of parent investment required to raise ‘successful’
offspring have risen dramatically. Ironically, as the authors point out, a low fertility rate
could be a threat to future economic success of these countries.
Much of the response from critics to Australia’s desire to learn from the East Asian
nations is based on the argument that the contexts of the East Asian nations differ greatly
from those of Australia, and that practices effective in Shanghai or Singapore would not
necessarily work as well here (see, for instance, Buckingham, 2012; Dinham, 2012).
Similar arguments have been made by a host of critics on the issue of ‘policy borrowing’
or ‘policy learning’. Whilst we agree that the ‘context’argument is both valid and
important, our concern is that the ‘context’critique leaves intact –or at least, it offers no
challenge to –the causal connection drawn between practices and test performance. The
context argument is also easily countered, as, for example, has been done by Jensen et al.
(2012), using a simple disclaimer, declaring that while we cannot unthinkingly adopt
policies from elsewhere, we can nevertheless learn from them. Crucially, debates about
Discourse: Studies in the Cultural Politics of Education 661
Downloaded by [Victoria University] at 19:24 24 January 2016
context direct attention away from examining and understanding Australia’s PISA
performance in greater detail. As a result, the idea that Australia’s performance in
international studies is slipping, and that the system is heading towards a crisis, has
persisted.
Conclusion
Our analysis demonstrates that some of our jurisdictions are already in the ‘top five’and
that there is no generalised crisis in education that can be inferred based on a detailed
reading of the PISA data. We suggest that the data point towards the need for a more
focused and targeted approach, rather than sweeping national reforms.
The fact that Australia’s performance varies markedly across mathematics items
provides a nuanced picture, pointing to possible differences between what is valued in
Australian curricula and what is tested in PISA. If we were to take seriously the
desirability of improving PISA scores, our analysis points to the need for further research
to locate the particular topics in which Australian students might be less well prepared,
rather than large-scale, system-wide reform.
In the case of ‘willingness to respond’also, the analysis points away from the
conclusion that Australian education is in a generalised crisis. While reiterating that
raising PISA scores is not a self-evidently good policy objective, we would conclude that
one way of improving Australia’s scores would be to improve students’attitude towards
and commitment to test completion, rather than engage in expensive, stressful reforms
such as NAPLAN and My School.
If Australia insists on using PISA to inform policy, it would do well to explore the
data in much greater detail. Average scores obscure far more than they reveal. Using
average performance scores and rankings to inform policy is leading to damaging policy
decisions.
Notes
1. The next PISA survey in 2015 will be implemented by a consortium led by the Educational
Testing Service based in the USA. For details, see http://www.erc.ie/?p=58.
2. Information for Tables 1 and 2was sourced from http://www.oecd.org/pisa/faqoecdpisa.htm.
3. http://www.oecd.org/pisa/.
References
Ahn, S.-Y., & Baek, H.-J. (2013). Academic achievement-oriented society and its relationship to the
psychological well-being of Korean adolescents. In C.-C. Yi (Ed.), The psychological well-being
of East Asian youth (pp. 265–279). Dordrecht: Springer Science+Business Media.
Ainley, M., & Ainley, J. (2011). A cultural perspective on the structure of student interest in science.
International Journal of Science Education,33(1), 51–71. doi:10.1080/09500693.2010.518640
Alexander, R. (2012). Moral panic, miracle cures and educational policy: What can we really learn
from international comparison? Scottish Educational Review,44(1), 4–21.
Anderson, T. M., & Kohler, H.-P. (2012). Education fever and the East-Asian fertility puzzle:
A case study of low fertility in South Korea. PSC Working Paper Series, PSC 12-07. Population
Studies Center, University of Pennsylvania Scholarly Commons. Retrieved October 12,
2013, from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1037&context=psc_working_
papers
Australian Education Act. (2013). Australian Education Act 2013: An act in relation to school
education and reforms relating to school education, and for related purposes. Retrieved October
12, 2013, from http://www.comlaw.gov.au/Details/C2013A00067
662 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016
Barad, K. (2003). Posthumanist performativity: Toward an understanding of how matter comes to
matter. Signs: Journal of Women in Culture and Society,28, 801–831. doi:10.1086/345321
Barber, M., & Mourshed, M. (2007). How the world’s best-performing school systems come out
on top. McKinsey & Company. Retrieved April 3, 2014, from http://www.google.com.au/url?
sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCgQFjAA&url=http%3A%2F%2Fmckin
seyonsociety.com%2Fhow-the-worlds-best-performing-schools-come-out-on-top%2F&ei=cdU8
U6v-G8qGkQWRsIGgAQ&usg=AFQjCNHaxsR2R1T3hoPgN77OwGq202cAIg&bvm=bv.639
34634,d.dGI&cad=rja
Bautier, E., & Rayou, P. (2007). What PISA really evaluates: Literacy or students’universes of
reference? Journal of Educational Change,8, 359–364. doi:10.1007/s10833-007-9043-9
Bracey, G. W. (2008). The leaning (toppling?) tower of PISA? Principal Leadership,9(2), 49–51.
Bray, M. (2009). Confronting the shadow education system –What government policies for what
private tutoring? Paris: UNESCO.
Breakspear, S. (2012). The policy impact of PISA: An exploration of the normative effects of
international benchmarking in school system performance OECD education working papers
(vol. 71).
Buckingham, J. (2012). Keeping PISA in perspective: Why Australian education policy should
not be driven by international test results.Issue Analysis (Vol. 136). St Leonards, NSW: The
Centre for Independent Studies. Retrieved from http://www.cis.org.au/images/stories/issue-
analysis/ia136.pdf
Common Core. (2009). Why we’re behind: What top nations teach their students but we don’t.
Common Core. Retrieved from http://commoncore.org/maps/documents/reports/CCreport_why
behind.pdf
Commonwealth of Australia. (2012). Australia in the Asian century. Canberra: Author.
Derksen, L. (2000). Towards a sociology of measurement: The meaning of measurement error in the
case of DNA profiling. Social Studies of Science,30,803–845. doi:10.1177/030631200030006001
Desrosières, A. (1998). The politics of large numbers –A history of statistical reasoning. (C. Naish,
Trans.). Cambridge, MA and London: Harvard University Press.
Dinham, S. (2012). Our Asian schooling infatuation: The problem of PISA envy. The Conversation,
14 September. Retrieved October 19, 2012, from http://theconversation.com/our-asian-schooling-
infatuation-the-problem-of-pisa-envy-9435
Gorur, R. (2011). ANT on the PISA trail: Following the statistical pursuit of certainty. Educational
Philosophy & Theory,43(5–6), 76–93.
Gorur, R. (2013). My school, my market. Discourse: Studies in the Cultural Politics of Education,
34 (2 Special Issue: Equity and marketisation in Australian education: Emerging policies and
practices), 214–230. doi:10.1080/01596306.2013.770248
Gorur, R. (2014). Towards a sociology of measurement in education policy. European Educational
Research Journal,13(1) (special issue on ‘Mobile Sociologies in Education’), 58–72.
doi:10.2304/eerj.2014.13.1.58
Gorur, R., & Koyama, J. P. (2013). The struggle to technicise in education policy. The Australian
Educational Researcher,40, 633–648. doi:10.1007/s13384-013-0125-9
Grek, S. (2007, November–December). ‘And the winner is…’: PISA and the construction of the
European education space. Paper presented at the ‘Advancing the European Education Agenda’,
European Education Policy Network Conference, Brussels, Belgium.
Grek, S. (2009). Governing by numbers: The PISA ‘effect’in Europe. Journal of Education Policy,
24(1), 23–37. doi:10.1080/02680930802412669
Jasanoff, S. (2005). Designs on nature: Science and democracy in Europe and the United States.
Princeton, NJ: Princeton University Press.
Jensen, B., Hunter, A., Sonnemann, J., & Burns, T. (2012). Catching up: Learning from the best
school systems in East Asia. Melbourne: Grattan Institute.
Latour, B. (2005). Reassembling the social: An introduction to actor-network-theory. Oxford:
Oxford University Press.
OECD. (2007). PISA –The OECD program for international student assessment. Retrieved June
14, 2009, from http://www.pisa.oecd.org/dataoecd/51/27/37474503.pdf
Porter, T. (1995). Trust in numbers –The pursuit of objectivity in science and public life. Princeton
and Chichester: Princeton University Press.
Porter, T. (2003). Measurement, objectivity, and trust. [Focus Article]. Measurement,1, 241–255.
Discourse: Studies in the Cultural Politics of Education 663
Downloaded by [Victoria University] at 19:24 24 January 2016
Rizvi, F., & Lingard, B. (2010). Globalizing education policy. London and New York, NY:
Routledge.
Rutkowski, D., & Rutkowski, L. (2013). Measuring socioeconomic background in PISA: One size
might not fit all. Research in Comparative and International Education,8, 259–278.
doi:10.2304/rcie.2013.8.3.259
Sadler, M. (1990). How can we learn anything of practical value from the study of foreign systems
of education? In J. H. Higginson (Ed.), Selections from Michael Sadler: Studies in world
citizenship. Liverpool: Dejall and Meyorre.
Scott, J. C. (1998). Seeing like a state: How some schemes to improve the human condition have
failed. Binghamton, NY: Vail-Ballou Press.
Simola, H. (2005). The Finnish miracle of PISA: Historical and sociological remarks on teaching
and teacher education. Comparative Education,41, 455–470. doi:10.1080/03050060500317810
Sjøberg, S. (2007). PISA and “real life challenges”: Mission impossible? In S. T. Hopman, G.
Brinek, & M. Retzl (Eds.), PISA according to PISA –does PISA keep what it promises? (pp.
203–225). Berlin: Lit Verlag.
Stengers, I. (2011). Comparison as a matter of concern. Common Knowledge,17(1), 48–63.
doi:10.1215/0961754X-2010-035
Stronach, I. (2010). Globalizing education, educating the local. Oxon: Routledge.
Thomson, S., Hillman, K., Wernert, N., Schmid, M., Buckley, S., & Munene, A. (2012). Monitoring
Australian year 4 student achievement internationally: TIMSS and PIRLS 2011. Camberwell:
Australian Council for Educational Research.
Torija, P. (n.d.). Straightening PISA: When students do not want to answer standardized tests. Work-
in-progress (pp. 1–43). University of Padova. Retrieved November 18, 2012, from http://www.
econ.jku.at/members/Department/files/ResearchSeminar/SS12/torija.pdf
Whitty, G. (2009). Marketization and post-marketization in education. In A. Hargreaves, A.
Lieberman, M. Fullan, & D. Hopkins (Eds.), Second international handbook of educational
change (pp. 405–413). Springer International Handbooks of Education 23, Dordrecht, Heidel-
berg, London, New York: Springer. doi:10.1007/978-90-481-2660-6_1
Woolgar, S. (1991). Beyond the citation debate: Towards a sociology of measurement technologies
and their use in science policy. Science and Public Policy,18, 319–332.
664 R. Gorur and M. Wu
Downloaded by [Victoria University] at 19:24 24 January 2016