ArticlePDF Available

Open Data for Global Science

Authors:
  • independent

Abstract

he digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use. The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy.
CHAPTER EIGHT
OPEN DATA FOR GLOBAL SCIENCE1
Paul Uhlir2 and Peter Schröder3
INTRODUCTION
The global science system stands at a critical juncture. On the one hand,
it is overwhelmed by a hidden avalanche of ephemeral bits that are
central components of modern research and of the emerging
‘cyberinfrastructure’4 for e-Science.5 The rational management and
exploitation of this cascade of digital assets offers boundless
opportunities for research and applications. On the other hand, the
ability to access and use this rising flood of data seems to lag behind,
1 ‘Open Data for Global Science’ was originally published in Open Data for Global Science – Special
Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007) page 36
<http://www.jstage.jst.go.jp/article/dsj/6/0/OD36/_pdf>. The views expressed in this
paper are those of the authors and not necessarily those of their institutions of employment.
2 Director of the Office of International S&T Information Programs (ISTIP) and the U.S.
National Committee for CODATA at the National Academies in Washington, DC National
Research Council.
3 Data Archiving and Networked Services (DANS).
4 The US Blue Ribbon Advisory Panel on Cyberinfrastructure anticipated an information and
communication technology (ICT) infrastructure of ‘…digital environments that become
interactive and functionally complete for research communities in terms of people, data,
information, tools and instruments and that operate at unprecedented levels of computational,
storage and data transfer capacity…’ in Revolutionizing Science and Engineering Trough
Cyberinfrastructure: Report of the National Science Foundation Blue Ribbon Advisory Panel on
Cyberinfrastructure, National Science Foundation (2003)
<http://www.communitytechnology.org/nsf_ci_report/>. We use the terms
cyberinfrastructure and ICT infrastructure interchangeably in this paper.
5e-science’ refers to ‘the large-scale science that will increasingly be carried out through
distributed global collaborations enabled by the Internet. Typically, a feature of such
collaborative scientific enterprises is that they will require access to very large data collections,
very large scale computing resources and high performance visualisation back to the individual
user scientist . . . Besides information stored in Webpages, scientists will need easy access to
remote facilities, to computer – either as dedicated Teraflop computers or cheap collections of
PCs – and to information stored in dedicated databases.’ John Taylor, Director General of UK
Research Councils. See: <www.research-councils.ac.uk/escience/>.
Legal Framework for e-Research: Realising the Potential 190
despite the rapidly growing capabilities of information and
communication technologies (ICTs) to make much more effective use of
those data. As long as the attention for data policies and data
management by researchers, their organisations and their funders does
not catch up with the rapidly changing research environment, the
research policy and funding entities in many cases will perpetuate the
systemic inefficiencies, and the resulting loss or underutilisation of
valuable data resources derived from public investments. There is thus
an urgent need for rationalised national strategies and more coherent
international arrangements for sustainable access to public research data,
both to data produced directly by government entities and to data
generated in academic and not-for-profit institutions with public
funding.
In this chapter, we examine some of the implications of the ‘data driven’
research and possible ways to overcome existing barriers to accessibility
of public research data. Our perspective is framed in the context of the
predominantly publicly funded global science system. We begin by
reviewing the growing role of digital data in research and outlining the
roles of stakeholders in the research community in developing data
access regimes. We then discuss the hidden costs of closed data
systems, the benefits and limitations of openness as the default principle
for data access, and the emerging open access models that are beginning
to form digitally networked commons. We conclude by examining the
rationale and requirements for developing overarching international
principles from the top down, as well as flexible, common-use
contractual templates from the bottom up, to establish data access
regimes founded on a presumption of openness, with the goal of better
capturing the benefits from the existing and future scientific data assets.
The ‘Principles and Guidelines for Access to Research Data from Public
Funding’ from the Organisation for Economic Cooperation and
Development (OECD), reported on in another article by Pilat and
Fukasaku,6 are the most important recent example of the high-level
(inter)governmental approach. The common-use licenses promoted by
the Science Commons are a leading example of flexible arrangements
originating within the community. Finally, we should emphasise that we
6 In Open Data for Global Science – Special Issue, Paul Uhlir (ed), CODATA Data Science Journal,
(2007).
Open Data for Global Science 191
focus almost exclusively on the policy—the institutional, socioeconomic,
and legal aspects of data access—rather than on the technical and
management practicalities that are also important, but beyond the scope
of this article.
THE GROWING ROLE OF DIGITAL DATA IN THE
RESEARCH PROCESS
The evolution of scientific research may be characterised by an
accelerating growth in scale, scope, and complexity. These
developments in scientific research have been accompanied by a
substantial rise in costs. Overall expenditures on research and
development (R&D) in the OECD countries increased from $163.2
billion in 1981 to $679.8 in 2003 (in constant prices, 2000 dollars: from
$276.6 billion in 1981 to $638 in 2003).7
Not surprisingly, these trends also have elicited growing governmental
policy involvement in scientific research at both the national and
international levels. The research policy establishment has promoted
greater cooperation between public researchers and the private sector, as
well as greater international cooperation in public research.8 The
phenomenal growth of the cyberinfrastructure, particularly in OECD
countries, has been both a facilitator and accelerator of these trends. It
has further magnified the scale, scope, and complexity of scientific
research by enabling the integration of research participants and
information resources from multiple disciplines, sectors, and countries.
Continuously growing quantities of data about the universe around us
are produced by government agencies, research institutions, and industry
as a fundamental component of scientific research worldwide.
Practically anything used for research purposes can be described and
stored in a digital database. A genomic sequence, the speed of
subatomic particles, a response in a social survey, the frequency of
nouns in a text corpus, and satellite images of other planets all are used
as research data. As described in the National Research Council
7 Organisation for Economic Co-operation and Development, OECD Main Science and
Technology Indicators (2005).
8 See, for example, Organisation for Economic Co-operation and Development, The Knowledge-
based Economy (1996).
Legal Framework for e-Research: Realising the Potential 192
symposium on The Role of Scientific and Technical Data and Information in the
Public Domain in 2002:
The rapid advances in digital technologies and networks over
the past two decades have radically altered and improved the
ways that data can be produced, disseminated, managed, and
used, both in science and in all other spheres of human
endeavour. New sensors and experimental instruments
produce exponentially increasing amounts and types of raw
data. This has created unprecedented opportunities for
accelerating research and creating wealth based on the
exploitation of data as such … There are whole areas of
science, such as bioinformatics in molecular biology and the
observational environmental sciences, that are now primarily
data driven. New software tools help to interpret and
transform the raw data into unlimited configurations of
information and knowledge. And the most important and
pervasive research tool of all, the Internet, has collapsed the
space and time in which data and information can be shared
and made available, leading to entirely new and promising
modes of research collaboration and production.9
The production of a data set thus constitutes the first stage of improving
the knowledge of some part of nature and society for further research
and innovation. Rather than a linear process, however, the use of digital
data is better conceptualised as a series of dynamic ‘chain link’
feedbacks, broadening the usability of separate and related chains (see
Box 1). The increasing supply of data frequently may be useful for
purposes beyond those contemplated in the original collection. Many
publicly funded data can be of great value for reuse by a broad range of
public and private researchers, other types of socioeconomic
applications, and the general public.
9 Paul Uhlir, ‘Discussion Framework’ in Julie Esanu and Paul Uhlir (eds), The Role of Scientific and
Technical Data and Information in the Public Domain (2003) 3.
Open Data for Global Science 193
BOX 1
Research data: their place in the research process
For most of the history of science, scientific data were usually
inextricably embedded in an all-embracing research process.
Researchers mostly collected and used their own data in their own
research projects and had access to few external data sources. However,
with the advent of digital technologies and networks, together with the
growing scale and scope of research activities worldwide, the various
parts of the research trajectory have been loosened into separate
specialised activities (as, for example, data collection or technical
support) that may be executed by different entities, in-house or outside
the research institute. In large-scale research, specialised data service
institutes may operate independently from the research projects they
serve. Different parties will have differing responsibilities and may have
differing claims on ‘their’ parts of the trajectories. The various phases of
the research process, including the upstream data management process,
may be subject to different policies, regulations, and legislation. This
diagram shows the main elements of the research and data trajectories.
The Research Trajectory
The Data Trajectory
Possibilities for data sharing once primary data have been collected:
The changes in the research process have not only been quantitative, but
qualitative as well, leading to discoveries never before possible. For
example, hitherto unconnected data elements can be assembled into
unexpected new results. The research strategy developed by Rita
Colwell, former Director of the U.S. National Research Foundation, in
Legal Framework for e-Research: Realising the Potential 194
her studies on cholera is a case in point.10 By combining large sets of
data on sea life, earth observation, historical epidemiology, DNA
analyses, and social anthropology she was able to demonstrate disease
patterns that, without the use of ICT tools and access to all the diverse
data, would have remained invisible. What is clear is that digital data
play a central part in the emerging global science system and in the
promise of e-Science. And while most of the palpable progress to date
has occurred in the more economically developed countries, the biggest
payoffs from this new research paradigm could take place in the
developing world.
These major changes in the structure and conduct of data-driven
research using the cyberinfrastructure result in an increasing need for
rational organisation and planning, however. A more transparent and
predictable environment for access to and use of data resources would
help to optimise the national and international research system.
THE EMERGING ROLES OF STAKEHOLDERS IN THE
GLOBAL SCIENCE SYSTEM IN DEVELOPING DATA
ACCESS REGIMES
Changes in the scientific research process are coupled with changing
roles of the interdependent parties responsible for science policy and
research management. Here we briefly examine the roles of these
different stakeholders with regard to public science data policy and
management in the context of the cyberinfrastructure. There are formal
organisations, associations, and individuals involved at different
(inter)national levels in the digital data activities. They represent specific
economic, social, national, personal, and scientific interests, and play
roles as experts and managers of research. These stakeholder groups all
affect the development (or not) of data access regimes, both directly
through governmental and institutional data management and policy
implementation, and indirectly through normative and behavioural
influences.
10 Rita Colwell, ‘A Global Thirst for Safe Water: The Case of Cholera’ (Speech delivered at the
Abel Wolman Lecture at the National Academy of Sciences, 2002)
<http://www7.nationalacademies.org/wstb/2002_Wolman_Lecture.pdf>.
Open Data for Global Science 195
Governments are responsible for the legal and regulatory framework in
which the research system operates, as well as for funding it with the
taxpayers’ money. Governments have core responsibilities for general
public information rights, including overall policy over national science
systems. More specifically, governments claim responsibility for overall
policy over national science and innovation systems as a public good (for
example research for public health, national security, general
advancement of knowledge, and socioeconomic development). As
funders of research, they have an interest in promoting accountability
for the cost effectiveness and management of their public investments in
research. Governmental policies are thus crucial for establishing a
rational framework for managing and implementing the national science
system and international scientific cooperation, most of which is now
entirely dependent on digital networks. To the extent that public
scientific data (and other types of information) are fundamental
components of the modern research enterprise, governments have a
responsibility to establish the policy framework in which the research
organisations function and enable the rational development and
exploitation of those information resources. This involves a balance
between protecting and stimulating competitive and cooperative values
at different levels of the research system.
Research funding agencies
are responsible for the actual allocation of
taxpayer funds to the various research activities. They are accountable
for the support and performance of the national science system. They
comprise the experts who must develop and implement national
research strategies and funding priorities in consultation with key
representatives of the scientific community. Research funding agencies
are also responsible for the more detailed allocation of public research
funds, the support of specific elements of the research infrastructure
(the people, facilities, and equipment), and the formation of policies
specific to their constituencies. Digital science increasingly requires such
specific policy and infrastructure support for networks, computing
facilities, and institutional mechanisms for storing and making available
the digital inputs and outputs of public research. This responsibility
includes the possible establishment of specialised data centres, both
within the funding agencies themselves and with their support at other
research institutions. As the research funding agencies decide on the
funding priorities, they are in a powerful position to influence the overall
Legal Framework for e-Research: Realising the Potential 196
data policy and management regimes for the research institutions that
they create or support.
Universities and not-for profit research institutes
manage their
employees’ implementation of publicly-funded research programs and
projects, subject to academic norms and the guidance of the sources of
their funding (both public and private, and internal and external). These
functions include support and management of ICT facilities and the
resulting data collections and repositories for publications. Many
academic research institutions now manage a large number of individual
databases—as well as specialised data centres and more comprehensive
institutional repositories and libraries—that are funded in whole or in
part with public money. Whether or not they do have a data centre, they
have a responsibility for establishing policies for the access to and use of
their expanding amounts and types or research data and information.
These policies must be consistent with the requirements and interests of
their funding sources, researchers, and other institutional stakeholders,
and with the broader research community in which these institutions
operate. Widespread uncertainty about possible conflicting interests and
tasks of multiple stakeholders make the establishment of data access
policies at research institutions crucial, though difficult. They require
consistency at the higher policy level, as well as flexibility at the
implementation level.
Learned and professional societies
represent the formal side of the
otherwise more loosely defined research communities. They provide a
focal point for interaction and communication by their particular
discipline communities, especially at the national level. They are major
players in developing scientific norms, values, and standards such as
academic freedom, scientific responsibilities, and increasingly regarding
access to data produced by members of their research communities.
They provide concentrated expert resources that combine the
perspectives of the larger-scale changes in the operation of the science
system with the first-hand experience from the specific changes in the
day-to-day research practice in their disciple areas. The societies
promote their views within their own communities by establishing
formal and informal policies and codes of conduct for their members,
through major conferences and their journal publications, and externally
through interactions with policy makers and research managers.
Open Data for Global Science 197
International scientific organisations
have a role similar to the
learned societies, but at regional or global levels. The international non
governmental scientific organisations (NGOs) must be distinguished
from the intergovernmental organisations (IGOs). Among the IGOs
relevant in this context are the Organisation for Economic Co-operation
and Development (OECD), and some of the specialised agencies of the
United Nations, such as the United Nations Educational, Scientific, and
Cultural Organisation (UNESCO). Relevant NGOs include the
International Council for Science (ICSU), the interdisciplinary
Committee on Data for Science and Technology (CODATA), the
InterAcademy Panel on International Issues (IAP), and the Academy of
Sciences for the Developing World (TWAS). These organisations have
the subject matter interest and expertise to develop improved data
policies and practices, as well as important contacts with the policy and
research communities to promote them.
Industry research institutions
generally benefit from greater access to
scientific data produced by others. Traditionally, industrial laboratories
and researchers tend to keep their own data outputs proprietary and
inaccessible to other scientists and engineers. Keeping proprietary data
inaccessible might entail lost opportunity costs for the owners as they
will not be able to benefit from the results of additional research by
other experts using those data. Industry research institutions
increasingly outsource research to universities, however, partnering with
university researchers often keeping the data on a proprietary basis.
Industry-academic research partnerships are growing because of public
policies favouring such arrangements and economic pressures on both
academic and industrial research organisations. Public-private research
partnerships may further complicate the management of the resulting
data and the optimal allocation of rights to those data, as discussed
further in the article.
Individual researchers
generate increasing amounts and types of data,
both as individuals and as participants in various kinds of formal and
informal collaborations. Individual researchers sometimes show a
different attitude to accessing data from colleagues for their own
research than towards sharing ‘their’ data with colleagues. The informal
culture at the working research level, with its strategic relations among
researchers that are often invisible to outsiders, is dominated by
traditions that in many cases have not yet caught up with recent
Legal Framework for e-Research: Realising the Potential 198
developments in data policies and data management. However, much of
the formal decision making on data access and sharing increasingly takes
place at the institutional level. As the main producers and users of
public scientific data, individual researchers ultimately have the greatest
stake in the development of rational data access regimes and in the
adequate funding and management of data collections and centres.
Because researchers typically have been at the forefront of both
developing and using the ICT infrastructure, they also have been some
of the most influential players, together with their employing
institutions, in creating new models of data access regimes from the
bottom up. A great deal of data exchange and collaboration takes place
informally on the internet between scientists as a result of their personal
and professional relationships and in support of their respective research
activities. Many researchers also have become part-time or specialised
data managers.
The general public
includes the taxpayers whose money is invested in
public research and related data activities. Society in general has a strong
interest in seeing that the fruits of those investments are effectively
managed and used. The lay public generally is not concerned directly
with the policy and management issues pertaining to national R&D, or
to data from publicly funded research. Nevertheless, action groups of
citizens may get involved in data access issues for various specific
reasons and circumstances (e.g., local environment, health, or consumer
safety). Increasingly, journalists do their own analyses of datasets used
in the social sciences and the humanities. Moreover, with the broad
public access to the internet in many countries, the potential user base
for many kinds of public research data has expanded greatly, adding a
further important dimension to the data policy debate, as discussed
further on in the article.
Each of these major stakeholder groups in the research enterprise has a
major and growing interest in the development of more effective policies
for access to and use of publicly funded research data. Although the
sharing of data resources in networked cooperation has become
standard practice in some fields, particularly in the more economically
developed countries, in many cases researchers and their institutes
experience too much uncertainty and barriers to make the most effective
use of the new possibilities. This situation is exacerbated in less
developed countries that also have less fully developed technical and
Open Data for Global Science 199
human infrastructure for research, as well as institutional mechanisms
and policy frameworks.
THE HIDDEN COSTS OF CLOSED DATA SYSTEMS
As described in Box 1, digital research data are emerging in the research
system as autonomous resources, the uses of which are no longer tied
solely to their original producers or purposes. There are, of course, data
that have little value outside the narrow research project for which they
were collected or that are not useful for lack of quality, insufficient
documentation, or other deficiencies. Many types of data, however, can
be used beyond the ambit of the original producers and users in diverse
and unlimited ways, at different times and places, and potentially by
anyone with access to the ICT infrastructure. The sharing of public
research data opens up new opportunities to raise the quality and
productivity of research, but the full realisation of this potential requires
additional attention to data policy and practice.
At the same time, there are competitive values and other legitimate
reasons for restricting access to data from publicly funded research,
which is reviewed further on in the article. The different stakeholders
involved may perceive conflicting interests when considering the
benefits and drawbacks of open access to data. Many researchers tend
to treat the data they produce through publicly-funded research as
individual or institutional property, and this view frequently is reinforced
by the lack of adequate policy guidance from their public funding
sources.
There are, however, a number of negative implications11 to the efficiency
and effectiveness of the research system from unnecessarily balkanised
and closed access regimes in light of the (quasi) public good12 nature of
such digital data resources.
11 J Reichman and Paul Uhlir, ‘Database Protection at the Crossroads: Recent Developments
and Their Impact on Science and Technology’ (Spring 1999) 14 (2) Berkeley Technology Law
Journal 819–21.
12 Both the public nature of the research and the resulting data have public-good characteristics.
A public good is both non-rival and non-excludable. The former means that it costs nothing to
provide the good to another person once someone has produced it (in other words, it has a
zero marginal cost of distribution). The latter refers to the characteristic that once such a good
is produced, the producer cannot exclude others from benefiting from it. Inge Kaul et al,
‘Defining Global Public Goods’ in Kaul et al (eds), Global Public Goods: International Cooperation in
Legal Framework for e-Research: Realising the Potential 200
Higher research costs
Most obviously, restricting access imposes structural inefficiencies and
higher research costs. Many factual databases cannot or should not be
independently recreated, either because they contain observations of
unique phenomena, historical information, or cost a great deal to
generate.13 Moreover, databases with a monopoly status that are
maintained on a closed proprietary basis will tend to result in higher,
anti-competitive pricing.14 Managing publicly funded databases on a
restrictive, proprietary basis also adds substantial administrative
overhead on both ends to make each transaction, further taxing the
public research system. This is particularly exacerbated by public
institutions that license data at high costs and restrictions to other public
institutions.
Lost opportunity costs
Perhaps not as obvious, there is much less data-intensive research
possible if the publicly-funded data are not shared or made easily
available online. This results in significant lost opportunity costs that are
certain to occur, but are difficult to measure.15 A simple analogy might
suffice to illustrate this effect. Just as it would hardly be cost-effective
research management to limit the use of a telescope or an accelerator to
the researchers and engineers who designed the instrument, it is a waste
of effort and money to limit the use of data to the researchers
the 21st Century (1999). Public research and publicly funded scientific data on digital networks
may be considered as ‘quasi public goods’ in that they are to a certain degree appropriable,
although they nonetheless have public-interest characteristics that make them capable of
production only if subsidised by public funding. See Michael Callon, ‘Is Science a Public
Good?’ (1994) 19 Science, Technology and Human Values 395.
13 National Research Council, A Question of Balance: Private Rights and the Public Interest in Scientific
Databases (1999)19–20.
14 Peter Weiss, ‘Conflicting International Public Sector Information Policies and Their Effects
on the Public Domain and the Economy’ in Julie Esanu and Paul Uhlir (eds), The Role of
Scientific and Technical Data and Information in the Public Domain (2003) 129–32; and J Reichman and
Paul Uhlir, ‘Database Protection at the Crossroads: Recent Developments and Their Impact on
Science and Technology’ (Spring 1999) 14 (2) Berkeley Technology Law Journal 819–21.
15 It is difficult to determine what might have been possible if only the data were openly
available. This was analysed in at least one instance when the U.S. Landsat program was
privatised in the mid-1980s. National Research Council, Bits of Power: Issues in Global Access to
Scientific Data (1997) 121–24.
Open Data for Global Science 201
responsible for their original collection and lose the potential benefits of
greatly expanded applications for those data that may have some broader
utility.
Barriers to innovation
The production downstream of copyrightable or patentable intellectual
goods by both the public and private sectors depends to a large extent
on access to the free flow of upstream public factual data and
information. The overprotection or unavailability of public databases
leads to deadweight social costs, taxing the innovation system in each
country and slowing scientific progress.16
Less effective cooperation, education, and training
A failure to make research data easily available, or erecting barriers that
are too high, necessarily results in less effective interdisciplinary, inter-
institutional, inter-sectoral, and international cooperation. In the same
way, students may be less effectively educated and trained if they are
unable to work with a broad cross-section of data. These barriers are
reinforced in many cases by myopic policies that provide access and
restricted use for a small number of pre-approved investigators formally
associated with specific research projects and programs, even at an
international level, while greatly constraining both access and use of
those data by researchers and other potential users in ‘non-approved’
disciplines, institutions, sectors, and nations.
Sub-optimal quality of data
Data organised in a closed environment frequently will be subject to a
process of validation and verification from a substantially smaller and
less diverse scientific community than data that are openly available.
This will increase the risks of lower data quality and consequently of the
quality of research outcomes. Less comprehensive opportunities for
quality control will diminish the return on investments in data as well as
research.
16 J Reichman and Paul Uhlir, ‘A Contractually Reconstructed Research Commons for Scientific
Data in a Highly Protectionist Intellectual Property Environment’ (Winter/Spring 2003) 66 Law
and Contemporary Problems – Duke University School of Law 410–16.
Legal Framework for e-Research: Realising the Potential 202
Widening gap between OECD nations and developing
countries
Developing countries are particularly disadvantaged by a lack of
availability or high barriers to access. Although not all databases
produced in OECD countries are relevant in less developed ones, either
because of their subject matter or geographic focus, those that do have
broad applicability as a global public good will typically be unused in the
developing world if there is a high price for access, and in many cases,
any charge at all.
Unnecessary access barriers to publicly funded research data therefore
result in diminished returns on the social and scientific capital
investments in public research and in the inefficient distribution of
benefits from those investments, even as the improving technological
capabilities offer ever greater opportunities to increase that return.
THE SCIENTIFIC AND SOCIOECONOMIC BENEFITS
OF GREATER OPENNESS
In view of the trends and the role of public data in science discussed
above and the inefficiencies of the current ad hoc system, there are many
compelling reasons for developing more comprehensive access regimes
at the institutional, national, and international levels, with open access as
the default rule. This is the case whether the data are produced within
government or by entities funded by government sources, although
some important distinctions apply, as outlined below.
Open access in the context of public research data may be defined as
access on equal terms for the international research community, as well
as industry, with the fewest restrictions on (re)use, and at the lowest
possible cost.17
This definition is also consistent with the ‘full and open’ data policy used
in various international environmental projects and in environmental
(and other) research in the United States over the past two decades.18
17 Preferably at no more than the marginal cost of dissemination (the cost of fulfilling a user
request), which is essentially zero online.
18 National Research Council, Bits of Power: Issues in Global Access to Scientific Data (1997) 1, 15–16.
Open Data for Global Science 203
Because the value of scientific data lies in their use, open access to and
sharing of data from publicly-funded research offers many advantages
over a closed, proprietary system that places high barriers to both access
and subsequent re-use. Open access to such data:
à reinforces open scientific inquiry,
à encourages diversity of analysis and opinion,
à promotes new research and new types of research,
à enables the application of automated knowledge
discovery tools online,
à allows the verification of previous results,
à makes possible the testing of new or alternative
hypotheses and methods of analysis,
à supports studies on data collection methods and
measurement,
à facilitates the education of new researchers,
à enables the exploration of topics not envisioned by the
initial investigators,
à permits the creation of new data sets, information, and
knowledge when data from multiple sources are
combined,
à helps transfer factual information to and promote
capacity building in developing countries,
à promotes interdisciplinary, inter-sectoral, inter-
institutional, and international research, and
à generally helps to maximise the research potential of
new digital technologies and networks, thereby
providing greater returns from the public investment in
research.19
Open access to factual data plays a vital enabling role in all these areas.
Creating a level playing field for researchers and their institutes is
19 See, for example, S E Feinberg, M E Martin, and M L Straf (eds), Sharing Research Data (1985);
National Research Council, A Question of Balance: Private Rights and the Public Interest in Scientific
Databases (1999); and Arzberger et al, ‘Promoting Access to Public Research Data for Science,
Economic, and Social Development’ (2004) CODATA Data Science Journal, 135–52.
Legal Framework for e-Research: Realising the Potential 204
impossible without broad and effective access to publicly funded
research data. Nevertheless, there are essential distinctions to be made
between data produced by government entities and by entities funded by
government sources, as well as across disciplines and types of data.
Moreover, there may be important and legitimate reasons for not
making publicly funded research data openly accessible, but rather
keeping them secret or proprietary, at least for limited times and in
specific circumstances. These nuances and exceptions are complex, but
important to understand in the development of access regimes. We
touch on them only briefly below.
Policy Considerations for Data Produced by Government
Entities
The data and databases generated directly through government research
have the following additional policy considerations favouring their open
availability and unrestricted reuse20:
Legal considerations
Consistent with Article 19 from the Universal Declaration of Human Rights,
national law on information rights should include public access to data
and information produced by the government, and related freedom of
expression by the public. Moreover, a government entity needs no legal
incentives from exclusive property rights to create the data. Both the
activities that the government undertakes and the information produced
by it in the course of those activities are a public good, properly in the
public domain. Data produced through public research frequently have
global public-good characteristics.21
Socio-economic considerations
Open access is the most efficient way to disseminate public data and
information online in order to maximise the value and return on the
20 Paul Uhlir and UNESCO, Policy Guidelines for the Development and Promotion of Governmental
Public-Domain Information (2004) 49.
21 See, for example, Dana Dalrymple, ‘Scientific Knowledge as a Global Public Good:
Contributions to Innovation and the Economy’ in Julie Esanu and Paul Uhlir (eds), The Role of
Scientific and Technical Data and Information in the Public Domain (2003) 35–51.
Open Data for Global Science 205
public investment in its production.22 There are numerous economic
and non-economic positive externalities—especially through network
effects—that can be realised on an exponential basis (though they may
be difficult to quantify) through the open dissemination of public-
domain data and information on the internet.23 Conversely, the
commercialisation of public data on an exclusive basis produces de facto
public monopolies that have inherent economic inefficiencies and tend
to be contrary to the public interest.
Ethical considerations
The public has already paid for the production of the information. The
burden of fees for access falls disproportionately on the poorest and
most disadvantaged individuals (and researchers), including those in
developing countries when the information is made available online.
This is an important consideration for public, governmental scientific
data that constitute a global public good.
Good governance considerations
Transparency of governance is undermined by restricting citizens from
access to and use of public data and information created at their expense
and on their behalf. Rights of freedom of expression are compromised
by restrictions on re-use and re-dissemination of public information. It
is no coincidence that the most repressive political systems make the
least amount of government information, especially factual data, publicly
available.
Although there are strong arguments in favour of a default rule of
openness in support of publicly-funded research, at the same time there
are various legitimate, countervailing polices that may limit the free and
unrestricted access to and use of government information, including
22 Joseph Stiglitz et al (commissioned by the Computer and Communications Industry
Association), The Role of Government in a Digital Age (2000).
23 Joseph Stiglitz et al (commissioned by the Computer and Communications Industry
Association), The Role of Government in a Digital Age (2000). See also Peter Weiss, ‘Conflicting
International Public Sector Information Policies and Their Effects on the Public Domain and
the Economy’ in Julie Esanu and Paul Uhlir (eds), The Role of Scientific and Technical Data and
Information in the Public Domain (2003) 129–32;, Commission of the European Communities
(European Union), Public sector information: A key resource for Europe (1998); and PIRA
International for the Directorate General for the Information Society (European Union),
Commercial Exploitation of Europe’s Public Sector Information, Final Report (2000).
Legal Framework for e-Research: Realising the Potential 206
research data. For example, there are statutory exemptions to public
access and use based on national security and law enforcement concerns,
the need to protect personal privacy, and to respect confidential
information (plus other exemptions to Freedom of Information laws,
where applicable).24 Government agencies also should respect the
proprietary rights in information originating from the private sector that
are made available for government use, unless expressly exempted.
Governments may adopt policies as well against competing directly with
the private sector in providing certain information products and services.
‘Emerging Open Access Models’ examines more explicitly some of the
additional factors that need to be considered in limiting disclosure of
data in research funded by the government.
Policy Factors to Consider in Disseminating Government-
Funded Research Data
The access policies for research data produced by non-governmental
entities with government funds25 have rationales similar to those
outlined above for government-produced data. There are additional
factors that may come into play, however.
In some areas of research or in certain research programs, the recipient
of a government grant or contract may have a specifically established
period of exclusive use of the research data or until publication of the
research results. These policies vary across disciplines, institutions, and
countries, and in many cases there are no expressly stated, formal rules,
just community practice and norms. In some instances, it is appropriate
for data to be withheld even after publication, either because of
confidentiality or privacy requirements, or because the underlying data
are part of a longitudinal study spanning many years. However,
generally accepted scientific norms and the exigencies of the scientific
process that require access to data underlying published results for the
purpose of independent verification make disclosure of such data
24 For a compendium of freedom of information laws and their exceptions, see
<http://www.freedominfo.org>.
25 This is certainly the case in which public sources provide 100 percent of the funding. As the
percentage of public funding in any given research project diminishes the corresponding
rationale and arguments for full policy control become weaker as well.
Open Data for Global Science 207
following publication an essential prerequisite for sound science, even if
there is no formal rule in place.26
Moreover, open access to research data will not in itself result in
usability. Optimum accessibility and usability presuppose a trajectory of
proper organisation and curation of a database with ‘added’ value, which
also adds costs to its production. Investments in preparing factual data
for broader use may easily qualify for intellectual property protection
and require some source of funding for providing enhanced access to
other users. In most cases, however, there is a compelling reason to
develop legal and funding mechanisms that will actively promote public
accessibility to those publicly funded data resources. Such complications
strengthen the case for further cooperation among the different parties
involved in developing the policies and institutional mechanisms for
improved data management and access.
Some OECD countries or research funding agencies also have policies
that favour the commercialisation of government-funded research.27
For research areas in which commercial applications are inherent or
desirable, there will be additional motivations for the researcher to keep
the data proprietary and under conditions of trade secrecy, at least until
patent rights are secured. Furthermore, the non-governmental research
may involve a mix of public and private funds or partners, or include
parties from multiple countries, which can complicate the allocation of
rights in the research data. In such cases, the application of an open
access data policy also may be inappropriate, unless expressly agreed to
by all the participating parties.
The issues raised in public-private relationships take many forms and
contain some inherent tensions, such as openness versus exclusivity,
public goods versus private investments, public domain versus
proprietary rights, and competition versus monopoly, among others.
This mix of motivations, priorities, and requirements is context-
26 See, for example, National Research Council, Community Standards for Sharing Publication-Related
Data and Materials (2002).
27 Perhaps the best known of these is the 1980 ‘Bayh-Dole Act’ in the United States, which
states in part: ‘[i]t is the policy and objective of Congress to use the patent system to promote
the utilisation of inventions arising from federally supported research or development…[and]
to promote the collaboration between commercial concerns and non-profit organisations,
including universities…’, Public Law No 96–517, § 6(a), 94 Stat 3015 (1980), codified as
amended at 35 USC, § 200.
Legal Framework for e-Research: Realising the Potential 208
dependent, typically unique to the parties involved, and frequently not
amenable to inflexible statutory and regulatory frameworks. In such
cases, the ordering of the respective rights and interests of the parties
involved is most efficiently accomplished through contracts. Such
private agreements provide maximum flexibility within the larger
research policy context. What is especially important to emphasise here
is that such agreements can in many cases provide for conditionally open
access that advances the public interest goals associated with the public
funding, while effectively protecting existing proprietary private
interests.28
This bifurcated ordering of interests can take many forms. At the most
basic level, it is possible to provide free access for not-for-profit research
and education (and other) users, while restricting commercial users and
uses to a reimbursable, or even for-profit, basis. Various techniques of
price discrimination and product differentiation may be similarly
employed, based on factors such as time (for example, real-time access
for commercial users vs. delayed access for non-profits), scope of
coverage (for example, geographic or subject matter limitations), levels
of customer support or service, and other possible distinctions.29 Such
strategies can help promote scientifically and socially beneficial access
and use, not only in the complex public-private research relationships,
but even in exclusively private-sector settings.30
In addition to these complexities within the government-funded
academic and not-for-profit research context, there are important
distinctions that need to be made among different disciplines and types
of research. A major difference is between those areas of science that
are dominated by ‘big science’ research projects and programs, and those
that remain predominately ‘small science’ research endeavours,
performed by a single investigator (or small group).31 The former are
28 J Reichman and Paul Uhlir, ‘A Contractually Reconstructed Research Commons for Scientific
Data in a Highly Protectionist Intellectual Property Environment’ (Winter/Spring 2003) 66 Law
and Contemporary Problems – Duke University School of Law 410–16.
29 National Research Council, Bits of Power: Issues in Global Access to Scientific Data (1997) 124–6.
30 See generally, J Reichman and Paul Uhlir, ‘A Contractually Reconstructed Research
Commons for Scientific Data in a Highly Protectionist Intellectual Property Environment’
(Winter/Spring 2003) 66 Law and Contemporary Problems – Duke University School of Law Part IV.
31 Traditionally, ‘small science’ research was done primarily in experimental laboratory sciences,
such as chemistry and biology; in fieldwork studies such as ecology, anthropology, and various
Open Data for Global Science 209
typically cooperative, whereas the latter tend to be more competitive, or
at least insular. Most big science programs have instituted a formal data
access regime in established data centres, frequently on an open access
basis (as discussed further in Emerging Open Access Models), whereas
the latter generally have no formal access rules governing their research
data.
Another key distinction across scientific disciplines is between the
observational and experimental sciences, where the types of data that
need to be preserved and made broadly available differ significantly.32
Typically, for observational data sets, it is the raw or minimally
processed data that have the greatest value for reuse in research, whereas
in the experimental sciences, it is the highly evaluated and verified data
that are preserved and made available for broad use.
Finally, as already noted for government-produced data, an important
distinction must be made between data collected on human subjects and
data on other, impersonal, subjects.33 Research data on human subjects
are restricted in various ways on ethical and legal grounds to protect
personal privacy.
The bottom line in all of these categories of research and data types,
however, is that open access to publicly funded research data should be
the default rule and operating presumption, rather than the exception,
and the exceptions to openness should be based on explicit, well-
justified grounds.
areas of social science; and in studies of human subjects, such as the biomedical and
behavioural sciences. The autonomous nature of the research, and in many cases the privacy
concerns associated with human studies, have precluded the sharing of data or the pooling of
small data sets in centralised repositories. Here the research has been more competitive than
cooperative and any exchanges of data were typically done on an informal, collegial basis, rather
than through some formally structured data access regime. With the advent of higher capacity
computing and digital networks, however, some of these research areas have organised ‘big
science’ research programs (for example, the human genome project) and become much more
data-intensive. They have established their own specialised data centres (for example, genomic
and protein data in molecular biology) or formed distributed data networks with nodes (for
example, ecological or biodiversity data). J Reichman and Paul Uhlir, ‘A Contractually
Reconstructed Research Commons for Scientific Data in a Highly Protectionist Intellectual
Property Environment’ (Winter/Spring 2003) 66 Law and Contemporary Problems – Duke University
School of Law 343–4 and 426–7.
32 National Research Council, Preserving Scientific Data on Our Physical Universe (1995) 34–6.
33 Organisation for Economic Co-operation and Development, OECD Guidelines on the Protection
of Privacy and Transborder Flows of Personal Data (1980).
Legal Framework for e-Research: Realising the Potential 210
EMERGING OPEN ACCESS MODELS
The presumption of openness and the implementation of an open access
policy as the default rule in publicly funded research is certainly not a
revolutionary concept. Not only are there solid justifications for such a
policy as outlined above, but there are innumerable examples of
successful implementations of this policy in practice in both government
and government-funded institutions, in many fields of research, and in
many countries. In this section we characterise these examples broadly
and provide a number of specific references. Box 2 identifies a range of
distributed, open, collaborative research and information production and
dissemination activities using digital networks,34 while Box 3 provides
details about one compelling example, identified in Box 2, of open
access to academic materials at a world-class university.
There are many new kinds of distributed, open collaborative research
and information production and dissemination on digital networks.
Examples of open data and information production activities include:
Box 2
à Open-source software movement (such as, Linux and 10Ks of other
programs worldwide, many of which originated in academia and are
developed for research purposes);
à Distributed Grid computing or e-Science (such as, SETI@Home,
LHC@home);
à Community-based open peer review (such as, Journal of
Atmospheric Chemistry and Physics); and
à Collaborative research Web sites and portals (such as, NASA
Clickworkers, Wikipedia, Curriki).
The following are examples of open data and information dissemination
and permanent retention:
34 Paul Uhlir, ‘The emerging role of open repositories for the scientific literature as a
fundamental component of the public research infrastructure’ in G Sica (ed), Open Access: Open
Problems (2006).
Open Data for Global Science 211
à Open data centres and archives (such as, GenBank, the Protein
Data Bank, The SNP Consortium, Digital Sky Survey);Federated
open data networks (such as, World Data Centers, Global
Biodiversity Information Facility; NASA Distributed Active Archive
Centers);
à Virtual observatories (such as, the International Virtual Observatory
for astronomy, Digital Earth);
à Open access journals (such as, BioMed Central, Public Library of
Science, + > 2500 scholarly journals);
à Open institutional repositories for that institution’s scholarly works
(such as, the Indian Institute for Science, plus hundreds globally);
à Open institutional repositories for publications in a specific subject
area (such as, PubMedCentral, the physics arXiv);
à Free university curricula online (such as, the MIT
OpenCourseWare); and
à Emerging discipline-based commons (such as, the Conservation
Commons, the Geoscience Information Commons)
Legal Framework for e-Research: Realising the Potential 212
Box 3
The OpenCourseWare initiative at the Massachusetts Institute
of Technology
The digital revolution is transforming information economics in a radical
way. In the public science system one of the interesting trends is the
development of additional user bases for ‘secondary’ use of data,
information, and knowledge. When openly available, publicly funded
digital resources can have many new useful ‘lives’ in addition to their
primary uses. Use of the internet has minimised distribution costs.
Open access is a way of cutting transaction costs. Low access barriers
serve the original purposes of the public investment and increase the
return on the investment: a broader scientific workforce can be put to
work to get additional results without investments in additional
resources.
Low access barriers make it possible to meet an important demand that
cannot be served through traditional markets. For example, in 1999 the
Massachusetts Institute of Technology (MIT) investigated a business
model for selling its curriculum materials online. When it appeared that
there would be an insufficient market for this service, MIT did not
abandon the idea, but changed the original business model into one of
open access: the ‘OpenCourseWare’ initiative. The university now offers
free access to well over one thousand courses and has gotten hundreds
of million hits on its portal from educators, students, and self-learners
from all over the world. Of course, the project initially was greeted with
a great deal of apprehension among the MIT faculty, but eventually this
bold vision was accepted. As expressed by President Emeritus of MIT
Charles M Vest: ‘OpenCourseWare looks counterintuitive in a market-driven
world. But it really is consistent with what I believe is the best about MIT. It is
innovative. It expresses our belief in the way education can be advanced – by
constantly widening access to information and by inspiring others to participate.’
Together, these various open access activities constitute an emerging
globally networked ‘commons’ for public science, representing a broad
range of information types, institutional structures, disciplines, and
countries. A common policy aspect of all these activities is their
provision of free and open access online, with either reduced retention
Open Data for Global Science 213
of intellectual property rights through permissive licensing mechanisms35
or, much less frequently, a statutory public domain status.36
In the area of data from publicly funded research, there already are many
open access activities throughout the world, although no comprehensive
compendium currently exists. As indicated in Box 2 there are at least
two major types of institutional models specific to data: (1) open data
centres or archives, and (2) federated37 open data networks. The former
is a centralised model whereas the latter has a connected set of
distributed nodes. There are numerous examples of each type of open
35 For a selection of such permissive licensing templates, which use statutory intellectual
property protection, but with only ‘some rights reserved’ instead of all the rights accorded
under the statute, see the Creative Commons and its more recent Science Commons initiative
<http://www.creativecommons.org>.
36 The public domain status of factual data is a complex legal subject. Some countries expressly
exclude government-generated information from copyright. Moreover, under traditional
copyright law, factual compilations that lacked creativity or originality in their selection or
arrangement, like many of the databases that are the subject of discussion in this paper, were
not copyrightable and all the data in those compilations were in the public domain. However,
some jurisdictions had so-called ‘sweat-of-the-brow’ common-law protections (for example, the
United Kingdom and certain states in the United States), while others adopted more formal
statutory protection of non-copyrightable compilations (for example, the Scandinavian
Catalogue Rule). More recently, the European Union enacted exclusive property protection of
databases and compilations of information (Directive 96/9/EC of the European Parliament and the
Council of 11 March 1996 on the Legal Protection of Databases [1996] OJ L 077), which has been
implemented in all E.U. Member States and Affiliated States, as well as in some other countries.
This protection in most countries applies even to government and government-funded
databases. In most countries there are very limited exceptions for public-interest uses of data
(for example, for public scientific research or education), and in some jurisdictions (for
example, France, Italy, Greece) there are no exceptions at all. For a comprehensive description
and analysis of the E.U. Database Directive and its potential long-term effects of public
research, see J Reichman and Paul Uhlir, ‘Database Protection at the Crossroads: Recent
Developments and Their Impact on Science and Technology’ (Spring 1999) 14 (2) Berkeley
Technology Law Journal 819–21; and J Reichman and Paul Uhlir, ‘A Contractually Reconstructed
Research Commons for Scientific Data in a Highly Protectionist Intellectual Property
Environment’ (Winter/Spring 2003) 66 Law and Contemporary Problems – Duke University School of
Law 410–16.
37 This type of management structure for distributed scientific data archives and data centres
was first described in National Research Council, Preserving Scientific Data on Our Physical Universe
(1995) 51–3. This model was based on a ‘flat’ corporate management model described in
Charles Handy, ‘Balancing Corporate Power: A New Federalist Paper’ (1992) 70(6) Harvard
Business Review 59–72. The key elements of a federated management model are: subsidiarity (the
power is assumed to lie within the subordinate units of the organisation), pluralism
(interdependence of members), standardisation of key elements to facilitate cooperation and
interoperability, a separation of powers (responsibilities), and strong leadership from a small
central directorate that is effective but not overbearing.
Legal Framework for e-Research: Realising the Potential 214
access data model operated either directly by government agencies or by
government-funded entities (universities and not-for-profit research
institutes).
Despite the successful adoption of open data access policies and
practices in many areas of public research, the application of such
regimes remains fragmented and inconsistent—a patchwork of
uncoordinated and largely disparate activities, many of which are ad hoc,
bottom-up endeavours. In too many cases, establishing satisfactory
arrangements for data access seems to go beyond the means and
imagination available at the working level. If finding adequate solutions
without outside help is too much trouble, the researchers involved may
easily succumb to passive risk avoidance. In view of the potential
benefits that can be derived from increasing and improving access to
such resources, establishing a more transparent and predictable
environment that is coordinated at the national and international levels is
desirable.
Some science policy leaders have begun to address these exigencies at
the national level. For example, China established the Scientific Data
Sharing Program in 2002.38 Canada launched a National Consultation
on Access to Scientific Research Data in 200439 and, that same year, the
Research Council of Norway released a white paper documenting the
important role of databases as a research infrastructure component.40 In
2005, the U.S. National Science Board called for an initiative to develop
a national policy framework for long-lived data collections,41 which was
followed up by the establishment of an Interagency Working Group on
Digital Data in the White House Office of Science and Technology
38 Jinpei Cheng, ‘Development of China’s Scientific Data Sharing Policy’ in Julie Esanu and
Paul Uhlir (eds), Strategies for Preservation of and Open Access to Scientific Data in China (2006). Also
discussed in the article by Guan-hua Xu in Open Data for Global Science – Special Issue, Paul Uhlir
(ed), CODATA Data Science Journal, (2007).
39 David Strong, and Peter Leach (National Research Council), National Consultation on Access to
Scientific Research Data (2005) 82. Also discussed in the article by Sabourin and Dumouchel in
Open Data for Global Science – Special Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007).
40 The Research Council of Norway, The Need for Scientific Equipment, Databases, Collections of
Scientific Material, and Other Infrastructure (2004) report submitted as input to the White Paper on
Research (2005) Oslo (Abridged English version).
41 National Science Board (National Science Foundation), Long-Lived Digital Data Collections:
Enabling Research and Education in the 21st Century (2005) 64.
Open Data for Global Science 215
Policy.42 Most research funding agencies in the United States also have
developed data policy guidelines for their grantees that encourage data
sharing or deposits in established community data repositories, within
specific discipline or research program contexts. However, the existing
institutional policies still remain ad hoc and sub-optimally coordinated at
the national level in the United States, as in most other countries.
At the international level, initiatives such as the Budapest Open Access
Initiative, the Bethesda Declaration, and the Berlin Declaration,43
although focused more on open access to the scholarly journal literature
than to the data, have helped to pave the way for further national
policies. The new ‘Guidelines for Access to Research Data from Public
Funding’ from the OECD, endorsed by the governments of OECD
countries (as discussed towards the end of this paper44), may be expected
to play an important catalytic role.
While these incipient institutional models and policy approaches are
commendable indicators that the scientific community is awakening to
the opportunities and challenges of comprehensively rationalised data
access regimes in public science, a great deal more can and should be
done. And although the patchwork quilt of bottom-up data access
regimes has served some research communities well in some cases, this
loosely decentralised aggregation of approaches could achieve much
greater results from a concerted national and international policy and
funding focus.
TOWARD OPEN DATA REGIMES: GUIDING
PRINCIPLES AND FLEXIBLE CONTRACTUAL
TEMPLATES
The foregoing discussion has sought to develop a rationale for more
formalised data access policies and procedures in public research, based
42 Declan Butler, ‘Agencies join forces to share data’ (2007) 446 Nature 354.
43 The Budapest Open Access Initiative (2002) is available at:
<http://www.soros.org/openaccess/read.shtml/>; the Bethesda Statement on Open Access
Publishing (2003) is available at: <http://www.earlham.edu/~peters/fos/bethesda.htm/>; and
the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003) is available at:
<http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html/>.
44 This is also discussed in an article in Open Data for Global Science – Special Issue, Paul
Uhlir (ed), CODATA Data Science Journal, (2007).
Legal Framework for e-Research: Realising the Potential 216
on a core default principle of openness. The benign neglect of research
data and databases thus far has not been regarded as a significant policy
blunder. The most pressing database requirements seem to have been
met through the ad hoc resourcefulness and volunteerism of dedicated
individuals in public science.45 But the brief history of the digital age
already is replete with major losses of data and missed opportunities46
that are certain to multiply in the absence of sustained focus and action.
As previously discussed, it also is important to recognise that public
policies in the developed and developing countries alike are shaped by
legitimate considerations and interests that do not leave all scientific
information and data in the public domain or under pure open access
conditions. Instead, they impose limitations upon openness and
cooperation in the conduct of public research and the utilisation of its
findings, in varying degrees and for a variety purposes. Consequently,
there is a need for public policies and institutional arrangements to seek
a judicious balance between positive and negative effects upon the
conduct of publicly funded research that are likely to ensue from the
granting and enforcing of private ownership rights in scientific and
technical data and information. Yet, in recent decades the policy balance
in this regard has been disrupted in ways that some science policy
analysts perceive as threatening the long-term vitality of fundamental
scientific research.47
A successful data access regime must involve a comprehensive
framework of policies and procedures that are based on a complete set
of supporting principles and guidelines. Areas that require attention in
developing principles and subsequent access regimes include
organisational and management, financial and economic, legal, socio-
cultural, and technical considerations.48 The costs of inaction in the
current state of affairs continue to accumulate, while the opportunities
45 Stephen Maurer, Richard Firestone and Charles Scriver, ‘Science’s neglected legacy’ (2000)
405 Nature.
46 See, for example, National Research Council, Bits of Power: Issues in Global Access to Scientific
Data (1997) 121–4.
47 See, for example, Paul Uhlir, ‘Discussion Framework’ in Julie Esanu and Paul Uhlir (eds), The
Role of Scientific and Technical Data and Information in the Public Domain (2003) 129–32.
48 Arzberger et al, ‘Science and Government: An International Framework to Promote Access
to Data’ 303 Science 1777–8.
Open Data for Global Science 217
provided by the emerging cyberinfrastructure and new science initiatives
will remain suboptimal.
Because of the diverse role of data in different fields of research, and the
diverse and sometimes competing interests of the different stakeholders
in the research enterprise, the formal data regimes need to be tailored to
specific circumstances, but managed for the greatest return on the public
investments. These conditions make it essential for most policy
directives from the top at the national and international levels to be
flexible and not rigidly prescriptive, while providing sufficiently strong
and comprehensive guidance to the entities at the working level to
implement effective regimes that are responsive to their particular
interests.
In this final section we examine some mechanisms that can improve
top-down guidance on the one hand, and bottom-up flexibility on the
other. The former are the high-level international principles that can
help guide the development of specific data access regimes at the
(inter)national level. The latter involve the practical implementation
through the development and voluntary adoption of new licensing
templates that rights holders can select as standard options to provide
access and use on less restrictive terms and conditions. We conclude
with a brief overview of a major new initiative that seeks to integrate
more effectively the top down and bottom up approaches.
Guiding principles
A good starting point for regulation at the more general level is the
development of international principles, based on consensus by the
national participants, which can help provide guidance to the
governments, the public agencies, institutions, and individual researchers
engaged in publicly funded research worldwide.49 Coherent, consensus-
49 One example of this type of consensus-building international process is the OECD Ministerial
Declaration on Access to Research Data from Public Funding of 30 January 2004 and the 2007 OECD
Guidelines that followed it, as described by Pilat and Fukasaku in Open Data for Global Science –
Special Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007). The Declaration was inspired
by the successful examples of data sharing on the (inter)national and institutional levels. The
science ministers agreed that OECD guidelines would contribute to reach common science
policy goals by improving the quality and productivity of scientific research and increasing the
cost effectiveness of public investment in scientific research. The essence of the Declaration lies
in the Principles that systematically treat the main points of the data access issues that have
been worked out in subsequent Guidelines.
Legal Framework for e-Research: Realising the Potential 218
based international principles, building on the experience of established
successful models, should provide a number of benefits. They indicate
the collective importance placed by science leaders in the national
governments to the public research data issues. They can articulate a
rationale and responsibility for improving the management and funding
of the public data resources. They can provide guidance for the
development of new access regimes based on a common set of values
and objectives. And they can help establish an international level playing
field for research and industry. The end result may be expected to lead
to a higher return on public investments in research and substantial
increases in productivity and cost-effectiveness.
The development of overarching international principles that cover
publicly-funded research data in many countries can only be restricted to
the essentials, of course. In the many different countries, disciplines,
and institutes complete compliance with the principal rules will be
difficult, and there will always be exceptions to the rules. Context-
dependent solutions will have to be found, but all of these exceptions
cannot and should not be part of the principles. The perspective can
only be that of stating the default rules, including the core openness
principle. Applying the principles and working out the specific details
will be the responsibility of the stakeholders identified above—the
national governments, public research funding agencies, and universities
and public research institutes—in collaboration with the research
community, as represented by the learned societies and the private
sector. The principles therefore should offer the general international
guidance for further regulation by the parties more directly involved.
The principles should not conflict with national legislation, nor harm
other national, institutional, or individual interests. Strong, simple
principles should be distilled from a much more extensive body of input
and from a broad consultative process.
At the level of international science policy, principles represent the
broadest common denominator of existing policies and (best) practices.
But from this common ground they should guide emerging processes of
change. International principles ultimately may look like abstract
noncommittal generalities, but they can empower those who have to
find the practical solutions with the right guidance for implementation.
Open Data for Global Science 219
Finally, international principles should be part of a common policy
strategy to seize the new opportunities to increase the return on public
investment in research and enhance the productivity and quality of
research. The high-level principles should have primacy—they are the
Why in the process. The principles then need to be implemented in a
sensible access regime by the research organisations – the How in the
process.
Contractual templates for the flexible implementation of the
openness principle
To implement the general guiding principles, one way to deal with the
potential imbalance in the statutory intellectual property system is to
seek to amend the aspects that affect public research most negatively.
However, this is not easily done, especially in view of the fact that many
of these laws are quite recent and largely have ignored such
considerations as they were debated and enacted.
There is, however, another and rather different approach whose practical
aspects merit wide attention and support to its further development.
The proposed approach consists of the voluntary use of the rights held
by intellectual property owners, which allow them to construct by means
of licensing contracts conditions of ‘common-use’ that emulate the key
features of the public domain that are most beneficial for collaborative
research in all its forms. The intention is to promote the cooperative use
of scientific data, information, materials and research tools that actually
are not in the public domain, and whose licensed use is therefore legally
protected by an intellectual property regime. Such an undertaking may
be properly described as creating ‘global information commons for
science’, inasmuch as a ‘common’ constitutes a collectively held and
managed bundle of resources to which access by cooperating parties is
rendered open (though perhaps limited in its extent or use) under
minimal transactions cost conditions.
The economic logic and practical feasibility of the ‘contractually
constructed commons’ approach can be derived from non-market
mechanisms constructed as systems of customary rights and restraints.
Historically, it was deliberate acts of private enclosure rather than some
imagined tragedy of over-grazing that often spelled the end of the
agrarian commons. The legal system today makes it possible for the
Legal Framework for e-Research: Realising the Potential 220
owners of a tangible resource held in common to protect their collective
use-rights, and manage their contractually constructed common-pool so
as to sustain and augment the benefits that it yields. Consequently,
because information cannot be depleted by overuse, individuals having
private ownership rights in intellectual property may voluntarily use
contracts to construct a common use-rights area that is all inclusive, in
granting access to those wishing to use the contents. Furthermore, and
because the common in this case is owned and not part of the public
domain, the benefits that all users can enjoy from such an arrangement
may be preserved and enhanced. This can be accomplished by reserving
the legal right to exclude certain usage practices that might otherwise
undermine the willingness of others to similarly pool the information
that they have created.
The respective rights of the participants in the public research system
can be most effectively mediated through the use of contracts at the
individual researcher and institutional levels. Common-use licensing
approaches that promote broad access and reuse rather than restrict it,
such as those being developed by the new Science Commons under the
Creative Commons mentioned ‘Emerging Open Access Models’, above,
can preserve essential ownership rights while improving the social
benefits and returns on the public investments in research.50 They can
help to achieve a productive balance between the domains of proprietary
R&D and publicly- funded open science.
TOWARDS GLOBAL INFORMATION COMMONS FOR
SCIENTIFIC DATA AND INFORMATION
The rationalisation of policies and practices across nations, institutions,
and disciplines may be expected to result in much greater social and
economic impact from the investment in public research overall by
enabling greater access to and use of scientific data and information
resources, and by facilitating interdisciplinary and international
cooperation in public science and education. Because of the
international scope of digital networks and research collaborations,
strategic international approaches for building information commons are
both necessary and desirable. In short, the adoption in recent years of
50 See the companion article by Onsrud and Campbell in Open Data for Global Science – Special
Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007).
Open Data for Global Science 221
the many innovative and promising open initiatives and common-use
licensing approaches from the bottom up, coupled with the introduction
of some new top-down policy proposals at the international level (at the
OECD) and at the national level in several countries, make this an
appropriate time to integrate these efforts.
It is for all the reasons established in this article that several international
science policy organisations—CODATA, ICSU, and Science
Commons—are joining efforts to launch the Global Information
Commons for Science Initiative. This Initiative51 has the overall goal to
accelerate the development and scaling up of open scientific data and
information resources on a global basis, with particular focus on
‘common use’ licensing approaches. The specific objectives are to:
1. Improve understanding and increase awareness of the
societal and economic benefits of easy access to and use
of scientific data and information, especially focusing
on those resulting from governmental or publicly
funded research activities;
2. Promote the broad adoption of successful institutional
and legal models for providing open availability on a
sustainable basis and facilitating reuse of data and
information;
3. Help coordinate the efforts of the many stakeholders in
the world’s diverse research community who are
engaged in devising and implementing effective
51 The original ideas for the Global Information Commons for Science Initiative were
presented in a series of reports published at the U.S. National Academies, in a seminal article J
Reichman and Paul Uhlir, ‘A Contractually Reconstructed Research Commons for Scientific
Data in a Highly Protectionist Intellectual Property Environment’ (Winter/Spring 2003) 66 Law
and Contemporary Problems – Duke University School of Law 410–16 and in P David and M Spence,
Toward Institutional Infrastructures for e-Science: The Scope of the Challenges (Report to the Joint
Information Systems Committee of the Research Councils of Great Britain, Oxford Internet
Institute Report No 2, 2003) <http://www.oii.ox.ac.uk/resources/publications/OIIRR_E-
Science_0903.pdf>. These ideas were more fully fleshed out following an international
workshop at UNESCO Headquarters in Paris on 1–2 September 2005 on the theme ‘Creating
the Information Commons for Science: Toward Institutional Policies and Guidelines for
Action’ (details of the Workshop rationale and proceedings, are available at:
<http://www.codataweb.org/UNESCOmtg/index.html>). That event was organised by
CODATA with the joint sponsorship of ICSU, ICSTI, INASP, UNESCO, and TWAS, and
with the collaboration of the OECD.
Legal Framework for e-Research: Realising the Potential 222
approaches to attaining these objectives, with particular
attention to the circumstances of the developing as well
as developed countries.
4. Develop an online ‘open knowledge environment’ to
promote all of the objectives of the Initiative, including
providing an online collaboratory for work with
different research communities to define, test, analyse,
and create new knowledge about the information
commons paradigm.
In our view, such an Initiative can help devise and promote new
normative and legal structures for the exchange of data and information
that are expected to be especially well-suited for the future conduct of
collaborative research in many domains of science. By rationalising the
policy and management systems in publicly funded research, the value of
global digital networks and related technological advances to the
progress of science can be fully realised.
... 3) Long term preservation (pelestarian) institutional repository menjamin pelestarian karya intelektual yang disimpan dalam format digital. 4) Easier and faster accessibility (aksesibilltas yang lebih mudah dan cepat) institutional repository dapat ditemukan melalui mesin pencari sehingga mudah diakses dan berdampak pada banyaknya pengguna dan sitasi, 5) Copyright (hak cipta) institutional repository tidak akan melanggar hak cipta, karena sivitas akademik dapat menyimpan karyanya dan tetap mempertahankan hak cipta pada karyanya (Uhlir and Schröder 2007). ...
Article
Full-text available
Institutional repository provides an opportunity for visitors to be able to access the collection or information owned by the library without having to come to the library. Through the institutional repository of the library helps promote the scientific publications produced by the academic community and business will be able to increase the number of citation from a publication. The management of scientific publications using the institutional repository is already contained in the Circular Letter of the Director general of higher education decree No. 152/E/T/2012 dated January 27, 2012 about the Publication of Scientific papers (S1, S2, and S3) and the Circular of the directorate of higher education No. 1864/E4/2015 dated 15 July 2015, concerning Credit Score Assessment (PAK) (Lecturer must be able to be traced online). But seeing the urgency the application or use of the institutional repository is not all institutions of higher education implement it. One of the only private high school in Yogyakarta . Interestingly the agency has not been applying the institutional repository.This study uses a case study research Methodology qualitative. A Single approach Grounded (embedded research), as a guide to conduct and determine the flow of the research. This research was conducted by direct interview, the main data Source in qualitative research are the words and actions of the informant is a librarian. However, to maintain the privacy and good name of the institution, the informant did not deign to mention the name of the college/ institution in the research. Results and discussion : 1) a Lot of Things That Should be Prepared; 2) the Limited Fulfillment of the Needs of Implementation; 3) the Demands On the Application of the Institutional Repository Conclusion : it is Necessary the presence of the ability of human resources management and technology resources and information as well as the commitment of the institution to the organization with the support of the entire academic community members in filling the content of institutional repository. As well as his involvement the role of librarians as a profession that is in charge of going to this documentation.Keywords: College, Institutional Repository, Library, Librarian, Policy. AbstrakInstitutional repository memberikan peluang bagi pemustaka untuk dapat mengakses koleksi atau informasi yang dimiliki perpustakaan tanpa harus datang ke perpustakaan. Melalui institutional repository perpustakaan membantu mempromosikan publikasi ilmiah yang dihasilkan oleh sivitas akademika dan usaha ini akan mampu meningkatkan jumlah sitiran dari publikasi. Pengelolaan publikasi ilmiah dengan menggunakan institutional repository sudah termuat pada Surat Edaran Dirjen Dikti No. 152/E/T/2012 tanggal 27 Januari 2012 tentang Publikasi Karya Ilmiah (S1, S2, dan S3) dan Surat Edaran Dikti No 1864/E4/2015 tanggal 15 Oktober 2015 perihal Penilaian Angka Kredit (PAK) Dosen (harus dapat ditelusur secara online). Namun melihat urgensinya penerapan atau penggunaan institutional repository tersebut belum semuanya lembaga perguruan tinggi menerapkannya. Salah satunya sekolah tinggi swasta di Yogyakarta. Menariknya lembaga tersebut belum menerapakan institutional repository. Penelitian ini menggunakan penelitian studi kasus dengan metodologi kualitatif. Pendekatan tunggal terpancang (embedded research), sebagai pedoman untuk melakukan dan menentukan alur penelitian. Penelitian ini dilakukan secara wawancara langsung, sumber data utama dalam penelitian kualitatif ialah kata-kata dan tindakan dari informan yakni pustakawan. Namun untuk menjaga privasi dan nama baik institusinya, informan tidak berkenan untuk menyantumkan/menyebutkan nama perguruan tinggi/ institusinya dalam penelitan. Hasil dan pembahasan : 1) Banyak Hal Yang Harus Disiapkan; 2) Tuntutan Terhadap Penerapan Institutional Repository; 3) Terbatasnya Pemenuhan Kebutuhan Implementasi. Simpulan: Perlu adanya kemampuan pengelolaan sumber daya manusia dan sumber daya teknologi dan informasi serta komitmen institusi terhadap penyelenggaraan dengan dukungan seluruh komponen sivitas akademik dalam mengisi konten institutional repository. Serta keterlibatannya peran pustakawan sebagai profesi yang membidangi akan hal dokumentasi.Kata kunci: Institutional Repository, Kebijakan, Perpustakaan, Perguruan Tinggi, Pustakawan.
... Open data also reduces the need for individual data collection. Researchers can skip the often costly and resource intensive steps of data collection (Uhlir and Schröder 2007). Hence, scholars can publish related studies quicker and therefore spend more time on other discoveries (Fischer and Zigmond 2010). ...
Article
Full-text available
We investigate what fosters or inhibits data sharing behaviour in a sample of 173 innovation management researchers. Theoretically, we integrate resource-based arguments with social exchange considerations to juxtapose the trade-off between data as a proprietary resource for researchers and the benefits that reciprocity in academic relations may provide. Our empirical analysis reveals that the stronger scholars perceive the comparative advantage of non-public datasets, the lower the likelihood of data sharing. Expected communal benefits may increase the likelihood of data sharing, while negative perceptions of increased data scrutiny are consequential in inhibiting data sharing. Only institutional pressure may help to solve this conundrum; most respondents would therefore like to see journal policies that foster data sharing.
... The U.S. OD initiative was inaugurated in 2009 by the President's Memorandum on Transparency and Open Government, followed by the U.K. government's initiative regarding OD in 2011 (Meijer et al., 2014). Although the majority of OD initiatives are in public sectors, OD is not limited to "open government" but to other fields, too, including science, economics, and culture (Aiello et al., 2019;Uhlir & Schröder, 2007). OD is also becoming important in research and has the potential to improve the governance of public institutions (Schalkwyk et al., 2016). ...
Article
Full-text available
Data are the most important resource of the 21st century. The open data (OD) movement provides publicly available data for the development of a knowledge-based society. As such, the concept of OD is a valuable information technology (IT) tool for economic, social, and human development, which adds value. To further develop these processes on a global scale, users need to manage the quality of OD in their practices. Otherwise, what is the point of using data just for the sake of using it (in science or practice) without thinking about data compliance with norms, standards, and so forth? This article aims to provide an overview of (meta)data quality dimensions, sub-dimensions, and metrics used within OD assessment-related research papers. To achieve this, the authors performed a systematic literature review (SLR) and extracted data from 86 relevant studies dealing with the evaluation of OD. The article endows the progress made so far in OD assessment research. Findings of reviewing the assessment of the OD in the light of existing (meta)data quality dimensions unveil the potential of metadata. Furthermore, the analysis disclosed the need for greater use of quantitative methods in research, and metadata can greatly assist in this.
... This requires sociological change such as incentives and reducing barriers to data sharing through citation and use metrics (Costello et al., 2013) and through supporting education and establishing community standards (Kattge et al., 2020;Michener, 2015). Ultimately, this would reduce research costs, improve collaborative efforts and increase research opportunities (Uhlir and Schröder, 2007). Many raw data records and long-tail datasets may not even be available in a digital spreadsheet (e.g. ...
Article
Full-text available
Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.
... The digital revolution has transformed curated public research data into an essential upstream resource whose value increases with use. The use of Open Data [30] in artificial intelligence (AI) systems makes those systems highly replicable and allows users to gather data to replicate these systems tailored to their needs. Thus, in order to maximize the applicability of our model, our aim is to select Open Data that is publicly available and allows for the reusability of the model. ...
Article
Full-text available
With climate change driving an increasingly stronger influence over governments and municipalities, sustainable development, and renewable energy are gaining traction across the globe. This is reflected within the EU 2030 agenda, that envisions a future where there is universal access to affordable, reliable and sustainable energy. One of the challenges to achieve this vision lies on the low reliability of certain renewable sources. While both particulars and public entities try to reach self-sufficiency through sustainable energy generation, it is unclear how much investment is needed to mitigate the unreliability introduced by natural factors such as varying wind speed and daylight across the year. In this sense, a tool that aids predicting the energy output of sustainable sources across the year for a particular location can aid greatly in making sustainable energy investments more efficient. In this paper, we make use of Open Data sources, Internet of Things (IoT) sensors and installations distributed across Europe to create such tool through the application of Artificial Neural Networks. We analyze how the different factors affect the prediction of energy production and how Open Data can be used to predict the expected output of sustainable sources. As a result, we facilitate users the necessary information to decide how much they wish to invest according to the desired energy output for their particular location. Compared to state-of-the-art proposals, our solution provides an abstraction layer focused on energy production, rather that radiation data, and can be trained and tailored for different locations using Open Data. Finally, our tests show that our proposal improves the accuracy of the forecasting, obtaining a lower mean squared error (MSE) of 0.040 compared to an MSE 0.055 from other proposals in the literature.
Chapter
Full-text available
Different norms, rules and practices (referred as institutions) organize the exchange of germplasm to address broader global challenges such as advancement of science and innovation, food security, sustainable agriculture and global equity. Some of these institutions are now embedded in various treaties and national regulations. This chapter demonstrates that these regulations are not as successful as they could be because they only partially integrate the complexity of the germplasm exchange environment. In order to better understand how germplasm exchange could be improved, it is important to go beyond the often-employed legalistic approach to examine the social contexts in which exchange takes place.
Article
Full-text available
Making decisions regarding data and the overall credibility of research constitutes research data governance. In this paper, we present results of an exploratory study of the stakeholders of research data governance. The study was conducted among individuals who work in academic and research institutions in the US, with the goal of understanding what entities are perceived as making decisions regarding data and who researchers believe should be responsible for governing research data. Our results show that there is considerable diversity and complexity across stakeholders, both in terms of who they are and their ideas about data governance. To account for this diversity, we propose to frame research data governance in the context of polycentric governance of a knowledge commons. We argue that approaching research data from the commons perspective will allow for a governance framework that can balance the goals of science and society, allow us to shift the discussion toward protection from enclosure and knowledge resilience, and help to ensure that multiple voices are included in all levels of decision-making.
Chapter
Open Science not only means the openness of various resources involved in a scientific study but also the connections among those resources that demonstrate the origin, or provenance, of a scientific finding or derived dataset. In this chapter, the authors used the PROV Ontology, a community standard for representing and exchanging machine-readable provenance information in the Semantic Web, and extended it for capturing provenance in the IPython Notebook, a software platform that enables transparent workflows. The developed work was used in conjunction with scientists' workflows in the Ecosystem Assessment Program of the U.S. NOAA Northeast Fisheries Science Center. This work provides a pathway towards formal, well-annotated provenance in an electronic notebook. Not only will the use of such technologies and standards facilitate the verifiability and reproducibility of ecosystem assessments, their use will also provide solid support for Open Science at the interface of science and ecosystem management for sustainable marine ecosystems.
Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century
41 National Science Board (National Science Foundation), Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century (2005) 64.
Also discussed in the article by Sabourin and Dumouchel in Open Data for Global Science -Special Issue
  • David Strong
  • Peter Leach
David Strong, and Peter Leach (National Research Council), National Consultation on Access to Scientific Research Data (2005) 82. Also discussed in the article by Sabourin and Dumouchel in Open Data for Global Science -Special Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007).
The Need for Scientific Equipment, Databases
The Research Council of Norway, The Need for Scientific Equipment, Databases, Collections of Scientific Material, and Other Infrastructure (2004) report submitted as input to the White Paper on Research (2005) Oslo (Abridged English version).
Also discussed in the article by Guan-hua Xu in Open Data for Global Science -Special Issue
  • Jinpei Cheng
Jinpei Cheng, 'Development of China's Scientific Data Sharing Policy' in Julie Esanu and Paul Uhlir (eds), Strategies for Preservation of and Open Access to Scientific Data in China (2006). Also discussed in the article by Guan-hua Xu in Open Data for Global Science -Special Issue, Paul Uhlir (ed), CODATA Data Science Journal, (2007).