ArticlePDF Available

Report of a Workshop on "History & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures

Authors:
  • Research At Work Consulting

Figures

Content may be subject to copyright.
Paul N. Edwards
Steven J. Jackson
Geoffrey C. Bowker
Cory P. Knobel
January 2007
Report of a Workshop on “History & eory of Infrastructure:
Lessons for New Scientific Cyberinfrastructures”
© 1986 Scott Mutter
NSF Grant 0630263 • Human and Social Dynamics • Computer and Information Science and Engineering • Office of Cyberinfrastructure
Table of Contents
EXECUTIVE SUMMARY .................................................................................................................I
I. INTRODUCTION ........................................................................................................................... 1
BACKGROUND TO THE WORKSHOP .................................................................................................................... 1
THE LONG NOW OF INFRASTRUCTURE ............................................................................................................... 3
DEFINING CYBERINFRASTRUCTURE.................................................................................................................. 5
BUILDING CYBERINFRASTRUCTURE ................................................................................................................. 6
II. DYNAMICS .................................................................................................................................... 7
A HISTORICAL MODEL OF INFRASTRUCTURE DEVELOPMENT........................................................................... 8
SYSTEMS VS. NETWORKS AND WEBS............................................................................................................... 11
REVERSE SALIENTS .......................................................................................................................................... 14
GATEWAYS....................................................................................................................................................... 15
PATH DEPENDENCE .......................................................................................................................................... 17
SCALE EFFECTS ................................................................................................................................................ 19
APPLYING THE HISTORICAL MODEL TO CYBERINFRASTRUCTURE.................................................................. 21
III. TENSIONS .................................................................................................................................. 24
INTEREST AND EXCLUSION .............................................................................................................................. 24
OWNERSHIP AND INVESTMENT MODELS ......................................................................................................... 29
DATA CULTURES, DATA TENSIONS .................................................................................................................. 31
IV. DESIGN ....................................................................................................................................... 33
STANDARDS AND FLEXIBILITY........................................................................................................................ 36
V. CONCLUSIONS AND RECOMMENDATIONS................................................................... 37
RECOMMENDATIONS........................................................................................................................................ 39
Learning from cyberinfrastructure............................................................................................................ 39
Improving cyberinfrastructural practice .................................................................................................. 41
Enhancing resiliency, sustainability, and reach....................................................................................... 42
VI. BIBLIOGRAPHY....................................................................................................................... 43
APPENDIX A. CONFERENCE PARTICIPANTS ..................................................................... 48
Understanding Infrastructure
i
Executive Summary
National Science Foundation support for scientific cyberinfrastructure dates to the
1960s. Since about 2000, however, efforts in cyberinfrastructure development have
gathered momentum, guided by an increasingly comprehensive vision. Yet assembling
the range of NSF-sponsored projects into a genuine infrastructure highly reliable,
widely accessible basic capabilities and services supporting the full range of scientific
work — remains an elusive goal. Close study of other infrastructures, from railroads and
electric power grids to telephone, cellular services, and the Internet, provides insights
that can help guide and consolidate the NSF vision.
Since the 1980s, historians, sociologists, and information scientists have been studying
how and why infrastructures form and evolve; how they work; and how they (sometimes)
disintegrate or fail. In September 2006, a three-day NSF-funded workshop on “History
and Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures” took place
at the University of Michigan. Participants included experts in social and historical
studies of infrastructure development, and domain scientists, information scientists, and
NSF program officers involved in building, using, and funding cyberinfrastructure. The
goal was to distill concepts, stories, metaphors, and parallels that might help realize the
NSF vision for scientific cyberinfrastructure. This report summarizes the workshop
findings, and outlines a research agenda for the future.
Social and historical analyses reveal some base-level tensions that complicate the work
of infrastructural development. These include:
Time, e.g. short-term funding decisions vs. the longer time scales over which
infrastructures typically grow and take hold
Scale, e.g. disconnects between global interoperability and local optimization
Agency, e.g. navigating processes of planned vs. emergent change in complex
and multiply-determined systems.
Such complications challenge simple notions of infrastructure building as a planned,
orderly, and mechanical act. They also suggest that boundaries between technical and
social solutions are mobile, in both directions: the path between the technological and
the social is not static and there is no one correct mapping. Robust cyberinfrastructure
will develop only when social, organizational, and cultural issues are resolved in tandem
with the creation of technology-based services. Sustained and proactive attention to
these concerns will be critical to long-term success.
Dynamics. Historical infrastructures – the automobile/gasoline/roadway system,
electrical grids, railways, telephony, and most recently the Internet – become ubiquitous,
accessible, reliable, and transparent as they mature. The initial stage in infrastructure
formation is system-building, characterized by the deliberate and successful design of
technology-based services. Next, technology transfer across domains and locations
results in variations on the original design, as well as the emergence of competing
systems. Infrastructures typically form only when these various systems merge, in a
process of consolidation characterized by gateways that allow dissimilar systems to be
linked into networks. In this phase, standardization and inter-organizational
communication techniques are critical. As multiple systems assemble into networks, and
Understanding Infrastructure
ii
networks into webs or “internetworks,” early choices constrain the options available
moving forward, creating what historical economists call “path dependence.
Tensions. Transparent, reliable infrastructural services create vast benefits, but there
are always losers as well as winners in infrastructure formation. Questions of ownership,
management, control, and access are always present. For example:
Who decides on rules and conventions for sharing, storing, and preserving data?
Local variation vs. global standards: how do we resolve frictions between
localized routines and cultures that stand in the way of effective collaboration?
How can national cyberinfrastructure development move forward without
compromising possibilities for international or even global infrastructure
formation?
Design. These and other tensions inherent to infrastructure growth present imperatives
to develop navigation strategies that recognize the likelihood of unforeseen (and
potentially negative) path dependence and/or institutional or cultural barriers to adoption.
Cyberinfrastructure seeks to enable a decentralized research environment that: 1)
permits distributed collaboration; 2) provides incentives for participation at all levels; and
3) encourages the advancement of cross-boundary and interdisciplinary scholarship.
Since all three of these goals are simultaneously social and organizational in nature and
central to the technical base, designing effective navigation strategies will depend on
strategic collaborations between social, domain, and information scientists. In particular,
comparative studies of cyberinfrastructure projects can reveal key factors in success
(and failure). Research on practices of standardization and modularity can help retain
the openness, flexibility, and broad-scale usability of cyberinfrastructure, minimizing the
path-dependent effects of standard-setting.
Recommendations: NSF should consider action in three broad areas.
Learning from cyberinfrastructure. By applying well-understood evaluation
tools, we can assess and compare existing cyberinfrastructure projects, both in
the US and abroad. The resulting knowledge can be used to improve reporting
mechanisms and incentive structures. Cyberinfrastructure projects can also be
instrumented to collect social and organizational data.
Improving cyberinfrastructural practice. Social science research can assist
with NSF goals of training and enrolling professionals into the
cyberinfrastructure-based research agenda. These goals may be achieved in
part by improving diagnostics for current research environments, providing direct
training for information managers, graduate students, and early-career faculty,
and developing funding structures that support work on multiple time scales.
Enhancing resilience, sustainability, and reach. Since infrastructures develop
by creating links among varied systems, the NSF agenda may be promoted by
forging and strengthening connections outside academic and governmental
channels. Social scientists can help to recruit under-represented groups and
institutions, as well as to create partnerships with organizations that have
substantial existing expertise in areas complementary to scientific research, such
as intellectual property standards and management.
Understanding Infrastructure
1
I. Introduction
Background to the workshop
Academic scientists and funding agencies throughout the advanced industrialized world
have recently embarked on major efforts to imagine, develop, and build new forms of
“cyberinfrastructure” or “e-science”.
1
Separately and for several decades, historians and
social scientists have studied the development of other kinds of infrastructure (railroads,
waterworks, highways, telephony, business communication systems, the Internet, etc.).
Reading across the body of this work produces two striking general results: first, there is
a good deal of contingency, uncertainty, and historical specificity that attends any
process of infrastructural development; second, despite these variations, there are
shared patterns, processes, and emergent lessons that hold widely true across the
comparative history and social study of infrastructure. This report represents a first
attempt to bring these two fields of inquiry and practice together. In particular, it seeks to
distill from the messy history and practice of infrastructure some general lessons and
principles that might inform, guide, and in some cases caution the contemporary work of
cyberinfrastructural development.
The report reflects the findings of the NSF-funded workshop, History and Theory of
Infrastructure: Lessons for New Scientific Cyberinfrastructures. Hosted by the University
of Michigan School of Information, and sponsored by the National Science Foundation’s
Human and Social Dynamics Program, the Computer and Information Science and
Engineering Directorate, and the Office of Cyberinfrastructure,
2
the workshop brought
together more than thirty historians, social scientists, domain scientists, and
cyberinfrastructure developers for three days of open, focused, interdisciplinary
discussion around the patterns, perils, and possibilities of infrastructure (both cyber and
other).
3
The group was charged with three general tasks: first, to identify dynamics, tensions,
strategies, and design challenges that are common across the wider history and
contemporary practice of infrastructural development; second, to begin to distill from this
1
While we use the NSF term “cyberinfrastructure” throughout this report, similar
arguments can be made for the UK “e-scienceprogram and other efforts to develop
new computational infrastructure in support of innovative and collaborative science.
2
NSF grant #0630263. We thank each of these entities for their generous support.
3
We thank the workshop participants for their many and invaluable contributions. In
particular, Tineke Egyedi, Cal Lee, Erik van der Vleuten, and JoAnne Yates composed
“think pieces” which we used as the basis for workshop sessions (and for parts of this
report). These individuals, as well as Johan Schot, Jane Summerton, and Fran Berman,
also advised us during a planning session held at the NSF on June 26, 2006. University
of Michigan doctoral students Clapperton Mavhunga, Yong-Mi Kim, Charles Kaylor, and
Trond Jacobsen served as rapporteurs and technical assistants, under the capable
direction of Cory Knobel. For a full list of all workshop participants including disciplinary
and institutional affiliations, see Appendix A.
Understanding Infrastructure
2
collected experience concrete lessons and principles that might shape and inform the
activities indicated under the National Science Foundation’s Vision for
Cyberinfrastructure; and third, to propose a research agenda for cyberinfrastructure
studies.
4
Our work drew upon and benefited greatly from the valuable series of
“Cyberinfrastructure for…” reports addressing applications of CI in various domains, but
this workshop concerned a different question: rather than “What can cyberinfrastructure
bring to the social sciences and humanities?” we asked “What can the findings and
methods of social and historical analysis bring to the development of
cyberinfrastructure?”
5
The report that follows delivers the workshop’s key findings in five sections. The
Introduction provides a general overview and orientation to concepts of infrastructure,
along with a brief overview of cyberinfrastructure and the NSF’s cyberinfrastructure
program. The Dynamics section surveys questions relating to the genesis, development,
and scaling of infrastructure, and includes examples, patterns, and principal findings
from historical studies of infrastructure. The Tensions section describes infrastructure as
fundamentally contested, and samples the kinds of conflicts that developing
infrastructures have frequently encountered. It pays particular attention to scientific data
and data cultures as both focal objects and stumbling blocks in cyberinfrastructure
development. The Design Strategies section explores a fundamental contradiction: if
effective infrastructures are rarely built” in an entirely top-down, orderly, and blueprint-
like way (as we shall argue), can we nevertheless think of design as a reasonable and
important aspiration for would-be infrastructure developers? What might count as
legitimate and promising design strategies? The Conclusions and Recommendations
section collects the broader findings and translates them into concrete recommendations
for those charged with shaping and implementing the NSF cyberinfrastructure vision.
Beyond the workshop itself, the report signals the first fruits of what we hope will be an
ongoing and mutually beneficial collaboration between those with expertise in the social
and historical analysis of infrastructure and those tasked with developing it in
contemporary settings. As the workshop resoundingly demonstrated, a good deal rides
on the front-end phases of infrastructure development. Building a robust, empirical, and
broad-based analytic capacity to support cyberinfrastructure development should be an
NSF priority of the highest order.
4
See, e.g. the NSF Cyberinfrastructure Council’s NSF’s Cyberinfrastructure Vision for
21st Century Discovery (Ver 7.1; July 20, 2006), available at www.nsf.gov/od/oci/ci-
v7.pdf. For an important earlier account of this vision, see the Report of the Blue-Ribbon
Advisory Panel on Cyberinfrastructure (“the Atkins Report”), available at
www.nsf.gov/od/oci/reports/toc.jsp.
5
For answers to the first question, see, inter alia, Report of the Commission on
Cyberinfrastructure for the Humanities and Social Sciences, and the Final Report of NSF
SBE-CISE Workshop on Cyberinfrastructure and the Social Sciences. These and other
domain-specific cyberinfrastructure reports can be accessed through the NSF Office of
Cyberinfrastructure homepage, at: http://www.nsf.gov/od/oci/reports.jsp.
Understanding Infrastructure
3
The long now of infrastructure
Stewart Brand’s “clock of the long now” will chime once every millennium: a cuckoo will
pop out (Brand 1999). Accustomed as we are to the “information revolution,” the
accelerating pace of the “24/7” lifestyle, and the multi-connectivity provided by the World
Wide Web, we rarely step back and ask what changes have been occurring at a slower
pace, in the background. For the development of cyberinfrastructure, the long now is
about 200 years. This is when two suites of changes began to occur in the organization
of knowledge and the academy which have accompanied – slowlythe rise of an
information infrastructure to support them: an exponential increase in information
gathering activities by the state (statistics) and knowledge workers (the encyclopedists)
on the one hand and the accompanying development of technologies and organizational
practices to sort, sift and store information.
When dealing with infrastructures, we need to look to the whole array of organizational
forms, practices, and institutions which accompany, make possible, and inflect the
development of new technology. JoAnne Yates made this point beautifully in describing
the first commercial use of punch card data tabulators, in the insurance industry. That
use became possible because of organizational changes within the industry. Without
new forms of information management, heralded by such low status technologies as the
manila folder and carbon paper, accompanied by new organizational forms, there would
have been no niche for punch card readers to occupy (Yates 1989). Similarly, Manuel
Castells argued that the roots of contemporary “network society” are new organizational
forms created in support of large corporate organizations, which long predate the arrival
of computerization (Castells 1996). James Beniger described the entire period from the
first Industrial Revolution to the present as an ongoing “control revolution” in which
societies responded to mass production, distribution, and consumption with both
technological and organizational changes, designed to manage ever-increasing flows of
goods, services, and information (Beniger 1986). In general there is more continuity than
cleavage in the relationship of contemporary “information society” to the past (Chandler
and Cortada 2003).
The lesson of all these studies is that organizations are (in part) information processors.
People, routines, forms, and classification systems are as integral to information
handling as computers, Ethernet cables, and Web protocols. The boundary between
technological and organizational means of information processing is mobile. It can be
shifted in either direction, and technological mechanisms can only substitute for human
and organizational ones when the latter are prepared to support the substitution.
Understanding Infrastructure
4
In the “long now,” two key facets of scientific information infrastructures stand out. One
clusters around the nature of work in the social and natural sciences. Scientific
disciplines were formed in the early 1800s, a time Michel Serres felicitously describes as
the era of x-ology, where “x” was “geo,” “socio,” “bio” and so forth (Serres 1990).
Auguste Comte classified the division of labor in the sciences, placing mathematics and
physics as the most developed and best models, and sociology as the most complex
and least developed, more or less where Norbert Wiener placed them 130 years later in
Cybernetics and Society (Wiener 1951). This was also the period during which the object
we now call the database came to be the lynchpin of the natural and social sciences.
Statistics etymologically refers to “state-istics,” or the quantitative study of societies
(states); it arose along with censuses, medical records, climatology, and other
increasingly powerful techniques for monitoring population composition and health
(Porter 1986). Equally, the natural sciencesmoved by the spirit of the encyclopedists
began creating vast repositories of data. Such repositories were housed in individual
institutions, such as botanical gardens and museums of natural history. Today they are
increasingly held in electronic form, and this is fast becoming the norm rather than the
exception. For example, the Ecological Society of America publishes digital
supplements, including databases and source code for simulation models, for articles
published in its journals (www.esapubs.org/archive/), and a researcher publishing a
protein sequence must also publish his or her data in the (now worldwide) Protein Data
Bank.
The second facet clusters around scientists’ communication patterns. In the 17
th
and 18
th
centuries scientists were largelymen of letters” who exchanged both public and private
correspondence, such as the famous Leibniz/Clarke exchange. From the early 19
th
century a complex structure of national and international conferences and publishing
practices developed, including especially the peer-reviewed scientific journal.
Communication among an ever-broader scientific community was no longer two-way,
but n-way. New forms of transportation undergirded the development of a truly
international scientific community aided also by linguae francae, principally English and
French.
Understanding Infrastructure
5
New scientific cyberinfrastructures must be understood as an outgrowth of these
developments. Databases and n-way communication among scientists have developed
embedded in organizational and institutional practices and norms. There is far more
continuity than many recognize. However, as scientific infrastructure goes cyber, there is
also genuine discontinuity. The social and natural sciences grew up together with
communication and data-processing technology. Changes in these latter will have ripple
effects throughout the complex web of relations that constitutes scientific activity.
Defining Cyberinfrastructure
Most often, cyberinfrastructure is defined by jotting down a laundry list. The reference
Atkins Report for the National Science Foundation defines it as those layers that sit
between base technology (a computer science concern) and discipline-specific science.
The focus is on value-added systems and services that can be widely shared across
scientific domains, both supporting and enabling large increases in multi- and inter-
disciplinary science while reducing duplication of effort and resources. According to the
Atkins Report, cyberinfrastructure consists of “hardware, software, personnel, services
and organizations” (p. 13). This list recognizes from the outset that infrastructure is about
more than just pipes and machines. The more recent cyberinfrastructure vision
document is similarly diffuse, though it regrettably somewhat sidelines the social and
organizational in the definition:
Cyberinfrastructure integrates hardware for computing, data and
networks, digitally enabled sensors, observatories and experimental
facilities, and an interoperable suite of software and middleware services
and tools. Investments in interdisciplinary teams and cyberinfrastructure
professionals with expertise in algorithm development, system operations,
and applications development are also essential to exploit the full power
of cyberinfrastructure to create, disseminate, and preserve scientific data,
information, and knowledge (NSF CI Vision ver. 7.1, p. 6).
Both these definitions do, however, draw attention to the dynamic, complex nature of
cyberinfrastructure development.
While accepting this broad characterization, this report’s long-now perspective invites a
discussion of first principles. For this we return to Star and Ruhleder’s now classic
definition of infrastructure (Star and Ruhleder 1996), originally composed for a paper on
one of the early scientific collaboratories, the Worm Community System. Here we show
how their definitions can be ordered along two axes, the social/technical and the
local/global:
Understanding Infrastructure
6
Figure 1. Cyberinfrastructure as distributions along technical/social & global/local
axes (diagram courtesy of Florence Millerand).
In building cyberinfrastructure, the key question is not whether this is a “social” problem
or a “technical” one. That is putting it the wrong way around. The question is whether we
choose, for any given problem, a social or a technical solution, or some combination. It is
the distribution of solutions that is the object of study. An everyday example comes from
the problem of email security. How do I distribute my trust? I can delegate it to my
machine, and use Pretty Good Encryption for all my email messages. Or I can work
socially and organizationally to make certain that sysops, the government, and others
who might have access to my email internalize a value of my right to privacy. Or I can
change my own beliefs about the need for privacy arguably a necessity with the new
infrastructure.
For our purposes, cyberinfrastructure is the set of organizational practices, technical
infrastructure and social norms that collectively provide for the smooth operation of
scientific work at a distance. All three are objects of design and engineering; a
cyberinfrastructure will fail if any one is ignored.
Building Cyberinfrastructure
At first glance, the term “building” seems apposite. After all, infrastructures are
composed of interoperating systems, each of which had a builder. But complex
structures have different types of builders and are not always the result of intentional
planning (Dennett 1996). As we will see in the Dynamics section of this report, the
eventual growth of complex infrastructure and the forms it takes are the result of
converging histories, path dependencies, serendipity, innovation, and “bricolage”
Understanding Infrastructure
7
(tinkering). Speaking of cyberinfrastructure as a machine to be built or a technical
system to be designed tends to downplay the importance of social, institutional,
organizational, legal, cultural, and other non-technical problems developers always face.
Axelrod and Cohen’s idea of harnessing complexity cautions against seeking tight
control over technologically-enabled organizational structures; even if it were a good
idea, it simply wouldn’t work (Axelrod and Cohen 2001). By extension, the organizational
aspects of science and the role of the social sciences in cyberinfrastructure should be
integrated into the work of design. Here, one of Star and Ruhleder’s observations is key:
Infrastructure is fixed in modular increments, not all at once or globally.
Because infrastructure is big, layered, and complex, and because it
means different things locally, it is never changed from above. Changes
take time and negotiation, and adjustment with other aspects of the
systems involved.
Hence this report turns away from a language of design and engineering, reframing the
discussion in a more organic lexicon. Since infrastructures are incremental and modular,
they are always constructed in many places (the local), combined and recombined (the
modular), and they take on new meaning in both different times and spaces (the
contextual). Better, then, to deploy a vocabulary of “growing,” “fostering,” or
“encouraging” in the evolutionary sense when analyzing cyberinfrastructure.
Adopting an alternative framework for cyberinfrastructure analysis frees us from
presuming that the final working form of scientific cyberinfrastructure will look much like
the initial vision. Further, this framework is responsive to the findings of science studies
that science, theory, and inquiry are created locally, and build out from these local
contexts. As cyberinfrastructure grows and takes shape by drawing in new communities,
each with its distinctive histories, needs, and practices, we can expect a common sense
and (partially) shared understandings of cyberinfrastructure to emerge. Such processes
may be aided by the crafting of a shared functional lexicon or pattern language”
(Alexander 1979) for cyberinfrastructure.
II. Dynamics
This section outlines an historical model of infrastructure development, one which has
been repeatedly confirmed across numerous cases from the 19
th
century to the present.
For cyberinfrastructure projects, this model leads to three significant conclusions. First,
true infrastructures only begin to form when locally constructed, centrally controlled
systems are linked into networks and internetworks governed by distributed control and
coordination processes. Second, infrastructure formation typically starts with technology
transfer from one location or domain to another; adapting a system to new conditions
introduces technical variations as well as social, cultural, organization, legal, and
financial adjustment. Third, infrastructures are consolidated by means of gateways that
permit the linking of heterogeneous systems into networks.
The section then turns to three key dynamics of infrastructure development. Reverse
salientscritical unsolved problems — may be technical, but are also frequently social
or organizational in nature, particularly in the network/internetwork formation phase.
Understanding Infrastructure
8
Gateways are defined as technologies and standards applied across multiple
communities of practice. The transition from systems to networked infrastructures
requires generic and meta-generic gateways, as opposed to the dedicated or improvised
gateways used in systems. Third, as infrastructures grow they create path dependence;
as organizations and individuals come to rely on an infrastructure, they adapt to it,
coupling many small-scale and local elements to the larger commodity service. This
phenomenon has positive and negative aspects.
These concepts explain why it is difficult to alter infrastructures once they have become
established, and thus why choices in the early phases of developmentas in the case
of cyberinfrastructure today really make a difference. The section ends by comparing
the historical trajectories of electric power, computing, and cyberinfrastructure.
A historical model of infrastructure development
The time scale in historical studies of infrastructural change is decades to centuries
considerably longer than most research projects in cyberinfrastructure! Historians
principal model of infrastructure development draws on Thomas Parke Hughes’s
Networks of Power (1983), on the evolution of electric power. Hughes’s model was
adapted and extended over two decades by a loose-knit group of historians and
sociologists studying “large technical systems” (LTS), including telephone, railroads, air
traffic control, and other major infrastructures (Bijker and Law, 1992; Braun and Joerges,
1994; Coutard, 1999; Coutard et al., 2004; La Porte, 1991; Mayntz and Hughes, 1988;
Bijker et al., 1987; Kaijser et al., 1995; Summerton, 1994). The Hughes model
conceptualizes invention and innovation in terms of systems rather than isolated
devices.
System builders and systems. System builders create and promote systems, i.e.
linked sets of devices that fill a functional need. Hughes’s paradigmatic example of a
system builder is Thomas Edison. Other inventors had already hit upon light bulbs; what
set Edison apart was his conception of a lighting system including generators, cables,
and light bulbs. The system delivered a service (lighting), rather than a commodity
(electricity) or an isolated device (the light bulb). Similarly, digital computing did not
achieve commercial success until manufacturers such as Univac and IBM supplied not
just CPUs, but complete data processing systems, including mass storage (magnetic
tape and disks), input devices (keyboards, punch cards), and output devices (printers,
card punches). They also rapidly found that technical systems alone were insufficient;
they had to supply training, software, and other kinds of support as well. Historians
concur that IBM’s rise to dominance in the late 1950s was based as much on the
services it supplied to customers as on the technical features of its products; it also built
on its large installed base of punch card and other equipment. Indeed, IBM’s research
group at Almaden is currently trying to establish the discipline of service science.
Successful system building always includes organizational, financial, legal, and
marketing elements. Historians have noted the common phenomenon of system-builder
teams made up of one or more technical wizards” or “supertechs,” who handle system
conception and innovation, working together with a “maestro,” who orchestrates the
organizational, financial, and marketing aspects of the new system. Such teams can also
include a charismatic “champion” who stimulates external interest in the project,
promoting it against competing systems and generating widespread adoption
(McKenney et al., 1995). Well-known examples of such teams in information and
Understanding Infrastructure
9
communication infrastructure are Alexander Graham Bell and Theodore Vail (AT&T);
Thomas Watson Sr. and James Bryce (IBM); Robert Taylor, Lawrence Roberts, Robert
Kahn and Vint Cerf (ARPANET/Internet); Steve Jobs and Steve Wozniak (Apple); Bill
Gates, Paul Allen, and Steve Ballmer (Microsoft); and Tim Berners-Lee and Robert
Cailliau (World Wide Web).
“Wizard,” “maestro,” and “leader” label roles, not people; they may be held by
individuals, groups, or organizations, as well as in various combinations. Our emphasis
here is not on heroic individuals — whose powers and importance are almost always
exaggerated but on the social features of this pattern. First, system building typically
begins as a social act (even a dyad is a social system). Second, the wizard-maestro-
leader combination reflects the spectrum of crucial capabilities: technical, organizational,
and social.
Government agencies have sometimes played key roles in the system-building phase of
major infrastructures. During and after World War II, for example, the principal sources
of support for US digital computing research were military agencies, especially the Office
of Naval Research and the Air Force. Very large contracts for the SAGE air defense
system helped IBM take the lead in the American computer industry (Edwards, 1996).
The government has the ability to plan for the long term; the Dutch government in the
sixteenth century, for example, planned forestry growth over the subsequent two
hundred years as part of its naval construction infrastructure. Similarly, government has
the ability to shepherd research projects over long periods of time – as witness the
successful creation of the Internet.
Technology transfer and growth. Once an LTS has been successfully constructed in
one location, technology transfer to other locations (organizations, cities, nations)
follows. Because conditions at the new locations differ, this process always produces
variations on the original system design as well as new organizational support. This
adaptation leads to a phenomenon Hughes called “technological style”: the distinctive
look and feel of the “same” technical system as it appears in differing local and national
contexts. As it develops, a new LTS not only requires further technical innovation, but
also increasingly incorporates heterogeneous components. Finance capital, legal
representation, and political and regulatory relationship management become
indispensable elements of the total system. Relevant economic forces include
economies of scale and scope, and economies of reach (Kaijser, 2003).
As the LTS spreads from place to place, competing systems may be introduced with
dissimilar, frequently incompatible properties. In the early days of electric power, for
example, competition occurred among dozens of systems using different line voltages,
as well as both direct and alternating current, all with their advantages and defects.
Issues of scaling become crucial during the growth phase. Systems that worked well in a
small local area, with a few hundred users, typically require substantial redesign in order
to function in many places with thousands or millions of users. Concerns such as
technical support, billing, capital investment, management of user expectations,
marketing, and many other issues come to the forefront during this phase.
During growth, attention to users and user communities can become critical to success
or failure. A key problem is that the development process builds expertise among the
developers; as a result, developers can lose their ability to see how novices, or users in
Understanding Infrastructure
10
a different field, perceive and use their system. Information technology projects
frequently founder when they attempt to transition rapidly from a small, close-knit
developer community to a larger, more diverse community of novice users, such as
consumers or scientists in unrelated fields. Rapid growth can make this especially
difficult to manage; users often take the process into their own hands, leading to
divergent norms, practices, and standards implementation (Fischer, 1992; Hanseth and
Monteiro, 1998; Kahin and Abbate, 1995; Abbate, 1999).
Consolidation: network formation. In consolidation, the final stage of the LTS model,
competition among technological systems and standards is resolved in one of two ways.
In rare cases, one system wins total victory over the others. More often, developers
create gateways that allow previously incompatible systems to interoperate. The rotary
converter, for example, allowed AC power to be converted to DC on a large scale,
permitting competing electrical distribution systems to be connected (David and Bunn,
1988). Today, gateways such as AC/DC power converters for consumer electronics and
telephone adapters for international travel. Platform-independent software, languages,
and presentation formats such as Java, HTML, and PDF are information technology
examples. By allowing heterogeneous technical systems to interoperate, gateways (see
below) permit the creation of networks such as power grids, railroad, telephone, and the
ARPANET, NSFNET, and Internet.
As in the system-building phase, the goal of network formation is to deliver a service. For
example, distributed packet switching computer networks were first developed in the
mid-1960s at the UK’s National Physical Laboratory (NPL) and at various ARPA
research contractors, especially Bolt Beranek and Newman, in the United States . In
each case, the critical step was not the technical development of packet switching itself,
but the conception of the network as a way to share data and programs among
expensive mainframe computers under a timesharing regime (Abbate, 1999; Hafner,
1996; Hauben and Hauben, 1997). The network delivered not only a physical connection
or a communication technique (packet switching), but a service (shared programs and
data).
The consolidation phase can be seen as complete when the service in question has
become, from the point of view of users, a commodity resource, i.e. an undifferentiated
good such as electricity, telephone switching, or IP connectivity. The “computer utilities”
envisioned in the late 1960s — giant timesharing computers that would provide
“powerful and reliable systems capable of serving large communities” (Fano, 1992,
[original 1967] p. 39) — represent a vision of computing as a commodity infrastructure.
Ultimately this was supplied not through systems (timesharing), but through the Internet
and grid computing.
Recently, historians have begun attending to another aspect of consolidation: the role of
infrastructure in transnational linking . Transborder bridges and tunnels; power grids;
national telegraph and telephone systems; containerized international shipping and
road/rail transport; airports; the Internet and WWW; and many other infrastructure
projects involve resolution of political, legal, and financial issues simultaneously with
technical standards. At the same time, they alter the nature of national boundaries,
especially at the level of culture and national sovereignty (Held et al., 1999; Schot et al.,
2006; Vleuten et al., 2006). Delinking occurs, particularly in wartime; transnational
infrastructural links are usually among the first objects of military engagement. Clearly,
scientific cyberinfrastructure will play a role in transnational linking as well. Coordinating
Understanding Infrastructure
11
now with cyberinfrastructure projects worldwide may reduce the difficulty of
consolidation, if and when it occurs.
Governments have played important roles in the consolidation phase of infrastructure
development. The general approach during the 1850-1975 period has been called the
“modern infrastructural ideal”: universal service by a single provider (Graham and
Marvin, 2001). Following this logic, many national governments provided most or all of
these services; national rail, road, electric power, and PT&T (post, telephone &
telegraph) networks are the prime examples. The United States kept some of these
services private, but in many cases allowed formation of monopoly providers or “public
utilitiesunder oversight by regulatory agencies . Telephone, electric power, sewer,
water, and natural gas are examples. (Friedlander, 1995a; Friedlander, 1995b;
Friedlander, 1996). In other cases, such as rail and air transport, competition continued,
but within a legal framework of public oversight. In its early phases the Internet was
created with ARPA and NSF funding, and it was used principally for military, research,
and educational activities; these features allowed its government sponsors to prohibit
commercial activity until the Internet was privatized in the early 1990s. Today, the
Internet and WWW are subject to national regulations of many kinds.
Splintering of the “modern infrastructural ideal.” Although early versions of the LTS
model ended with consolidation, there is a further phase. Starting around 1975, in the
United States, the UK, and to lesser extent elsewhere, the model of monopoly utilities
was increasingly displaced by a deregulated, market-oriented approach, with reduced
but still significant public oversight (as in air transport, telephone, television, and energy
services). By increasing the ability of multiple suppliers to coordinate their operations,
balance loads, and handle system breakdowns, new information technologies played a
major role in this ongoing transition. Increased capacity for decentralized coordination
(as opposed to centralized control) enabled a retreat from the logic of vertical integration.
The result in most infrastructure areas has been a pronounced splintering of the single-
provider, monopoly utility model (Graham and Marvin, 2001). Frequently this has also
meant service tiering, with wealthy customers and heavy users receiving premium,
highly reliable services, while poor people and infrequent users must rely on low-grade
services or be excluded altogether – an issue being revisited today under the broad
rubric of internet neutrality, one which is highly relevant to the social goals of producing
scientific cyberinfrastructure for disadvantaged communities.
Having emerged in an era of ideological opposition to large new government-funded
projects, the cyberinfrastructure movement has sought new models in the success of
open-source software development projects, especially Mozilla Firefox with the highly
refined “Bugzilla’ bug reporting system and Linux. The RFC process in the development
of the Internet was a major precursor to these (Russell, 2006). Building the Internet
involved massive long term investment by ARPA, the NSF and other government
agencies – frequently, as the Hughes report to the NAS noted, involving making long
term bets on particular key players irrespective of short-term pay-off (the continuing
CNRS funding model).
Systems vs. networks and webs
The growth, consolidation, and splintering phases of the historical model mark a key
transition from homogeneous, centrally controlled, often geographically local systems to
Understanding Infrastructure
12
heterogeneous, widely distributed networks in which central control may be partially or
wholly replaced by coordination.
In general, and specifically in the meaning of the cyberinfrastructure framework,
infrastructures are not systems. Instead, they are networks or webs that enable locally
controlled and maintained systems to interoperate more or less seamlessly. It is typically
only in the consolidation phase, with the appearance of standardized, generic gateways,
that most LTSs become genuine infrastructures, i.e. ubiquitous, reliable, and widely
shared resources operating on national and transnational, scales. Thus we define a
spectrum running from systems (centrally organized and controlled) to networks (linked
systems, with control partially or wholly distributed among the nodes) to webs (networks
of networks based primarily on coordination rather than control). Table 1 summarizes
this distinction.
Infrastructures
Systems
Networks
Internetworks or Webs
Key actors
System builders
Users (adjustment
roles)
Gateway builders
Standards bodies
Corporations & governments
Users (transformative roles)
Gateway builders
Standards bodies
Corporations & governments
Users (foundational roles)
Elements
Heterogeneous
components and
subsystems
Heterogeneous systems
Heterogeneous networks
Gateways and
standards
Dedicated or
improvised
Generic or meta-generic
Generic or meta-generic
Control vs.
coordination
Control
Central, strong
Control and coordination
Partially distributed,
moderate strength
Coordination
Widely distributed, weak
Reliant on other
infrastructures
Boundaries
Closed, stable
Open, reconfigurable
Open, reconfigurable
Virtual or second-order large
technical systems
Examples
Local electric power
company
Enterprise computing
(e.g. banks, insurance
companies)
Railroad; electric power grids
Grid computing
NEES, GEON
National weather services
Intermodal freight
Global telephone system
(fixed + mobile + VOIP)
Internet and WWW
World Weather Watch
Table 1. Systems vs. infrastructures (modified from Edwards, 1998a).
Understanding Infrastructure
13
The last column in Table 1, “Internetworks or webs,” refers to integration across
networks. Perhaps the best example is intermodal freight, in which ISO standard
containers, which may be mounted on standard truck or rail wheelbases and lifted into
container ship holds, smooth transfers among independent road, rail, and shipping
infrastructures. In information technology, the Internet and WWW are obvious examples.
However, integrated information internetworks long predate the Internet. Telegraph,
telephone, and postal mail were linked into a 19
th
century analog information
internetwork” Another pre-Internet example is the World Weather Watch, which collects
data from national weather services, sends it to global data processing centers, and
returns processed global forecasts and data to the national services for their own use
(Edwards, 2006).
The analog information internetwork
Greg Downey has described how 19
th
century business users effectively
combined the available communication systems into a single information
internetwork, a “…combination of character-transmission telegraph,
voice-transmission telephone, and physical-transport Post Office
networks. I call this an ‘analog’ internetwork because… information could
only move over each component network in a single form, requiring
repeated physical translations as it moved through the internetwork
(handwriting to voice to dot-and-dash and back again). Although the
telegraph itself was in some sense ‘digital’ based as it was on three
possible states: no pulse, a short pulse (dot), and a longer pulse (dash)
— those states were conveyed at varying cadences through the physical
actions of rapidly pressing telegraph keys and attentively listening to
telegraph sounders, and so were still analog at the core.
Historical actors who used and studied the telegraph, telephone, and Post
Office saw the three as an internetwork. Business texts from the 1910s
through the 1930s instructed students that proper business practice when
sending telegrams involved all three media: even when paying for the
‘report delivery’ and ‘repeat backoptions to make sure telegrams were
accurately transmitted and received (with those reports coming by
telephone), important telegrams were to be ‘confirmed immediately by a
properly dated and signed letter’“ (Downey, 2001:213-14).
The moments at which systems become linked into networks, and networks become
linked to form internetworks, thus represent crucial phases in infrastructure formation.
Virtual infrastructures and second-order LTSs: email, WWW, cellular telephony.
Better information technology increases the capacity for distributing control from a
central point to the nodes in a network. It also permits integration of services across
networks. The contemporary phrase for this capability is “digital convergence.” As we
have seen, this is in fact an old trend, not necessarily dependent on computer
technology. However, the flexibility and power of computers has undeniably been the
principal reason for the explosion of new basic services built upon existing
infrastructures in recent decades. Analysts have named these “virtual infrastructures
Understanding Infrastructure
14
and “second-order large technical systems” (Braun, 1994; Edwards, 1998b).
Cyberinfrastructure for the sciences is a specialized manifestation of this trend.
The principal historical examples of successful large-scale infrastructure formation since
the 1970s are email, the World Wide Web, and cellular telephony. Since both email and
the Web “sit on top of” the Internet, they are also outstanding examples of virtual
infrastructures. Cellular telephony, requiring very large investments in cell towers and
transmitters, followed a trajectory more like that of electric power or rail. All three cases,
however, depended strongly on pre-existing infrastructures. However, its integration with
the existing land-line telephone network makes it also a second-order LTS, combining
existing services in new ways based largely on information technology. The WWW
often touted as a miracle of decentralized grassroots development would not exist
without the Internet, whose early history strongly resembles those of other
infrastructures. Meanwhile, cellular telephony’s success is strongly linked to its
integration with the pre-existing land-line telephone network, which provided a huge
head start toward building a critical mass. (Even with this base to build on, however,
cellular telephony’s growth curve resembles those of the other infrastructures charted in
Figure 3, below; the first cellular phone call was made in 1973.) The explosive growth
experienced by all three of these recent infrastructures would not have occurred without
the slower growth of the older infrastructures that underlie them.
Together with the splintering phenomenon described above, our increasing capacity to
build virtual infrastructures and second-order large technical systems using coordination
mechanisms is important background to cyberinfrastructure formation. These
phenomena have created a paradigm of increasingly articulated, fragmented and tiered
service delivery, across all infrastructures. The decline of the monopoly public utility
model suggests that forming new large-scale infrastructures may be more difficult than in
the past. Indeed, NSF reports on cyberinfrastructure reflect a distaste for large-scale,
long-term projects by single providers, and a corresponding enthusiasm for open source
models and “federated” systems linked by coordination rather than control.
Reverse salients
The LTS tradition highlights “reverse salients” in system development. This is a military
metaphor referring to points where an advancing front is held back. In terms of
technological change, the analogy refers to engineering problems whose solution is
required for the entire system to work or to grow. Examples are long-distance
transmission of electric power; automatic switching of telephone calls; or linking
computer networks which use different data packet sizes and addressing schemes.
Since reverse salients are often widely understood to be the most significant problems in
a field, they are normally a locus of intense research efforts. Typically multiple groups
converge on solutions around the same time.
Reverse salients need not be technical; in fact, the most important reverse salients are
often legal, political, social, or cultural. Government and other national-level institutions
have played critical roles in identifying, and sometimes also in shifting, reverse salients
in the sciences. For example, in its 1992 report Computing the Future, the National
Research Council criticized the discipline of computer science for a narrow agenda that
failed to engage applications areas and interdisciplinary arenas such as human-
computer interaction (Hartmanis and Lin, 1992). Arguably the current NSF
cyberinfrastructure initiatives represent a continuation of this effort, this time focused on
Understanding Infrastructure
15
changing the culture and social relations of both computer science and the domain
sciences to reduce duplication of effort while creating basic middleware services on
which present and future inter- and multi-disciplinary research can rely.
Overcoming social reverse salients in computer networking
In the late 1960s era of expensive mainframe computers, jealously
guarded by system operators and research groups, ARPA’s idea of
sharing programs and data across a network with researchers located
elsewhere seemed like an invasion, particularly since such sharing often
required assistance from local operators.
In an interview, Lawrence Roberts recalled how ARPA compelled its
contractors not only to connect to, but also to use, the early ARPANET.
“The universities were being funded by us, and we said,We are going to
build a network and you are going to participate in it. And you are going to
connect it to your machines. By virtue of that we are going to reduce our
computing demands on the office. So that you understand, we are not
going to buy you new computers until you have used up all of the
resources of the network.’ So over time we started forcing them to be
involved” (quoted in Abbate 1999, 55).
In a similar but less dramatic way, the NSF also compelled participation in
the NSFNET by requiring its supercomputer centers to make network
connections available to all qualified educational or research users. As in
the case of the ARPANET, having provided very costly equipment, the
funding agency was in a position to set conditions that strongly promoted
broad-based, inexpensive access.
Examples of reverse salients relevant to cyberinfrastructure include: generating
metadata (this is an unfunded mandate); the tangle of intellectual property rights;
techniques for federating databases held at multiple institutions using different
equipment and data formats; domain specific data sharing and publication cultures;
reluctance of modelers who have been working with a given program to shift to a better
one if the learning curve is too steep; lack on incentives in universities for infrastructure-
building and data sharing work; inability to translate across different fields and so forth.
Gateways
As Tineke Egyedi has observed, gateway technologies confer differing degrees of
flexibility on technical systems depending on the degree to which they are standardized.
Gateways may be dedicated (improvised, or devised specifically for a particular system);
generic (standardized sockets opening one system to interconnection with others); or
meta-generic (“modeled,” i.e. specifying a framework or protocol for the creation of
specific generic standards, without specifying those standards directly). Table 2 outlines
Egyedi’s framework.
Understanding Infrastructure
16
Degree of Standardization
Scope of Gateway Solution
Examples
High (modeled)
Meta-generic
OSI
6
Medium (standardized)
Generic
XML, Java, ISO container
7
Low ('improvised')
Dedicated
AC/DC rotary converter
Table 2. Relationship between degree of standardization and scope of gateway solution
(from Egyedi, 2001).
Plug adapters (e.g. 3-pin to 2-pin AC power, Firewire 400/800) and AC/DC power
converters are excellent everyday examples. Gateway technologies of all three types are
manifested in software as well, e.g. document format converters, allow one document
format to be converted into another; one operating system to emulate the properties of
another; and so on.
Gateways represent a key principle of infrastructure development: plugs and sockets
that allow new systems to be joined to an existing framework easily and with minimal
constraint. Gateways are often wrongly understood as “technologies,” i.e. hardware or
software alone. A more accurate approach conceives them as combining a technical
solution with a social choice, i.e. a standard, both of which must be integrated into
existing users’ communities of practice. Because of this, gateways rarely perform
perfectly.
“Information technology standards have been touted as a means to
interoperability and software portability, but they are more easily lauded
than built or followed. Users say they want low-cost, easily maintained,
plug-and-play, interoperable systems, yet each user community has
specific needs and few of them want to discard their existing systems.
Every vendor wants to sell its own architecture and turbo-charged
features, and each architecture assumes different views of a particular
domain (e.g., business forms, images, databases). International
standards founder on variations in culture and assumptions — for
example, whether telephone companies are monopoliesin North
America, Europe, and Asia” (Libicki, 1995:35).
6
The Open Systems Interconnection Reference Model defines seven “layers” of
computer network function, from physical links to applications. Within each layer,
standards can evolve separately so long as they conform to the model (see Abbate,
1999, Chapter 5).
7
XML is the eXtensible Markup Language. Java is a cross-platform computer language.
ISO (International Standards Organization) container refers to standard sizes, shapes,
and connectors for shipping containers used for freight transport by ship, rail, and truck.
Understanding Infrastructure
17
Below the level of the work.” Neither the exact implementation of standards, nor their
integration into local communities of practice, can ever be wholly anticipated. For this
reason, gateways in information infrastructures work best when they interlock with the
existing framework below the level of the work,” i.e. without specifying exactly how work
is to be done or exactly how information is to be processed (Forster and King, 1995).
Most systems that attempt to force conformity to a particular conception of a work
process (e.g. Lotus Notes) have failed to achieve infrastructural status because they
violate this principle (Grudin, 1989; Vandenbosch and Ginzberg, 1996). By contrast,
email has become fully infrastructural because it can be used for virtually any work task.
Path dependence
Path dependence refers to the “lock-in” effects of choices among competing
technologies. It is possible, following widespread adoption, for inferior technologies to
become so dominant that superior technologies cannot unseat them in the marketplace.
Standard examples include keyboards (QWERTY vs. Dvorak), video (VHS vs.
Betamax), and nuclear reactor designs (light water vs. gas-graphite). Factors that can
reduce the ability to adopt an alternative include social investment (e.g. the ~100-hr.
training required to learn QWERTY) and the difficulty of overcoming positive network
effects (e.g. for the case of automobiles, the gasoline distribution network was well
established long before rural electric grids). Individual habits and organizational routines
are highly efficient modes of organizing behavior, but they can be strongly resistant to
change.
Key elements of the path dependence concept are:
Localized learning. Individuals and organizations “satisfice” rather than
optimize. All possible technological choices cannot remain on the table forever.
Once they have made an initial investment, people adapt themselves, their
organizations, and their technological choices to that investment rather than
(re)consider alternatives (Foray, 1997).
Irreversibility. Beyond some tipping point of widespread adoption, choosing an
alternative to the dominant system becomes too costly, not only in money but
also in time, attention, retraining, and coordination.
Network effects. The value of certain kinds of technology increases
exponentially with widespread adoption. Telephones aren’t worth much if only a
few people have them, but become indispensable when most people do.
Inefficiency. For economists, true path dependence exists only if some
alternative technological path would be substantially more efficient in some
sense (usually cost, but also labor time, etc.). This effect is debated. Some
economists argue that the claimed inefficiencies are improbable and in any case
cannot be proven, since there is no way to determine all the real-world
ramifications (including inefficiencies) of an alternative technological system if it
was never widely implemented (David, 1985).
Whether or not path dependence leads to economic inefficiencies, the concept is a
useful metaphor for cyberinfrastructure developers (Figure 2). Technological change is
always path dependent in the sense that it builds on, and takes for granted, what has
gone before. Today’s choices constrain tomorrow’s possibilities. Yet they also create
new possibilities, i.e. directions that could not have been taken in the absence of
Understanding Infrastructure
18
technology X. Thus as workshop participants stressed — path dependence leads to
many positive effects.
Figure 2. Visualizing path dependence and discontinuity. (Graphic prepared by
Trond Jacobsen.)
As a metaphor, path dependence also applies to the practice of science in ways
particularly relevant to cyberinfrastructure. Progress is possible precisely because new
practices build upon old ones (positive path dependence), but this can also mean
inheriting defects, entrenching them even more deeply. Climatologist Michael
Oppenheimer coined the term “negative scientific learning” to describe this
phenomenon.
A relevant example is data collection in the experimental sciences. Since the currency of
scientists’ careers is reputation, based on credit for new discoveries, the data produced
by experiments were traditionally treated as the private intellectual property of the
experimenter. Typically these data were closely guarded, at least until the results of data
analysis were published, and often afterward as well. Publication of raw data in their
entirety was a rare exception, not the rule.
In the last two decades or so, as data sets have grown ever larger and new techniques
of data mining and reanalysis have improved, it has become clear that the private
ownership model for scientific data represents an inefficient use of resources (as path
dependence would predict). Much experimental data can and should be released for
others to analyze and reuse. The NSF and other agencies now require public release of
data after an appropriate waiting period (to allow experimenters to publish and receive
credit for their work). Yet despite this requirement, changing practices based on the
Understanding Infrastructure
19
private ownership model has proven much more difficult than anticipated, for both
technical and social reasons. Decades or even centuries of private-ownership practice
has led to a plethora of data collection practices and data formats, many of them
idiosyncratic, as well as an absence of the metadata needed by other scientists to
understand how the data was originally produced. The cultural norms of many
experimental sciences devalue efforts to share or publish data, or even to record
metadata beyond that needed by the original producers. The consequence is that much
“shared” data remains useless to others; the effort required for one group to understand
another’s output, apply quality controls, and reformat it to fit a different purpose often
exceeds that of generating a similar data set from scratch.
Path Dependence: Notes from the Workshop
The integration of ad hoc systems into large networks is rife with
examples ofbad choices.” Tight integration leads to a need for
standards, which requires making choices often those that in
retrospect create inefficiencies. By the same token, there are manifold
examples of felicitous choices.
We need to distinguish between technological and institutional paths.
Humans can function as intermediaries across boundaries imposed by
technological paths, linking projects, disciplines and institutions. How can
we cultivate awareness of the social dimensions of integration?
Communication across fields, cultures, and institutions begins with pidgin
languages. If this communication endures, the pidgin can become a full-
fledged creole: a bridging lingua franca, spoken natively across divides.
What incentives exist, or can be created, to enable or generate translation
across entrenched practices and institutions?
“Scientific revolutions” can be seen as breakout moments when old, well-worn paths of
theory, data collection, and analysis are overturned in favor of new ones. The
cyberinfrastructure and e-science efforts, if successful, may represent such a moment.
Intellectual path dependence implies that seeding ideas and work practices that view
data as a fundamentally collective, shared resource, rather than as the private
possession of individuals and work groups, could have enormous impact.
Scale effects
Cyberinfrastructure developers are focused on transforming small-scale, short-term,
local projects into large-scale, functional infrastructures. The pattern of history suggests
that this will take a long time — on the order of decades.
Scholars of the diffusion of innovation have demonstrated an “S-curve” pattern of
adoption for successful large technical systems. In the case of major infrastructures, the
duration of this curve is typically 40-50 years (Figure 3). After an initial period of linear
growth, these infrastructures entered periods of exponential growth, falling into a slow
linear growth period again upon reaching their maximum extent. Both the adoption rate
Understanding Infrastructure
20
(Figure 4) and the 35-year period between about 1970 (early ARPANET) and the present
fit this model well.
Figure 3. Growth of infrastructures in the United States as a percentage of their maximum
network size (reproduced from Grübler and Nakiâcenoviâc, 1991).
Figure 4. Technology adoption relative to total US population. B = birth of system
(reproduced from Hannemayr, 2003).
Understanding Infrastructure
21
Applying the historical model to cyberinfrastructure
Just as email has become infrastructural because it can be readily used for many tasks,
the Excel spreadsheet has become the vehicle of choice for transfer of data among
some scientific communities. It is easier for scientists in dissimilar specialties to use the
pidgin they all know (Excel) than to learn each others’ “languages,” i.e. database designs
(Borgman, 2007). The example demonstrates the basic principles of infrastructure
formation:
Reverse salients (no common data format or software);
Attempts to bridge dissimilar systems using Excel as a gateway;
Path dependence (entrenchment of a widely shared, but inefficient standard that
arrived on the scene before better ones were available)
Finally, of course, the example illustrates the urgency of the cyberinfrastructure project.
Where does scientific cyberinfrastructure now stand along the infrastructure
development path sketched in this section? Without further research no detailed answer
to this question is possible, given the large number and wide variety of
cyberinfrastructure elements. We think much could be learned from a more systematic
attempt to do this. Still, we can sketch a very preliminary analysis, encapsulated here in
Error! Reference source not found., comparing electric power, electronic digital
computing, and cyberinfrastructure.
The basic pattern to date, to which NSF initiatives are responding, has been one of
stovepiped construction of specialized systems for domain sciences. Within a few
domain areas, especially meteorology, purpose-built networks for distributed digital data
collection, analysis, and distribution predate the Internet (World Weather Watch;
Edwards 2006), but these were rare. Other sciences took advantage of the Internet and
NSFNet in the 1980s. Numerous collaboratory projects were initiated in a variety of
domains during the first half of the 1990s (although predecessor projects date to the
early 1970s).
8
More recent major projects include Teragrid and NEES, both begun in
2000, and GEON (started in 2003).
Although many of these projects involve innovative collaboration among subdisciplines
that may not normally connect, the LTS historical model would characterize virtually all
of them as system-building i.e. the first, early phase of competing and conflicting,
locally based development (where “local” in this case references scientific domain rather
than geographical location). They supply limited, specialized services to a predefined
community, rather than providing ubiquitous, widely shared, commodity services across
the sciences. In terms of the historical model, technology transfer would be the next
step. This might occur, for example, if some set of tools built for one collaboratory or
distributed information environment were taken up by a similar project in another
8
For a substantial list by date, see the catalog developed by the Science of
Collaboratories project,
www.scienceofcollaboratories.org/Resources/colisting.php?startDate+asc.
Understanding Infrastructure
22
scientific domain. Another step would be the emergence of gateways capable of
connecting existing systems.
Comparative analysis of different cyberinfrastructure projects could as we suggest in
the Recommendations section of this reportilluminate the kinds of transfers,
gateways, and other processes most likely to lead in the direction of true infrastructure.
Understanding Infrastructure
23
Development phase
Electric power networks
Computing
Cyberinfrastructure
System building
Edison, Westinghouse
Univac
IBM
“Seven Dwarves”: GE, Honeywell, etc.
Grid computing (GriPhyN, Teragrid)
Collaboratories (SPARC)
Domain science networks (NEES)
E-science, e-Social Science (UK)
Reverse salients
Long distance power transmission
AC/DC conversion
Load factor
Machine- and vendor-specific languag