ArticlePDF Available

Shaping The Web: Why The Politics Of Search Engines Matters


Abstract and Figures

This article argues that search engines raise not merely technical issues but also political ones. Our study of search engines suggests that they systematically exclude (in some cases by design and in some accidentally) certain sites, and certain types of sites, in favor of others, systematically give prominence to some at the expense of others. We argue that such biases, which would lead to a narrowing of the Web's functioning in society, run counter to the basic architecture of the Web as well as the values and ideals that have fueled widespread support for its growth and development. We consider ways of addressing the politics of search engines, raising doubts whether, in particular, the market mechanism could serve as an acceptable corrective.
Content may be subject to copyright.
The Information Society, 16:169
185, 2000
2000 Taylor & Francis
0197-2243/00 $12.00 + .00
Shaping the Web: Why the Politics of Search
Engines Matters
Lucas D. Introna
L ondon School of Economics, London, United Kingdom
H elen Nissenbaum
U niversity Center for Human Values, Princeton University, Princeton, New Jersey, USA
This article argues that search engines raise not merely technical
issues but also political ones. Our study of search engines suggests
that they s ystematically excl ude
in some cases by design and in
some, accidentally
certain sites and certain types of sites in favor
of others, systematically giving prominence to some at the expense
of others. We argue that such biases, which would lead to a nar-
rowing of the Web’s functioning in society, run counter to the basic
architecture of the Web as well as to the values and ideals t hat have
fueled widespread support for its growth and development. We
consider ways of addressing the politics of search engines, raising
doubts whether, in particular, the market mechanism could serve
as an acceptable corrective.
Keywords s earch engines, bias, values in de sign, World Wide Web,
digit al divide, information access
T he Internet, no longer merely an e-mail and le-sharing
system, has emerged as a dominant interactive medium.
Re ceived 17 July 1997; accepted 24 November 1998.
We are indebted to many colleagues for commenting on and ques-
t ioning earlier versions of this article: audiences at the conference
“Computer Ethics: A Philosophical Enquiry, London; members of the
s eminars at the Kennedy School of Government, Harvard University,
a nd the Center for Arts and Cultural Policy Studies, Princeton Univer-
s ity; Steven Tepper, Eszter Hargittai, Phil Agre; and Rob Kling and re-
viewe rs for The Information Society. We are grateful to Lee Giles, Brian
La Macchia, Andrea LaPaugh
and members of her graduate seminar
a nd Andrew Tomkins for tec hnical guidance, and to our able research
a ssistants Michael Cohen and Sayumi Takahashi. H. Nissenbaum ac-
knowledges the invaluable support of the National Science Foundation
t hrough grant SBR-9806234.
Addre ss corr espondence to Helen Nissenbaum, University Center
f or Human Va lues, Princeton University, Princeton, NJ 08544-1013,
USA. E-ma il:
E nhanced by the technology of the World Wide Web, it
has become an integral part of the ever-expanding global
m edia system, moving onto center stage of media politics
a longside traditional broadcast media—television and ra-
dio. Enthusiasts of the “new medium” have heralded it as
a democratizing force that will give voic e to diverse so-
c ial, economic, and cultural groups, to members of society
not frequently hea rd in the public sphere. It will empower
the traditionally disempowered, giving them access both
to typically unreachable nodes of power and to previously
inacc essible troves of information.
To scholars of traditional media, these optimistic claims
m ust have a ring of familiarity, echoing similar optimistic
pre dictions concerning the democratizing and empower-
ing capacities of both radio and television. Instead of the
expected public gains and ful lment of democratic pos -
sibilities, instead of the spreading of access and power,
however, the gains, the power, and the access were con-
solidated in the hands of a few dominant individuals and
institutions. In the words of acclaimed media critic Robert
M cChesney
1999, p. 1
The American media system is spinning out of control
in a hyper-commercialized frenzy. Fewer than ten transna-
ti onal media conglomerates dominate much of our media;
fewe r than two dozen account for the overwhelming major-
it y of our newspapers, magazines, lms, television,radio, and
books. With every aspect of our media culture now fair game
for commercial exploitation, we can look forward to the full-
s cale commercialization of sports, arts, and education, the
dis appearance of notions of public service fr om public dis-
c ourse, and the degeneration of journalism, political cover-
a ge, and children’s programming under commer cial pressure.
M cChesney’s work
1993, 1997b
trac es—in very sub-
tle and convincing detail—how commercial interests were
w oven into the very ber of the modern media networks
through legislation, m arket me chanisms, and the like.
T hese moves progressively pushed out and silenc ed the
public service agenda, which was very central to the vision
of the early pioneers in the eld—McChesney’s
historical account of radio is very telling in this regard.
H is central argument, historically grounded, is that the
fundame ntal course of media is determined primarily by
how they’ re owned and operated. Most U.S. communica-
tion media—going back to AM radio in the 1920s—have
followed this path: At rst, when they do not seem com-
me rcially viable, they are developed by the nonpro t, non-
com mercial sector. When their pro t-making potential
em erges, however, the corporate sector starts colonizing
the media, and through a variety of mechanisms, usually
its dominance of politicians, muscles out the rest and takes
over. McChesney argues that this pattern is seen in the
cases of FM radio, in UHF television, and to some extent
in satellite and cable.
O n the prospects of the Internet, there are divergent pre-
dictions. Some, like Dan Schiller
a nd McChesney,
in uenced by their knowledge of other media, anticipate
a similar narrowing of prospects for the Internet. They
point to the commitment of the U nited States to private
ownership of communications technology as the single
most important and consistent historical policy position
that in uenced the course of telecommunications devel -
opment. A nd this same commitment is clearly evident in
the rhetoric of the political foundations of the Internet,
nam ely, the fact that of ve “values” that Vice-President
G ore identi ed as ones that should de ne and guide the
development of the Global Internet Infrastructure
the rst one listed was “private investment”
O f ce of the
Vice President, 1995
. Schiller asks, “What is the likeli-
hood of robust adherence to . . . elemental democratic pre-
scription, when the character of the network development
is now all-too-evidently to be given mainly as a function
of unrestrained corporate ambition and private design?”
Sc hiller, 1995, p. 6
. Others, like Mark Poster
offer a contrasting view, arguing that the distinctly post-
moder n” nature of the Internet, with its capacity to dis-
seminate material rather than centralize it, will discourage
the endowment of authority—both academic and politi-
cal. Its development, therefore, is unlikely to mirror that
of previous media.
T he broader debate about the dual possibilities of
me dia—to be democratizing or to be colonized by special-
ized interests at the expense of the public good—inspires
and motivates this article on the politics of search engines.
T he general position we defend, and illustrate in this one
case, is that although the Internet and the Web offer excit-
ing prospects for furthering the public good, the bene ts
are conditional, resting precariously on a number of po-
litical, economic, and technical factors. Following Poster,
w e are buoyed by clear instances where the Web and In -
ternet have served broad political and ends. But we also
see irrefutable signs of gradual centralization and com-
me rcialization of guiding forces. Like McChesney, we are
particularly concerned with the way these competing in-
c entralized commercial vs. decentralized public
ma y, early on, be woven in, or out, of the very ber of
me dia networks. Search engines constitute a particularly
telling venue for this competition. And prospects, as seen
from the perspective of the time of writing this article, do
not look good for broad public interests.
Sea rch engines constitute a powerful source of access
and accessibility within the Web. Access, already a thorny
issue, is the subject of much scholarship and research
G olding, 1994; Hoffman & Novak, 1998; Pollack, 1995;
Schiller, 1995
, as well as a lengthy report by the Na-
tional Telecommunications and Information Administra-
, Falling Through the Net. Focusing on social,
economic, and racial factors, these works show how access
to the Web is precon gured in subtle but politica lly impor-
tant ways, resulting in exclusion of signi cant voices. It
is not enough, however, to worry about overcoming these
traditional barriers, to focus only on the granting of entry
to the media space of the Web. It is not enough if, as we ar-
gue, the space itself is distorted in favor of those wealthy in
technical or economic resources through the mechanism
of biased search engines. The politics of search engines
thus represents the broader struggle to sustain the demo-
cra tic potential of traditional media, the Internet, and the
World Wide Web in particular.
In a statistical study of Web search engines, S. Lawrence
and C. L. Giles
estimated that none of the search
engines they studied, taken individually, index more than
16% of the total indexable Web, which they estimate to
consist of 800 million pages. Combining the results of the
searc h engines they studied, they estimated the coverage
to increase to approximately 42%. This con rms the prim-
itive impressions of many users, namely, that the Web is
almost inconceivably large, and also that search engines
only very partially meet the desper ate need for an effective
wa y of nding things.
W hen judging what the producers
of search engines have accomplished so far, optimists, fo-
cusing on the half-full portion of the c up, may legitimately
ma rvel at the progress in Web search technologies and at
the sheer bulk of pages that are successfully found. In this
article, however, we are concerned with the half-empty
portion of the cup: the portions of the Web that remain
hidden from view.
T he purpose of this article is not, however, to bemoan
the general dif culties of building comprehensive search
engines, nor to highlight the technological dif culties that
must surely impose limits on the range of scope and cov-
era ge that even the best search engines can achieve. Our
concern, rather, is with the ways that developers, design-
ers, and producers of search engines will direct these tech-
nological limitations, the in uences that may come into
play in determining any systematic inclusions and ex-
c lusions, the wide-ranging factors that dic tate systematic
prom inence for some sites, dictating systematic invisibil-
ity for others. These, we think, are political issues.
T hey
a re important because what people
the seekers
a re able
to nd on the Web determines what the Web consists of
f or them. And we all—individuals and institutions alike—
have a great deal at stake in what the Web consists of.
A lthough a complete discussion of the technical detail of
sea rch engines is beyond the scope of this article,
w e high-
light aspects of search engines that we consider relevant
to our discussion of their politics. We brie y discuss the
nature of the connection between search engines and Web
page s, the process by which this relationship is established,
a nd how this relationship affects the producers
or own-
e rs
of Web pages wishing to have their pages recognized.
Web-page pr oviders seeking recognition from search en-
gines for their Web pages must focus on two key tasks:
being indexed and
achieving a ranking in the top
20 search results displayed.
On Being Indexed
H aving a page indexed, the essential rst stage of be-
ing recognized by search engines, is extremely important.
Without much exaggeration one could say that to exist is
to be indexed by a search engine. If a Web page is not
in the index of a search engine, a person wishing to ac-
c ess it must know the complete Uniform Resource L oca-
—also known as the Web page address—such
a s for the CEPE98
Since there is no rigid standard for produc-
ing URLs, they are not obvious or even logical in the
w ay we tend to think that the addresses of our physi-
c al homes are logical.
S ometimes the Internet domain-
nam e structure may help, such as “” or “edu” for an
a cademic institution in the United Kingdom or United
S tates. However, for most searches we do not have any
idea of the URLs involved.
T his is where search engines enter the picture. They
c reate a map of the Web by indexing Web pages according
to keywords and then create enormous databases that link
page content to keywords to URLs. When a seeker of infor-
m ation submits a keyword
or phrase
—presumably, one
that best c aptures his or her interest—the search-
e ngine database returns to the seeker a list of URLs linked
to that keyword, ideally including all those that are relevant
to the seeker’s interest. It is important to note that search
e ngines use the notion of a keyword
i.e., that which is
indexed and hence used for searching
in a rather minimal
sense. Keywords are not determined a priori by the de-
signers of the search engines’ databases nor, explicitly, by
some other authority, but rather they are “deduced” from
Web pages themselves in the process of indexing. In a par-
ticular Web page a keyword can be any of the following:
A ctual keywords indicated by the Web -page de-
signer in an HTML metatag as follows: <meta
“keyw ords” CONT ENT
“list of key-
wor ds”>.
A ll or some of the words appearing in the title
that is indicated by the HTML <TITLE> tag as
follows: <TITLE>Whatever is the title of the
T he rst X words in a Web page
possibly exclud-
ing stop words
A ll the words in the Web page
possibly excluding
stop words
M ost search engines use at least some of the words in
the title tag of the Web page as the relevant keywords
for indexing purposes.
It is obviously important for Web-
page producers as well as seekers to know what words on a
par ticular Web page are seen as keywords by the indexing
softwa re of search engines. Thus, one might naturally ask:
H ow does a search engine go about creating its database
a nd what does it store in it?
T he answer to this question depends on which of ba-
sically two categories
a nd within these categories, the
fur ther subcategories
the search engine ts. One category
includes directory-based search engines such a s Yahoo!
a nd Aliweb. In this category, the vast majority of the pages
indexed are manually submitted to the search engines’ ed-
itors by Webmasters
and other creators of Web pages
T he other category includes search engines that automat-
ica lly harvest URLs by means of spiders
a lso referred to
a s robots or softbots
. Among the most well-known search
e ngines tting this category are Alta Vista, Lycos, and
H otbot.
In the case of directory-based search engines, Web-page
c reators submit URLs to the search engines for possible
inclusion into their databases. If you wanted your page rec-
ognized by Yahoo!, for example, you would submit your
U RL and background information to a human editor, who
w ould review the page and decide whether or not to sched-
ule your page for indexing. If your page is scheduled for
indexing, it w ould be retrieved by the indexing software,
w hich would parse
the page and index it according to
the keywords
found in the page. For directory-
based search engines, therefore, human gatekeepers hold
the key to inclusion in their indexed databases. At the time
of the writing this article, there is a considerable backlog,
so this process can take up to six months from the time of
submission to the time of inclusion.
Web owners wishing to have their pages indexed must
surely wonder what criteria these human editors use to
decide whether or not to index their pages. This is a major
bone of contention, especially for anyone contesting these
decision criteria. With Yahoo!, for example, re presenta-
tives say that they use criteria of relevancy
P hua, 1998
T he exact nature of these criteria, however, is not widely
known or publicly disseminated and, evidently, these cri-
teria are not consistently applied by the various editors. As
a result, you may have your page rejected
w ithout noti -
and would not know what to do to get it accepted.
D anny Sullivan, the editor of Search Engine Watch, be-
lieves that the base success rate for any submitted page ’s
being listed with Yahoo! is approximately 25%. Two fac-
tors that seem to increase the chances of being listed are the
number of links
to and from a given site—also referred to
as inlinks and outlinks
and how full a particular category
happens to be. When editors feel they need more references
w ithin a category, they lower the entry barriers. Defending
their approach, representatives of Yahoo! maintain they
list what users want, arguing that if users were not nding
relevant information they would cease using Yahoo!.
return to this form of response later.
With Aliweb, a very
small site in comparison to its competitors, users submit
supplemental information about their Web-page content
and keywords as a way of helping the indexing software
improve the quality of its indexing and hence provide bet-
ter search results. Representatives of Aliweb emphasize
that they do not provide comprehensive coverage; rather,
they emphasize high-quality search results. Because this is
a small site, it is still able to index most of its submissions.
A s it becomes larger, it may, like its competitors, need to
establish criteria for inclusion and exclusion.
Being indexed by search engines that automatically har-
vest URLs is a matter of being visited by a spider
a lso
called robot, crawler, softbot, agent, etc.
. S piders usually
start crawling from a historical list of URLs, especially
documents with many links elsewhere, such as server lists,
“What’s New” pages, and other popular sites on the Web.
Softw are robots crawl the Web—that is, automatically tra-
verse the Web’s hypertext structure— rst retrieving a doc-
ume nt and then recursively retrieving all documents that
are referenced
linked by other URLs
in the original doc-
ume nt. Web owners interested in having their pages in-
dexed might wish they had access to details concerning
the routes spiders follow when they crawl, which sites
they favor, which they visit and how often, which not, and
so forth. This, however, is a complicated technical subject,
and the details are steadfastly guarded as trade secrets by
the respective search engine companies. From our experi-
ence and discussions with those involved in the eld, we
would contend with some certainty that spiders are guided
by a set of criteria that steer them in a systematic way to
select certain types of sites and pages and not select oth-
ers. However, the blackout on information about search
engine crawl algorithms means we can only try to infer
the character of these algorithms from search engine se-
lection patterns—an inexact exercise.
We have learned something of the nature of spider algo-
rithms from a paper on ef cient crawling by Cho, Garc ia-
Molina, and Page,
presented at the WWW7 conference
Cho et al., 1998
. This paper, which discusses commonly
used metrics for determining the importance” of a Web
page by crawling spiders, provides key insights relevant to
the main claims of our article. Because of its signi cance,
w e discuss it here in some detail. Cho et al.
1998, p. 1
w rite:
Given a Web page P, we can de ne the importance of the
page, I
, in one of the following ways . . . :
1. Similarit y to a Driving Query Q. A query Q drives the
crawli ng process, and I
is de ned to be the textual
si milarity be tween P and Q . . . .
2. Backlink Count. The value of I
i s the number of
li nks to P that appear over the entire web. We use IB
to refer to this importance metric. Intuitively, a page
P that is li nked to by many pages is more important
than one that is seldom referenced. On the web, IB
is useful for ranking query results, giving end-users
pages t hat are more likely to be of general interest.
Note that evaluating IB
r equires counting backlinks
over the entire web. A crawler may estimate this value
wit h IB
, the number of links to P that have been
se en so far.
3. PageRank. The IB
me tric treats all links equall y.
Thus, a link from the Yahoo! home page counts the same
as a link from some individual’s home page. However,
si nce the Yahoo! home page is more important
i t has
a much higher IB count
, it would make sense to value
that link more highly. The PageRank backlink metric,
, recursively de nes the importance of a page
to be the weighted sum of the backlinks to it. Such
a metric has be en found to be very useful in ranking
re sults of user queries
Page 1998.2
. We use IR
for the estimated value of IR
whe n we have only a
subse t of pages available.
4. Location Metric. The IL
i mportance of page P is a
funct ion of its location, not of its contents. If URL u
le ads to P, then IL
is a function of u. For example,
URLs ending with “.com” may be deemed more useful
than URLs with other endings, or URLs containing
the string “home” may be more of interest than other
URLs. Another location metric that is sometimes used
consi ders URLs with fewer slashes more useful than
those with more slashes. All these examples are local
me trics since they can be evaluated s imply by looking
at the URLs.
emphas is added
T he Similarity to a Driving Query Q metric uses a
query term or string
—such as “holiday cottages,” for
example—as the basic heuristic for craw ling. This means
that the spider does not need to make a decision about
im portance since it will be directed in its search by the
quer y string itself. For our discussion, this metric is of
m inor signi cance.
T he real issue emerges when the
c rawling spider must “decide” importance without the
use of a submitted query term. This is where the other
m etrics play the dominant role. The Backlink metric uses
the backlink
or inlink
count as its importance heuristic.
T he value of the backlink count is the number of links
to the page that appear over the entire Web—for exam-
ple, the number of links over the entire Web that refer to The assumption here is that “a page
that is linked to by m any
pages is more impor-
tant than one that is seldom referenced. Obviously, this
is a very reasonable heuristic.
We know from academic
r esearch that it is wise to look at the canonical” works
that are referred to—or cited in academic language—by
m any other authors. We know also, however, that not all
topics necessarily have canons. Furtherm ore, although in
some elds a small number of citations may make a par-
ticular work a canon, in other elds it takes a vast num-
ber of citations to reach canonical status. Thus, the Back-
link heuristic would tend to crawl and gather the large
topics/ elds
such as shareware computer games”
a n even relatively unimportant site in this big eld will
be seen as more importanthave relatively more back-
links or inlinks—than an actually important site in a small
such as the local community services information”
, w hich would have relatively less backlinks or in-
links. The essential point is that the large elds determine
the measure, or threshold, of importance—through sheer
volume of backlinks—in ways that would tend to push out
the equally important small elds.
We return to this issue
later, in our market discussion.
With the PageRank metric, this problem is exacerbated.
I nstead of treating all links equally, this heuristic gives
prom inence to backlinks from other important pages—
page s with high backlink counts. Thus, since
a link from
the Yahoo! home page is more important
it has a much
higher IB
, it would make sense to value
that link more highly. In the analogy of academic papers,
a metric like this would imply that a particular paper is
even more important if referred to by others who are al-
r eady seen as important—by other canons. More simply,
you are important if others who are already seen as impor-
tant indicate that you are important. The problem with the
B acklink and PageRank metrics is that they assume that
bac klinks are a reliable indication of importance or rele -
vance. In those cases where authors of pages create links
to other pages they see as valuable, this assumption may be
true. There are, however, many organizations that actively
c ultivate backlinks by inducing Web-page creators to add
a link to their page through incentives such as discounts on
products, free software utilities, access to exclusive infor-
m ation, and so forth. Obviously, not all Web-page creators
have equal access to the resource s or the incentive to in-
duce others to link to them.
T he Location Metric uses location information from the
U RL to determine “next steps” in the crawl. “For exam-
ple, URLs ending with ‘.com’ may be deemed more use-
ful than URLs with other endings, or URLs containing the
string ‘home’ may be more of interest than other URLs.
E ven though the authors do not indicate what they see as
m ore important, one can assume that these decisions are
m ade when crawl heuristics are set for a particular spider.
It may therefore be of great signi cance “where you are
located” as to how important you are seen to be. With the
U RL as the basis of decision making, many things can aid
you in catching the attention of the crawling spider, such
a s having the right domain name, being located in the root
direc tory, and so forth. From this discussion on crawling
m etrics we can conclude that pages with many backlinks,
e specially backlinks from other pages with high backlink
c ounts, which are at locations seen as useful or important
to the crawling spider, will become targets for harvesting.
A nother criterion that seems to guide spiders is breadth
or depth of representation. If a spider’s algorithm favors
bre adth
r ather than depth
, it would visit more sites
but index them only partially. In the case of big
sites such as America Online
, Geocities, and so
for th, spiders will index them at a rate of approximately
If your site is hosted on AOL or another big
site, there is a good chance that it will not be included.
A nother reason that a site, and so all the pages on that
ser ver, may be excluded from search engine databases is
that the owner/Webmaster of that server has excluded spi-
der s through the robot exclusion standard by means of a
“robots.txt le.
T his is often done because requests for
pages from spiders may signi ca ntly increase the load on
a server and reduce the level of service to all other users.
C NN, for example, excludes all spiders from its site,
a s do many sites that offer free Web-page space.
It is
a lso important to note that the harvesting spiders of the
sea rch engines we looked at process only HTML les and
in particular HTML tags. If important information on your
Website is in other formats, such as Acrobat
les or
re presented by a graphic
le, this information could
be lost in the indexing process.
H aving said all of this, it ought to be acknow ledged
that most spider-based search engines do also allow au-
tonomous submissions by Webmasters/designers. Soft-
w are is available that automatically generates the required
e lectronic formats and facilitates submission to a num-
ber of search engines simultaneously. Using this route has
had very mixed results, according to the Webmasters we
spoke to.
On Being Ranked
Indexing is but one hurdle to clear for the creators of Web
pages who strive for recognition through search engines.
H aving been successful in the indexing game, their con-
cer n shifts to ranking. Many observe that to be noticed
by a person doing a search, a Web page has to be ranked
am ong the top 10 to 20 listed as hits. Because most sea rch
engines display the 10 most relevant hits on the rst page
of the search results, Web designers jealously cove t those
10 or 20 top slots. The importance of ranking is regularly
discussed by leading authors in the eld of Web -site pro-
The re is competition for those top ten seats. There is seri-
ous competition. People are trying to take away the top spots
every day. They are always trying to ne-tune and tweak their
HTML code and learn the next little trick. The best players
even know dirty ways to “bump off their competition while
protec ting their own sites
Ander son & Henderson, 1997
A lthough we have not found large-scale empirical stud-
ies measuring the effects of ranking on the behavior of
seekers, we observe anecdotally that seekers are likely
to look down a list and then cease looking when they
nd a “hit. A study of travel agents using computer-
ized airline reservations systems, which showed an over-
w helming likelihood that they would select a ight from
the rst screenful of search results, is suggestive of what
w e might expect among Web users at large
F riedman &
N issenbaum, 1996
. Indeed, if this were not the case it
would be dif cult to see why Webmasters are going to all
the effort to get into the rst screen—and there is signi -
cant evidence that they do, indeed, take it very seriously.
N ow it may be that it is not only the rst screen but the
second and third screen as well. Nevertheless, even though
w e cannot say without further research exactly where this
line may be
a nd it may vary with topic, type of searcher,
and so forth
, we can propose that it does matter whether
you are in the rst few screens rather than much lower
down in the order. One could also a rgue such a position
from an information-overload point of view; we shall not
pursue it here
Wurman, 1989
Relevancy ranking is an enormously dif cult task. Some
resea rchers working on search tec hnologies argue that rel-
evancy ranking is currently the greater challenge facing
searc h engines and that developments in technical know-
how and sheer capacity to nd and index sites has not
nearly been matched by the technical capacity to resolve
relevancy ranking. Besides the engineering challenges, ex-
perts must struggle with the challenge of approximating a
com plex human value
r elevancy
w ith a computer algo-
rithm. In other words, according to these experts, while
w e seem to be mastering the coverage issue, we con-
tinue to struggle with the issue of what precisely to ex-
tract from the enormous bulk of possibilities for a given
searc h.
Most ranking algorithms of search engines use both the
position and the fr equency of keywords as a basis for their
ranking heuristics
Pr ingle et al., 1998
. Accordingly, a
document with a high frequency of keywords in the be-
ginning of a document is seen as more relevant
to the keyword entered
than one with a low frequency
lower down in the document. Other ranking schemes, like
the heuristic used by Lycos, are based on so-called inlink
popularity. T he popularity score of a particular site is cal-
culated based on the total number of other site s that contain
links to that site
a lso refer to backlink value, discussed ear-
. High link popularity leads to an improved ranking.
A s with the crawl metrics discussed earlier, one sees the
standard or threshold of r elevance being set by the big sites
at the expense of equally relevant small sites.
T he desire and battle for ranking have generated a eld
of knowledge called sea rch engine design, which teaches
how to design a Web page in order to optimize its rank-
ing and combines these teachings with software to as-
sess its ranking potential. On one end of the spectrum,
practices that make reasonable use of prima facie rea-
sonable heuristics help designers to optimize their Web
pages’ expected rankings when they are legitimately rel-
evant to the person searching. On the other end of the
spectrum, some schemes allow Web designers to manipu-
late, or trick, the heuristics—schemes such as relevancy
spamm ing,
w here Web-page designers “trick”
the ranking algorithm into ranking their pages higher than
they deserve to be ranked by means of keyword stuff-
ing, invisible text, tiny text, and so forth. Such spam-
ming activities doubly punish the innocent. If, for exam-
ple, you design a Web page with a few graphic images
at the beginning, followed somewhere toward the middle
w ith text, you would be severely “punished” by the algo-
rithm both because key terms are positioned relatively low
down on the page and also because you would be c ompet-
ing for rank with those less, as it were, scrupulous in their
O ut of this strange ranking warfare has emerged an im-
possible situation: Search-engine operators are loath to
give out details of their ranking algorithms for fear that
spamm ers will use this knowledge to trick them.
ethical Web-page designers can legitima tely defend a need
to know how to design for, or indicate relevancy to, the
ranking algorithm so that those who search nd what is
genuinely relevant to their se arches.
Beyond the challenge of second-guessing ra nking al-
gorithms, there may yet be another, more certain, method
of getting results. Some producers of Web sites pursue
other ways of elevating their ranking, ways tha t are outside
Summ ary of criteria for indexing and ranking
Perspective Reason for exclusion
Search engine: Indexing
Directory-type search engines
The human editor does not include your submission on the bas is of criteria
not generally known and apparently inconsistently applied.
Automatic-harvesting-type search engines
Site not visited because of spider exclusion standard set by the Webmaster.
Site not in the crawl path of the spider
not suf ciently rich in backlinks
Part of a large
often free
site that is only partially indexed.
Documents don’t conform to HTML standard
pdf, gif, et c.
in top 10 when relevant
) (
Did not buy the keyword or top spot.
Not high in inlink popularity
from and to site
Relevant keywords not in meta tag or title.
Keyword spammers have pushed you down.
Important parts of your title are stop words.
Your pages have been altered
dumped off
through unethical practices
by your competitors.
Seeker: Finding appropriate content
Using only one search engine
sometimes a default that user is unaware of
Inappropriate use of search criteria.
of the technical fray: They try to buy them. This subject
is an especially sensitive one, and representatives of sev-
e ral major search engines indignantly deny that they sell
sea rch positions. Recently, however, in a much-publicized
m ove, Alta Vista and Doublclick have invited advertis-
e rs to bid for positions in their top slots
H ansell, 1999
Yahoo! sells prominence indirectly by allowing Web ow n-
e rs to pay for express indexing. This allows them to move
a head in the 6-month queue. Another method for buying
prom inence—less controversial but not unproblematic—
a llow s Web owners to buy keywords for purposes of banner
a ds. Amazon Books, for example, has a comprehensive ar-
r angement with Yahoo!, and Barnes & Noble has one with
Lycos. If a seeker submits a search to Yahoo! w ith the term
“book” in it, or a te rm with a name that corresponds to an
a uthor’s name or book title in the Amazon database, the
see ker would get the A mazon banner
and URL
on his or
her search result screen. This is also true for many other
c ompanies and products.
T he battle for ranking is fought not only between search
e ngines and Web masters/designers but also among organi-
z ations wishing for prominence. There is suf cient evi-
denc e to suggest that the erce competition for both pres-
e nce and prominence in a listing has led to practices such
a s one organization’s retrieving a competitor’s Web page,
e diting it so that it will not do well in the ranking, and
r esubmitting it as an updated submission, or one organiza-
tion’s buying a competitor’s name as a keyword and then
having the rst organization’s banner and URL displayed
w hen a search is done on that keyword.
In Table 1, we summarize the main points of our descrip-
tion, showing some of the ways search engine designers
a nd operators commonly make choices about what to in-
c lude in and exclude from their databases. These choices
a re embedded in human-interpreted decision criteria, in
c rawl heuristics, and in ranking algorithms.
We may wonder how all this affects the nature of Web
users’ experiences. Based on what we have learned so far
a bout the way search engines work, we would predict that
inform ation seekers on the Web, whose experiences are
m ediated through search engines, are most likely to nd
popular, large sites whose designers have enough technical
savvy to succeed in the ranking game, and especially those
sites whose proprietors are able to pay for various means of
impr oving their site’s positioning. Seekers are less likely
to nd less popular, smaller sites, including those that are
not supported by know ledgeable professionals.
When a
sea rch does yield these sites, they are likely to have lower
prom inence in rankings.
T hese predictions are, of course, highly general and will
vary considerably according to the keywords or phrases
w ith which a seeker initiates a search, and this, in turn,
is likely to be affected by the seeker’s competence with
sea rch engines. The nature of experiences of information
see kers will also vary according to the search engines
they choose. Some users may actively seek one search
e ngine over others, but some will simply, and perhaps
unknowingly, use a default engine provided by institu-
tions or Internet service providers
We are un-
likely to nd much relief from these robust irregularities in
me ta search engines like Metacrawler, Ask Jeeves, and De-
brie ng because they base their results on existing search
engines and normally accomplish their task by recogniz-
ing only higher-order search keys rather than rst-order
We note further that not only are most users
unaware of these particular biases, they seem also to be
unaware that they are unaware.
Re aders may nd little to trouble them in this description
of search engine proclivities. What we have before us is an
evolving marketplace in search engines: We ought to let
producers of search engines do what they will and let users
decide freely which they like best. Search engines whose
offerings are skew ed either because their selections are not
com prehensive or because they prioritize listings accord-
ing to highest bid will suffer in the marketplace. And even
if they do not, the collective prefer ences of participants
should not be second-guessed. As the representative s of
Yahoo! we cited earlier have argued, users’ reactions must
rem ain the benchmark of quality: Dissatis ed seekers will
defec t from an inadequate search engine to another that
does a better job of indexing and prioritizing. Thus will
the best search engines ourish; the poor ones w ill fade
away due to lack of use. McChesney
1997b, p.12
scribes a comparable faith in the market mechanism as
it applied to traditional broadcast media: In the United
States, the notion that comme rcial broadcasting is the su-
perior system because it embodies market principles is
closely attached to the notion that the market is the only
‘democratic’ regulatory me chanism, and that this demo-
cra tic market is the essence of Americanism, patriotism,
and all that is good and true in the world. Both McChes-
a nd Schiller
, however, have criticized
the idea that a media market best re presents democratic
ideals. In the case of search engines, we are, likewise, not
optimistic about the promise of development that is shaped
only by a marketplace.
A s anyone who has used search engines knows, the
dominant search engines do not charge seekers for the
searc h service. Rather, the arrangement resembles that
of commercial television where advertisers pay television
stations for the promise of viewers. Similarly, search en-
gines attract paid advertisements based on the promise of
searc h usage. High usage, presumably, garners advertisers
and high charges. To succeed, therefore, search engines
must establish a reputation for satisfying seekers’ desires
and needs; this way they will attract seekers in the rst
place, and then will keep them coming back.
A s a way
of simplifying the discussion, however, we refer to the mar-
ketplace as a marketplace in search engines with seekers as
the buyers. This strategy does not, as far as we have been
able to tell, alter the substantive outcomes of the particula r
issues we have chosen to highlight.
We do not dispute the basic fact of the matter, namely
that a marketplace for search engines
and seekers, if you
w ill
is possible. It is also possible that suc h a market,
re ecting discrepant degrees of satisfaction by seekers,
w ill result in some search engines ourishing and others
failing. Our dissatisfaction with this forecast is not that
it cannot come true but what it would mean, from the
perspective of social values and the social investment in
the Internet, if it did. Why, the critic might ask, on what
grounds, would we presume to override the wishes of users
so as they are cleanly re ected in their market choices?
O ur reply to this challenge, which we try to keep as free
from sentimental prejudices as possible, cites two main
sources of concern. One is that the conditions needed for
a marketplace to function in a democratic and ef cient
wa y are simply not met in the case of search engines. The
other is our judgment that Web -search mechanisms are
too important to be shaped by the marketplace alone. We
discuss each in turn, the rst one only brie y.
A virtue frequently claimed by defenders of the mar-
ket mechanism is that participants are free to express their
prefe rences through the choices they make among alterna-
tives. Through their choices, incompetent inef cient sup-
pliers are eliminated in favor of competent, ef cient suppli-
ers. As many critics have pointed out, however, this holds
true only for markets in which those who supply goods
or services have an equal opportunity to enter the market
and communicate with potential customers, and in which
those who demand goods and services are fully informed
and act in a rational manner. Such an ideal m arket simply
does not exist, and this is especially so in the case of search
If we focus on the demand side rst, we see that most
users of the Web lack critical information about alterna-
tives. Only a small fraction of users understand how search
engines work and by what means they yield their results. It
is misleading to suggest that these users are meaningfully
expressing preferences or exercising free choice when they
select from the alternatives. Though we lack systematic
em pirical evidence, the anecdotal results of asking peo-
ple why they use or prefer one search engine to others
is some version of “It nds what I’m looking for” and a
shrug. Now, if one is searching for a speci c product or
service, it may be possible to know in advance how to
determ ine that one has indeed found what one was look-
ing for. When searching for information, however, it is
dif cult
if not impossible
to make such a conclusive as-
sessment, since the locating of information also serves to
inform one about that which one is looking for. This is
an old information-retrieval problem—often expressed as
“how do you know what you do not know until you know
it”—with which information science scholars have been
battling for many years. It seems unlikely that this would
be different for search engines. In fact, the partiality of any
sea rch attempt
even if we assume a competent searcher
w ill magnify this problem in the context of search engines.
N ot only this, we would also claim that users tend to be
ignorant about the inherent partiality present in any search
e ngine search results
as explained earlier, in the techni-
c al overview
. T hey tend to treat search-engine results the
w ay they treat the results of library catalogue searches.
G iven the vastness of the Web, the close guarding of al-
gorithms, and the abstruseness of the technology to most
users, it should come as no surprise that seekers are unfa -
m iliar, even unaw are, of the systematic mechanisms that
drive sear ch engines. Such awareness, we be lieve , would
m ake a difference. Although here, too, we came across
no systematic empirical ndings, we note that in spheres
outside of the electronic media, people draw clear and
de nitive distinctions between information and recom-
m endations coming from disinterested, as compared with
interested, sources, between impartial advice as compared
w ith advertisement.
A nd anecdotal experience bears this
out, as when customers learned that Amazon Books, for
exam ple, had been representing as friendly recommenda-
tions” what were in reality paid advertisements. Customers
r esponded with great ire, and Amazon hastily retreated.
T he problem is equally complex on the supply side of the
supposed market. We have already indicated the complex
hurdles that need to be cleared to get listed and ranked ap-
propriately. They all indicate that there simply is no level
playing eld by any stretch of the imagination. It seems
c lear that the “market will decide” view
problematic in
m ost cases
is extremely problematic in this context. It is
a lso doubtful that this can be resolved to the point where
the market argument will become valid.
T he question of whether a marketplace in search engines
suf ciently approximates a competitive free market is, per-
haps, subordinate to the question of whether we ought to
leave the shaping of search mechanisms to the marketplace
in the rst place. We think this w ould be a bad idea.
D eve lopments in Web searching are shaped by two dis-
tinct forces. One is the collective preferences of seekers.
I n the current, commercial model, search engines wishing
to achieve greatest popularity would tend to cater to ma-
jority interests. While markets undoubtedly would force
a degree of comprehensiveness and objectivity in listings,
there is unlikely to be much market incentive to list sites
of interest to small groups of individuals, such as indi-
viduals interested in rare animals or objects, individuals
w orking in narrow and specialized elds or, for that mat-
ter, individuals of lesser economic power, and so forth. But
popularity with seekers is not the only force at play. The
other is the force exerted by entities wishing to be found.
H ere, there is enormous inequality. Some enter the mar-
ket already wielding vastly greater prowess and economic
power than others. The rich and powerful clearly can in-
uence the tendencies of search engines; their dollars can
a nd in a restricted way do already
play a decisive a role in
w hat gets found. For example, of the top 100 sites—based
on traf c—just 6 are not .com commercial sites.
If we
exclude universities, NASA, and the U.S. government, this
numbe r drops to two. One could reasonably argue that the
U nited Nations site ought to generate at least enough traf c
to be on the list if we consider that Amazon is in position
10 and USA Today in position 35. The cost to a search
e ngine of losing a small number of searching customers
m ay be outweighed by the bene ts of pandering to the
m asses” and to entities paying fees for the various forms of
e nhanced visibility. We can expect, therefore, that at least
some drift will be caused by those wishing to be found,
w hich, in turn, would further narrow the eld of what is
available to seekers of information, association, support,
a nd services.
It may be useful to think of the Web as a market of
m arkets, instead of as just one market. When we seek, we
a re not interested in information in general; rather, we are
interested in speci c information related to our speci c
interests and needs. Seekers might be in the market for
inform ation about, for example, packaged tour holidays
or computer hardware suppliers. For these markets, where
w e expect the demand for information to be great, we
w ould expect the competition for recognition to be great
a s well. Companies would pay high prices for the keywor d
banner s that will ensure them the top spot and a search
w ill generate many hits for the seekers. In contrast, there
a re other, signi cantly smaller markets—for information
a bout a rare medical condition or about the services of a
local government authority or community.
In this market of markets, there is likely to be little
incentive to ensure inclusion of these small markets and
only a small cost
in loss of participation
f or their ex-
c lusion. Although we do not have empirical evidence, we
w ould expect the law of Pareto to apply
see Sen, 1985
We could imagine that a high percentage of search re-
say 80%, for argument’s sake
a re directed to a
sm all percentage
say 20%
of the big markets, w hich
w ould be abundantly represented in search results.
O nly
a small percentage of the search requests
say 20%
be addressed to the large percentage
say 80%
of the
sm aller markets, which would be underrepresented. This
sce nario would explain the limited incentive for inclusion
a nd relatively low cost of exclusion. We nd this result
A market enthusiast does not nd this result problem-
a tic. This is exactly what the market is supposed to do; the
ra nge and nature of choices are supposed to ebb and ow
in response to the ebb and ow of the wants and needs of
ma rket participants—from varieties of salad dressings to
ma kes of automobiles. Nevertheless, we resist this conclu-
sion not because we are suspicious of markets in general—
for cars and salad dressings, they are ne—but because
ma intaining the variety of options on the Web is of special
importance. We resist the conclusion because we think that
the value of comprehensive, thorough, and wide-ranging
acc ess to the Web lies within the category of goods that
E lizabeth Anderson describes in her book Values in Ethics
and Economic as goods that should not be left entirely
at all
to the marketplace
A nderson, 1993
A nderson constructs an elaborate argument defending
the claim that there are ethical limitations on the scope
of market norms for a range of goods
a nd services
. Ab-
stracting principles from cases that are likely to be noncon-
troversial in this regard—for example, friendship, persons,
and political goods
like the vote
—she then argues that
these principles apply to goods that are likely to be more
controversial in this regard, such as public spaces, artistic
endeavor, addictive drugs, and reproductive capacities. For
some goods, such as cars, bottled salad dressings, and so
on, unexamined wants, expressed through the market-
place, are a perfectly acceptable basis for distribution. For
others, including those that Anderson identi es, market
norms do not properly express the valuations of a liberal
dem ocratic society like ours, which is committed to “free-
dom, autonomy and welfare”
A nderson, 1993, p. 141
A lthough it is not essential to our position that we uncrit-
ically accept the whole of Anderson’s analysis, we accept
at least this: that there are certain goods—ones that Ander-
son calls “political goods, including a mong them schools
and public places—that must be distributed not in accor-
dance with market norms but “in accordance with public
A nderson, 1993, p. 159
Sustaining the 80% of small markets that would be ne-
glected by search engines shaped by market forces quali-
es as a task worthy of public attention. Sustaining a full
range of options here is not the same as sustaining a full
range of options in bottled salad dressings or cars because
the former enriches the democratic arena, may serve fun -
dam ental interests of many of the neediest members of our
society, and more
on which we elaborate in the next sec-
. We make political decisions to save certain goods
that might fall by the wayside in a purely market-driven
society. In this way, we recognize and save national trea-
sures, historic homes, public parks, schools, and so forth.
In this spirit, we commit to serving groups of people, like
the disabled, even though
and because
w e know that a
ma rket mechanism would not cater to their needs.
ma ke special accommodation for nonpro t efforts through
tax exemption without consideration for popularity.
see an equivalent need in the case of search engine s.
In order to make the case convincing, however, we need
to introduce into the picture a substantive claim, because
our argument against leaving search engines fully to the
me rcy of the marketplace is not based on formal grounds—
or at least, we do not see them. We base our case against
leaving it to the market on the particular function that we
see search engines serving and on the substantive vision
of the Web that we think search engines
a nd search-and-
retrieval mechanisms more generally
ought to sustain. We
do not argue unconditionally that the trajectory of search
engine development is wrong or politically dangerous in
itself, but rather that it undermines a particular, normative
vision of the Web in society. Those who do not share in this
vision are unlikely to be convinced that search engines are
in kind
f rom salad dressings and automobiles.
T he case that search engines are a special, political good
presumes that the Web, too, is a special good.
3 3
T he thesis we here elaborate is that search engines, func-
tioning in the manner outlined earlier, raise political con-
cer ns not simply because of the way they function, but also
because the way they function seems to be at odds with
the compelling ideology of the Web as a public good. This
ideology portrays the fundamental nature and ethos of the
Web as a public good of a particular kind, a rich array of
comm ercial activity, political activity, artistic activity, as-
sociations of all kinds, communications of all kinds, and
a virtually endless supply of information. In this regard
the Web was, and is still seen by many as, a democratic
me dium that can circumvent the hegemony of the tradi-
tional media market, even of government control.
O ver the course of a decade or so, computerized net-
works—the Internet and now the Web—have been envi-
sioned as a great public good. Those who have held and
promoted this vision over the course of, perhaps, a decade
have based their claims on a combination of what we have
alrea dy achieved and what the future promises. For exam-
ple, with only a fraction of the population in the United
States linked to the Internet, Al Gore
the vision of a Global Internet Infrastructure. This con-
ception of the great public good—part reality, part wishful
thinking—has gripped people from a variety of sectors, in-
cluding scholars, engineers and scientists, entrepreneurs,
and politicians. Each has highlighted a particular dimen-
sion of the Web’s promise, some focusing on information,
some on communication, some on commerce, and so on.
A lthough we cannot enume rate here all possible public
bene ts, we highlight a few.
A theme that is woven throughout most versions of the
promise is that the Web contributes to the public good by
serving as a special kind of public space. The Web earns
its characterization as public in many of the same ways
as other spaces earn theirs, and it contributes to the pub-
lic good for many of the same rea sons. One feature that
pushes something into the realm we call public is that it is
not privately owned. The Web does seem to be public in
this sense: Its hardware and software infr astructure is not
w holly owned by any person or institution or, for that mat-
ter, by any single nation. Arguably, it does not even come
under the territorial jurisdiction of any existing sove reign
T here is no central or located clearinghouse that
spec i es or vets content or regulates overall who has the
r ight of access. All those who accept the technical proto-
c ols, conform to technical standards
H TML, for exam-
, and are able to connect to it may enter the Web. They
m ay access others on the Web a nd, unless they take special
pre cautions, they may be accessed. When I post my Web
page s, I may make them available to any of the millions of
potential browsers, even if, like a street vendor, I decide to
c harge a fee for entry to my page. The collaborative nature
of much of the activity on the Web leads to a sense of the
Web’s being not simply unowned but collectively owned.
T he Web ful lls some of the functions of other tra-
ditional public spaces—museums, parks, beaches, and
schools. It serves as a medium for artistic expression, a
spac e for rec reation, and a place for storing and exhibiting
item s of historical and cultural importance, and it can ed-
uca te. Beyond these func tions, the one that has earned it
gre atest approbation both as a public space and a political
good is its capacity as a medium for intensive communi-
c ation among and between individuals and groups in just
a bout all the permutations that one can imagine, namely,
one -to-one, one-to-many, etc. It is the Hyde Park Corner
of the electronic age, the public square where people may
gather as a mass or associate in smaller groups. They may
talk and listen, they may plan and organize. They air view-
points and deliberate over matters of public importance.
S uch spaces, where content is regulated only by a few f un-
dam ental rules, embody the ideals of the liberal democratic
T he idea of the Web as a public space and a forum for
political deliberation has fueled discussions on teledemoc-
r acy for some time
A bramson et al., 1988; Arterton, 1987
T he notion of the public sphere as a forum in which com-
m unicatively rational dialogue can take place unsullied
by ideology has had one of its strongest proponents in
H abermas
. Although there is no universal agree-
m ent among scholars on the extent of the effect the Web
m ay have in the political sphere, several contributors to the
deba te have cited cases in which the Web appears to have
had a decisive impact on the outcome. D ouglas Kellner
gives some examples: Zapatistas in their struggle
a gainst the Mexican government, the Tiananmen Square
dem ocracy movement, environmental activists who ex-
posed McDonald’s through the McLibel campaign, and the
C lean Clothes Campaign supporting attempts of Filipino
garm ent workers to expose exploitative working condi-
We have not yet mentioned the perhaps dominant rea-
son for conceiving of the Web as a public good, namely,
its function as a conveyor of information. As a public
m eans of access to vast amounts of information, the Web
prom ises widespread bene ts. In this so-called informa-
tion age, being among the information-rich is considered to
be so important that some, like the philosopher Jeroen van
den Hoven
1994, 1998
, have argued that it makes sense
to construe access to information as one of the Rawlsian
“primary goods, compelling any just society to guarantee
a basic, or reasonable, degree of it to all c itizens. Growing
use of the Web as a repository for all manner of information
e .g., government documents, consumer goods, scienti c
a nd artistic works, local public announcements, etc.
incre asing weight to this prescription. The Web, according
to the vision, is not intended as a vehicle for further expand-
ing the gap between haves and have-nots, but for narrowing
see , e.g., Civille, 1996; Hoffman & Novak, 1998
T he view of the Internet as a public good, as a glob-
a lly inclusive, popular medium, fueled much of the ini-
tial social and economic investment in the medium and
its supporting technology, convincing progressive politi-
c ians
or those who wish to appear progressive
to support
it with investment and political backing.
T he vision has
a lso motivated idealistic computer scientists and engineers
to volunteer energy and expertise toward developing and
prom ulgating the hardware and software, from the likes
of Jonathan Postel, one of the early builders of the Inter-
net, who worked to keep its standards open and free,
to professionals and researchers volunteering in efforts to
w ire schools and help build infrastructure in poorer na-
tions. These inclusive values were very much in the minds
of creators of the Web like Tim Berners-Lee:
The universality of the We b includes the fact that the in-
for mation space can represent anything from one’s personal
private jottings to a polished global publication. We as people
c an, with or without the Web, interact on all scales. By be-
ing involved on every level, we ourselves form the ties which
we ave the levels together into a sort of consistency, balancing
the homogeneity and the heterogeneity, the harmony and the
diversit y. We can be involved on a personal, family, town, cor-
porat e, state, national, union, and international levels. Culture
exis ts at all levels, and we should give it a weighted balanced
re spect at each level.
W hile the promise of the Web as a public space and a
public good continues to galvanize general, political, and
c ommercial support, many observers and scholars have
c autioned that the goods are not guaranteed. The bene ts
of the vast electronic landscape, the billions of gigabytes
of information, and the participation of millions of people
a round the world depend on a number of contingencies. Is-
suing one such caution, Lewis Branscomb
ca lls for
political effort to protect public interests against encroach-
ing commercial interests. He worries about the enormous
a mount of money “invested in the new business combi-
nations to exploit this consumer information market; the
dollars completely swamp the modest investments being
ma de in bringing public services to citizens and public
p. 27
, urging federal, state, and local gov-
ernm ent to develop and realize the many non-pro t public
service applications nece ssary for the realization of the
‘promise of NII’
p. 31
G ary Chapman and Marc Rotenberg, writing in 1993
on behalf of the organization Computer Pr ofessionals for
Social Responsibility, listed a number of problems that
would need to be solved before the National Information
Infra structure would be capable of serving the public in-
terest. Of particular relevance to us here is Chapman and
Rotenberg’s reference to Marvin Sirbu’s
ca ll for
“Development of standardized methods for information
nding: White Pages directories, Yellow Pages, informa-
tion indexes.” Without an effective m eans of nding what
you need, the bene ts of an information and communi-
cation infrastructure like the Web are signi cantly dimin-
ished. We can conjure up analogies: a library containing a ll
the printed books and papers in the world without covers
and without a catalogue; a global telephone network with-
out a directory; a magni cent encyclopedia, haphazardly
organized and lacking a table of contents.
Sea rch engines are not the only answer to this need,
but they still are the most prominent, the one to which
most users turn when they want to explor e new territory on
the Web. The power, therefore, that search engines wield
in their capacity to highlight and emphasize certain Web
sites, while making others, essentially , disappear, is con-
siderable. If search engines systematically highlight Web
sites with popular appeal and mainstream commercial pur-
pose, as well as Web sites backed by entrenched economic
powers, they amplify these presences on the Web at the ex-
pense of others. Many of the neglected venues and sources
of information, suffering from lack of traf c, perhaps ac-
tually disappear, further narrowing the options to Web
If trends in the design and function of search engines
lead to a narrowing of options on the Web—an actual nar-
rowing or a narrowing in what can be located—the Web
as a public good of the particular kind that many envi-
sioned is undermined. The ideal Web serves all people,
not just some, not merely those in the mainstream. It is
precisely the inclusivity and breadth that energized many
to think that this technology would mean not just business
as usual in the electronic realm, not merely a new tool for
entrenched views and powers. The ideal Web would extend
the possibilities for association, would facilitate access to
obscure sources of information, would give voice to many
of the typically unheard, and would preserve intensive and
broadly inclusive interactivity.
In considering the effects of a biased indexing and re-
trieval syste m, our attention rst was drawn to the seekers.
It is from the perspective of seekers that we noted the sys-
tema tic narrowing of Web offerings: There would be fewer
opportunities to locate various types of information, indi-
viduals, and organizations, a narrowing of the full range
of deliberative as w ell as recreational capabilities. If ac-
cess to the Web is understood as access by seekers to all
of these resources, then the outcome of biased search en-
gines amounts to a shrinking of access to the Web. This
perspective, however, does not represent all that is at stake.
A t stake is access to the Web in the shape of those, in ad-
dition, who would like to be found, to be seen and hear d.
Ma rc Raboy describes this dimensions of the new medium:
The notion of access” has traditionally meant different
things in broadcasting and in telecommunications. In the
broadca sting model, emphasis is pla ced on the active receiver,
on free c hoice, and access refe rs to the entire range of prod-
ucts on offer. In the telecommunications model, emphasis is
on the sender, on the capacity to get one’s messages out, and
ac cess refers to the means of communication. In the new me-
dia environment, public policy will need to promote a new
hybrid model of communication, which combines the social
and cultural objectives of both broadcasting and telecommu-
nica tions, and provides new mechanisms—drawn from both
tra ditional models—aimed at maximizing equitable access to
se rvices and the means of communication for both senders
and receivers
Raboy, 1998, p. 224
T he public good of the Web lies not merely in its func-
tioning as a repository for seekers to nd things, but as
a forum for those with something
goods, services, view-
points, political activism, etc.
to offer. The cost of a biased
searc h-and-retrieval mechanism m ay even be greater for
Web-site owners wishing to be found—the senders. Con-
sider an example of just one type of case, someone seeking
information about, say, vacation rentals in the Fiji Islands.
Be cause one rental is all the person needs, he or she is likely
to look down a list of options and stop looking when he or
she nds it. There is no loss to the seeker even if it turns out
that lower down on the list there are many other candidates
me eting his or her criteria. The seeker has found what he or
she needs. Those who are not found
because their lower
ranking deprives them of attention or recognition
ar e of-
fering, arguably, just as much value to the seeker. Our loss,
in this case is twofold: One is that if continuing invisibil-
ity causes options to atrophy, the eld of opportunity is
thinned; the other is that many of those reaching out for
attention or connection are not being served by the Web.
If search mechanisms systematically narrow the scope of
w hat seekers may nd and what sites may be found, they
w ill diminish the overall value of the Web as a public forum
and as a broadly inclusive source of information.
Many have observed that to realize the vision of the
Web as a democratizing technology or, more generally, as a
public good, we must take the question of access seriously.
We agree with this sentiment but wish to expand what the
term covers. Access involves not merely a computer and
a network hookup, as some have argued, nor, in addition,
the skills and know-how that enable effective use. Access
im plies a comprehensive mechanism for nding and being
f ound. It is in this context that we raise the issue of the
politics of search engines—a politics that at present seems
to push the Web into a drift that does not resonate with one
of the historically driving ideologies.
We also believe we
have shown why a rally to the market will not save the day,
w ill not ensure our grand purpose. The question of how to
a chieve it is far harder.
We have claimed that search-engine design is not only a
tec hnical matter but also a political one. Search engines
a re important because they provide essential access to the
Web both to those with something to say and offer and to
those wishing to hear and nd. Our concern is with the
evident tendency of many of the leading search engines to
give prominence to popular, wealthy, and powerful sites at
the expense of others. This they do through the technical
m echanisms of crawling, indexing, and ranking algorithms
a s well as through human-mediated trading of prominence
f or a fee. As long as this tendency continues, we expect
these political effects will become more acute as the Web
We regret this tendency not because it goes against our
per sonal norms of fair play but because it undermines a
substantive ideal—the substantive vision of the Web as an
inclusive democratic space. This ideal Web is not merely a
new communications infrastructure offe ring greater band-
w idth, speed, massive connectivity, and more, but also a
platform for social justice. It promises access to the kind
of information that aids upward social mobility; it helps
people make better decisions about politics, health, educa-
tion, and more. The ideal Web also facilitates associations
a nd communication that could empower and give voice to
those who, traditionally, have been weaker and ignored.
A drift toward popular, commercially successful institu-
tions, through the partial view offered by search engines,
ser iously threatens these prospects. Scrutiny and discus-
sion are important responses to these issues but policy and
a ction are also needed—to ll that half-empty portion of
the cup. We offer preliminary suggestions, calling for a
c ombination of regulation through public policy as well
a s value-conscious design innovation.
T he tenor of our suggestions is enhancement. We do not
see that regulating and restricting development of commer-
c ial search engines is likely to produce ends that we would
value—as it were, siphoning off from the half-full portion.
T his course of action is likely to be neither practically ap-
pea ling nor wise, and might smack of cultural elitism or
pater nalism. Amartya Sen
1987, p. 9
, commenting on
existing schools of thought within the eld of economics,
w rote: “It is not my purpose to write off what has been or is
being achieved, but de nitely to demand more. We take
a similar stance in response to our study of Web searc h
e ngines.
A s a rst step we would demand full and truthful dis-
c losure of the underlying rules
or algorithms
indexing, searching, and prioritizing, stated in a way that
is meaningful to the majority of Web users. Obviously,
this might help spammers. However, we would argue that
the impact of these unethical practices would be severely
dam pened if both seekers and those wishing to be found
w ere aware of the particular biases inherent in any given
sea rch engine. We believe, on the whole, that informing
users will be better than the status quo, in spite of the
dif culties. Those who favor a market mechanism would
per haps be pleased to note that disclosure would move
us closer to ful lling the criteria of an ideal competitive
m arket in search engines. Disclosure is a step in the right
direc tion because it would lead to a clearer grasp of what
is at stake in selecting among the various search engines,
w hich in turn should help seekers to make informed de-
c isions about which search engines to use and trust. But
disclosure by itself may not sustain and enhance Web of-
fe rings in the way we would like it to—that is, by retaining
transpare ncy for those less popular sites to promote inclu-
T he marketplace alone, as we have argued, is not ade-
quate. As a policy step, we might, for example, consider
public support for developing more egalitarian and in-
c lusive search mechanisms and for research into search
a nd meta-search technologies that would increase trans-
par ency and access. Evidently, if we leave the task of chart-
ing the Web in the hands of commercial interests alone, we
w ill merely mirror existing asymmetries of power in the
very structure of the Web
M cChesney, 1999
. A lthough
these and other policies could promise a fairer representa-
tion of Web offerings, a second key lies in the technology
Values in Design
P hilosophers of technology have recognized the intricate
c onnection between technology and values—social, polit-
ica l, and moral values.
T hese ideas—that technological
systems may embed or embody values—resonate in so-
c ial and political commentary on information technology
w ritten by engineers as well as by philosophers and ex-
per ts in cyberlaw
see , e.g., Friedman, 1997; Lessig, 1999;
N issenbaum, 1998
. Translating these ideas into practice
implies that w e can build better systems—that is to say,
systems that better re ect important social values—if we
build them with an explicit commitment to values. With
this article, the commitment we hope to inspire among
the designers and builders of search engine technology is
a commitment to the value of fairness as well as to the
suite of values represented by the ideology of the Web as
a public good.
Two technical approaches that appear to be attracting
interest are not without drawbacks. One would increase
segmentation and diversi cation. Search engines would
becom e associated w ith particular segments of society—
borders drawn perhaps according to traditional categories
sports, entertainment, art, and so forth
. A problem with
segmentation overall, however, is that it could fragment
the very inclusiveness and universality of the Web that we
value. The Web may eventually merely mirror the institu-
tions of society with its baggage of asymmetrical power
structures, privilege, and so forth.
T he other approach is to develop individualized spiders
that go out and sear ch for pages based on individual cri-
teria, building individualized databases according to in-
dividual needs.
The re is, however, a signi cant “cost”
in automatic har vesting via spiders that even the existing
population of spiders imposes on system resources; this
has already caused concern
see Kostner, 1995
T here is much interesting work under way concerning
the technology of search engines that could, in principle,
help: for example, improving the way individual pages
indicate relevance
also referred to as metadata
) (
see Mar-
chiori, 1998
, re ning overall search engine technology,
and improving Web resource presentation and visualiza-
see Hearst, 1997
and meta-search technology
L awrence & Giles, 1998
. Although improvements like
these might accidentally promote values, they hold gr eat-
est promise as remedies for the current politics of search
engines if they are explicitly guided by values. We urge
engineers and scientists who adhere to the ideology of
the Web, to its values of inclusivity, fairness, and scope
of representation, and so forth, to pursue improvements
in indexing, searching, accessing, and ranking w ith these
values rmly in their sights. It is good to keep in mind
that the struggle to chart the Web and capture the atten-
tion of the information seekers is not merely a technical
challenge, it is also political.
1. In an online survey the NDP Group polled 22,000 seekers who
ac cessed search engines to determine their satisfaction with the search
engine . Ninety-six percent
indic ated that they were satis ed
wit h the search results. This would seem to go against our argument.
However, in another study done by researchers from British Telecom
, PC-literate but not regular users of the Internet found their search
re sults disappointing and generally “not worth the effort”
Pollock &
Hockley, 1997
. This may indicate that a fairly high level of searching
skil l is necessary to get what you want. We return to this issue when we
disc uss the market argument for the development of search engines .
2. Winner, L. 1980. Do art ifacts have politics? Daedalus 109:121
3. For those interested in more detail, the Web site http://www.
se is a good pla ce to start.
4. We are thinking here of the top 10 to 20 when it is a matter of
ac tual relevancy. We later discuss the issue of spamming.
5. One could argue that it is also possible for a Web page to be
found through portal sites, which are increasingly popular, though as a
mat ter of fact, we think it would be highly unlikely that a link would be
es tablished through a portal site if it does not meet the indexing criter ia
for search engines.
6. We realize we have not listed all the means through which
pages may be found. For example, one may access a page through
an outlink from another page. The problem with such means is that
they depend on somewhat unpredictable serendipity. One needs also
to add that increasing numbers of alternatives are emerging as viable
options, such as portal sites and keyword retrieval via Centraal’s Real
Name system
ht tp://
. Nevertheless, the majority of
those who access the Web continue to do it through search engines.
There is no reason to believe that this would change in the foreseeable
future .
7. We not e, for readers who are aware of the debate currently
ra ging over domain names, that an effective system of search and re-
tr ieva l is a constructive response to the debate and would lessen the
impa ct of whatever decisions are made. We argue that domain names
ar e important in inverse proportion to the ef cacy of available search
mec hanisms, for if individuals and institutions can easily be found on
the basi s of content and relevancy, there is less at stake in the precise
formula tion of their domain names. In other words, a highly effective
indexing and retrieval mechanism can mitigate the effects of domain-
name assignments.
8. A stop word is a frequently occurring word such as the, to, and
we that is excluded because it occurs too often. Stop words are not
indexed. This is not insigni cant if one considers that the word web”
is a stop word in Alta Vista. So if you are a company doing Web design
and have “Web design” in your title, you may not get indexed and will
be ranked accordingly.
9. The <TITLE> tag is either created by the Web-page designer
or deduced by a converter. For example, when you create an MSWord
document and want to publish it on the Web, you can save it as HTML
dire ctly in the MSWord editor. In this case the MSWord editor assumes
that the rst sentence it can nd in the document is the title and will
plac e this in the <TITLE> tag in the HTML source code it generates.
10. Most of the directory-based search engines also use some form
of automatic harvesting to augment their manually submitted
data base.
11. When pars ing the page, the spider views the page in HTML
forma t and treats it as one long string of words, as explained by Alta
Vista: Alta Vista treats every page on the Web and every article of
Use net news as a sequence of words. A word in this context means any
st ring of lettersand digits delimited either by punctuation and other non-
al phabetic characters
f or example, &, %, $, /, #,
, or by white
spa ce
spac es, tabs, line ends, start of document, end of document
To be a word, a string of alphanumerics does not have to be spelled
cor rectly or be found in any dictionary. All that is required is that
some one type it as a single wor d in a Web page or Usenet news article.
Thus, the following are words if they appear delimited in a document:
HAL5000, Gorbachevnik, 602e21, www, http, EasierSaidThanDone,
e tc. The following are all considered to be two words because the
i nternal punctuation separates them: dont,, x
y, AT&T,
3.14159, U.S., All’sFairInLoveAndWar.
12. Page is one of the designers of Google, and the details presented
he re are the heuristics used by Google
a t least the earlier version of
t hese heuristics
13. We are not claiming that this is a straightforward and uncontro-
ver sial metric. The decision about the “similarity” between the query
t erm and the document is by no means trivial. Decisions on how to im-
ple ment the determination of “similarity” can indeed be of signi cance
t o our discussion. However, we do not pursue this discussion here.
14. In the cases of Excite, Hotbot, and Lycos, there is evidence
t hat this is a major consideration for determining indexing appeal—
r efer to
Excl usion, us ing this metric, is le ss likely for a search engine like
Al ta Vista, which goes for massive coverage, than for its smaller, more
s elective competitors.
15. For search-engine operators it is a matter of deciding between
bre adth and depth: Should many sites be partially indexe d or few
s ites fully indexed, since they know a priori that they can not in-
c lude everything?
Br ake, 1997
Louis Monier, in a response to John
Pi ke—Webmaster for the Federation of American Scientists site—
i ndicated that Alta Vista indexed 51,570 of the estimated 300,000
pa ges of the Geocities site. This amounts to approximately 17% cov-
e rage. He thought this to be exceptionally good. Pike indicated that
Al ta Vista indexed 600 of their 6000 pages.
Ref er to this discussion at 11638.html and
htt p:// 13066.html as
we ll as to the New Sci entist paper at
16. For a discussion of this standar d, refer to http://info.webcrawler.
17. Another reason for excluding spiders from sites such as CNN
i s that their content is constantly in ux and one does not want search
e ngines to index
a nd now cache
old cont ent. Another issue worth
noti ng here is that many search engines now have large caches to go
a long with t heir indexes.
18. Refer to the New Scientist paper at
keysit es/networld/lost.html. The “cost of a spider visit can be signif-
i cant for a site. Responsible spider will request a page only every so
ma ny s econds. However, the pressure to index has induced what is
t ermed “ra pid re. This means that the spider requests in rapid suc-
c ession, which may make the server unavailable to any other user.
Al though there is a danger that this problem will worsen, there seems
t o be a generally optimistic view among experts that we will develop
t echnical mec hanisms to deal with it, for example, proposals to devise
exte nsions to HTTP, or parallel spiders.
19. Although at present some spiders are unable to de al with fea-
t ures such as fra mes and are better with simple HTML les, there are
s piders that have been developed that are now able to handle a variety
of formats.
20. Lee Giles disputes this. He still considers indexing to be a huge
proble m.
21. Also referred to as spamdexing. Refer to
cyber /index/metatags.html for a reasonable discussion of this issue.
22. “To stay ahead of the game, the major search engines change
t heir methods for determining relevancy rankings every few months.
This is usually when they discover that a lot of people have learned the
l atest technique and are all sneaking into a side door. They alsotr y to fool
t he tricksters . . . sometimes they put irrelevant pages at the top of the list
j ust to cause confusion”
Pat rick Anderson & Michael Henderson, ed-
i tor & publisher, Hits To Sales, at
23. At the WWW7 Conference, researchers in Australia devised
a n ingenious method for attempting to reverse-engineer the relevance-
r anking algori thms of various commercial search engines, causing con-
s ternation and some outrage—see Pringle et al.
24. Lawsuits have been led by Playboy Enterprises, Inc., and Es-
t ee Lauder Companies, Inc., challenging such arrangements between
Exci te, Inc., and other companies that have “bought” their respective
na mes for purposes of banner ads. See Kaplan
25. “If you want the traf c and the expos ure, you are going to pay
f or the education or you are going to pay for the service. There is no
othe r way to do it. It is not easy. It is not magic. It takes time, effort, and
knowledge. Then it takes continual monitoring to keep the position you
worked so hard to get in the rst place. Please do not misunderstand—
t he competition is erce and severe for those top spots, which is why
t he search engines can charge so much money to sell keyword banners
Anders on & Henderson, 1997, emphasis added
26. Some large sites
universiti es, for example
a llow users to sub-
mi t keywords, which the site, in turn, submits to a particular default
s earch engine
fre quently Yahoo!
. If users select “search” on the
Ne tscape toolbar it takes them to the Netscape Web pages where they
have a list of search engines. In this case Excite is the default search
e ngine. There is clearly considerable advantage to being chosen as the
de fault search engine on the Netscape or other equivalent Web page.
27. This is because, as Giles and Lawrence remarked in verbal con-
s ultation, there is a fair degree of convergence in the results yielded by
various search engine algorithms and decision criteria.
28. One should also note that search e ngines also market them-
s elves aggressively. They also establish agreements with other service
providers to become defaults on their pages. Refer to footnote 26.
29. As noted by one of the reviewers, this is equally true outside
t he electronic media.
30. Refer to for the latest list.
31. And engines that use link popularity for priority listing will be
even mor e prone to reifying a mode of conservatism on the Web.
32. This guess is not far from reality, as searches for sex-related
key terms are by far the most frequent—constituting perhaps a s high a
pe rcentage as 80% of overall searches.
33. Our discussion of the Web would probably be more accurately
a ddressed to the Internet as a whole. We think that the more inclusive
dis cussion would only strengthen our conclusions but would probably
i ntroduce unnecessary complexity.
34. See Johnson and Post
. This article puts forward an ex-
t reme version of this view. We will not engage further in the debate.
35. Popular news media re ect the hold of this vision of the Web.
I n an article in The New York Times about the Gates Learning Foun-
da tion’s recent donation for public-access computers to libraries, the
gif t is discussed in terms of bridging economic inequality and over-
c oming technical illiteracy. Librarians are qu