Content uploaded by Birger Hjørland
Author content
All content in this area was uploaded by Birger Hjørland on Feb 02, 2015
Content may be subject to copyright.
Subiect
Access
Points
in
Electronic
Retrieval*
BIRGER
HJgRLAND
UniversitY
College
of
BorAs
LYKKE
KYLLESBECH
NIELSEN
noyat
School
of
Library
and
Information
Science,
CoPenhagen
INTRODUCTION
This
is
the
first
ARIST
chapter
devoted
to
subject
access
points
(SAPs,alsocalledsearchfieldsordocumentrepresentations)indata-
bases.
The
term
is
used
here
in
a much
wider
sense
than
just
headings'
ROWLEY&FARROWusethisnarrowersenseandfindthat,,the
concept
of
the
access
point
belongs
to
manually
searched
indexes'
and
is
arguably
irrelevani
to
databasJs
with
search
systems
allowing
key-
*o.i
u.""rr"
(p. 253).
In
our
wider
sense,
SAPs
are
fundamental
to
any
kind
of document
retrieval.
This
subject
has
earlier
been
scattered
in
many
different
chapters
(especially
tho9l-on
document
representation'
whlch
have
not
been
reviewed
in ARlsT
since
1974
by
HARRIS)'
A
systematiccumulatingoffindingsrelatedtoeachkindofsubjectaccess
iata
has
never
been
Jndertaken
in ARIST
or
elsewhere,
although
texts
suchasthatbyLANCASTERcovermuchoftherelevantfindings.This
review
cannot
cover
all relevant
studies
but
concentrates
on
the
broader
theoretical
PersPective.
SUBJECT
AND
ACCESS
DATA
Much
more
research
has
been
done
on
searching
and
retrieving
documents
and
informationl
and
on
users
than
on
access
points'
Re-
trieval,
however,
is
essentialt
f
^"
use
of
access
data'
and
people
*WethankRayaFidel,rectoremeritusTorHenriksen'andotherreviewersforvaluable
feedback
during
the
writing
of
this
article'
1
In
this
article
the
word
"information"
is
used
synonymously
with
data'
which
we
believe
is
the ordinary
,".rr"
of
this
word.
ln othei
pup..t
*"
ipply
a more
Shannon-
inspired
meaning
of
inforrnliion.
However,
it
is outsideihe
scope
oi
this
article
to
discuss
thii concept
further.
Annual
Review
of
lnformation
science
and
Technology
(AFISI),
Volume
35,
2001
Martha E. Williams,
Editor
Published
for the
American
society
for
Information
science
and
Technology
(ASIST)
By Information
Today,
Inc',
Medlord'
NJ
249
250
BIRGER
HIORLAND
AND
LYKKE KYLLESBECH
NIELSEN
cannot
use something
that
is not
there. Access
points determine
in a
rather
firm
way the
objective
possibilities
that are
provided
for the
talented
user
(or
for
any formalized,
algorithmic,
or automatic
proce-
dure).
Therefore,
it is
essential in
information
science (IS)
to develop
knowledge
about
what kinds
of
subject data
exist as
well as
the
strengths,
weaknesses,
and
relative
contributions
of each kind.
For
example,
what
proportion
of a given
set of relevant
documents
is missed
by
using
only
one
access
point
such as
words from
titles?
How
much
can additional
access
points
increase
recall,
and
how
do they affect
precision?
Only
from
such
knowledge
are we able
to
study how
the users
or
algorithms
utilize
these
possibilities,
which
form
the subjective
factors
in
retrieval.
For
example,
if people
(or
algorithms)
do not use
references
as access
points (in
citation
databases)
because
they
do not
know
about
this
possibility
or they
misjudge
it,
then an
objective
possibility
in
retrieval
is not
utilized.
Knowledge
about
SAPs is
also
crucial
in relation
to
the
design
of
information
systems
because
it is
related
to the
fundamental
question
of which
possibilities
should
be
provided.
It is
rather
triviar
io
think
that
systems
should
provide as
m;rny
retrieval
possibilities
as pos-
sible-to
believe,
for
example,
that databases
providing
access
to
search-
ing
abstracts
are
better than
those that
do not,
other
things
being
equal.
Faster
access
to
more
information
is an
important
demand
from
users,
but
this is
primarily
provided
by better
computer
technology,
espe-
cially
storage
technology,
not
by IS.
The
availability
of manliindJ
of
access
points in
databases
demands
much
space,
which
is
provided
by
developments
in information
technology (IT).
There
has
therefore
been
an
IT-driven
growth
in subject
access
data
that is
outlined
below.
This
growth
is mainly
quantitative,
while
the
qualitative
ways in
which
the
technological
potential
has
been
utilized
is a central
issue for
research
in IS.
Information
science
is
concerned
about
how IT
developmenrs
c;rn
best
be
used
to represent
and
to retrieve
documents
and inJormation.
This is
related
more to
qualitative
characteristics
of
subject
access
points
than
to
quantitative
issues.
IS
should
ask
questions
such
as:
Given
certain
constraints,
what
are
the
optimal
ways
to
design
a
system?
Theoretically
we
should
have
a
comprehensive
knowledgl
or
the tinds
of access
data and their
characteristics.
Each
existing
retrieval
system
should
then
be seen
as
realizing
more
or fewer
of these
possibilities.
The
term
information
retrieval
(IR)
was
introduced
6y MOOERS
in
L951.
He
also
introduced
the
term "information
retrieval
languages"
as
the generic
term
for
classification
codes,
keywords,
free-texi
reirieval,
and
other
search
elements
or sAPs.
At
the same
time
the
empiricist,
experimental
approach
to document
retrieval (references,
suriogates,
or information)
was founded
as an
important
research
tradition
in
IS.
SUBIECT
ACCESS
POINTS
IN ELECTRONIC
RETRIEVAL
251
This tradition
is
nor:nally
termed
the
inJormation
retrieval
tradition
in
IS, and
it has some
distinctive
characteristics
that distinguish
it from
other research
traditions
within
the
field, such
as the
facet
analytic
tradition,
the
cognitive
approach,
and semiotic
approaches'
Analyti-
cally it
is important
to
distinguish
IR as
a field of
study
from
IR
as a
specific
approach
or research
tradition
because
diJferent
traditions
may
provide
useful contributions
to this
field.
(The
IR tradition
may,
like
empiricism
in general,
have
certain blind
spots.)
The basic
element
in IR
is the user's
interaction
with
a database
(or
with electronic
information
environments
such
as the
World
Wide
Web).
The user
has a query2
that
has
to match,
more
or less
exactly
or
directly,3
some elements,
which
may be termed
access
points,
search
keys, retrieval
keys,
data elements,
or
document
representations.
There
are many
kinds of such
access
points,
they
have many
different
func-
tions,
and they
have different
informational
values
in different
search
situations.
What are
these subject
access
points?
M*y
texts in
tS differentiate
between
subject
access
data
and
so-
called
descriptive
data
and other
kinds
of data
such
as call
numbers.
Metadata
is
the
generic
term
for all such
kinds of
data.
ln major
research
libraries, librarians
usually
provide
the
descriptive
data
and
subject
specialists
provide
the subject
data.
Many
people
think
that
there
is a
clear
and sharp
functional
division
among
subject
data,
descriptive
data,
and other
kinds of
metadata.a
This
was virtually
true
in
the age
of
printed card
catalogs,
where the
descriptive
data
allowed
for
searches
for
known
items and subject
data
allowed
for searches
for
known or
unknown
documents
about
a given
subiect.
In
the age of
electronic
retrieval, however,
there
is no clearcut
functional
division.
AII
words in
titles
have
become searchable,
and titles
are thus both
descriptive
ele-
ments
and SAPs. Search
profiles
can include
many
kinds
of data.
HyPo-
thetically,
it may be
relevant
to limit a
subject search
according
to the
name of
a publisher, a
journal,
or even
a language
code.
Subject
data are
2
There have been attempts
in IR
to avoid
queries, and systems
that
allow
"navigating"
seem to avoid this concept.
We do
not see this
as a theoretical
problem
for
our views on
subjects; we do
not disoiss it here.
We are
also aware that
advanced
technologies,
such
as iatent Semantic lndexing
(lSI),
can
retrieve
relevant
documents
even
when they
do
not
share any words
with the
query. LSI uses
statistically
derived
lo1cepts"
to.improve
searching
performance
(see
GORDON
& DUMAIS).
However,
such
"concepts"
must
be
based on subject
access points,
so knowledge
of these
still
is necessary
3
A dircct maich
is obtaiired
in
systems
based
on
Boolean logic.
Such
a match
is between
words
(a
lexical
match), not
bitween concepts
(a
semantic
match).
lmplicit
or
latent
semantic
matches
can be obtained
by taking
advantage
of the
implicit
higher
order
structure in
the association
of
terms with
documents.
Such strucfures
rePresent
imPortant
associative relationships
that are
not evident
in
individual
documents
(cf'
BERRY
ET
AL.),
{
such a sharp dichotomy
can be
found
in, for example,
a
Danish dictionary
of
informa-
tion
science,-fiy'ormatioisorilbogen,
published
in 1996
by
The Danish
standardization
Organization
(FRII9HANSEN
ET
AL.).
BIRGER
HJoRLAND
AND
LYKKE
KYLLESBECH
NIELSEN
]F9*"
is opposed
to rheme:
what
an
author
tells
about
a
theme.
6
"lnformation
analysis"
is,
ror
exampte,
;;;&;;#t
anarysis
in
the
INSpEc
database.
252
not
strictly
limited
to
specific
kinds
of
data;
under
specific
circum-
stances
any
kind
?{
gtu
may
serve
to
identify
ao.rrirr"r,t,
about
a
subject
(cf.
HJ'RLAND,
1997,'pp.11,-32).But
what
is
that
,,something,,
that_subject
data
are
meant
to
iientifyz
finrat
are
subiecisz
----'
-
"subject"
is
one
of
several
related
ierms
used
in
the
literature.
Terms
that
are
sometimes
considered
synonyms
and
sometimes
used
with
different
meanings
are
shown
below:
'
fubject
(subject
matter;
subject-predicate)
Aboubress
Topic
(topic
ality
;
topic
/comment)
Theme
(with,,central
theme,,
ur,j
*,"
German
,,leitmotiv,,)s
PoTe
(cognitive
domain,
scientific
domain)
Field
(information
field,
field
of knowledge,
field
of
research)
Content
Information6
other
(including
related
terms
such
as
"discipline,,and
,'concept',)
These
concepts
are
consider"g
ygty
difficurt
both
in
IS
and
in
linguis-
tics,
and
when
used
in
other
fields
such
as
semiotics,
psychology,
and
cognitive
sciences'
one_proposal
for
differentiation
of
some
of
these
terms
is
given
by
BERMER
(p.192)'
In
his
opinion,
subject
indexes
are
different
from,
and
can
be
contrastea
with,
indexes
to
concepts,
topics,
and
words.
Subjects
are
what
authors
are
working
and
reporting
on.
Presentations
can
be
organized
into
topics
and
use
iords
ur,i
*.,."prr.
A
document
can
have
the
subject
of
Chromatography.
papers
usrng
Chromatogrlphy
as
a
research
method
or
discusiini
iir-"
i"Lrecrion
do
not
have
chromatography
as
subjects.
rndexers
.:r"
"*iif
atir,
-a
indexing
concepts
ana
woras
rather'than
subjects,
but
this
i's
not
gooa
indexing'
Bernier
does
not,
however,
differentiate
authors'
subjects
from
those
of
the
information
seeker.
e
,rs".
may
want
a
document
about
a
subject
that
is
different
from
the
one
intended
by
its
author.
From
the
point
of
view
of
information
systems,
the
subject
of
a
docu-
ment
is
related
to
the
questions
that
the
document
can
answer
for
the
user'
such
a
distinction
between
a
content-oriented
and
a
request-
oriented
approach
is
emphasized
by
soERGEL
(19g5).
a .t""r,-.i-
ented.
approach
impries
that
su-bpci
analysis
should
ir,.,,
p.-"a1.t
u,r"
questions
that
the
document
will
help
to
answer.
Based
on
zuch
analy-
ses'
HJ@RLAND
(1994
proposes
thit
subjects
are
the
epistemological
or,infonnative
potentials
o?
doc.rmer,t
,
La
he
sees
if,"
;"i
"f
U,r"
indexer
as
that
of
predicting
the
most
important
future
uppti.utior,,
or
SUBIECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
253
the
document.
This
view
corresponds
to
the
functional
theory
about
sources
in
history,
which
states
that
what
co'nts
as
an information
ro:r"_"
is
always
relative
to
the
question
that
it is
supposed
to answer.
In
linguistics,
the
corresponding
concept
is
mostly
known
as
,,topic,,
(which
is
contrasted
by
the notior,
of
"cor-nent,"
i.e-
what
is
said
about
a
given
topic).
A
concise
encyclopedic
article
on this
topic
with
further
references
is
provided
by
vAN
ruppEvrrr.
A
rg7s'conference
was
devoted
to,
subject
and
Topic
at
the
university
of
califomia,
santa
Barbara
(LI).
In
one
of the
papers,
CHAFE
heats
a range
of
phenomena
r.el1t9d
to
subject
topic,
point
of view,
givenness,
contrastiveness,
and
definiteness.
In
her
text,
NoRD (1991);ddresses
subject
matter
from
the
point
of
view
d
FT:l1!on
theory.
In
psychotogy,
subject/predi_
cate
has
been
treated
bv
HORNBy.
.
In
recent
years
the
terms_"topic"
and
"topicality"
have
been
popular
in IS.
Many
writers
(e.g.,
BoycE
and
wANG
& SOERGEL)
agree
that
topicality
is
only
one
of
many
factors
influencing
rerevance,
f,ut
they
have
not
succeeded
in
defining
this
concept
in
i
clear
way.
GREEN
(1995)
and
GREEN
&
BEAN
found
that
there
is
not
one
kina,
but rather
many
kinds
of relatio:rships
between
texts
and
questions
that
are
per-
ceived
as
being
"on
the
same
topic.,,
They
have
-not,
however,
consid_
ered
how
concepts
such
as
aboutness,
theme,
or
subject
relate
to topic.
These
are
different
concepts
that
peopre
use
when
searching
for
un-
known
documents,
but
we
do not
know
much
about
how
such"concepts
diJferor
overlap
in
ordinary
use,
nor
have
we
any
theory
that
provides
a
well-defined
meaning
for
these
concepts.
According
to
yeNfS
1p.
767): "over
the
last
several
decades,
a number
of
other
liords
have
been
used
to
not
only
describe
what
goes
on in
peopre's
heads
when
they
make
judgments
about
documenis,
but
arso
io
uik
thurn
to tell
us
about
it.
our
results
might
lead
one
to
believe
that
these
several
concepts
and
terms
overlap
. .
. .
But
it
Tay
go
further
than
this.
perhaps
wiat
we
have.called
'topicality,'
rtility,'
batisfaction,'
'pertinence,,
u.d
u
variety
of
other
names
are in
fact
dimensions
of u
l*g"r,
multidimensional,
.dnd"
concept
. .
.
"
This
problem
is
still
unJolved,
although
some
hints
are
given.
For
example,
WANG
&
SOERGEL
suggest
thatYri"ld"
i,
(or
should
be
used
as)
a
broader
term
than "topic,"
uriJ
noycp (p.
109)
suggests
that
the
use
of
references
or citation
indexes
is a
recall-oriented
technique
in
which
each
iteration
brings
in
more
and
more
documents
of
.questioniltg
topicatity.
This
last
srlggestion
points
to a
d.ifference
between
a
field
defined
as
a
network
of
citing
papers
and
a
topic
defined
as
a
conceptual
or tenninorogical
structure.
what
kind
of theory
lt l,::O:O
to
clarify.
these
concepts
further?
Because
they
are
concepts
about
structures
in
knowledge,
epistemology
is
the
most
relevant
disci-
pline.
Different
theories
in
epistemology
imiiy,
however,
differmt
views
of
knowledge
structures.
ehssical
i"uo"arr-
imagines
a
highly
or-
254
BIRGER
HIORLAND
AND
LYKKE
KYLLESBECH
NIELSEN
dered
universe
of
knowledge,
in
which
every
concept
has
its
well-
defined
place
in
relation
to
all
other
concepts.
The
modem
view
is
much
more
pragmah.c-viz,ihat
knowledge
serves
cognitive
systems
and
that
the
structures
of
knowledge
reilect
the
neels
and
behavior
of
activity
systems
and
discourse
communities.
This
view
implicates
that
the
concepts
we
are
tatking
about (e.g.,
topic)
are
concepts
vie
use
about
units
or parts
relating
to
(human)
communication
and
that
their
defini-
tion
must
be
grounded
in
sociocognitive
theories.
Different
kinds
of
sAps
describe
the
subject
of
a
given
document
in
different
ways,
such
as
more
or
less
exhaustive,
''o.i
o, less
general
or
specific,
in
a
more-or-less
open
or
closed
way,
and
so on.
VJst
impor_
lantly,
they
may
describe
the
subject
of
a
document
from
diJferent
interpretations
of
the
relevance
of
the
given
document
to
fufure
ques-
tions
put
to
the
database.
Because
ut y
do.rrrt
"nt
can
in
principle
an-
swer
an
unlimited
number
of
questions,
subject
analysis
prioritizes
the
most
important
questions
that
the
docu'eni
is rupposed
to answer
in
the
future.
The
most
varuable
sAps
are
those
that'make
it
possible
for
T"::u-r
jo
identify
the
most
highly
relevant
documents,
that
is,
make
the
highly
relevant
documents
the
most
visible
in
the
database
at
the
expense
of
less-relevant
documents.
Major
Technology-Driven
Stages
in
the
Development
of
Subject
Access
points
(SAps)
Manual-indexing
and
classification
in
libraies.
This
first
stage
has
deep
roots
in_
the
history
of libraries
and
comprises
especianyiooks
and
other
physical
units.
A
more
formal
reslarch
*"u
*u,
established
about
1876
by
Melvil
Dewey
and
others.
This
stage
concentrated
mostry
on
the
organization
of
specific
physical
colectiins
of
documents
and
enabling
access
either
to
known
documents
or to
documents
on
specific
subjects
in
these
collections.Important
developments
in
this
stagl
were
Charles
A'
Cutter's
(1832-1903)
rules
for
a dictionary
catalog;"Melvil
Dgyey's
(1851-1931)
Decimal
Classification
system,
Henry
E.
Bhrr,,
(1870-1955)
Knowledge
organization,
and
principles
deveioped
by
s.
R.
Ranganathan
(1892-1972).This
stage
stili
inJluences
some
research
traditions
in library
science.
Classification
research
is
built
on
theoreti-
cal
traditions
and
assumptions
other
than
the
IR
tradition.
The
most
influential
work
in
this
hadition
is
Ranganathan,s
Colon
Classification
from
1933,
and
the
most
important
kLds
of
sAps
in
this
,tug"
u."
classification
codes
and
subjectieadings.
The
main
approach
to
subject
access
is
a top-down
division
of
"the
universe
of
knowLdge,,
accord'ing
1l.roT" 3_tignal
principles.
A
more
empirical
orientation
was
estab_
lished
by
HULME
(191la)
in
the
principle
of
bibtiographicar
warrant
or
literary
warrant,
which
states
thit
a
cliss
or
a
subyictieading
must
be
SUBIECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
ZJJ
established
only
if
there
exists
literature
to be
classified
by that
group.
F
trr
way
subject
retrieval
was
not
only
built
on top-down
anatyses
ot
the
universe
of
knowledge
but
was also
somewhal
inlluenced
iy the
existing
literature
in a
bottom-up
manner.
sAps
in
this
stage
are
pro-
duced
and
controlled
by librarians
and information
speciatJts
(inciud-
ing
subject
specialists)
and
constrained
by their
subject
knowledge.
Another
major
constraint
in
this
stage/tradition
is
that
the
principles
were
developed
for
subject
access
to
physical
units
(e.g.,
books),-not
documentary'nits
(e.g., joumal
articlei).
This
implies
a ilvel
of
subject
description
and
concepts
that
are
often
much
broaier
than
those
need.ed
by researchers
in specific
investigations.
A third
major
constraint
in
this
stage/tradition
is
that
because
the available
space (e.g.,
on
printed
catalog
cards)
was
very
limited,
the sAps
tended
to c-ontain
scantv
information-
Nevertheless,
this
stage/tradition
developed
important
principles
that
many
researchers
find
useful
in a
fully
electronic
envi-
ronment
(see,
e.g.,
POLLITT
ET
AL.).
what
de
Grolier
wrote
in
1955
is
still regarded
by
many
as
true.7
We feared
some
years
ago
that
classification
was
becoming
useless,
that
the
treatment
of natural
language
texts
by mal
chines
. . .
would
replace
classification.
Clissification
and
the
classificationists
would
become
something
like
the
dino_
saurs,
killed
by
the
progress
of evolution.
This
has
proved
to
be a
complete
fallacy.
When
you
examine
the new
literature
you
find
that
more
and
more
classification
. . .
is considered
as
something
quite
essential
in information
retrieval
. . .
It
is
quite
evident
that
hierarchies,
generally
speaking,
are
some_
thing
which
can
not
be
avoided
in
an
inflormati6n
retrieval
language
which
is
to
be useful
for
the reader.
(DE
GROLIER.
P.11)
"
D
o cument
ation"
and
s cientific
c
ommuni
c ation.,,Documentation,,
is
the
name
of
a movement
founded
by
paul
otlet
(1g6g-1944).
The
estab-
lishment
of
rhe
lntemational
Lrstitute
of
Bibliography
in
Brussels
in
1895 (from
1937
called
F6d6ration
rntemationale
de
Documentation
(FID))
and
of
the
Universal
Decimal
Classification
(uDC)
system
in
1905
with
the
aim of
universal
bibliographical
control,
was
a
major
achievement
in
this
movement.
The
documentalists
often
regarded
themselves
as
more
service-minded,
more
technology-orienti,
and
more
advanced
than
librarians.
where
traditional
librarians
often
had
an
orientation
toward
the humanities,
the
documentalists
were
mostly
affiliated
with
science,
technology,
and
business.
They
indexed
singll
articles
in
joumals
and
books
and
played
a
central
roie
in
establishiirg
?
SALTON
is an
example
of an
explicit
disagreement
with
this
view.
256
BIRGER
HIORLAND
AND
LYKKE
KYLLESBECH
NIELSEN
intemational
abstracting joumals.s
They
were
less
interested
in
collec-
tion
development
and
more
concemed
with
providing
better
access
to
knowledge
that
is
independent
of
specific
collectionsl
rhey
were
less
interested
in
keeping
books
for
their
own
sake
or
for
broad
cultural
purposes
and
highly
interested
in
establishing
services
that
could
stimu-
late
the
application
of
knowledge
to
specific
pntpos"s.
The
foundations
of
userstudies
(BERNAL)
andbibliomerrics (".g.,
nnaoFoRD)
are
also
part
9f
this
stage/traditiory
which
is
primarily
characterized
by a
more
specific
subject
approach,
a
deeper
rever
of indexing,
and
a
more
scien-
tific
attitude
toward
goals
and
problems.
.
Information
storage
and
retieaal
by
computers.
This
stage
has
been
developing
-,"ily
since
1950
and
can
be
seen as
a
technoi-ogical
mod-
emization
of documentation
(American
Documentation
lnstitute
(ADI),
founded
n
7937,
changed
its
name
in
195g
to
American
societv
for
Information
Science
(ASIS),
then
ASIS
in 2000
added
"Technoiogy,'
(ASIS&T)).
The
establishment
of
computer-based
abstract
services,
such
as
Chemical
Abstracts
and
MEDLhIE,
in
the
1960s
was
important
during
this
stage.
The
development
of
descriptor-based
and
free-text
retrieval
(mainly
based
on
titles
and
abstracis),
Boorean
logic,
field-
specific
subject
access,
as
well
as
the
measurements
of recall
and
preci
sion
and
other
innovations
were
extremely
important
in
document
retrieval.
Information
retrieval
(IR)
as
a research
tradition
started
with
the
Cranfield
experiments
in
the
1950s,
and
today,s
Text
REtrieval
c-onference
(TREC)
full-text
experiments
continue
this
tradition
(see
NATIONAL
INSTITUTE
OF STANDARDS
AND
TECHNOLOGY).
This
third
stage
improved
information
services
and
research
efforts
in
IS
in
an important
way.
Computer
technology
made
it
possible
to use
many
kinds
of
sAPs,
both
the
traditional
kinds
produced
by informa-
tion
specialists
and
the
use
of
words
from
the
documents
tiremselves
(e.9.,
titles
and
abstracts).
It
removed
the
monopoly
of librarians
and
information
specialists
over
subject
access
at d
"itablished
a
direct
competition
between
SAps
produced
by
diJferent
agencies.
.
tu
*9"Ilying
premise
in
this
stage has
often
been
that
the
length
of
the
searchable
record
itself
was
the
most
important
parameter
in
re-
trieval (LANCASTER,
pp.
G8).
SAps
*u."
oltu.,
seen
merely
as
,,se_
mantic
condensations"
of
the
texts
represented
(implying
thai
the
ulti-
mate
goal
was
full-text
representation
and
nothing-more).
Research
was
dominated
by quantitative
methodologies,
and
iot
much
research
on
qualitative
differences
(semantics
o. meanings)
among
different
kinds
of
sAPs
was
established.
The
premise
was
empiriclst,
first
and
foremost,
in its
attempt
to
measure
the
efficiency
of
subject
retrieval
8
The
history
of
the abstract joumal
goes
back,
however,
to i655
(cf.
MANZER).
SUBJECT ACCESS POINTS IN
ELECTRONIC RETRIEVAL
257
points empirically
(e.9.,
by measuring
recall and precision).
It was also
empiricist
in its avoidance
of
"metaphysical"-based classifications and
in its favoring of
"atomist"
SAPs, such as the Uniterm
system
devised
by Mortimer Taube in 1951. and similar systems
that depended
on
specific words from the documents themselves.
One associated tendency in this stage was
the attempt to formalize
and to automate retrieval
and
to eliminate human interpretation
and
subject analysis. We must distinguish between
the economic pressure
to automate practical
systems
on one side and the scientific
evaluation
of the performance
of
various aspects of human-based and mechanized
retrieval
systems on the other side. It is legitimate
and highly desirable
to reduce
costs and
improve
efficiency in information systems.
Basic
research, however,
should
illuminate basic
strengths
and drawbacks
in
different approaches
and not be blinded by the pressure to use
auto-
mated
or cheap
solutions.
Because
of such
tendencies, important ap-
proaches related
to
interpretation
were neglected, and the research did
not yield as satisfactory a
body of
knowledge as
desired.
Citation-based retrieaal
(1963-).
Eugene Garfield's introduction of
the Science Citation lndex in 1963 marks
the fourth
important
stage
in
the
development of SAPs. The
possibility of
retrieving documents
according
to the citations they receive represents a real innovation
in
IR,
and this technique
is able to supplement all forms of term-based
retrieval
in very important and
qualitative new ways.
This innovation
has
also prompted research on
motives to cite other
documents, on
sociological
pattems in citing,
on
the relative role of terms and refer-
ences as SAPs,
and on the semantic relations
between
citing and cited
PaPers.
Lr
this way, citation-based
retrieval has changed our understanding
not
only of subject
relatedness but also
of
the concept of subject matter
and of
the fundamental aim of IR itseU. Because
it
may
be
relevant to
cite
papers that have
no words in common with the citing papers
(or
no
simple sernantic relation
such
as
narrower terms, broader terms,
and
synonyms), naive
conceptions of subject relatedness or subject matter
can
no longer
persist. Semantic relations may be
implicit
or
latent.
Semantic relations
in science are determined
by theoretical
advances,
which may change
the verbal description of the research phenomena
completely; this
is why statistical pattems in vocabulary
may
some-
times
be a
less
efficient measure
of subject
relatedness than patterns in
citations.
Citation behavior is extremely important because the goal of IR is to
provide
the references
that
are
useful in solving
a
specific
problem. A
scientific article is a documentation
of
how a
specific
research problem
is
solved. The problem is formulated in
the
article, and the problem has
258
BIRGER
HIORLAND
AND
LYKKE KYLLESBECH
NIELSEN
determined
the kind
of
information
needede
by the author
to
solve the
problem.
Based
on need,
inIormation
was
sought and
selected,
and the
documents
actually
used were
finally
cited in the article.
Each of
the
thousands
of articles
produced
weekly is a kind
of case
study
in IR.
Every
article not
only
poses a
definite
IR problem,
but the
list of
references
provided
by the
author is also
the key
to how that
particular
person
has
solved
the problem.
Thus,
it is possible
to check theories of
IR againsthow
they match
the actual
documents
cited.
According to the
traditional
view
in philosophy
of
science,
science
should be
able to
predict
future
events. In
other words,
theories
and
models of IR should
be able
to
predict citations
that will appear
in
particular
papers.
Most
research
on
relevance
and
on IR seems
to have
overlooked this fact.
From
what
we
do know,
it seems
extremely
unlikely
that an algorithm
would
be able
to select
references
from
electronic
databases
and end
up
with
exactly
the
same references
that
appear
in a given
article. From this
point
of view, theories
of IR
seem naive and
unrealistic (and
the goal of
prediction
seems
to
be wrong).
A more
detailed
study of
citation behav-
ior
can illuminate
the real
problem
of
IR,
which is
that cited documents
are
not
simply a
set of documents
sharing
a fixed
set of attributes that
are not
represented
in the
nonselected
items.
Documents
that are simi-
lar from
the
point
of view
of retrieval
algorithms
need
not be co-cited,
whereas
documents
that are not
similar
are
often co-cited.
Ordinary
retrieval
algorithms
and citation
practices
seem simply
to reflect differ-
ent
theories
about
subject
relatedness.
Because
authors
may
cite other
papers in
order
to flatter or
to
impress,
the
prediction
of
which references
a
given author
will finally
select
for a
given
paper
cannot
be used
as a valid
criterion in IR. The
criteria
for
IR should
not
be
based on
social or
psychological motives
but
on epistemological
principles
for the
advancement
of
public
knowl-
edge. tn
this
way,
our insight
from
citation
indexes
has profoundly
changed
not only
the methods
of IR
but also the
concept of subject
relatedness
itself
and
the basic
aim
of retrieving
inJormation.
We can
no longer
regard
the
prediction
of individual
use as
the ideal criterion
for
IR, nor
can
we regard
IR as
a value-free
technique. lnstead,
we
have
e
lnformation
need
is an important
concept in
IS. People
may have many needs with
complicated
interrelations.
A more
preciseneed
arises
when a-specific decision is made
to,write
a paper.
From
that point
and
until the paper
is printed,
the author seeks
inJormation,
selects information,
and decides
what to
cite in the paper. The
references
in
the paper
represent
only
one stage in
the development
of the
authoi's information
need.
However,
they
are the most
tangible,
public, and
available
expression
of
how
the
author
has
seen and resolved
his
or her
needs. People who
are
used ib reading and interpreting
papers
can
evaluate
authors'conceptual
horizons,
compare them
withbthers,
and study
their development
and how
they are influenced.
In
this-way sdrolars
may have
methods
to determine
information
needs other
than
behavioral methods.
SUBiECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
259
to face
the
fact that the
goals
of
IR are deeply
rooted
in
epistemological
norms
for what
should
be regarded
as good
science
and
good
citation
behavior.
FUII
text,
hypertext,
lnternet,
and
digital
libraries.
Full-text
retrieval
marks
the fifth
and
final
step in
the development
of
sAps.
until
this
point,
space
limits
were
a major
constraint
in the
development
of
subject
access
systems
because
length
of
the record
in
itielf
is an
important
parameter
in retrieval.
At this
stage,
every
single
word
and
every
possible
combination
of
words in
full-text
documents
are
poten-
tial
sAPs,
as
is
every
conceivable
kind
of value-added
information
provided
by authors,
readers,
or intermediaries.
Given
full-text
repre-
sentations,
the
first
important
theoretical
problem
that
arises
is
whether
any
kind
of
value-added
information
is necessary.
can the
extra infor-
mation
provided
by abstracting
and indexing,
at least
in
principle,
increase
recall
and/or
precision?
If not,
then
we
seem to have
reached
the
end
of the
line
in
that no
further
contributions
from
research
or
practice
in IS
are needed.
The
answer
to this
question is
closely
linked
to theoretical
views on
the
concept
of
subject.
pouLSEN
sees i
subject
as
something
that
is expressed
in the literature
(in
a transparent
and
self-evident
way?).
By
defining
subjects
in
this
way it
is impossible
even
to-pose
the
problem
of whether
a given
text
always
represents
the
optimal
representation
of itself.
By
defining
subjects
as iniormative
or
epistemological
potentials,
HIORLAND
(1992;
1997)
established
the
possibility
that
documents
may
be impticit
or
even
wrong
about
their
own
subject
matter;
hence,
information
professionals
areitill
needed..
To take
an
extreme
example,
a
document
about
fews
written
by
a Nazi
author
should
not
only
be indexed
as
being
about
Jews,
but it is
also
important
to make
the Nazi
view visible
in the
subject
analysis (e.g.,
to
index
it
as
Nazi
propaganda
about
jews).
Subjecti
are
not
objectively
"given"
but
are
influenced
by
broader
views,
which
are
important
for
the
information
seeker
to
know
and
should
therefore
be part of
the
subject
analysis.
whether
this
is
also
practical,
economic,
and realistic
is
another
question
that
must
be explored
by
evaluating
specific
sub-
ject
access
systems.
Toward
a
Taxonomy
of Subject
Access
points
Figure
1
outlines
some
important
criteria
for
the classification
of
sAPs.
In general,
access
points
should
be
regarded
as a
system
wherein
each
element
contributes
to
the
overall
performance
of the
retrieval
system.
For
example,
in
research
libraries,
it would
be
a waste
of
resources
to
provide
subject
access
to
articles
in
the library
catalog
if
this
access
is
redundant
with
the
subject inJormation
that
can
be found
in,
for
example,
CD-ROM
databases
it
th" ru-" library.
260
BIRGER H]@RLAND
AND LYKKE KYLLESBECH NIELSEN
Access
Points
Classified by
Provider
or Agent
Author-generated
(e.g.,
document titles, abstracts, and keywords)
Value-added,
including those
provided by publisher
or
editor
(e.g.,
joumal
narne,
publisher nerme,
and cover information);
indexer/
abstractor/information
specialist
(e.
g.,
classification codes, descrip-
tors, identifiers,
and abstracts);
reviewers, readers,
and other writers
(e.g.,
reviews with links
on hrtemet,
best-seller statistics, citations,
and citation
indexing)
Access
Points
Classified bv
Kind
Verbal
vs. nonverbal (nonverbal
is sometimes called
symbolic)
Long forms
vs. short forms
(e.g.,
abstracts
vs. single keywords
or clas-
sification
codes)
Controlled
vs. uncontrolled
forms
(or
closed vs.
open systems)
Derived
vs. assigned
forms
(e.g.,
titles vs. identiliers)
Forms
based on
checklist or facet analysis
vs. forms
based on free
analysis
Explicit
vs. implicit (e.g.,
descriptors vs. references,
joumal
narnes,
or
publishers.
Implicit
SAPs are mostly made
for purposes
other than IR.
Titles are
explicit
SAPs when
the
authors
intend them
to be used for IR)
Content-oriented (or
descriptive) vs.
question-oriented
(or
evaluative)
Precoordinated
vs.
postcoordinated
indexing forms
Syntactic
indexing
fonrrs vs.
forms without syntax
(syntactic
devices
eire, e.g./
roles and links;
they are also
applied in
the PRECIS indexing
system)
Manually
produced vs. computer-generated (computer-generated
ac-
cess points
are
sometimes produced
by retroconversions
in databases)
Figure 1.
Some taxonomic criteria
for subject
access points
It is
evident
that a comprehensive
description
of
all
potential kinds
of
access
points
generated
by the authors of documents
implies
a compre-
hensive
typology
of kinds of documents
and a description
of the
struc-
ture
(architecture
or
composition) of each kind
of document
listing all
types of
SAPs. Because
document
structures develop
in response to
different
demands, they are
also influenced
by epistemological
posi-
tions or paradigms.
Figure 2
shows the potential
SAPs in a typical
scientific article.
I
SUBIECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
261
Norms
(of
scientific
method
and
philosophy
of
science
extemal
to
the
article)
Elements
Contained
in
the Article
Value-Added
Information
(Subject
access
points,
access,
and
evalua-
tion
inforrnation)
Observation
and
description
Problem
statement
Hypothesis
Experiment
Theory
building
(According
to the
basic
view
formu-
lated
in
HIORLAND (ree7),
there
exist
different
epistemological
views (and
each
implies
different
standards
or ideals
regarding
the
strucfure
of docu-
ments.
Thus
a
typical
empiricist
article
reflects
the
development
of
the
empiricist
research
tradition.)
Bibliographical
identification
foumal
n;une,
volume,
pages)
Title
Author(s)
Corporate
affiliation
and
address
Author
abstract
Author
keywords
Lrtroduction
Apparatus
and
materials,
method,
results,
discussion
Conclusion
Aclnowledgements
References
Bibliographical
description
Relationship
to
other
editions
Biographical
infor-
mation
Institutional
inf
orma-
tion
lndexer
abstracts
Lrdexer
descriptors
and
identifiers
Classification
codes
Language
codes
Document
type
codes
Editorial
comments
Links
to citing
papers,
reviews,
and
criticism
Information
about
availability
of
document
Evaluations
Target
group
infor-
mation
"Key
word
plus" and
"research
fronts"
Other
kinds
of links
and
semantic
networks
Figure
2.
Structure
and
elements
in a
typical
scientific
article
262
BIRGER
HIoRLAND
AND
LYKKE
KYLLESBECH
NIELSEN
In
monographs,
additional
subject
access
points
could
be
based
on
their
composition<.
g.,
books
/volu-es, paris,
chapters,
sections,
subsec_
l:":,
u"d_Uibliography
and
index.
intemet
documents
forrr
a
third
Knd.
Ihe
rntemet
search
engine
AltaVista
provides
the
sAps
shown
in
Figure
3.
Searchable
by
Search
Engine
AltaVista
(Search
codes
in
brackets)
.
Words
or
phrases
contained
in
the
URL
fUniform
Re_
source
Locator)
of
the
document
[url:]
.
Title
ltitle:
]
'
Links
(uRL
to
other
documents
to
which
there
is
a refer-
ence)
[link]
.
Word
from
the
clickable
text
to
a link
[anchor:]
.
Words
in filenames
of
pictures
contained
in
documents
Iimage:]
.
Words
and
phrases
in
full
text (except
image
tags,links
and
URLs)
[text:]
.
Java
Applets
[applet:]
(Also
searchable
are
domain
nEunes,
host
names,
and
"similar
uRLs,,)
Figure
3.
subject
access
points
in
Intemet
GilML)
documents
(Based
on
ALTAVISiA:
Advanced
Search
Cheat
Sfreeii---
Other
kinds
of
documents,
such
as
newspapers,
popular
magazines,
patents,
pictures,
and
sound
recordings,
present
diiferent
s#uctures
and
different
kinds
of
potentiar
access"points
and
retrieval
probrems.
The
information
to
be
derived
from
a
document
a"p".,ir-o.,
u,,"
information
contained
in
that
document.
some
d.ocuments
have,
for
1xample.,
author-generated
titles,
abstracts,
and
keywords
while
others
do not;
the
need
to
add
such
elements
is
more
evident,
but
not
necessar-
ily
redundant,
in
the
last
case.
A taxonomy
of
derived
sAps
thus
crearry
must
be
based
on
a
taxonomy
of
documents
and
document
structures.
some
research
in
this
area
has
been
done
in
such
fierds
as
composition
3tu$ies
(e.g.,
BAZERMAN,
1988)
and
genre
analysis
(e.g.,
Maiilfl,enl.
In
this
still
new
and
relativeryunexplored
field,
we
raik
a
taxonomy
of
document
types,
their
composition
and
elements,
and
consequentry
the
relative
contributions
of
zuch
elements
in
IR.
we
know
niore
about
scientific
research
articles
than
about
all
other
kinds
of
documents,
SUBIECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
r0
A
rhematic
title
indicates
the
kind
of
document
considered
rather
than
what
the
document
is
about--e.g.,
the
terms "novel,"
"letter,"
,,dissertation"
a.e
e*amples
of
rhematic
263
including
scholarly
monographs.
Thus,
unless
otherwise
stated,
this
review
considers
only
primary
scientific
articles.
In
our
view,
the
essentiar
quality
of sAps
is
their
ability
to
express
that
aspect
of a
given
document
thai
wourd
be
most
useful
in answerrng
the
questions
put
to
the
specific
database
from
which
the
sAps,
perfor-
mance
is
to
be
evaluated.
poor
titles,
bad
indexing,
and
in
generil
poor
sAPs
are
those
that
express
unimportant
(or
perriaps
everifalse)
infor-
latiol
about
a
given
document.
A[
questions
"or,."*iog
the
choice
of
formal
aspects
of retrievai
language
(e.g.,
standardiza"tion,
pre_
vs.
postcoordination,
length
of represenlation)
are
subordinate.
If
a
need
for
value-added
information
is
to
be
justified
in
future
systems,
it
must
be
done
by
arguments
about
the
auihty
of information
specialists
to
interpret
documents
in relation
to
other
documents
and
to
the
specif'c
user
group
they.are
serving-
Meaning,
semantics,
and
epis_
temolog'y
become
the
most
important
theoreticiperspectives
that
can
be
generalized
from
specific
domains.
RESEARCH
ON
SPECIFIC
SUBJECT
ACCESS
POINTS
Document
Tiiles
A
title
is
the
name
of
a
document
given
by
the
author
and
influenced
by
existing
nonns
at,the
given
timi.
According
to
BERNARD,
there
exists
an:"q"
discipline
within
literary
historyialled
titrology,
which
confines
itsel-f
to
the
study
of
titres.
For
nearly
50
y"u*
it
has
f6nerated
an
impressive
number
of
publications
(mosily
in
French).
Oi"
,..rrr"y
of
titrology
is
given
by
GENETTE,
who
defines
the
functions
of
titles
in
the
following
way:
"The
first
function,
the
only
mandatory
o.re
ln
titerary
practice
and
institutiory
is
the
function
of
designation
or
identi-
l"^T:ltr.ls
the
only
one
to
be
mandatory,
but
i-polriUf"
io
r"purut"
trom
the
others,
since'nder
the
semantic
pressure
bf
the
envirorr*"rrt,
even
a
simple
opus
number
can
be invested
with
rneaning.
The
second
one
is
the-descriptive
function:
thematic,
rhematic,l'
mixei,
or
ambigu_
9rr
.
. .
[the
last]
is
the
f'nction
called
seductive,,
(GENETTE,
pp.7lg_
719).
whereas
most
books
and
joumal
articles
have
titles,
ourei
linas
ot
documents
(e'g',
pictures,
attd
nonprinted
documents
such
as
letters)
may
lack
them.
Names
may
characterize
what
they
name,
and
their
use
in
retrieval
is
based
on
this
assumption,
which,
however,
is
not
always
true'
The
most
common
measure
of
title
informativity
has
been
the
number
of
"substantive"
words
that
it
includes
(u.g.,
Ly
counting
all
words
except
trivial
words,
such
as articles
from'a"stop
list).
Because
t-
264
BIRGER
HIORLAND
AND LYKKE
KYLLESBECH
NIELSEN
titles
can express
many
different
things,
this method
gives a
very rough
measure
and can
be misleading.
According
to NORD
(1995)
titles can
be intended
to achieve
six
communicative
functions,
four of
which
(referentiality,
expressivity,
appellativity,
and phatic
function)
can
be universally
assigned
to all
texts and
text types.
The
other two
(metatextuality
and distinctive
function)
can be
observed
as specific
functions
of particular
text types;
the distinctive
function
is typical
of names
or labels,
and
the metatextual
ftrnction
is found
in
metatexts
such
as text
commentaries,
reviews,
abstracts,
and
summaries.
Therefore,
titles
are not
just
texts
but typical
texts
presenting
u
complex
hierarchy
of
communicative
functions. In
spite
of
their complex
functionality,
titles
presmt simple
syntactic-
semantic
structures.
Nord found
only four
macrostructural
types (simple
titles,
title-subtitle
combinations,
duplex
titles with
"or,,,
arrrd title
se-
ries),
six slmtactic
forms
(nominai
titles,
verbal
titles,
sentence
titles,
adverbial
titles,
attributive
titles, and
interjection
titles),
and a
lirnited
number
of microstructural
patterns
such
as
-NP
& NP"=
nominal
phrase +
connective
+ nominal
phrase
(as
tn
lohn
lakes:
Heauen and
HeII).
Therefore,
title
elements
have
to be
polyfunctional
if
the title is
to
achieve
its
intended
functions,
which is
also typical
of
other communi-
cative
signs.
The
design
or form
of a
title varies
over time,
culhrre,
subject
matter,
and
document
type. BERNARD
analyzed
a representative
sample
of
French
monographs
from
880 to
1991 and
found
that titles
in the
nine-
teenth
and
twentieth
centuries
are distinctly
shorter
than those
of
the
seventeenth
and
eighteenth
centuries,
whereas
titles from
880
to 1523
are
as
short
as recent
ones.
Books
republished
in
modem
times
often
bear
titles
that
are abbreviations
of
their
original title.
Ln modem
terms,
Renaissance
titles served
as
both
title, subtitle,
signature,
and fourth
cover
page.
The
development
of
carefully
structured
titles
and
subtitles
legitimizes
the
use
of the title
without
the
subtitle.
Another
develop-
ment
is homonymic
works.
Sometimes
there
is an intentional
repeat
of
a
title, with,
for
example,
parody
or location
within
a
tradition
as the
objective.
In
general,
books from
the
Middle
Ages and
Renaissance
did
not,
however,
take the
precaution
of
attaching
to their
works
a unique
label,
which
we
consider
so important
today.
Titles
are
intended
to
indicate
what
the document
is
about
(its
sub-
ject).
Authors
usually
choose
a
narne
that draws
potential
readers,
indicating
the document's
content
at a glimpse
and
thus contributing
to
its
initial
selection
or rejection.
we
have little
knowledge
of how
titles
are acfually
used
or should
be interpreted
in selection
processes.
Among
the
few
studies
on this
subject
are those
by ATKINSON
BAZERMAN
(1985),
and
NAHL-IAKOBOVTIS
&
JAKOBOVITS.
Studies
such as the
one
by SARACEVIC
on
the comparative
effects
of titles,
abstracts, and
SUBIECT
ACCESS
POINTS
IN ELECTRONIC
RETRIEVAL
265
full
texts
on the relevance
judgment
of documents are
pertinent. He
found
that
of 207
answers
judged
relevant
from
ftrll text, L3L
were
judged
so
from
titles and 160
from
abstracts.ll He also
for:nd that
it
seems to
be easier
for users
to recognize
nonrelevant
documents
than to
recognize
relevant
documents
from
the title.
A title
normally
constitutes
the
most concise
statement
of a document's
content.
It is
often used
as a
surrogate for
the document
in bibliogra-
phies,
databases,
indexes,
tables
of contents,
current-awareness
ser-
vices,
and
reference
lists,
and it
is heavily
used in IR. However,
because
the title
is a
name,
it
is
the author's
decision
as
to how
informative
it
will
be, and
what
kind
of in-formation
is
given priority. The
great impor-
tance
of inJormative
titles
is almost
unanimously
emphasized in
the
literature
by many
writers,
journal
editors,
and authors
of
guidance
books for
scientific
and
professional
authors (ITIZHAKI,1996).
When
we
€re evaluating
titles as
SAPs, we have
to consider the
kind
of
skills, motives,
and norms
that mav
influence
the author's
choice of
title and
hence
its subsequent
possibilities
and
limitations in IR.
For
example,
an
author
may
want
a title that
"sounds
good," perhaps
poetic.
Metaphorical
language
is
one of the
most common
problems
with titles
in
IR. A title
such as "The
Conllict
between Egypt
and Israel:
A
Nightrnare
in Modem
Politics"
is a
problem for
the psychologist
who
is
seeking
information
about
nighhnares
by looking
in Social
Sciences
Citation
lndex
using
titles for
subject
access.
Another
problem with
title
words is
the lack
of control
of synonyms
and
homonyms. In
a given
time
period
of
the Social
Sciences
Citation
lndex,
"
AIDS' is a
useful
access
point
for
the illness,
but when it
is used in
the total
time span of
the
database,
other meanings
such as
"teaching
aids"
may
cause a very
low
precision
rate.
In
composition
studies,
CROSBY
suggests
a high
correlation be-
tween
the
quality
of a written
composition
and its
title. The
shuttlecock
process
of finding
an
appropriate
title
stimulates
creativity,
unity, revi-
sion,
and
significance.
He
classiJied
300 titles
according to
their appar-
ent
purpose
in
order
to infer
certain lessons
for
writers. The
classifica-
tion
includes:
Titles
announcing
the
general
subject,
such
as "The
Age
of
Adolescence"
and "The
Collective
Corporation";
Titles
indicating
a specific
topic, including "The
Decline
of
Courtesy"
and
"Toward
a
New Morality";
Titles
indicating
the
controlling
question; some
titles
tt
The
abiliry
to
evaluate relevance
from
bibliographical
records
seems to be much
better
in
the
study reported
by
SARACEMC
ttran ln ihe
study by WELWERT,
reported in
English
in
igo'ru-eNoirsss).
266
BIRGER
H]ORLAND AND LYKKE KYLLESBECH
NIELSEN
indicate
the question
that the writer is answering, and
they go a long way
to help the writer stay focused:
e.g.,
"Is
Culture
Worthwhile?" and "How
Can We Recover
Our
loy?";
r
Titles
announcing the
thesis, such as
"This
Thing
Cailed
Love is Pathological"
and
"The
Rip-Off
Age is the
Clue
to Nation's
[ls"; and
o
Titles
that bid for attention.
Some methods
of
athacting
attention
include
alliteratiory detberate
ambiguity, in-
triguing
word coupling, allusions
from serious
and pop
culture, and
the twist
(something
unexpected).
The length
of a title is
also important
for retrieval. The longer
the title,
the more
words it
contains and
the greater should
be the probability
that
it
will be retrieved
by a
given query. This is not
always the case,
however.
KELLER
found
that masters theses
with 1 to 12 words in
the
title had a
greater
chance of
being
retrieved
than did
titles with 13 to 18
words,
showing that factors
other than number
of words are
at work.
The
difference
between
titles in
professional scientific
journals
and in
popular science
journals
is
not
just
a
question of length
but also of
emphasis
(see
Figure
4). It
shouldbe remembered
that the title is always
a
choice arnong
possible altematives.
What
is considered
the core
sub-
ject
by
the author is
not necessarily
the same as
the searcher's
core
interest.
A
paper
may
be relevant
for a
searcher from a
point of view
different from
the one
expressed in
the title
(or
expressed
explicitly at
all). Titles
often
express more
general
claims than are
covered by
the
paper;
they may
be seductive
or inflated, and a
given
subculture may
stimulate
a kind of marketing
of a paper
that
resembles
commercial
thinking
more than
scientific
precision.
The
hard
scimces
tend to have
longer titles
than the
softer
and
popular sciences.
An analysis
by BUXTON
& MEADOWS
(1977)
and
YITZHAKI
(7992;1996;1992)
demonstrated a
trend toward longer
(and
more informative)
titles, which
occurred over a
wide range
of subject
fields
and was apparent
before KWIC indexes
and
computer-based
searching of
title words
became corunon.
Although this
trend
preceded
the introduction
of
these
tools, the
tools trndoubtedly
contributed
greatly
to the
growing awareness
of the importance
of title informativity.
ln the
humanities
a somewhat
similar trend
seems to have
occurred but in
a
weaker
way
and at a slower
pace.
(These
studies do not
discuss altema-
tive hypotheses
such as
the need for
longer titles because
of increasing
specialization
in
research,
creating a need for more
words to
express
a
given
piece of research.)
VOORBIJ
studied the
relative roles
of title keywords and
subject
descriptors
of monographs
in the humanities
and social sciences held
SUBJECT
ACCESS
POINTS
IN ELECTRONIC
RETRIEVAL
Articles
for Professional
Audiences
Articles
for Popular
Audiences
267
Insects as Selective
Agents on
Plant Vegetative
Morphology:
Egg Mimicry
Reduces
Egg
Laying by
Butterflies
(K.
Williams
and
L.
Gilbert,
Science,L98l)
Female Sex Pheromone
in the
Skin
and
Circulation
of a Garter
Snake
(W.
Garstka and
D.
Crews,
Science,
1981,)
The Reproductive
Behavior
and
the Nature of
Sexual Selection
in Scatophaga stercoraria
L.
(Diptera:
Scatophagidae).
IX.
Spatial Distribution
o1
pglfiliza-
tion Rates and
Evolution of
Male Search
Strategy
within the
Reproductive
Area
(G.
Parker,
Eaolution,1974)
Coevolution
of
a Butterfly
and
a
Vine
(L.
Gilbert,
Scientific
Ameican,1982)
The Ecological
Physiology
of a
Garter
Snake
(D.
Crews
and W.
Garstka,
Scientific
Ameican,
1982)
Sex
around the Cow-pats
(G.
Parker,
N ew S cientist,
1979)
Figure 4. Comparison
of professional
and
popular titles
(Based
on MYERS,
p.2751
by the online public
access catalog
(OPAC)
of the
National
Library of
the Netherlands.
He found that 37% of the
records
were considerably
enhanced by
a
subject
descriptor
and that 49"/" werc
slightly
or consid-
erably
enhanced.
In
a second study
he
found that
when subject
librar-
ians performed subject
searching using
title
keywords
and subject de-
scriptors
on the same topic,
the relative
recalls were
48%
and 86"h,
respectively. Failure analysis
revealed why
so many
records
that were
found
by descriptors were
not found
by title words.
First,
the title of
a
publication does not aiways
offer sufficient
clues
for retrieval.
Second,
and
more
important, is the wide
diversity of expressing
a topic in titles'
Descriptors
remove
the
burden
of vocabulary
control
from the user'
\
/hile
the
study clearly demonstrates
the
benefits
of descriptors
over
title
words, it does not consider
the functions
of
those descriptors
in
relation
to other kinds of subject
access data
that
will probably
soon be
available
from other sources
(such
as tables
of contents
and
book de-
scriptions
as
used,
for example, by
Amazon.com).
268
BIRGER
HJURLAND
AND
LYKKE
KYLLESBECH
NIELSEN
A
study
of
CoMpENDIX
by
B'RNE
comparing
titres
and
abstracts
a,s
subject
access
points
fo'nd
that
titles
retrievJd
22%
of
citations,
abstracts
retrieved
61o/o,
and
titles
and
abstracts
combined
retrieved
T5?.!tr"
study
did
not,
however,
report
any
percent
for
precision,
but
it
indicates
that
titres
alone
perform'very
poorly
co*pured
with
ab-
stracts'
C.MPENDEX
is
dominated
by
arfrcles,-and
we
must
expect
that
this
problem
is
even
greater
with
monographs.
In
another
study,
BARKER
ET
AL.
examined
chemical
database-s
and
rouna
*rut
r,rrrr,'u-
ries
increased
recall
overtitles
by
6g%but-at
the
expense
of
azir
arop
in
precision.
Keywords
increasld
recall
by
35%
;ith
a
107,
drop
in
precision.
HODGES
tested
the
effectiveness
of
titre
keywords
in
retrieval
and
concluded
that
less
than
50%
of
the
rerevant
iitles
were
retrieved
by
y":*r
in
titles-
surprisingry,
this
study
found
that
the
social
sciences
nad
better
retrieval
from-
titles
(ag%)
than
the hard
sciences
(42%)i
arts
and
humanities
retrieve
d
gr%.
T'his
iow
rate
of
retrieval
from
titles
was
athibuted
to
three
sources:
(1)
titles
themselves,
(2)
ignor;c;
by
the
user
and
informationspecialist
of
the
subject
vocabula[,
in
use,
and (3)
general
l*gllsg
problems.
Even
the
best
efforts
of
uslrs
and
speciar-
ists
are
not
likely
to
improve
this
rate
significantly
Hrag",
argues,
however,
that
in
many
instances,this
,""uI
i,
mo.e'thu.
u8"qrrut"
to,
the
user'
M*y
students
and
faculty
do
not
require
the
entire
body
of
literature
on
a
topic;
they
are
just
tying
to
determine
the
kinds
and
amount
of
material
being
written
on
u
!i,n"r,
topic,
or
th"y
*irh
un
introduction
to
a
topic
or
an
entry
poiniinto
the
literature.
Also,
be_
cause
of
their
timeliness
and
economy,
title-word
indexes
wrrt,
n rrer
vi11v,
rem31n
an
important
element
oiindexlng.
when
titles
are
used
for
retrieval,
their
woris
are
merged
with
those
from
other
titles
in
the
same journal,
other
joumals,
other
kinds
of
documents
in
the
domain,
and
perhaps
also
words
from
titles
in
other
domains.
IR
is
always
done
in
o.,u
or^*or"
specific
colrections,
and
the
acfual
context
determines
the
most
rational
slarch
strategy.
The
princi-
pal
disadvantages
in
having
authors
rather
*,*
pror"rr?6"J
il-,t"*"r,
provide
access
points
may
be related
to the
fact
that
authors
ao
.,ot
t,urr"
the
same
overview
of
thetotal
database
(or
total
literature
in
the
field).
Hence,
-tr"y ruy
have
difficulty
in
predicting
the
discriminative
value
of
words
and
their
combinations.
Their
selelHons
can
easily
be
"ithe,
too
specific
or
too
general.
..
Because
titles
are
different
in
their
informational
values,
they
have
a
different
status
in
different
databases.
some
printed
bithograptries
(e.g.,
ERIC)
use
titles
as
document
surrogates
oido.r.,,.".,t
replesenta-
IoT T
the
index
(und.er
each-descripto'r),
while
others
(".S.lp;y"f,o_
l:AgjTT,":1"1,
apply
a
value-added
index
phrase
with
a higher
rntormational
value' (This
may
of
course
reflect
a
decision
that
is
not
SUBIECT
ACCESS POINTS
IN ELECTRONIC
RETRIEVAL
269
grounded
in a difference
in the
informativeness
of titles in educational
and psychological
research.)
PERITZ
examined
the frequency
of noninformative
titles in library
and information
science
(LIS)
and in
sociology.
Noninformative ar-
ticles
totaled
27% n
LIS and 15%
in sociology.
For both
fields the study
showed
that the
noninformative
articles
were
concentrated in a
few
joumals.
Conclusion.
Investigations
of titles as
access points tend
to empha-
size
quantitative
aspects,
such
as length, number
of
"substantive"
words,
and
differences
between
domains
and
over time. Studies
of qualitative
aspects
of titles
are scurce
and are
found mostly
in disciplines
outside IS
(e.g.,
linguistics
and
composition
studies).
If we assume
that diJferent
theoretical
views or
paradigms
have
different
views on a
given paper
and
on what
in that
paper is
of interest,
then
such different views
should
be able
to express
different
criteria
for
the
informativity
of
given
titles.
For
example,
we
might
expect
positivist-oriented
information
seekers
to
value titles
that express
the kind of
statistical methods
used
in a
paper,
and hermeneutical-oriented
seekers to value
titles that
ex-
press
the interpretative
attitudes
of the author.
This implies
that title
informativity
cannot
be measured
by
an objective
standard,
for ex-
ample,
by number
of words.
Nor
is such informativity
sirnply a subjec-
tive
or
cognitive value
in
an individuaf
psychological
way. The
episte-
mological
view implies
that the informativity
of
titles
is
something
to be
inferred
theoretically
by views
formulated
in epistemology.
Abstracts
According
to
ALTERMAN,
text summarization
is
not a
single phe-
nomenon.
There
are
many different
kinds of
summaries,
such as ab-
sttacts,
epitomes,
overviews,
abridgements,
digests,
and recapitula-
tions.
Alterman
does
not,
however,
describe
the differences
among
them.
We can add
the following:
annotations,
briefs, cuts, extracts,
part
texts (e.g.,
half
texts as opposed
to full texts),
pr6cis, and
Zentralbldtter.
However,
in
IS the
two most
conunon
distinctions
are indicative
vs.
informative
abstracts-respectively,
evaluative
(or
critical) vs.
nonevaluative
abstracts.
Lr
the
philosophy
of science
there
is an important
argument-viz.,
that
one's
observations
are not independent
of one's theoretical
as-
sumptions
(cf.
CHALMERS,
chaps. 1
and 2). This principle
is also
valid
concerning
the
observation/reading
of documents
and the interpreta-
tion
of
their
essential
or core
information
(or
rather
their informational
potentials)
and
thus
the summarization
of them.
As a consequence,
even
nonevaluative
abstracts
carnot
just
be regarded
as objective
de-
270
BIRGER
H]ORLAND
AND
LYKKE
KYLLESBECH
NIELSEN
scriptions
of a
document
but
are influenced
by
norms,
interests,
and
epistemological
positions.
Today
most
scientific joumals
pubtish
authors'
abstracts
for
an
their
articles'
These
abstracts
may
be
used
directly
in
bibliogruphi"ut
dut"-
bases,
or
they
may
be
edited,
revised,
or
repraced
uy
an
atstact
written
by
a
professional
abstractor,
who
usually
then
signs
it.
we
cal
such
value-added
abstracts,,indexer
abstracts.!,
LANCASTER
believes
that
the
length
of a
given
search
field
is
the
most
important
factor
in
information
ietrieval:
For
retrieval
purposes,
the
longer
the
abstract
the
better.
At
least,
the
fonger
the
abstract
th"
,rrore
access
points
it
pro_
vides,
and
the
more
access
points
the
greater
the
potenUal
for
high
recall
in
retrieval.
At
the
same
time,
it
-must
be
recognized
that
precision
is
likely
to
deteriorate:
the
longer
the
abstract,
the
more
,,minor,,
aspects
of
the
document
ttat
will
be
brought
in
and
the
greaier
the
potential
for
false
associations.
(LANCASTE&
p. 21)
Because
the
brief
abstract
provides
more
access
points
than
title
or
selective
indexing,
the
item
it represents
will
be
more
retrievable.
Likewise,
the
exhaustive
indexing
may
make
this
item
more
retrievable
than
it
would
be
in
a
iearch
on
the
brief
abstract
but
less
retrievable
than
it
wourd
be
in
a
search
on
the
expanded
abstracts
.
.
. .
The
longer
the
record,
the
greater
the
chance_
that
spurious
relatio-nships
will
occur.
Spurious
relationshipr,
oi
courru,
cause
lower
precision.
(LANCASTER,
pp.
2z7ff
.)
From
our
point
of
view,
however,
this
quantitative
measure_that
is,
the
length
of
the
field-is
less
interesting
,rru"
how
well
it
will
satisfy
the
needs
of
users
in
given
situations.
Becluse
some
subject
analyses
are
simply
better
than
others,
the
strategy
of
unrimited
"li*i.;
which
iip*r
that
as
many
different
subjecliescriptions
as
possibii
be
put
into
the
document
representatiorrr,
i,
not
a_
correct
theory
or
strateg.y.
This
canbe
disproved.both
theoretically
and
empirically
1lf.
nnOOfSy.
Therefore,
we
need
a
theory
about
whit
should
be
"*pi"rrua
ir,
differ-
ent
sAPs
(viewed
as
a system)
and
what
is
the
abstiact's
role
in
this
;ysfem.
The
ability
t9
yg.what
is
important
and
to
express
it in
a
way
that
maximizes
its visibility
to
the
rlser
must
be
the
only
factor
that
matters.
LANCASTER
writes
further:
At
the
present
time,
authors
and
publishers
have
littre
incen-
tive
for
"embroidering,,
abstracts
to
make
the
underlying
SUBIECT
ACCESS
POTNTS
rN
ELECTRONTC
RETRTEVAL
271
work
seem
more
attractive
than it really
is. Price.
. . has
argued
that
this
could
become a
danger in a
completely
electronic
environment.
. . . Publishers
would want
to pro-
mote
use
because
they would
probably
be paid
on this
basis.
Authors
would
want to
promote
use if
this factor
became, as
it
might,
a
criterion used
in
promotion
and tenure
decisions.
The
term
"spoofing"
has
been used
to refer
to the embroi-
dering
of Web
pages
to increase
their retrievability.
. . .
(LANCASTER,
p. 116)
This
quotation
is the
key to
understanding
the role
of value-added
information
provided
by
information
specialists. Their
perspective is
diJferent
from
those
of authors
and
publishers.
Ideally
they read
on
behalf
of the
user
(or
on behalf
of
science
or some
collective goals
and
values).
Perhaps
the
commercial
or
self-promoting
embroidering
of
abstracts
is
rare in
the
printed
world,
but a more "scientistic"
"embroi-
dering"
of the
whole
text
including
name dropping,
for
example,
may
be
the
rule rather
than
the
exception
(and
some
embroidering
may
be
unconscious
and
subtle).
Abstractors
can-at least
ideally-have
an
overview
of the
system
in which
the
single document
is
going to be
organized.
They have
an implicit
knowledge
of the
visibility and
retrievabiJity
of different
documents
in the
database,
and they
can
improve
the
visibility
of
those aspects
of
a
given
document
that will
be
most
useful.
Most importantly,
because all
documents
are
based on
implicit
assumptions,
information
professionals
can make
a difference
in
explicating
such
epistemological
assumptions.
Two
specific
examples
of how
this
can
make
an important
difference
are
given by HERRELL
and
by
WINDSOR.
The
work
of abstractors
cein be
guided
by thesauri,
classification
systems,
checklists,
and
facet analysis (FIDEL,
1986). In
this way their
specific
subject
analysis
can
be somewhat
formalized.
The
most impor-
tant
factor
is not
the
degree
of formalization
but
the fact that
the
abstractor
write
on behalf
of
the users
and from
the perspective
of
a
more-or-Iess
specific
collection
or database
with
more-or-less
well-de-
fined
functions
in the
infonnation
environment.
Conclusion.
Abstracts
are important
in IR as
access points
and as
indicators
of the
relevance
of
documents
during a
se€rch. Abstracts
increase
recall
and
precision
much better
than titles
and keywords.
Their
efficiency
depends
not only
on their
length
but also on
their
content.
With
titles
they
share the
problem
of providing
users with a
relevant
description
of the document
being represented.
Such a de-
scription
is
in
principle
not
value
free or neutral
but always
biased in
one
direction
or another.
In information
systems,
abstracts
should
ide-
ally
be
written
on behalf
of the
user and frlm
the perspective
and
goals
imnucjt
in
the
specific
system.
This
is
why
many
information
systems
have
their
own
abstractors
and
do
not
rely
on author-created
abstracts.
References/Citations
searches
that
use
the references
in
documents
as
sAps,
directlv
or via
citation
indexes,
are
called
chain
searches.l2
They
represent
a
gualita-
tively
different
method
from
term
searching.
How
should
we
evaluate
the
relative
strengths
and
weaknesses
of term
searching
vis-a-vis
cita-
tion
or
chain
searching?
.
Chain
searching
is
often
quite
valuable
(e.g.,
see
WELWERT,
which
is
(reported
in
English
in FUURLAND
(19gg)i.
A
search
for
the
subject
"reading
comprehension
vs.
listening
comprehension,,
resulte
d,
t\ 79
relevant
references
using
database
seirchinf,
+z
usngmanual
sources,
and
82
using
chain
searching
in
the
references
that
iere
located.
The
last
82 references
could,
of
course,
not
have
been
found
without
the
previous
bibliographic
search,
but
this
example
indicates
the
signifi-
cance
of chain
searching.
It
may
also
indicate
the high
degree
of uncer-
lui"q
of
bibliographic
searchi.g
itt
that
so
-utty
t-"f"r"ices
were
not
found
by
a thorough
search
of
databases
and
printed
bibliographies.
chain
searching
vs.
bibliographic
searching
can
be
fuither
i[us-
trated
by field
studies
(pAo,
1993)
and
controlled
studies
(pAo
&
WORTHEN)
in
terms
of
literafure
references
vs.
terms
as
search
crite-
ria.
These
studies,
which
were
performed
in
medicine
and
which
built
on
a
pool
of
common
references
in
MEDLINE
(a
database
that
main-
tains
a
high
level
of
indexing
quarity)
and
sCISEARCH
(a
science
citation
index),
cannot
be
regarded
as
definitive,
but
they
do indicate
the
following:
.
The
level
of
overlap
is low (4-S%)
when
terms
and
refer_
ences
are
used
for
searching.
.
Given
a
high
quality
of indexing,
term
searching
seems
to
be more
efficient
than
reference
searching
(term"search_
i.g
ir,
MEDLINE
gave
a
mean
recall
of 77%
and
a
pre_
cision
level
of
55%;
reference
searching
in
SCISEAFiCH
gave
a
recall
of
33%
arrd
a
precision
level
of
60%).
r
Compared
with
term
searching
alone,
reference
search_
ing
increased
recall
by a
mean
of 24%.
Moreover,
the
overlap
between
the
two
search
strategies
had
high
pre_
cision.
272
BIRGER
HJSRLAND
AND
LYKKE
KYLLESBECH
NIELSEN
I
ffilj,i1:l$#ilL$:*:ilsearching
is
by using
the
web
of
science
produced
by
SUBIECT ACCESS
POINTS IN
ELECTRONIC RETRIEVAL
273
Unforfunately,
these
sfudies lack a
closer analysis of the nature of
the
terms and
references
that result in few
or no results. These kinds
of
studies
are typically quantitative
rather
than qualitative. [f
recall can be
increased
by 24"/"by
including
reference
searching, it would be relevant
to analyze
what kinds
of concepts typically
should have been included
in
the
bibliographic
records, but
were not. Might these kinds
of experi-
ences
lead
to new instructions
for
the indexers so that indexing
prac-
tices
could
be improved?
HARTER ET AL. also
found that the subject
sinilarity
among
pairs
of cited and
citing documents typically is
very
small, indicating
that
term searching and
chain searching are
comple-
mentary
methods.
GREEN (2000)
compared
chain searching
with the use of
standard
bibliographic
tools in
the humanities
and found that less
than 5% of the
references
were found
by
both types
of searches. Precision
of
retrieval
based
on bibliographical
references
from
"seed
documents" appears
to
be high.
Whereas
bibliographical
tools
generally observe a well-defined
boundary
of
coverage
relative to
subject, date,
format, and language,
the relevant
literature
may
not respect
the same
boundaries, especially
in
the humanities.
This is one reason
chain
searching is
so
important.
Green also
found
(p.22\
that
although most
of the
sample documents
were
covered
in the
bibliographic
tools
being used, only 10% were
assigned
index
terms
that
matched
the user's need in terms
of both
breadth
and depth.
She
says,
"suffice
it to say
that there are no trivial or
easy
solutions
to
the
overwhelrning
problem of assigning
subject de-
scriptors
to
documents
that will
consistently
enable users to locate
all,
but
only,
the literature
relevant
to
their needs"
(p.225).
The
efficiency
of bibliographic
searching is,
of course, determined by
how
much
of
the relevant
literature
has been recorded,
analyzed
by
subject,
and
described
in a way
that allows
searchers
to
locate
it via
bibliographies,
databases,
and reference
literature.
The bibliographic
approach
is characterized
by formal rules
that determine what is
in-
cluded
in
a bibliography
or database
and how
it is described
(e.g.,
by
using
descriptors).
The document
description
is largely an expression
of
the
competence
that is tied
to the
administration of a
set of rules. The
efficiency
of the
result depends
in
particular on whether
the
formal
rules
are able
to ensure the
design
of a product that meets
the users'
needs.
The
strength
of the formal
approach is
that little material
is
excluded
because
of value-based
criteria. The weakness is
that because
they
are
formal,
these
systems do not give
priority to materials accord-
ing
to
relevance.
They
may, for example,
include all
books longer than
49
pages
or
exclude
book
reviews or
not index parts of a document.
A
lack
of
resources
or of adequate
rules
to carry out the formal
program
might
lie
behind
the random
inclusion
of both the highly relevant and
274
BIRGER
H]ORLAND
AND
LYKKE
KYLLESBECH
NIELSEN
the
nonrelevant
references.
[r
real
life, there
is almost
always
a lack
of
resources,
which
means
that
highly
relevant
references
are
often
absent.
Such
formal
omissions
should
not
be
expected
in
references,
which
may,
however,
contain
other
kinds
of omissions'
ihe
efficiency
of
chain
searching-assuming
that
one
can
identify
relevant
seed
documents-is
determined
by
how
well
the document
identifies
and cites
relevant
information
in the
reference
list.
The
method
presupposes
that
the scientiJic
literature
in the
field
is neither
unrelated
io
othlr
research
in the
field
nor simply
redundant.
In
other
words,
it
assumes
that
researchers
are extremely
conscientious
in their
literature
searching
and
their
referencing
to
relevant
sources
and
that
the
refer-
ences
are selected
with
a view
to informing
the
reader
of
important
literature.
It also
presupposes
that
the scientist
does
not
cite
on purely
formal
or presentational
grounds,
for
example.
Most
importantly,
it
presrrpposes
that authors
are not
biased
in selecting
information
but
give even
consideration
to papers
that
argue
both
for
and against
their
6wn
view.
This
last assumption
seer6
to contradict
the
results
of
psychological
research:
As shownby
a multitude
of
studies,
such
information-seek-
ing processes
often
are not
balanced:
people
prefer
informa-
tion that
supports
their
favored
or chosen
decision
altema-
tive compared
to information
that
oPPoses
it.
. . . the prefer-
ence
for supporting
(consonant) compared
to conllicting
(dis-
sonant)
information
occurs
if people
have
decided
voluntar-
ily and with
a certain
degree
of commitrnent
for
a particular
alternative
. . .We will
refer to
this preference
for supporting
information
as
confirmation
bias..
. Therefore,
it can
be
concluded
that
individuals
carry
outbiased
information
seek-
ing
while
making
decisions,
and that
this
happens
from
the
moment
they
commit
themselves
to a particular
altemative'
(SCHULZ-HARDT
ET
AL',
P.
655)
In citation
studies
MACROBERTS
&
MACROBERTS
(1988; 1989)
have
considered
authors'
motives
for not
citing
relevant
documents,
just
as
they
represent-together
with SEGLEN
(and
GAMIELD
himself)-
some
ofthe
most
qualified
and
dedicated
critics of
the
misuse
of
citation
indexes.
Psychological
factors
are
important
in studying
why
authors
quote
other
documents.
As GARFIELD
(P.
85),
points
out,
there
are
many
kinds
of citation
motivations:
'
Paying
homage
to
Pioneers;
.
Giving
credit
for
related
work
(homage to peers);
I
IdentrfYing
methodology,
equipment,
and
so on;
!
.I
SUBIECT
ACCESS
POINTS
IN
ELECTRONIC
RETRIEVAL
275
.
Providing
background
reading;
o
Correcting
one's
own
work;
r
Correcting
the
work
of
others;
e
Criticizing
previous
work;
r
Substantiatingclaims;
.
Alerting
to
forthcoming
work;
o
Providing
leads
to
poorly
disseminated,
poorly
indexed,
or
uncited
work;
o
Authenticafing
data
and
classes
of
facts_physical
con_
stants,
and
so
on;
.
Identifying
original
publications
in
which
an
idea
or
concept
was
discussed;
.
Identifying
original
publications
or
other
work
describ_
ing
an eponymic
concept
or
term;
r
Disclaiming
work
or ideas
of
others (negative
claims);
and
o
Disputing
priority
claims
of
others (negative
homage).
-SEGLEN
(p.
29)
also
lists
a
range
of
problems
conceming
selection
of
references:
a
a
a
a
References
are
selected
because
of
their
usefulness
for
the
author,
which
is
something
different
from
their
qual-
ity;
Only
a
small
fraction
of
all used
material
is
cited;
General
knowledge
is
not
cited;
Knowledge
is
often
cited
from
secondary
sources;
Documents
supporting
an
author's
arguments
are
cited
more
often
than
other
documents;
Flattering
(citing
editors,
potential
referees,
and
other
authorities);
r
Showing
off
(citing
hot
new
,,in,'
articles);
r
Reference
copying
(references
provided
by
other
au-
thors);
r
Conventions
(inbiochemistry,
for
example,
methods
are
cited
but
not
reagents);
.
SeU
citations;
and
.
Citing
colleagues
(often
reflecting
informal
transfer
of
information)
-
,
such
research
says
something
about
the
usefulness
of references
vs.
oescrrptors
in
information
seeking.
To
the degtee
that
the
conventions
can
be
generalized
and
describea,
tney
are of
Immediate
relevance.
For
example,
with
the
knowledge
given
above,
we
can
state
that
citation
275
BIRGER
HISRLAND
AND
LYKKE
KYLLESBECH
NIELSEN
i
indexing
should
perform
well
on
a search
for
biochemical
methods
but
rather
badly
on
a
search
for
a reagent.
There
are
many
studies
in
this
_"*iti"s
area
of
citation
behavior
that
directly
or
indirectly
illuminate
foth
the
strengths
and
weaknesses
of
citations
as
sApi
but
space
limitations
prevent
us
from
referring
to
more
of
them.
It
should
be
clear
that
the
evahiation
of
the
possibilities
of
chain
searching
is
connected
with
studies
of
cooperadon
and
competition
among
scientists
and
subsequent
citation
behavior.
studies
in
tire
soci-
ology
of
science
and
in
eplstemology
are
highry
rerevant.
It
is
not
difficult
to
see
the
importance
oi
foi-exampl;
(uHN,,
well-known
*:oo
of
scientific
paradigms,
which
directly
explains
how
tiorrps
or
scientists
develop
different
criteria
for
relevance
and
subseqirent
cita-
tion
behavior.
Conclusion.-
A
given
subject
access
point
(e.g.,
descriptors,
refer_
ences)
cannot
be
expected
to
have
a
fixed
inform#on
,ralu'e
{a.dless
of
conventions
in
the
knowledge
domain
and
the
writing
",rrt
r"r".
Thi,
is
a
serious
argument-againsi
positivistic
approaches,
which
try
to
develop
general
algorithms
and-measures
wiihout
regard
for
the
con-
tents
and
the
context
of the
information.
To
the
extent
irat
the
demands
on
"optimal
citation
behavior"
are
met,
the
reference
list
of
every
document
represents
a
perfect,
,,selective,,
bibliography
in
the
field
or