Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as "bag of concepts" that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall.
The research looks at a Web page as a graph structure or a Web graph and tries to classify different Web graphs in the new coordinate space: Out-Degree, In-Degree. The Out-degree coordinate is defined as the number of outgoing Web pages from a given Web page. The In-degree coordinate is the number of Web pages that point to a given Web page. J. Kleinberg's (1988) Web algorithm on discovering “hub Web pages” and “authorities Web pages” is applied in this new coordinate space. Some very uncommon phenomena have been discovered and new interesting results interpreted. The author believes that understanding the underlying Web page as a graph will help design better Web algorithms, enhance retrieval and Web performance, and recommends using graphs as part of a visual aid for search engine designers
This paper proposes knowledge map creation and maintenance approaches by utilizing information retrieval and data mining techniques to facilitate knowledge management in virtual communities of practice. Besides evaluating their performance using synthesized data, the generated knowledge maps for documents collected from the teachers' cyber community, SCTNet, and the master thesis repository at Taiwan's National Central Library, are evaluated by domain experts. Domain experts are asked to revise the obtained knowledge maps, and the proportion of modification is small and acceptable. Therefore, the developed approaches are suitable for support knowledge management of professional communities on the Internet.
A perspective is presented to bring into focus the state-of-the-art and development of methods and criteria which relate to the problem of design and performance evaluation of information systems. Two major aspects of design and evaluation are considered. These are the initiation, planning, development and testing of new information systems, to include modification of existing structures; and the appraisals and measurement of operational systems and their components. A taxonomy of information systems is presented in order to provide for a basis of organized evaluation of system performance.
The purpose of the present study is to analyse and map the trends in research on prion diseases by applying bibliometric tools to the scientific literature published between 1973 and 2002. The data for the study were obtained from the Medline database. The aim is to determine the volume of scientific output in the above period, the countries involved and the trends in the subject matters addressed. Significant growth is observed in scientific production since 1991 and particularly in the period 1996–2001. The countries found to have the highest output are the United States, the United Kingdom, Japan, France and Germany. The collaboration networks established by scientists are also analysed in this study, as well as the evolution in the subject matters addressed in the papers they published, that are observed to remain essentially constant in the three sub-periods into which the study is divided.
Multimedia is proliferating on Web sites, as the Web continues to enhance the integration of multimedia and textual information. In this paper we examine trends in multimedia Web searching by Excite users from 1997 to 2001. Results from an analysis of 1,025,910 Excite queries from 2001 are compared to similar Excite datasets from 1997 to 1999. Findings include: (1) queries per multimedia session have decreased since 1997 as a proportion of general queries due to the introduction of multimedia buttons near the query box, (2) multimedia queries identified are longer than non-multimedia queries, and (3) audio queries are more prevalent than image or video queries in identified multimedia queries. Overall, we see multimedia Web searching undergoing major changes as Web content and searching evolves.
Search engines play an essential role in the usability of Internet-based information systems and without them the Web would be much less accessible, and at the very least would develop at a much slower rate. Given that non-English users now tend to make up the majority in this environment, our main objective is to analyze and evaluate the retrieval effectiveness of various indexing and search strategies based on test-collections written in four different languages: English, French, German, and Italian. Our second objective is to describe and evaluate various approaches that might be implemented in order to effectively access document collections written in another language. As a third objective, we will explore the underlying problems involved in searching document collections written in the four different languages, and we will suggest and evaluate different database merging strategies capable of providing the user with a single unique result list.
The Web, and consequently the information contained in it, is growing rapidly. Every day a huge amount of newly created information is electronically published in Digital Libraries, whose aim is to satisfy users' information needs.In this paper, we envisage a Digital Library not only as an information resource where users may submit queries to satisfy their daily information need, but also as a collaborative working and meeting space of people sharing common interests. Indeed, we will present a personalized collaborative Digital Library environment, where users may organize the information space according to their own subjective view, may build communities, may become aware of each other, may exchange information and knowledge with other users, and may get recommendations based on preference patterns of users.
Citation analysis is performed in order to evaluate authors and scientific collections, such as journals and conference proceedings. Currently, two major systems exist that perform citation analysis: Science Citation Index (SCI) by the Institute for Scientific Information (ISI) and CiteSeer by the NEC Research Institute. The SCI, mostly a manual system up until recently, is based on the notion of the ISI Impact Factor, which has been used extensively for citation analysis purposes. On the other hand the CiteSeer system is an automatically built digital library using agents technology, also based on the notion of ISI Impact Factor. In this paper, we investigate new alternative notions besides the ISI impact factor, in order to provide a novel approach aiming at ranking scientific collections. Furthermore, we present a web-based system that has been built by extracting data from the Databases and Logic Programming (DBLP) website of the University of Trier. Our system, by using the new citation metrics, emerges as a useful tool for ranking scientific collections. In this respect, some first remarks are presented, e.g. on ranking conferences related to databases.
The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared.
This paper deals with information needs, seeking, searching, and uses within scholarly communities by introducing theory from the field of science and technology studies. In particular it contributes to the domain-analytic approach in information science by showing that Whitley’s theory of ‘mutual dependence’ and ‘task uncertainty’ can be used as an explanatory framework in understanding similarity and difference in information practices across intellectual fields. Based on qualitative case studies of three specialist scholarly communities across the physical sciences, applied sciences, social sciences and arts and humanities, this paper extends Whitley’s theory into the realm of information communication technologies. The paper adopts a holistic approach to information practices by recognising the interrelationship between the traditions of informal and formal scientific communication and how it shapes digital outcomes across intellectual fields. The findings show that communities inhabiting fields with a high degree of ‘mutual dependence’ coupled with a low degree of ‘task uncertainty’ are adept at coordinating and controlling channels of communication and will readily co-produce field-based digital information resources, whereas communities that inhabit fields characterised by the opposite cultural configuration, a low degree of ‘mutual dependence’ coupled with a high degree of ‘task uncertainty’, are less successful in commanding control over channels of communication and are less concerned with co-producing field-based digital resources and integrating them into their epistemic and social structures. These findings have implications for the culturally sensitive development and provision of academic digital resources such as digital libraries and web-based subject portals.
In earlier papers the authors focused on differences in the ageing of journal literature in science and the social sciences. It was shown that for several fields and topics bibliometric standard indicators based on journal articles need to be modified in order to provide valid results. In fields where monographs, books or reports are important means of scientific information, standard models of scientific communication are not reflected by journal literature alone. To identify fields where the role of non-serial literature is considerable or critical in terms of bibliometric standard methods, the totality of the bibliographic citations indexed in the 1993 annual cumulation of the SCI and SSCI databases, have been processed. The analysis is based on three indicators, thepercentage of references to serials, the mean references age, and themean reference rate. Applications of these measures at different levels of aggregation (i.e., to journals in selected science and social science fields) lead to the following conclusions. 1. The percentage of references to serials proved to be a sensitive measure to characterise typical differences in the communication behaviour between the sciences and the social sciences. 2. However, there is an overlap zone which includes fields like mathematics, technology oriented science, and some social science areas. 3. In certain social sciences part of the information seems even to be originated in non-scientific sources: references to non-serials do not always represent monographs, pre-prints or reports. Consequently, the model of information transfer from scientific literature to scientific (journal) literature assumed by standard bibliometrics requires substantial revision before valid results can be expected through its application to social science areas.
Analyzing actions to be supported by information and information retrieval (IR) systems is vital for understanding the needs of different types of information, search strategies and relevance assessments, in short, understanding IR. A necessary condition for this understanding is to link results from information seeking studies to the body of knowledge by IR studies. The actions to be focused on in this paper are tasks from the angle of problem solving. I will analyze certain features of work tasks and relate these features to types of information people are looking for and using in their tasks, patterning of search strategies for obtaining information and relevance assessments in choosing retrieved documents. The major claim is that these information activities are systematically connected to task complexity and structure of the problem at hand. The argumentation is based on both theoretical and empirical results from studies on information retrieval and seeking.
Test collections have traditionally been used by information retrieval researchers to improve their retrieval strategies. To be viable as a laboratory tool, a collection must reliably rank different retrieval variants according to their true effectiveness. In particular, the relative effectiveness of two retrieval strategies should be insensitive to modest changes in the relevant document set since individual relevance assessments are known to vary widely.The test collections developed in the TREC workshops have become the collections of choice in the retrieval research community. To verify their reliability, NIST investigated the effect changes in the relevance assessments have on the evaluation of retrieval results. Very high correlations were found among the rankings of systems produced using different relevance judgment sets. The high correlations indicate that the comparative evaluation of retrieval performance is stable despite substantial differences in relevance judgments, and thus reaffirm the use of the TREC collections as laboratory tools.
Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but comparison across techniques has been difficult because evaluation results often span only a limited range of conditions. This article identifies the key issues in dictionary-based CLIR, develops unified frameworks for term selection and term translation that help to explain the relationships among existing techniques, and illustrates the effect of those techniques using four contrasting languages for systematic experiments with a uniform query translation architecture. Key results include identification of a previously unseen dependence of pre- and post-translation expansion on orthographic cognates and development of a query-specific measure for translation fanout that helps to explain the utility of structured query methods.
Environmental scanning is the acquisition and use of information about events and trends in an organization's external environment, the knowledge of which would assist management in planning the organization's future courses of action. This paper reports a study of how 13 chief executives in the Canadian publishing and telecommunications industries scan their environments and use the information in decision making. Each respondent was asked to relate two critical incidents of information use. The incidents were analyzed according to their environmental sectors, the information sources, and their use in decision making. The interview data suggest that the chief executives concentrate their scanning on the competition, customer, regulatory, and technological sectors of the environment. In the majority of cases, the chief executives used environmental information in the Entrepreneur decisional role, initiating new products, projects, or policies. The chief executives acquire or receive environmental information from multiple, complementary sources. Personal sources are important for information on customers and competitors, whereas printed or formal sources are also important for information on technological and regulatory matters.
This paper describes algorithms and data structures for applying a parallel computer to information retrieval. Previous work has described an implementation based on overlap encoded signatures. That system was limited by (a) the necessity of keeping the signatures in primary memory and (b) the difficulties involved in implementing document-term weighting. Overcoming these limitations required adapting the inverted index techniques used on serial machines. The most obvious adaptation, also previously described, suffers from the fact that data must be sent between processors at query time. Since interprocessor communication is generally slower than local computation, this suggests that an algorithm which does not perform such communication might be faster. This paper presents a data structure, called a partitioned posting file, in which the interprocessor communication takes place at database-construction time, so that no data movement is needed at query-time. Performance characteristics and storage overhead are established by benchmarking against a synthetic database. Based on these figures, it appears that currently available hardware can deliver interactive document ranking on databases containing between 1 and 8192 Gigabytes of text.
While research into theoretical models of information retrieval (IR), IR in libraries, and testing of search algorithms have been cornerstones of IR research for decades, there has been comparatively little research into the problems of IR in business. Because of the growing magnitude and urgency of these problems, it is important to assess them more completely. This paper is an essay that draws on interviews with over 40 people experiencing real-life retrieval problems to characterize better these problems in the context of the work their organizations perform.
Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms.
Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how differences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Different data types with vague predicates are required to allow for queries referring to arbitrary attributes of documents. Physical data independence separates search operators from access paths, thus solving text search problems related to noun phrases, compound words and proper nouns. Projection and inheritance on attributes support the creation of unified views on a set of IR databases. Uncertain inference allows for query processing even on incompatible database schemas.
We present a new paradigm for the automatic creation of document headlines that is based on direct transformation of relevant textual information into well-formed textual output. Starting from an input document, we automatically create compact representations of weighted finite sets of strings, called WIDL-expressions, which encode the most important topics in the document. A generic natural language generation engine performs the headline generation task, driven by both statistical knowledge encapsulated in WIDL-expressions (representing topic biases induced by the input document) and statistical knowledge encapsulated in language models (representing biases induced by the target language). Our evaluation shows similar performance in quality with a state-of-the-art, extractive approach to headline generation, and significant improvements in quality over previously proposed solutions to abstractive headline generation.
Studies on contrastive genre analysis have become a current issue in research on languages for specific purposes (LSP) and are intended to economize specialist communication. The present article compares formal schemata and linguistic devices of German abstracts and their English equivalents, written by German medical scholars to English native speaker (NS) abstracts. The source material is a corpus of 20 abstracts taken from German medical journals representing different degrees of specialism/professionalism. The method of linguistic analysis includes 1.(1) the overall length of articles/abstracts,2.(2) the representation/arrangement of “moves”,3.(3) the linguistic means (complexity of sentences, finite verb forms, active and passive voice, tenses, linking words, and lexical hedging).Results show no correlation between the length of articles and the length of abstracts. In contrast to NS author abstracts, the move “Background information” predominated in the structure of the studied German non-native speaker (GNNS) abstracts, whereas “Purpose of study” and “Conclusions” were not clearly stated. In linguistic terms, the German abstracts frequently contained lexical hedges, complex and enumerating sentence structures, passive voice and past tense as well as linkers of adversative, concessive and consecutive character. The GNNS English equivalent abstracts were author translations and contained structural and linguistic inadequacies which may hamper the general readability for the scientific community. Therefore abstracting should be systematically incorporated into language courses for the medical profession and for technical translators.
Free-text retrieval is less effective than it might be because of its dependence on notions that evolved with controlled vocabulary representation and searching. The structure and nature of the discourse level features of natural language text types are not incorporated. In an attempt to address this problem, an exploratory study was conducted for the purpose of determining whether information abstracts reporting on empirical work do possess a predictable discourse-level structure and whether there are lexical clues that reveal this structure. A three phase study was conducted, with Phase I making use of four tasks to delineate the structure of empirical abstracts based on the internalized notions of 12 expert abstractors. Phase II consisted of a linguistic analysis of 276 empirical abstracts that suggested a linguistic model of an empirical abstract, which was tested in Phase III with a two stage validation procedure using 68 abstracts and four abstractors. Results indicate that expert abstractors do possess an internalized structure of empirical abstracts, whose components and relations were confirmed repeatedly over the four tasks. Substantively the same structure revealed by the experts was manifested in the sample of abstracts, with a relatively small set of recurring lexical clues revealing the presence and nature of the text components. Abstractors validated the linguistic model at an average level of 86%. Results strongly support the presence of a detectable structure in the text-type of empirical abstracts. Such a structure may be of use in a variety of text-based information processing systems. The techniques developed for analyzing natural language texts for the purpose of providing more useful representations of their semantic content offer potential for application of other types of natural language texts.
This article presents part of phase 2 of a research project funded by the NSF-National Science Digital Library Project, which observed how academic users interact with the ScienceDirect information retrieval system for simulated class-related assignments. The ultimate goal of the project is twofold: (1) to find ways to improve science and engineering students’ use of science e-journal systems; (2) to develop methods to measure user interaction behaviors. Process-tracing technique recorded participants’ processes and interaction behaviors that are measurable; think-aloud protocol captured participants’ affective and cognitive verbalizations; pre- and post-search questionnaires solicited demographic information, prior experience with the system, and comments. We explored possible relationships between affective feelings and cognitive behaviors. During search interactions both feelings and thoughts occurred frequently. Positive feelings were more common and were associated more often with thoughts about results. Negative feelings were associated more often with thoughts related to the system, search strategy, and task. Learning styles are also examined as a factor influencing behavior. Engineering graduate students with an assimilating learning style searched longer and paused less than those with a converging learning style. Further exploration of learning styles is suggested.
Many approaches to decision support for the academic library acquisition budget allocation have been proposed to diversely reflect the management requirements. Different from these methods that focus mainly on either statistical analysis or goal programming, this paper introduces a model (ABAMDM, acquisition budget allocation model via data mining) that addresses the use of descriptive knowledge discovered in the historical circulation data explicitly to support allocating library acquisition budget. The major concern in this study is that the budget allocation should be able to reflect a requirement that the more a department makes use of its acquired materials in the present academic year, the more it can get budget for the coming year. The primary output of the ABAMDM used to derive weights of acquisition budget allocation contains two parts. One is the descriptive knowledge via utilization concentration and the other is the suitability via utilization connection for departments concerned. An application to the library of Kun Shan University of Technology was described to demonstrate the introduced ABAMDM in practice.
A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
Described here is a study of how students actively read electronic journal papers to prepare for classroom discussions. Eighteen students enrolled in a graduate course participated in this study; half of them read the documents privately, while the other half shared their readings. These readers were digitally monitored as they read, annotated, and shared the electronic (e-) documents over a course of several weeks during a semester. This monitoring yielded a comprehensive data bank of 60 e-documents (with 1923 markings), and 56 computer logs. Using semi-structured interviews, the reading, marking, and navigational activities of the participating readers were analyzed in detail. Under scrutiny were a range of activities that the subjects carried out. Analyses of the data revealed the types of markings that the users employ, and the ways in which those marking were placed. A derivation of the user-perceived functions of the marking structures was then carried out. The findings then lead to several implications for informing the design of reading and marking applications in digital libraries.
Information-seeking is important for lawyers, who have access to many dedicated electronic resources. However there is considerable scope for improving the design of these resources to better support information-seeking. One way of informing design is to use information-seeking models as theoretical lenses to analyse users’ behaviour with existing systems. However many models, including those informed by studying lawyers, analyse information-seeking at a high level of abstraction and are only likely to lead to broad-scoped design insights. We illustrate that one potentially useful (and lower-level) model is Ellis’s – by using it as a lens to analyse and make design suggestions based on the information-seeking behaviour of 27 academic lawyers, who were asked to think aloud whilst using electronic legal resources to find information for their work. We identify similar information-seeking behaviours to those originally found by Ellis and his colleagues in scientific domains, along with several that were not identified in previous studies such as ‘updating’ (which we believe is particularly pertinent to legal information-seeking). We also present a refinement of Ellis’s model based on the identification of several levels that the behaviours were found to operate at and the identification of sets of mutually exclusive subtypes of behaviours.
Previous studies of academic web interlinking have tended to hypothesise that the relationship between the research of a university and links to or from its web site should follow a linear trend, yet the typical distribution of web data, in general, seems to be a non-linear power law. This paper assesses whether a linear trend or a power law is the most appropriate method with which to model the relationship between research and web site size or outlinks. Following linear regression, analysis of the confidence intervals for the logarithmic graphs, and analysis of the outliers, the results suggest that a linear trend is more appropriate than a non-linear power law.
The study attempts to apply J.R. Bettman's consumer choice theory to the concept of information processing, evaluation and utilization within the present and future Zambian context. Central to this is the belief that information is a marketable commodity and is one that is critically essential in all activities relating to research and development (R & D) in Zambia. However, the author argues that unless there are discernible changes in the traditional attitudes and cultural values among Zambians, information processed and organized by academic libraries in Zambia will never be fully utilized even in research and development activities.
The aim of this paper is to study the link relationships in the Nordic academic web space – comprised of 23 Finnish, 11 Danish and 28 Swedish academic web domains with the European one. Through social networks analysis we intend to detect sub-networks within the Nordic network, the position and role of the different university web domains and to understand the structural topology of this web space. Co-link analysis, with asymmetrical matrices and cosine measure, is used to identify thematic clusters. Results show that the Nordic network is a cohesive network, set up by three well-defined sub-networks and it rests on the Finnish and Swedish sub-networks. We conclude that the Danish network has less visibility than other Nordic countries. The Swedish one is the principal Nordic sub-network and the Finland network is a slightly isolated from Europe, with the exception of the University of Helsinki.
Academic planning is becoming a very complex problem due to a variety of changes that have impacted the availability of funds for education. Some of these changes such as demographic shifts, social pressures and technological advances are external to the academic institution affected. These factors make planning increasingly important. This paper describes the use of technologically available tools to combat the problems faced in the planning activities at universities. It proposes the use of mathematical models and forecasting techniques to predict and therefore plan for change with enough leadtime so as to make these changes effective. This paper describes how administrators can allocate limited resources to where they are most effective. A model-based decision support system which is used by the decision-maker in planning and responding quickly to changes is presented. The system includes a number of alternative quantitative techniques that vary in complexity to suit the decision-maker's needs of forecasting change before it happens so as to plan for it. The changes insure the requisite quality of graduates. The system also identifies popular software packages referred to as spreadsheets to evaluate “what if” scenarios of budgets and enrollments.
The Web is an enormous set of documents connected through hypertext links created by authors of Web pages. These links have been studied quantitatively, but little has been done so far in order to understand why these links are created. As a first step towards a better understanding, we propose a classification of link types in academic environments on the Web. The classification is multi-faceted and involves different aspects of the source and the target page, the link area and the relationship between the source and the target. Such classification provides an insight into the diverse uses of hypertext links on the Web, and has implications for browsing and ranking in IR systems by differentiating between different types of links. As a case study we classified a sample of links between sites of Israeli academic institutions.
The citation practices of academics and practioners who have published papers in the proceedings of a national computer conference are analyzed in order to measure how the two groups differ in their use of published information. The comparison is based upon types of documents cited, age of cited literature and core journals cited. The study found that both groups cited the same group of core journals and cited documents of approximately the same age. Both groups cited journals most frequently and when document types are ordered according to citation frequency, the rankings are identical. However, the actual citation frequencies by type of document are not the same and these differences are attributed to unequal levels of awareness and access to specific categories of documents.
An operational prototype inhomogeneous distributed database system has been built to provide homogeneous access for researchers and managers of science policy to data on research projects held in three countries. The architecture is sufficiently general not only to accommodate additional countries and databases on research projects, but also to provide a general framework for other international collaborative projects aimed at the provision of information. The design exhibits an optimal compromise between sufficient data to satisfy a retrieval request and the minimal data for transmission to maintain the database.
Advances in the publishing world have emerged new models of digital library development. Open access publishing modes are expanding their presence and realize the digital library idea in various means. While user-centered evaluation of digital libraries has drawn considerable attention during the last years, these systems are currently viewed from the publishing, economic and scientometric perspectives. The present study explores the concepts of usefulness and usability in the evaluation of an e-print archive. The results demonstrate that several attributes of usefulness, such as the level and the relevance of information, and usability, such as easiness of use and learnability, as well as functionalities commonly met in these systems, affect user interaction and satisfaction.
Multimedia data can require significant examination time to find desired features (“content analysis”). An alternative is using natural-language captions to describe the data, and matching captions to English queries. But it is hard to include everything in the caption of a complicated datum, so significant content analysis may still seem required. We discuss linguistic clues in captions, both syntactic and semantic, that can simplify or eliminate content analysis. We introduce the notion of concept depiction and rules for depiction inference. Our approach is implemented in an expert system which demonstrated significant increases in recall in experiments.
A software research and development project for the U.S. Department of Energy provided an opportunity to explore the information-seeking behavior of energy researchers. The DOE project, entitled “Online Access to Knowledge,” or “OAK,” is developing a microcomputer interface for improving end-user access to energy databases. Interviews with 18 researchers and 34 search intermediaries in energy-related fields indicate a reliance on databases as sources of information. The interview data suggest a migration of searchers toward commercial systems that offer the widest choice of database coverage. Despite previous efforts to encourage direct use of RECON databases, most energy researchers interviewed preferred that others do their searching for them. Librarians and technical information specialists, although recognizing the potential for researchers to use databases directly, doubted that such use will be common in the near future. However, this and other studies suggest a trend towards first-hand use of databases by end-users in the energy field, particularly younger researchers. Preliminary testing of the OAK software indicates that end-users will search, if provided with adequate tools. These findings are discussed in the light of previous research on the information gathering habits of scientists and engineers.
Part I of two articles reviews six research literatures that consider access from different vantage points to identify common aspects of the concept `access to information'. The resulting multi-dimensional framework includes (1) conceptualizations of information itself (resource/commodity, data in the environment, representation of knowledge and part of the communication process), (2) conceptualizations of the notion of access (knowledge, technology, communication, control, goods/commodities and rights), (3) a set of general information seeking facets (context, situation, strategies and outcomes) and (4) a variety of influences and constraints (physical, cognitive, affective, economic, social and political). Only a comprehensive consideration of these factors will allow us to understand the concept of access to information, as well as develop and study systems, institutions and policies that foster improved access.
It has been shown (S. Lawrence, 2001, Nature, 411, 521) that journal articles
which have been posted without charge on the internet are more heavily cited
than those which have not been. Using data from the NASA Astrophysics Data
System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell
University (arXiv.org) we examine the causes of this effect.
This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. First, when considering various contexts, this study reveals that the combined indexing strategy always obtains the best retrieval performance. Second, when users wish to conduct exhaustive searches with minimal effort, we demonstrate that manually assigned terms are essential. Third, the evaluations presented in this paper study reveal the comparative retrieval performances that result from manual and automatic indexing in a variety of circumstances.
Subject access to documents is influenced by two kinds of indeterminacy: the indeterminacy of the indexer's selection of indexing descriptors and the indeterminacy of the inquirer's selection of search terms. The possibility of successful retrieval depends on how these two indeterminacies interact. Five types of interaction are discussed and a change in the traditional method of subject searching is suggested as a way of reducing the effect of one of these two indeterminacies and of avoiding those types of interactions where the retrieval of the desired document(s) is impossible.
The greatest number of open access journals (OAJs) is found in the sciences and their influence is growing. However, there are only a few studies on the acceptance and thereby integration of these OAJs in the scholarly communication system. Even fewer studies provide insight into the differences across disciplines. This study is an analysis of the citing behaviour in journals within three science fields: biology, mathematics, and pharmacy and pharmacology. It is a statistical analysis of OAJs as well as non-OAJs including both the citing and cited side of the journal to journal citations. The multivariate linear regression reveals many similarities in citing behaviour across fields and media. But it also points to great differences in the integration of OAJs. The integration of OAJs in the scholarly communication system varies considerably across fields. The implications for bibliometric research are discussed.
Part II summarizes and extends the review of fundamental conceptualizations of access to information across six relevant research literatures developed in part I. It identifies unique underlying assumptions of the concept `access to information' within each of the different disciplines. We discuss implications of the conceptualizations of access, and of influences and constraints on access. We then integrate the common and unique conceptualizations and the implications to propose a general model of access to information. The goal of the two articles is to identify common and unique, as well as hidden and overlooked, aspects of how access is conceptualized in a selected set of relevant literature, and to suggest a comprehensive perspective that may be applied to future studies and policies related to information access.
This study applies theories about organizational information processing and about valuing information to better understand the influences on method of access and on effects of using online information. Interviews in four organizations indicated that users manage such systems in different ways suitable to the organization's problem-solving needs and personnel skills, in order to obtain considerable benefits and overcome some problems. Questionnaire data showed that type of database and organizational differences strongly influenced access method, but access method had no independent influence on usage or on perceived outcomes. Counter to expectations based upon the difficulties in assessing the cost/benefit ratio of information obtained from external sources, task variables had little independent influence on access method, usage, or outcomes. The moderate relationship between using online information in one's work and the two outcome factors seems generalizable across organizations. Differences in tasks, especially amount of information in one's task, appeared to influence the relationship between use and outcomes.
Traditional approaches to information retrieval, based on automatic or manually constructed keywords, are inappropriate for certain desirable tasks in an intelligent information system. Obtaining simple answers to direct questions, a summary of an event sequence that could span multiple documents, and an update of recent developments in an ongoing event sequence are three examples of such tasks.In this paper, the SCISOR system is described. SCISOR illustrates the potential for increased recall and precision of stored information through the understanding in context of articles in its domain of corporate takeovers. A constrained form of marker passing is used to answer queries of the knowledge base posed in natural language. Among other desirable characteristics, this method of retrieval focuses search on likely candidates, and tolerates incomplete or incorrect input indices very well.
This paper discusses the database design developed by the authors' company for its new range of database systems as an example of a design of database which allows a wider range of capabilities than does the conventional IR database. The paper discusses the range of searching options which are now possible. It describes how searches may be prestructured to pinpoint the desired target items, which are then selected from a browsable display. It describes the navigation capability which allows the user to move from record to record within the database, thus facilitating the extraction of “related” information. The design is discussed from the user viewpoint, and emphasis is placed on the facilities and how they complement existing Boolean/key access facilities.
Numerous studies of engineers' information seeking behavior have found that accessibility was the factor that influenced most their selection of information sources. The concept of accessibility, however, is ambiguous and was given various interpretations by both researchers and engineers. Detailed interviews with 32 engineers, in which they described incidents of personal information seeking in depth, uncovered some of the specific factors that are part of the concept. Engineers selected sources because they had the right format, the right level of detail, a lot of information in one place, as well as for other reasons. When looking for human information resources, the engineers most frequently selected sources with which they were familiar, while saving time was the most frequently mentioned reason for selecting documentary sources. Future research should continue to examine the concept of accessibility through detailed empirical investigations.