The critical elements of the change process were designed into the strategic planning process and the pilot project for the Integrated Academic Information Management System (IAIMS) at the University of Maryland. These elements were: Support by the institutional leadership; a critical mass of interested participants from diverse groups across the organization, committed to the project and with ownership of the plan; a motivating level of dissatisfaction with the status quo; the construction of a scenario describing the desired future and an assessment of needs to achieve it; technical and consulting help; a pilot project with replicable features to demonstrate the concept and feasibility of the approach; and participation of opinion leaders initially with later identification of additional opinion leaders who would become part of the pattern of acceptance of the innovation and diffusion of the technology across the campus.
The IAIMS project at the University of Utah has focused on clinical linkages to facilitate the research, teaching, and service mission of the Medical Center. The planning phase focused on the relationship among ther users and providers of the system and developed a scenario describing the professor and clinical clerk making rounds in the bedside setting. The prototype Health Evaluation through Logical Processes (HELP) system brings together three sources necessary to solve a medical problem: the patient database, the medical literature, and an expert in the subject. Microcomputers provide access to the HELP system and a complementary literature knowledge database.
The technologies contributing to today's information world are profound. Electronic technologies are transforming the way we handle information and the medium used to pursue knowledge. What once seemed beyond reach--that computers, communications, information, and knowledge systems could be merged harmoniously to transmit information via networks--is reality today. The Integrated Academic Information Management System (IAIMS) approach for transfer of biomedical information within a medical center also offers opportunities for networks across institutional lines. The tradition of cooperation and resource sharing among libraries makes them logical facilitators of the emerging, universal information environment. The authors present a course of action based on visions of a new era, realities of the present, and strategies for the future.
The IAIMS concept calls for the logical integration of information derived from databases in the functional areas of administration, clinical care, education, libraries, and research. To accomplish this task a technological base must be established. It is suggested that such an infrastructure will consist of a systems component, the linked databases component, and a management component. An IAIMS should rapidly evolve into an information utility and, within a decade, provide a knowledge management system.
Medical informatics is still in its early stages of evolution and definition. If informatics is to obtain the status of a specialized field of study within the health science curriculum, its ambiguity must be eliminated. This article discusses the term "medical informatics" and the impact of the new field of study on curriculum, education, and training of health care professionals, and health care information systems research and development.
The Administration on Aging (AoA) sponsored a national information system known as the SCAN system. The SCAN bibliographic database went on-line in March 1982. Congress, however, had repealed the authority for the clearinghouse, so the Government let all of the services of SCAN expire by September 1982. The Government then tried to keep the SCAN information accessible by offering it to the private sector. The Government solicited applicants through both the Federal Register and the Commerce Business Daily. As a result, the American Association of Retired Persons (AARP) and AoA signed an agreement calling for AARP to make such of the SCAN information accessible on-line and to update it.
This article describes the Indexing Aid Project for conducting research in the areas of knowledge representation and indexing for information retrieval in order to develop interactive knowledge-based systems for computer-assisted indexing of the periodical medical literature. The system uses an experimental frame-based knowledge representation language, FrameKit, implemented in Franz Lisp. The initial prototype is designed to interact with trained MEDLINE indexers who will be prompted to enter subject terms as slot values in filling in document-specific frame data structures that are derived from the knowledge-base frames. In addition, the automatic application of rules associated with the knowledge-base frames produces a set of Medical Subject Heading (MeSH) keyword indices to the document. Important features of the system are representation of explicit relationships through slots which express the relations; slot values, restrictions, and rules made available by inheritance through "is-a" hierarchies; slot values denoted by functions that retrieve values from other slots; and restrictions on slot values displayable during data entry.
Many issues relating to the protection of intellectual property are economic in nature. This article applies economic analysis to several of those issues that arise in an international context. The first model concerns how one nation's choice of a particular form of protection will affect the economic welfare of its trading partners. Then the economics of unilateral, bilateral, and multilateral action are compared. The final analyses cover the optimal number of members in a multilateral agreement and the choice between mutually exclusive international agreements.
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of it database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especially with regard to the inverted flies of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.
A procedure for automated indexing of pathology diagnostic reports at the National Institutes of Health is described. Diagnostic statements in medical English are encoded by computer into the Systematized Nomenclature of Pathology (SNOP). SNOP is a structured indexing language constructed by pathologists for manual indexing. It is of interest that effective automatic encoding can be based upon an existing vocabulary and code designed for manual methods. Morphosyntactic analysis, a simple syntax analysis, matching of dictionary entries consisting of several words, and synonym substitutions are techniques utilized.
A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of hundreds of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, WEB documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most probable use would be for improving or refining search results.
Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.
Much of what white collar workers do in offices is communication-related. White collar workers make up the majority of the labor force in the United States today and the majority of current labor costs. Because office automation represents more productive structured techniques for handling both written and oral communication, office automation therefore offers the potential to make organizations more productive by improving organizational communication. This article: (1) defines communication, (2) identifies the potential benefits to be realized from implementing office automation, and (3) offers caveats related to the implementation of office automation systems. Realization of the benefits of office automation depends upon the degree to which new modes of communication may be successfully substituted for traditional modes.
User satisfaction as a measure of library effectiveness has been studied by using Kantor's branching technique. Cleveland Health Sciences Library, Cleveland, Ohio, was used as the center of this study. The study measured independently the user bibliographic information performance, collection development policy performance, acquisition policy performance, user catalog performance, circulation performance, library operation/functioning performance, and the user search performance. Each of these categories represented one or more of the factors of library performance that could account for the possibility of users' failure. The sample size consisted of 1000 requests for book titles. The book availability rates were 59.60% without the help of the librarian and 63.50% with the help of the librarian. The results indicate that a dynamic study of library policies at regular intervals by using the Kantor's method, which is easily reproducible, can maximize library resources for the most effective fulfillment of user demand.
'A Navigator of Natural Language Organized Data' (ANNOD) is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users. ANNOD includes common word deletion, word root isolation, query expansion by a thesaurus, and application of a complex empirical matching (ranking) algorithm. The Hepatitis Knowledge Base, the text of a prototype information system, was the file used for testing ANNOD. Responses to a series of users' unrestricted natural language queries were evaluated by three testers. Information needed to answer 85 to 95% of the queries was located and displayed in locating information in both the classified (listed in Table of Contents) and unclassified portions of the text. Development of this retrieval system resulted from the complementarity of and interaction between computer science and medical domain expert knowledge. Extension of these techniques to larger knowledge bases is needed to clarify their proper role.
Search results for nine topics in the Medical Behavioral Sciences are reanalyzed to compare the overall performance of descriptor and citation search strategies in identifying relevant and novel documents. Overlap percentages between an aggregate "descriptor-based" database (MEDLINE, EXCERPTA MEDICA, PSYCINFO) and an aggregate "citation-based" database (SCISEARCH, SOCIAL SCISEARCH) ranged from 1% to 26%, with a median overlap of 8% relevant retrievals found using both search strategies. For seven topics in which both descriptor and citation strategies produced reasonably substantial retrievals, two patterns of search performance and novelty distribution were observed: 1) Where descriptor and citation retrieval showed little overlap, novelty retrieval percentages differed by 17-23% between the two strategies; 2) Topics with a relatively high percentage retrieval overlap showed little difference (1-4%) in descriptor and citation novelty retrieval percentages. These results reflect the varying partial congruence of two literature networks and represent two different types of subject relevance.
The National Library of Medicine has offered TOXLINE, an online interactive bibliographic database of biomedical (toxicology) information since 1972. Files from 11 secondary sources comprise the TOXLINE database. The sources supplied bibliographic records in different formats and data structures. Data from each supplier's format had to be converted into a format suitable for TOXLINE. Three different, successive retrieval systems were used for the TOXLINE database which required reformatting of the data. Algorithms for generating terms for inverted file search methods were tested. Special characters peculiar to the scientific literature were evaluated during search term generation. Developing search term algorithms for chemical names in the scientific literature required techniques different from those used for nonscientific literature. Problems with replication of bibliographic records from multiple secondary sources are described. Some observations about online interactive databases since TOXLINE was first offered are noted.
A study to compare the cost effectiveness of retrospective manual and on-line bibliographic searching is described. Forty search queries were processed against seven abstracting-indexing publications and the corresponding SDC/ORBIT data bases. Equivalent periods of coverage and searcher skill levels were used for both search models. Separate task times were measured for question analysis, searching, photocopying, shelving, and output distribution. Component costs were calculated for labor, information, reproduction, equipment, physical space, and telecommunications. Results indicate that on-line searching is generally faster, less costly, and more effective than manual searching. However, for certain query/information-source combinations, manual searching may offer some advantages in precision and turn-around time. The results of a number of related studies are reviewed.
Since its founding in 1937, the National Cancer Institute (NCI) has supported a substantial program of information dissemination. Two peer-reviewed journals, begun in 1940 and 1959, are supplemented by a congressionally mandated International Cancer Research Data Bank (ICRDB), established in 1972. The NCI has made available online databases of published cancer literature and cancer research in progress for the past decade, using the National Library of Medicine (NLM) MEDLARS system. Recently, a clinical-practice-oriented cancer-information system called Physician Data Query (PDQ) has been developed for access at the NLM, as well as through commercial database vendors. The impact of the NCI information programs is currently under prospective evaluation.
This article discusses the development of a vertically oriented CD-ROM database product in the medical subdiscipline of oncology. Called OncoDisc, the CD-ROM is mastered by ISG. It contains three major information collections: (1) PDQ (a system of a series of files and a set of relationships; the user accesses the data through those relationships); (2) Cancerlit (the research literature that underlies the treatment information contained in PDQ), and (3) full-text articles. SearchLITE, the retrieval system, is written in C language and has been implemented on the the DEC VAX family and the IBM PC/XT and PC/AT. The disc provides a personal library of oncology information for immediate local use by the health professional; it requires no subscription to an online service, no telecommunications, and no online search charges.
This is an overview of the Federal government's support of health information clearinghouses--why they were initiated, their purpose, problems, and impact. Federal clearinghouses emerged in the 1960s to identify, organize, and provide access to a substantive body of information. As a support service to their sponsoring agencies, and often the only source for "fugitive" information, they constantly change to meet program priorities and facilitate the flow of information to multilevel audiences. In addition, the increasing complexity and quantity of information has intensified the need for organizing resources into coherent, manageable form. As a primary source for unbiased health information, clearinghouses strive to present a balanced view of research issues and treatment modalities. They provide inexpensive access to reliable health information, especially publicly funded research and information for the public good, and play an important role in meeting the nation's health objectives.
An investigation of the relationship between National Institutes of Health (NIH) funding and the quantity and nature of biomedical publications is reported for 120 U.S. medical school complexes. A correlation of 0.95 was found between the amount of NIH funds received and the number of biomedical publications from the medical schools. Medical school ranks based on bibliometric measures were found to correlate at the 0.80-0.90 level with ranks based on peer assessments of the schools. The characteristics of the medical school papers varied with the type of school. The average citation influence per paper increased with the publication size of the schools. This was true even when factors such as public versus private control, geographic region, average research level (from basic to clinical), and subject emphasis were controlled. The positive relationship between number of papers from a school and its citation influence holds within individual research levels and within subfields.
The rate of citation duplication was examined in three databases: MEDLINE, BIOSIS, and LIFE SCIENCES COLLECTION. Duplicate citations were found to be more pertinent than unique citations. The duplicate citations came from a highly compact literature, while those from a single database were very widely scattered. The pertinent duplicated citations were more likely to be retrieved in searches that had more terms overall, had a higher percentage of thesaurus terms, and had terms which appeared in both title and abstract. These results suggest that the rate of duplication of citations in multidatabase searches may be used to rank output according to probable pertinence.
A design for an on-line serials decision-making and collection analysis system is proposed. It is composed of four basic components: citation data, conventional serial records data, utility/cost ratio compilation and journal ranking techniques, and user interface software. The system would have the ability to respond specifically to user interest profiles and to integrate locally generated data. It is postulated that such a system is capable of satisfactorily resolving the major criticisms of the use of citation data for selection purposes: that libraries are diverse in their interests and that no aggregate list can be more than generally relevant; that the inclusion of cost data is essential and that citation ranking without regard to cost can be misleading; and that other relevant data should be considered as well.
Data from a national survey (n = 1666) of researchers, practitioners, and policymakers in the field of rural mental health services were used to conduct a sociometric analysis of person-to-person communication in the field. This article describes the structure of the person-to-person communication network in terms of its connectedness, centrality, homogeneity, and differentiation. Despite the diversity of survey respondents, and apparently meager interorganizational communication, communication in he field is similar, in many respects, to that observed in "invisible colleges." While the probability of two randomly chosen individuals being in contact is low (0.0008), over 70% were connected indirectly. The person-to-person communication network is also highly centralized and exhibits higher than expected communication among respondents in the same professional role, type of work organization, and geographical region. It does not appear to be highly differentiated with respect to topic, since the majority of information providers are contacted with respect to a number of topics.
Increasingly images are being incorporated into computer-information systems, allowing faster and more reliable access to legal documents, fingerprints, medical images, and so on. But designing viable computer-human interactions (CHI) for image-information systems can be particularly difficult. This article presents an overall approach to developing viable image CHI involving user metaphors for comprehending image data, and methods for locating, accessing, and displaying computer images. Since medical-image applications involve almost all image display problems, a medical-image radiology-workstation application is used as a driving example to present critical image CHI issues.
This article raises several questions regarding training for computer-based reference services, including who is to be trained and who is responsible for training. It discusses these issues and then provides a summary of the training provided to date by search service suppliers, database suppliers, library schools and extension programs, library cooperatives, and professional organizations. The available training materials are also discussed. Some projections are made of likely future activities.
The purpose of this article is to examine the ramifications of legislative recognition of the concept of fair use in the Copyright Act of 1976. The fair use concept, while of small consequence in its normative origins, has turned out to be the foundation of the most perplexing and divisive issues in the new legislative guidelines governing copyright. Legislative recognition of the concept of fair use, coupled with enormous growth of a new technology--extending from xerography to on-line database systems--creates de facto exemptions to both the intent and content of new copyright guidelines. The issue is not one of limiting use or suppressing information, but of mechanisms for safeguarding the rights of copyright holders, be they authors or publishers, and insuring the free flow of information by providing a proper return on both intellectual creativity and capital expenditures. The authors argue that the elimination, or at least curtailment of fair use doctrine, coupled with an increase in technological approaches to reporting of secondary use of copyrighted material, will benefit all sections of the knowledge industry. Authors will receive proper royalties on use; publishers will be able to sell more books and journals at lower prices; and librarians will be liberated from extensive chores such as monitoring usage or determining fee schedules and transferences. The issue is one of fair return--an issue obscured and ultimately subverted by fair use.
The database model presented in this article is suitable for applications in which queries may require noncrisp references to certain attributes. The data item (attribute) values may be crisp or fuzzy. For instance, such adjectives as "high" or "normal" may be attribute values for the attribute "blood pressure." A disease or a condition can be described by a number of symptoms which may be crisp alphanumeric values or fuzzy terms such as "high" or "normal." A query into this database can retrieve diseases which have "similar" symptoms. The similarity or "indistinguishability" is a measure defined by the database user on the relations that describe a family of diseases. This database system in conjunction with a rule base can provide the framework for a medical consultation system.
One of the key variables in the optimization of the retrieval process is the logic imposed on a set of query terms. If the query is small, a combinatorial algorithm can be employed to identify search expressions having an optimal logical form. An experiment is described in which this was done for queries expressed against MEDLINE, for a variety of criterion variables. The method employed is useful not only for assisting in identifying optimal logical forms, as demonstrated, but also as an experimental control device to assist investigations into the effects of varying the set, and number, of search terms. The experiment also suggests several novel properties of effective searching against MEDLINE, for example that searching with four MeSH terms is likely to be more successful than searching with a lesser number of terms, provided search logic is optimal. The latter result is, surprisingly, true for both Precision and Recall, i.e., a tradeoff between the maximally attainable values of these variables fails to hold when the number of search terms varies in this range.
The area of rural mental health services was used as a testbed to study information-seeking behavior in a field that includes researchers, policymakers, and practitioners. Findings from a nationwide survey (n = 1666) describe the sources that were used to obtain information about various topics and the use and value of these sources by or to individuals in various work roles and settings. The findings demonstrate the importance of person-to-person communication; differences in the sources used, and the value placed on these sources, by individuals in different work roles and settings; and that information-seeking episodes generally involve using multiple sources (5.0) to obtain information about several topics (3.2).
Issues related to the demand for library service are not new to library administrators and other information scientists. However, formal economic analysis of that demand is just beginning. This paper has two objectives: to explain the theoretical links between the demand for library service and key economic variables and to describe how demand was estimated in one specific experimental context: the demand for library service of institutional users of the Cleveland Health Sciences Library. This study is cited to illustrate some problems that are likely to arise in attempting to estimate statistically the parameters of library demand functions. The need for more precise economic analysis of library demand has grown as forms of information that have traditionally been provided free to users begin to acquire explicit price tags. This trend is likely to continue. Most economic models for setting library user fees require specific inputs about the demand for information and the sensitivity of demand to changes in economic variables. Even in situations where user fees are not applicable, an understanding of the demand function can be useful in predicting how the amount of library service demanded might change if underlying economic or noneconomic variables change. As more complete data become available, economic analysis of library demand will be employed more frequently as a policymaking tool.
Although industrial and consumer videodisc technology has been available since 1978, information providers are now just beginning to explore the possibilities of using videodisc and optical disk technology as publishing media. Firms involved in the design, development, and delivery of information products and services are looking at these technologies as new business opportunities and distribution channels. The areas in which information providers are using videodisc and optical disk technology are covered briefly.
The National Technical Information Service (NTIS) addresses the issue facing all government information providers--justification of the activity on a cost/benefit basis--by being self-supporting. The user pays for the information provided on a cost-recovery basis. Within the NTIS, a new program adds to the resources available to the health professional and/or consumer. The Center for the Utilization of Federal Technology (CUFT) links information, Federal technology resources, and new technologies to new users, including the private sector, to facilitate commercialization and therefore enhance utilization. To bring the Federal research and development (R & D) community together with potential non-Federal users, CUFT provides information products and undertakes networking activities in its Office of Applied Technology. The program initiates the link from the public to private sector for commercialization of newly developed Federal technology in its Office of Federal Patent Licensing. Individual products and examples of successful projects addressing the health community and its concerns are described. The CUFT program is increasing its online availability to deal with the increasing volume of information available and the growing number of users in health-related fields as well as in other areas of Federal scientific and technical information.
This article reports on exploratory experiments in evaluating and improving a thesaurus through studying its effect on retrieval. A formula called DISTANCE was developed to measure the conceptual distance between queries and documents encoded as sets of thesaurus terms. DISTANCE references MeSH (Medical Subject Headings) and assesses the degree of match between a MeSH-encoded query and document. The performance of DISTANCE on MeSH is compared to the performance of people in the assessment of conceptual distance between queries and documents, and is found to simulate with surprising accuracy the human performance. The power of the computer simulation stems both from the tendency of people to rely heavily on broader-than (BT) relations in making decisions about conceptual distance and from the thousands of accurate BT relations in MeSH. One source for discrepancy between the algorithms' measurement of closeness between query and document and people's measurement of closeness between query and document is occasional inconsistency in the BT relations. Our experiments with adding non-BT relations to MeSH showed how these non-BT non-BT relations to MeSH showed how these non-BT relations could improve document ranking, if DISTANCE were also appropriately revised to treat these relations differently from BT relations.
End-user searching of National Library of Medicine (NLM) online data bases during eleven years has been investigated through transaction logs, questionnaires, and follow-up interviews. From 1976 through 1984, pathologists and pharmacists performed 8,313 searches. Highlights of our studies are compared with a review of other end-user research. Volume of searching is directly related to the convenient placement of the terminal in the work place. Slightly fewer than half of all potential searchers actually search for themselves. Practices of pharmacists and pathologists do not differ in important ways. Nonmediated searchers feel they need answers more promptly than do those who obtain mediated searches. End-users perform very simple searches, mostly using only the AND operator. Problems with techniques are fewer and more easily solved than those with the vocabulary and content of the system. The major problems, with the most powerful capabilities of MEDLINE--subheadings and explosions--sometimes cause substantial loss of references, but in relatively few searches. One-on-one teaching is most popular, with trial-and-error the most frequent procedure used in actual learning.
An online search strategy to help find pairs of medical literatures that are logically (scientifically) related but non-interactive is described and exemplified. 'Noninteractive' means that the two pairs have no articles in common, do not cite each other, and are not co-cited - thus implying that any logical relationship between them may be unintended and perhaps unnoticed. This article proposes a strategy for identifying such sets of literature. The proposed strategy consists of two parts. The first part is an exploratory process intended to stimulate human creativity in perceiving connections that identify logically-related pairs of literatures. Second, and more prescriptive, is a method for eliminating all pairs except those that are noninteractive.
The radical changes and improvements in health sciences libraries during the last quarter century have been primarily achieved through the leadership of the National Library of Medicine (NLM) in the application of technology and in the creation of a biomedical communications network. This article describes principal programs and activities of the National Library of Medicine and their effects on health sciences libraries: the Medical Literature Analysis and Retrieval System (MEDLARS), implementation of the Medical Library Assistance Act (MLAA), and defense of "fair use" of copyrighted material. The article briefly summarizes more recent Federal activities which directly affect access to and dissemination of health information and concludes with a summary of problems for which solutions must be found if health sciences libraries are to be prepared to meet the future. It is clear from comparing the programs described with current government attitudes that, although the Federal government has promoted advancement in the dissemination of biomedical information in the past, this trend is reversing, and Federal funding to libraries is decreasing while the cost of accessing information is increasing.
This article addresses file maintenance of subject headings in the National Library of Medicine's MEDLINE, a bibliographic retrieval file which is indexed using Medical Subject Headings (MeSH), NLM's subject authority. The emphasis is class maintenance in which sets of records in MEDLINE are changed by maintenance actions which reflect yearly changes in MeSH. Specific types of maintenance action are described with examples to highlight problem areas. Class maintenance failures, checking class maintenance, and informing end-users of MeSH changes resulting in class maintenance are discussed. Technical constraints and feasibility are not analyzed in detail; rather, the approach is geared toward requirements and impact of maintenance from the standpoint of vocabulary specialists and end-users. The article concludes with a brief statement on defining file maintenance policy.
This project was designed to test the relative efficacy of index terms and full-text for the retrieval of documents in those MEDLINE journals for which full-text searching was also available. The full-text files used were MEDIS from Mead Data Central and CCML from BRS Information Technologies. One hundred clinical medical topics were searched in these two files as well as the MEDLINE file to accumulate the necessary data. It was found that full-text identified significantly more relevant articles than did the indexed file, MEDLINE. The full-text searches, however, lacked the precision of searches done in the indexed file. Most relevant items missed in the full-text files, but identified in MEDLINE, were missed because the searcher failed to account for some aspect of natural language, used a logical or positional operator that was too restrictive, or included a concept which was implied, but not expressed in the natural language. Very few of the unique relevant full-text citations would have been retrieved by title or abstract alone. Finally, as of July, 1990 the more current issue of a journal was just as likely to appear in MEDLINE as in one of the full-text files.
Official printer and sales agent for publications of the Federal government, the U.S. Government Printing Office (GPO) receives typed and electronic manuscripts from virtually every agency of government. Either in house or through commercial procurement, GPO provides typesetting, printing, and binding services to produce finished publications. GPO also disseminates these publications through the 1400 Depository Libraries and the Superintendent of Documents Sales Program. GPO employs an hierarchical marketing system which helps assure public exposure for every sales program title, while assigning increasing levels of promotion for titles with the greatest sales potential. As a trend, GPO sees fewer consumer-oriented publications and more professional-use titles. GPO also observes a new appreciation of the value of government statistical information, and increased agency efforts to provide improved public access to this data. GPO is working with publishing agencies and information-technology suppliers to study ways of accommodating demand for electronic information dissemination.
This article explores changes that can occur during the creation of a derivative thesaurus. A term translation dictionary is proposed to aid MeSH-trained and other searchers who would be using the Cancer Information Thesaurus.
MEDLINE is presented as a prototype for on-line bibliographic search systems. Creation of the data base, indexing language, and file organization are reviewed. On accessing the files, search logic is illustrated with a sample MEDLINE search. NLM's development of a document delivery system to complement its bibliographic retrieval system is discussed.
This study investigated more attributes of online searchers that were believed to affect the quality of their search results. Subjects were selected from the online searching courses in six library schools. The searching proficiency of the subjects was measured by their performance on two DIALOG searches. Their creativity level was measured using two self-report inventories; their intelligence level was approximated from their GRE Verbal and Quantitative scores; and their personality traits in regard to masculinity, femininity, and self-esteem were measured using the Interpersonal Disposition Inventory. The large number of independent, predictor variables were reduced by factor analysis and the derived factors were related to the dependent variable, online searching performance, in a multiple regression analysis. The findings suggest that differences in searching performance can be attributed, to a small degree only, to general verbal and quantitative aptitude, artistic creativity, and to an inclination toward critical and analytical creative thinking. The findings also raise doubts, however, that high intelligence and other attributes cited by writers in the field are necessary for high performance. The notion that searching performance can be predicted by or is dependent upon certain cognitive or personality traits has thus become highly suspect.
There have been a number of major evaluations of the performance of retrieval systems against large full text and surrogate (bibliographic) databases. These evaluations have concentrated on the experimental determination of the Precision Ratio, the fraction of retrieved items that are relevant to an information request, and the Recall Ratio, the fraction of the total number of relevant items that were actually retrieved. While these measures have met with general acceptance, they have also generated much controversy. The purpose of this article is to review the results of several of the largest evaluations and to propose a simple model for the performance of such systems that may help explain the relationship between these measures and user behavior.
The Neurological Information Network (NIN) of the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) was a loosely structured assemblage of a variety of information-transfer activities that existed for approximately 20 years, starting in the early 1960s. These activities included the Neurosciences Research Program at the Massachusetts Institute of Technology, the Parkinson's Disease Information Center at Columbia University, the Brain Information Service at UCLA, the Information Center for Hearing, Speech, and Disorders of Human Communication at the Johns Hopkins Medical Institutions, the Clinical Neurology Information Center at the University of Nebraska, the Cerebrovascular Disease Abstracts generated at the Mayo Foundation and appearing in the journal Stroke, and Epilepsy Abstracts published by Excerpta Medica. The article discusses primarily the sociopolitical factors that govern the creation and life of activities of the type enumerated.
Visual documents--motion sequences on film, videotape, and digital recording--constitute a major source of information for the Space Agency, as well as all other government and private sector entities. This article describes a method for automatically selecting key frames from visual documents. These frames may in turn be used to represent the total image sequence of visual documents in visual libraries, hypermedia systems, and training algorithm reduces 51 minutes of video sequences to 134 frames; a reduction of information in the range of 700:1.
The use of terms from natural and social scientific titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Different notions of sublanguage distinctiveness are explored. Objective methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that considers the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering or information retrieval systems.