Article

Building an autonomous citation index for grey literature : the economics working papers case

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes an autonomous citation index named CitEc that has been developed by the authors. The system has been tested using a particular type of grey literature: working papers available in the RePEc (Research Papers in Economics) digital library. Both its architecture and performance are analysed in order to determine if the system has the quality required to be used for information retrieval and for the extraction of bibliometric indicators.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Currently, many RIS, like Google Scholar, Web of Science (Clarivate Analytics), Scopus, RePEc/CitEc, etc. generate the citation relationships. Usually they use already well-developed procedure [4] with the two main steps: ...
... One-two sentences to the left and to the right from each in-text citation give us a context of using references in a paper. Analyzing the context data [4], we can assess a style of its mentioning, e.g. how the cited paper was used in the citing paper, and many others. ...
... Currently, many RIS, like Google Scholar, Web of Science (Clarivate Analytics), Scopus, RePEc/CitEc, etc. generate the citation relationships. Usually they use already well-developed procedure [3] with the two main steps: ...
... One-two sentences to the left and to the right from each in-text reference give us a context of using references in a paper. Analyzing the context data [3], we can assess a style of its mentioning, e.g. how the cited paper was used in the citing paper, and many others. ...
Conference Paper
Full-text available
This paper presents a method to process a content of research papers in binary PDF format at a server side that gives research information systems new features of citation content analysis. This method efficiently generates JSON versions of PDF documents that allows an easier recognition of papers’ references, in-text references, citation context, etc. As a result, one can parse an extended set of citation data, including a location of citations in a research paper’s structure, frequency of mentioning for the same references, style of reference mentioning and so on. Based on these data we upgrade traditional citation relationships by adding some semantic attributes. Formatting these semantic data according W3C Web Annotation Data Model and integrating the data with some annotation tools, we visualize citation relationships, its semantic attributes and related statistics as annotations for readers of PDF documents from a research information system.
... Instead, the data is put in the public domain and several so-called RePEc services have been developed that use the metadata in various ways, such as an email dissemination service (NEP, http: //nep.repec.org/, see Chu andKrichel, 2003, andBarrueco Cruz andKrichel, 2005), websites with search and browsing functionalities (the most prominent ones being EconPapers, http: //econpapers.repec.org/, and IDEAS, http: //ideas.repec.org/), ...
... Instead, the data is put in the public domain and several so-called RePEc services have been developed that use the metadata in various ways, such as an email dissemination service (NEP, http: //nep.repec.org/, see Chu andKrichel, 2003, andBarrueco Cruz andKrichel, 2005), websites with search and browsing functionalities (the most prominent ones being EconPapers, http: //econpapers.repec.org/, and IDEAS, http: //ideas.repec.org/), ...
... Este sistema podría crecer fácilmente con el uso de la profesión de la red. Dicho sistema sería de gran beneficio de la profesión" (KRICHEL, 1999, p.1 Krichel (1999;2005), advierten que las principales razones de utilizar ReDIF es por la sencillez que ofrece para la descripción bibliográfica, lo que permite elaborarlas por personal no especializado, considerando que Marc a veces es difícil de entender por los propios bibliotecarios. Otro punto es que los metadatos de las plantillas ReDIF permite la actualización de forma rápida. ...
Article
Full-text available
Introducción: El repositorio disciplinario Research Papers in Economics (RePEc) provee acceso a un amplio número de preprints, artículos de revistas, libros, capítulos de libros y software sobre la temática de las ciencias económicas y administrativas. Este repositorio agrega registros bibliográficos elaborados por diferentes universidades, institutos, editores y autores que trabajan de forma colaborativa siguiendo normas de organización documental. Objetivos: En este trabajo, principalmente, se identifica y analiza el funcionamiento de RePEc, lo que incluye la organización de los archivos, el cual se caracteriza por el uso del protocolo Guildford y las plantillas de metadatos ReDIF propias para la descripción documental. Metodología: Parte de esta investigación se estudió teóricamente en la literatura; otra parte se llevó a cabo mediante la observación de una serie de características visibles en la página web de RePEc y en los archivos de una revista que colabora en este repositorio. Resultados: Se encontró que el repositorio es un proyecto colaborativo descentralizado, además brinda varios servicios derivados del análisis de metadatos. Conclusiones: Se concluye que las plantillas ReDIF y el protocolo de comunicación Guildford son elementos clave para organizar los registros en RePEc, además existe una similitud con los metadatos de Dublin Core.
... On May 10th 2015 after 3 months of testing and experiments with the new facilities for authors to enrich metadata of their publications there are only 211 semantic linkages recently created by registered authors. Among other linkages the biggest groups are: about 7,3 million citation linkages imported from the CitEc database [1]; about 1,3 million "publication" linkages from researchers' personal profiles to their publications and about 60 thousands "person" linkages from organizational profiles to researchers' profiles; and etc. ...
Conference Paper
Full-text available
Our paper presents a semantically enrichable type of research information systems, which differs from the traditional one by allowing users to create semantic linkages between information objects and to enrich by this the initial content. This approach was implemented as a whole ecosystem of tools and services at the SocioNet research information system, which is publicly available for the research community. Making semantic linkages, the SocioNet users create over metadata from its content a semantic layer that visualizes their scientific knowledge or hypothesis about relationships between research outputs. Such facilities, in particular, open new opportunities for authors of research outputs. Authors can essentially enrich metadata of their research outputs after the papers have been published and have become available at the SocioNet content. Authors can provide comments and notes for updating publication abstracts, data about motivations for citing the papers in the reference lists, research association with newer relevant publications, data about their personal roles and contributions into the collective research outputs, etc.
... One of the few that is currently updated is CitEc at http://citec.repec.org/. CitEc regularly processes full texts of publications linked from metadata collected at RePEc [2]. At the end of July 2014 CitEc recognized more than 6300K citations between RePEc documents. ...
Conference Paper
Full-text available
A CRIS system with implemented semantic linkage technique opens new opportunities for authors of research outputs. Using facilities of such a CRIS authors can essentially enrich metadata of their research outputs after the papers have been published and have become available at the CRIS. The proposed approach allows for authors providing some data for enrichment purposes. These data include their personal roles and contributions into the collective research, comments and notes for updating publication abstracts, data about motivations for citing the papers in the reference lists, research association with newer relevant publications, etc. Technically authors make such enrichment as a semantic linking of related publications’ metadata. They don’t overwrite the initial publishers’ metadata. In the paper we present this approach and its pilot implementation at the Socionet CRIS with focus on servicing the big international community of RePEc users. We expect that results of such enrichment will improve a professional recognition of “who did what” and “how research outputs were used”. Also we expect interest from research management and evaluation organizations.
... The biggest part of them -about 5Mis harvested with regular updates from the CitEc.repec.org system (Barrueco and Krichel 2005). All of these linkages have a semantic meaning just as the "citation", but authors of these citation linkages can semantically enrich them by using Socionet tools and semantic vocabularies presented above. ...
Article
Full-text available
A growing number of research information systems use a semantic linkage technique to represent in explicit mode information about relationships between elements of its content. This practice is coming nowadays to a maturity when already existed data on semantically linked research objects and expressed by this scientific relationships can be recognized as a new data source for scientometric studies. Recent activities to provide scientists with tools for expressing in a form of semantic linkages their knowledge, hypotheses and opinions about relationships between available information objects also support this trend. The study presents one of such activities performed within the Socionet research information system with a special focus on (a) taxonomy of scientific relationships, which can exist between research objects, especially between research outputs; and (b) a semantic segment of a research e-infrastructure that includes a semantic interoperability support, a monitoring of changes in linkages and linked objects, notifications and a new model of scientific communication, and at last--scientometric indicators built by processing of semantic linkages data. Based on knowledge what is a semantic linkage data and how it is stored in a research information system we propose an abstract computing model of a new data source. This model helps with better understanding what new indicators can be designed for scientometric studies. Using current semantic linkages data collected in Socionet we present some statistical experiments, including examples of indicators based on two data sets: (a) what objects are linked and (b) what scientific relationships (semantics) are expressed by the linkages.
... Instead, the data is put in the public domain and several so-called RePEc services have been developed that use the metadata in various ways, such as an email dissemination service (NEP, http://nep.repec.org/, see Chu andKrichel, 2003, andKrichel, 2005), websites with search and browsing functionalities (the most prominent ones being Econ-Papers, http://econpapers.repec.org/, and IDEAS, http://ideas.repec.org/), ...
Article
Identifying authorship correctly and efficiently is a difficult problem when the literature is abundant, but poorly recorded. Homonyms are tedious to differentiate. This paper describes how the field of economics has organized itself with respect to author identification. We describe the RePEc project with a special emphasis on the RePEc Author Service. We then discuss how the concept is currently being expanded to the entire scientific body with the AuthorClaim project.
... La descripción del proceso de creación de un índice de citas autónomo queda fuera del alcance de este trabajo debido a los diferentes enfoques desde los que se puede abordar. En Barrueco (2005) ya se ha analizado uno de ellos. En general dicho proceso tiene tres etapas: 1. Recolección. ...
Article
Full-text available
Entre los retos que tienen planteados los repositorios institucionales está el demostrar y cuantificar con datos objetivos que los trabajos disponibles en abierto se citan y se utilizan más que el resto. Algunos repositorios están incluyendo análisis del uso de sus documentos. También existen proyectos a nivel internacional dedicados a la elaboración de índices de citas. De momento estas iniciativas son aisladas. Para obtener una evaluación precisa será necesario integrar los resultados procedentes de distintas instituciones y disciplinas tendentes a obtener indicadores globales que permitan la comparación entre autores, instituciones, etc. En este trabajo se presenta una propuesta de arquitectura destinada a permitir la recopilación, distribución y agregación de los datos necesarios para llevar a cabo una medición del uso e impacto de los trabajos almacenados en repositorios institucionales.One of the challenges faced by institutional repositories is to demonstrate and quantify with objective data that open access documents are cited and used more than other papers. Some repositories already include analyses of the use of their documents. In addition, several international projects focus on compiling citation indexes. However, to date these initiatives have been isolated. To accurately assess open access, results from different institutions and disciplines must be combined to obtain overall indicators that enable comparisons to be made between authors, institutions, etc. In this paper, we present a proposal for architecture that facilitates the collection, distribution and aggregation of the data required to measure the impact of papers stored in international repositories.
... The precise details of this base are beyond the scope of this paper. Secondly, we have a series of three software modules, one for each step in the reference linking process (Barrueco, 2005): ...
Article
Full-text available
Citation indexes are key tools in the science communication system for two reasons. Firstly, they are an excellent information source for searching the scientific literature since they enable navigation through links between documents represented by bibliographic references. Secondly, they allow the evaluation of the scientific production. Citations count is a usual procedure to evaluate the quality of a research paper. In Spain, this evaluation can only be carried out using tools elaborated by the ISI which have a limited coverage of journals published outside Anglo – Saxon countries. In this way, the evaluation of the Spanish scientific production is limited to works published in international journals. There is no tool for the evaluation of research (mainly in Social Sciences and Humanities) published in local journals. With the INCISO research project we will investigate the possibility of create a citation index by automatic means. The deliverable of the project will be software to automatically create citation indexes and a sample citation index for social sciences. Ministerio de Educación y Ciencia.Proyecto de investigación HUM2004-05532 Peer reviewed
... Jose Manuel Barrueco Cruz is the creator of CitEc at http://citec.repec.org, see [3]. This is a citation index for Economics based on documents available in the RePEc digital library as described at http://repec.org. ...
Article
Full-text available
We have developed a system that can elaborate a citation index in an automated way. It has been tested with Spanish journals. We need evaluate our system, mainly in effectiveness of the retrieval of citations. Criteria for evaluation of the system is presented and discussed, and the results of the application to our system are showed and analyzed. Ministerio de Educación y Ciencia. Proyecto de investigación HUM2004-05532 Peer reviewed
... 26 See http://citec.repec.org/ or Barrueco Cruz and Krichel (2005). 27 Impact factors are computed using various methods. ...
Article
In this paper, we study the citation decision of a scientific author. By citing a related work, authors can make their arguments more persuasive. We call this the correlation effect. But if authors cite other work, they may give the impression that they think the cited work is more competent than theirs. We call this the reputation effect. These two effects may be the main sources of citation bias. We empirically show that there is a citation bias in Economics by using data from RePEc. We also report how the citation bias differs across regions (U.S., Europe and Asia).
Conference Paper
During her lifetime, Pina Bausch had already started to collect material containing her work and in this laid the foundations for an archive. For preserving this cultural heritage in the area of performing arts it was of special interest to integrate ideational resources such as memory fragments or oral storytelling as well as to offer flexible knowledge exploration experiences. Therefore, the digital Pina Bausch archive is realized as a Linked Data archive containing data on various different materials such as manuscripts, choreography notes, programs, photographs, posters, drawings, videos and even oral history related to Pina Bausch’s work. In this paper, an insight into the used techniques is presented together with the modeling approach based on FRBR and a machine-readable Dublin Core application profile specifically adapted for managing the archive.
Article
Full-text available
This paper describes the results of an analysis of Italian Working papers (WP) available both in RePEc (Research Papers in Economics) and in Institutional Repositories (IR) and websites. Given that RePEc is a disciplinary repository based on the active involvement of economic institutions, rather than authors, our analysis intends to explore the institutions' propensity for making their collections available both in disciplinary and Institutional repositories. Therefore, the paper provides a profile of the Italian Economics institutions participating in RePEc as well as an in-depth analysis of the their availability WPs and WP series. Moreover, IRs and websites of the Italian institutions participating in RePEc were analysed to compare the scientific contents available in these important sources of free access information (RePEc, IRs and websites).
ResearchGate has not been able to resolve any references for this publication.