Philipp Mayr

Philipp Mayr
  • PhD
  • Group Leader at GESIS - Leibniz-Institute for the Social Sciences

About

295
Publications
46,284
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,816
Citations
Introduction
My research interests are non-textual ranking in digital libraries, bibliometric methods, evaluation of information systems, applied informetrics.
Current institution
GESIS - Leibniz-Institute for the Social Sciences
Current position
  • Group Leader
Additional affiliations
January 1998 - July 2000
Humboldt-Universität zu Berlin
Position
  • Tutor
November 2004 - present
GESIS - Leibniz-Institute for the Social Sciences
Position
  • Group Leader
Education
January 2005 - December 2008
Humboldt-Universität zu Berlin
Field of study
  • Information Science

Publications

Publications (295)
Article
Full-text available
OpenAlex is a promising open source of scholarly metadata, and competitor to established proprietary sources, such as the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rap...
Preprint
In this research-in-progress paper, we apply a computational measure correlating with originality from creativity science: Divergent Semantic Integration (DSI), to a selection of 99,557 scientific abstracts and titles selected from the Web of Science. We observe statistically significant differences in DSI between subject and field of research, and...
Book
The Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2024; https://eeke-workshop.github.io/) and the 4th AI + Informetrics (AII2024; https://ai-informetrics.github.io/) was held in Changchun, China and online, co-located with the iConference2024. The two workshop series are designed to activel...
Preprint
Full-text available
This study compares and analyses publication and document types in the following bibliographic databases: OpenAlex, Scopus, Web of Science, Semantic Scholar and PubMed. The results demonstrate that typologies can differ considerably between individual database providers. Moreover, the distinction between research and non-research texts, which is re...
Article
Full-text available
In this paper, we compare the performance of several popular pre-trained reference extraction and segmentation toolkits combined in different pipeline configurations on three different datasets. The extraction is end-to-end, i.e. the input is PDF documents, and the output is parsed reference objects. The evaluation is for reference strings and indi...
Chapter
The 14th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series takes place at ECIR 2024 as a full-day workshop. BIR addresses research topics related to academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scientific eve...
Chapter
The VADIS system addresses the demand of providing enhanced information access in the domain of the social sciences. This is achieved by allowing users to search and use survey variables in context of their underlying research data and scholarly publications which have been interlinked with each other.
Article
Purpose The recent proliferation of preprints could be a way for researchers worldwide to increase the availability and visibility of their research findings. Against the background of rising publication costs caused by the increasing prevalence of article processing fees, the search for other ways to publish research results besides traditional jo...
Preprint
Objective: To explore and compare the performance of ChatGPT and other state-of-the-art LLMs on domain-specific NER tasks covering different entity types and domains in TCM against COVID-19 literature. Methods: We established a dataset of 389 articles on TCM against COVID-19, and manually annotated 48 of them with 6 types of entities belonging to 3...
Article
Full-text available
Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can hel...
Article
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Preprint
Full-text available
Purpose: The recent proliferation of preprints could be a way for researchers worldwide to increase the availability and visibility of their research findings. Against the background of rising publication costs caused by the increasing prevalence of article processing fees, the search for other ways to publish research results besides traditional j...
Book
The Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023; https://eeke-workshop.github.io/) and the 3rd AI + Informetrics (AII2023; https://ai-informetrics.github.io/) was held at Santa Fe, New Mexico, USA and online, co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2...
Preprint
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Preprint
Full-text available
Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can hel...
Article
Full-text available
Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories...
Article
Full-text available
In 2014, a union of German research organisations established Projekt DEAL, a national-level project to negotiate licensing agreements with large scientific publishers. Negotiations between DEAL and Elsevier began in 2016, and broke down without a successful agreement in 2018; in this time, around 200 German research institutions cancelled their li...
Preprint
Full-text available
Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories...
Article
Full-text available
Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate t...
Chapter
The 13th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series will take place at ECIR 2023 as a full-day workshop. BIR tackles issues related to, for instance, academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scient...
Article
Full-text available
Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher’s collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research...
Article
Full-text available
Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same jour...
Preprint
Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher's collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research...
Conference Paper
Full-text available
The paper outlines the motivation and build-up of the DFG-funded GEOcite project at University of Passau. The project works on a domain-specific approach to automatically extract, segment, match and visualize literature references in the German speaking geography domain with the objective to provide a novel basis for a scientometric monitoring inst...
Preprint
In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documen...
Preprint
Full-text available
Nowadays there is a growing trend in many scientific disciplines to support researchers by providing enhanced information access through linking of publications and underlying datasets, so as to support research with infrastructure to enhance reproducibility and reusability of research results. In this research note, we present an overview of an on...
Preprint
Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate t...
Preprint
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Conference Paper
Full-text available
The 3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2022) was held online at the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2022. The goal of this workshop series (https://eekeworkshop.github.io/) is to engage the related communities in open problems in the extraction and evaluation of know...
Article
Full-text available
The Research Topic on “Knowledge Discovery and Data Exploitation” aims at promoting interdisciplinary research in computational linguistics and in Natural Language Processing (NLP) applied to the fields of Bibliometrics, Scientometrics, and Information Retrieval. It is a follow-up of our previous Research Topic: “Mining Scientific Papers: NLP-enhan...
Preprint
Full-text available
In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the eas...
Article
Full-text available
International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. By studying these differences, we can detect inequalities in acce...
Chapter
The 12th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series is a full-day ECIR 2022 workshop. BIR tackles issues related to, for instance, academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scientific event, BIR bri...
Preprint
Full-text available
International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. The aim of this study is twofold. First, we investigate to what e...
Preprint
Full-text available
Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same jour...
Preprint
Full-text available
Automatic processing of bibliographic data becomes very important in digital libraries, data science and machine learning due to its importance in keeping pace with the significant increase of published papers every year from one side and to the inherent challenges from the other side. This processing has several aspects including but not limited t...
Preprint
Full-text available
In this demo paper, we present ConSTR, a novel Contextual Search Term Recommender that utilises the user's interaction context for search term recommendation and literature retrieval. ConSTR integrates a two-layered recommendation interface: the first layer suggests terms with respect to a user's current search term, and the second layer suggests t...
Article
The 11th Bibliometric-enhanced Information Retrieval Workshop (BIR 2021) was held online on April 1st, 2021, at ECIR 2021 as a virtual event. The interdisciplinary BIR workshop series aims to bring together researchers from different communities, especially Scientometrics/Bibliometrics and Information Retrieval. We report on the 11th BIR, its invit...
Preprint
Full-text available
In 2014, a union of German research organisations established Projekt DEAL, a national-level project to negotiate licensing agreements with large scientific publishers. Negotiations between DEAL and Elsevier began in 2016, and broke down without a successful agreement in 2018; in this time, around 200 German research institutions cancelled their li...
Article
Full-text available
This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010–2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allowed us to identi...
Article
Full-text available
In recent years, increased stakeholder pressure to transition research to Open Access has led to many journals converting, or ‘flipping’, from a closed access (CA) to an open access (OA) publishing model. Changing the publishing model can influence the decision of authors to submit their papers to a journal, and increased article accessibility may...
Article
Full-text available
Traditionally, Web of Science and Scopus have been the two most widely used databases for bibliometric analyses. However, during the last few years some new scholarly databases, such as Dimensions, have come up. Several previous studies have compared different databases, either through a direct comparison of article coverage or by comparing the cit...
Preprint
Full-text available
In recent years, increased stakeholder pressure to transition research to Open Access has led to many journals converting, or 'flipping', from a closed access (CA) to an open access (OA) publishing model. Changing the publishing model can influence the decision of authors to submit their papers to a journal, and increased article accessibility may...
Chapter
The Bibliometric-enhanced Information Retrieval (BIR) workshop series at ECIR tackles issues related to academic search, at the intersection of Information Retrieval, Natural Language Processing and Bibliometrics. BIR is a hot topic investigated by both academia and the industry. In this overview paper, we summarize the 11th iteration of the worksh...
Chapter
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Preprint
Full-text available
Traditionally, Web of Science and Scopus have been the two most widely used databases for bibliometric analyses. However, during the last few years some new scholarly databases, such as Dimensions, have come up. Several previous studies have compared different databases, either through a direct comparison of article coverage or by comparing the cit...
Chapter
Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and...
Preprint
Full-text available
The goal of this workshop is to engage the related communities in open problems in the extraction and evaluation of knowledge entities from scientific documents. This workshop entitles this cutting-edge and cross-disciplinary direction Extraction and Evaluation of Knowledge Entity (EEKE), highlighting the development of intelligent methods for iden...
Preprint
Full-text available
This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010-2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allows to identify d...
Preprint
Full-text available
Secondary analysis or the reuse of existing data is a common practice among social scientists. The complexity of datasets, however, exceeds those known from traditional document retrieval. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Due to the div...
Article
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 [Mayr et al., 2014] and it was held at ECIR each year since then. This year we organized the 10th iteration of BIR as an all-virtual workshop with a peak of 97 participants. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic...
Article
Full-text available
ECIR 2020 ¹ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organising the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organisers and th...
Preprint
Full-text available
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Preprint
Full-text available
ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the worksh...
Article
A potential motivation for scientists to deposit their scientific work as preprints is to enhance its citation or social impact. In this study we assessed the citation and altmetric advantage of bioRxiv, a preprint server for the biological sciences. We retrieved metadata of all bioRxiv preprints deposited between November 2013 and December 2017, a...
Chapter
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 [19] and it was held at ECIR each year since then. This year we organize the 10th iteration of BIR. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic search, at the crossroads between Information Retrieval, Natural Language...
Article
Full-text available
Citation metrics have value because they aim to make scientific assessment a level playing field, but urgent transparency-based adjustments are necessary to ensure that measurements yield the most accurate picture of impact and excellence. One problematic area is the handling of self-citations, which are either excluded or inappropriately accounted...
Preprint
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 \cite{MayrEtAl2014} and it was held at ECIR each year since then. This year we organize the 10th iteration of BIR. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic search, at the crossroads between Information Retrieval, N...
Preprint
Full-text available
Citation metrics have value because they aim to make scientific assessment a level playing field, but urgent transparency-based adjustments are necessary to ensure that measurements yield the most accurate picture of impact and excellence. One problematic area is the handling of self-citations, which are either excluded or inappropriately accounted...
Article
The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and reco...
Preprint
Full-text available
The popular social media platforms are now making it possible for scholarly articles to be shared rapidly in different forms, which in turn can significantly improve the visibility and reach of articles. Many authors are now utilizing the social media platforms to disseminate their scholarly articles (often as pre- or post- prints) beyond the paywa...
Preprint
The Bibliometric-enhanced Information Retrieval workshop series (BIR) at ECIR tackled issues related to academic search, at the crossroads between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia (e.g., ArnetMiner, CiteSeerx, DocEar) and the industry (e.g., Google Scholar, Microsoft Academic Search, Semantic...
Preprint
Full-text available
Scholarly articles are now increasingly being mentioned and discussed in social media platforms, sometimes even as pre- or post-print version uploads. Measures of social media mentions and coverage are now emerging as an alternative indicator of impact of scholarly articles. This article aims to explore how much scholarly research output from India...
Conference Paper
Full-text available
The popular social media platforms are now making it possible for scholarly articles to be shared rapidly in different forms, which in turn can significantly improve the visibility and reach of articles. Many authors are now utilizing the social media platforms to disseminate their scholarly articles (often as pre-or post-prints) beyond the paywall...
Conference Paper
Full-text available
The popular social media platforms are now making it possible for scholarly articles to be shared rapidly in different forms, which in turn can significantly improve the visibility and reach of articles. Many authors are now utilizing the social media platforms to disseminate their scholarly articles (often as pre-or post-prints) beyond the paywall...
Article
Full-text available
Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival o...
Article
Full-text available
In this paper, we analyze a major part of the research output of the Networked Knowledge Organization Systems (NKOS) community in the period 2000 to 2016 from a network analytical perspective. We focus on the papers presented at the European and U.S. NKOS workshops and in addition four special issues on NKOS in the last 16 years. For this purpose,...
Chapter
In this paper, three different possible inputs (reference strings, reference segments and a combination of reference strings and segments) were tested to find the best performing strategy for citation matching. Our evaluation on a manually curated gold standard showed that the input data consisting of the combination of reference segments and refer...
Conference Paper
The deluge of scholarly publication poses a challenge for scholars find relevant research and policy makers to seek in-depth information and understand research impact. Information retrieval (IR), natural language processing (NLP) and bibliometrics could enhance scholarly search, retrieval and user experience, but their use in digital libraries is...
Preprint
Full-text available
In this paper, three different possible inputs (reference strings, reference segments and a combination of reference strings and segments) were tested to find the best performing strategy for citation matching. Our evaluation on a manually curated gold standard showed that the input data consisting of the combination of reference segments and refer...
Preprint
Full-text available
A potential motivation for scientists to deposit their scientific work as preprints is to enhance its citation or social impact, an effect which has been empirically observed for preprints in physics, astronomy and mathematics deposited to arXiv. In this study we assessed the citation and altmetric advantage of bioRxiv, a preprint server for the bi...
Preprint
Full-text available
Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and the quality of identified reference segments. The classic citation matching configuration used in this paper is the combination of blocking technique and a binary classifier. Three different possible inputs (re...
Conference Paper
Full-text available
This demo paper presents a generic toolchain to extract, segment and match literature references from full-text PDF files in the project EXCITE. The aim of EXCITE is extracting and matching citations from social science publications and making more citation data available to researchers. Each single step in the EXCITE pipeline and the open source t...
Article
The Bibliometric-enhanced Information Retrieval workshop series (BIR) at ECIR tackled issues related to academic search, at the crossroads between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia (e.g., ArnetMiner, CiteSeer X , Doc-Ear) and the industry (e.g., Google Scholar, Microsoft Academic Search, Seman...
Article
Full-text available
The Research Topic on “NLP-enhanced Bibliometrics” aims to promote interdisciplinary research in bibliometrics, Natural Language Processing (NLP) and computational linguistics in order to enhance the ways bibliometrics can benefit from large-scaletext analytics and sense mining ofpapers. The objectives of such research are to provide insights into...
Chapter
The Bibliometric-enhanced Information Retrieval workshop series (BIR) at ECIR tackles issues related to academic search, at the crossroads between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia (e.g., ArnetMiner, CiteSeer\(^\chi \), DocEar) and the industry (e.g., Google Scholar, Microsoft Academic Search,...
Preprint
Full-text available
In recent years, increased stakeholder pressure to transition research to Open Access has led to many journals "flipping" from a toll access to an open access publishing model. Changing the publishing model can influence the decision of authors to submit their papers to a journal, and increased article accessibility may influence citation behaviour...
Preprint
In this article, we describe highly cited publications in a PLOS ONE full-text corpus. For these publications, we analyse the citation contexts concerning their position in the text and their age at the time of citing. By selecting the perspective of highly cited papers, we can distinguish them based on the context during citation even if we do not...

Network

Cited By