Philipp Mayr

Philipp Mayr
GESIS - Leibniz-Institute for the Social Sciences | GESIS · Department of Knowledge Technologies for the Social Sciences

PhD

About

272
Publications
37,592
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,489
Citations
Introduction
My research interests are non-textual ranking in digital libraries, bibliometric methods, evaluation of information systems, applied informetrics.
Additional affiliations
November 2004 - present
GESIS - Leibniz-Institute for the Social Sciences
Position
  • Group Leader
January 1998 - July 2000
Humboldt-Universität zu Berlin
Position
  • Tutor
Education
January 2005 - December 2008
Humboldt-Universität zu Berlin
Field of study
  • Information Science

Publications

Publications (272)
Preprint
BACKGROUND Recent advances in large language models (LLMs) have shown remarkable performance on various downstream tasks in zero- and few-shot scenarios, shedding light on named entity recognition (NER) in low-resource domains. Traditional Chinese medicine (TCM) against COVID-19 has been a new research topic and led to niche research literature. NE...
Article
Full-text available
Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can hel...
Article
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Preprint
Full-text available
Purpose: The recent proliferation of preprints could be a way for researchers worldwide to increase the availability and visibility of their research findings. Against the background of rising publication costs caused by the increasing prevalence of article processing fees, the search for other ways to publish research results besides traditional j...
Book
The Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023; https://eeke-workshop.github.io/) and the 3rd AI + Informetrics (AII2023; https://ai-informetrics.github.io/) was held at Santa Fe, New Mexico, USA and online, co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2...
Preprint
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Preprint
Full-text available
Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can hel...
Article
Full-text available
Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories...
Article
Full-text available
In 2014, a union of German research organisations established Projekt DEAL, a national-level project to negotiate licensing agreements with large scientific publishers. Negotiations between DEAL and Elsevier began in 2016, and broke down without a successful agreement in 2018; in this time, around 200 German research institutions cancelled their li...
Preprint
Full-text available
Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories...
Article
Full-text available
Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate t...
Chapter
The 13th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series will take place at ECIR 2023 as a full-day workshop. BIR tackles issues related to, for instance, academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scient...
Article
Full-text available
Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher’s collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research...
Article
Full-text available
Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same jour...
Preprint
Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher's collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research...
Conference Paper
Full-text available
The paper outlines the motivation and build-up of the DFG-funded GEOcite project at University of Passau. The project works on a domain-specific approach to automatically extract, segment, match and visualize literature references in the German speaking geography domain with the objective to provide a novel basis for a scientometric monitoring inst...
Preprint
In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documen...
Preprint
Full-text available
Nowadays there is a growing trend in many scientific disciplines to support researchers by providing enhanced information access through linking of publications and underlying datasets, so as to support research with infrastructure to enhance reproducibility and reusability of research results. In this research note, we present an overview of an on...
Preprint
Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate t...
Preprint
Full-text available
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the ack...
Conference Paper
Full-text available
The 3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2022) was held online at the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2022. The goal of this workshop series (https://eekeworkshop.github.io/) is to engage the related communities in open problems in the extraction and evaluation of know...
Preprint
Full-text available
In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the eas...
Article
Full-text available
International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. By studying these differences, we can detect inequalities in acce...
Chapter
The 12th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series is a full-day ECIR 2022 workshop. BIR tackles issues related to, for instance, academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scientific event, BIR bri...
Preprint
Full-text available
International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. The aim of this study is twofold. First, we investigate to what e...
Preprint
Full-text available
Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same jour...
Preprint
Full-text available
Automatic processing of bibliographic data becomes very important in digital libraries, data science and machine learning due to its importance in keeping pace with the significant increase of published papers every year from one side and to the inherent challenges from the other side. This processing has several aspects including but not limited t...
Preprint
Full-text available
In this demo paper, we present ConSTR, a novel Contextual Search Term Recommender that utilises the user's interaction context for search term recommendation and literature retrieval. ConSTR integrates a two-layered recommendation interface: the first layer suggests terms with respect to a user's current search term, and the second layer suggests t...
Article
The 11th Bibliometric-enhanced Information Retrieval Workshop (BIR 2021) was held online on April 1st, 2021, at ECIR 2021 as a virtual event. The interdisciplinary BIR workshop series aims to bring together researchers from different communities, especially Scientometrics/Bibliometrics and Information Retrieval. We report on the 11th BIR, its invit...
Preprint
Full-text available
In 2014, a union of German research organisations established Projekt DEAL, a national-level project to negotiate licensing agreements with large scientific publishers. Negotiations between DEAL and Elsevier began in 2016, and broke down without a successful agreement in 2018; in this time, around 200 German research institutions cancelled their li...
Article
Full-text available
This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010–2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allowed us to identi...
Article
Full-text available
In recent years, increased stakeholder pressure to transition research to Open Access has led to many journals converting, or ‘flipping’, from a closed access (CA) to an open access (OA) publishing model. Changing the publishing model can influence the decision of authors to submit their papers to a journal, and increased article accessibility may...
Article
Full-text available
Traditionally, Web of Science and Scopus have been the two most widely used databases for bibliometric analyses. However, during the last few years some new scholarly databases, such as Dimensions, have come up. Several previous studies have compared different databases, either through a direct comparison of article coverage or by comparing the cit...
Preprint
Full-text available
In recent years, increased stakeholder pressure to transition research to Open Access has led to many journals converting, or 'flipping', from a closed access (CA) to an open access (OA) publishing model. Changing the publishing model can influence the decision of authors to submit their papers to a journal, and increased article accessibility may...
Chapter
The Bibliometric-enhanced Information Retrieval (BIR) workshop series at ECIR tackles issues related to academic search, at the intersection of Information Retrieval, Natural Language Processing and Bibliometrics. BIR is a hot topic investigated by both academia and the industry. In this overview paper, we summarize the 11th iteration of the worksh...
Chapter
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Preprint
Traditionally, Web of Science and Scopus have been the two most widely used databases for bibliometric analyses. However, during the last few years some new scholarly databases, such as Dimensions, have come up. Several previous studies have compared different databases, either through a direct comparison of article coverage or by comparing the cit...
Chapter
Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and...
Preprint
Full-text available
The goal of this workshop is to engage the related communities in open problems in the extraction and evaluation of knowledge entities from scientific documents. This workshop entitles this cutting-edge and cross-disciplinary direction Extraction and Evaluation of Knowledge Entity (EEKE), highlighting the development of intelligent methods for iden...
Preprint
Full-text available
This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010-2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allows to identify d...
Preprint
Full-text available
Secondary analysis or the reuse of existing data is a common practice among social scientists. The complexity of datasets, however, exceeds those known from traditional document retrieval. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Due to the div...
Article
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 [Mayr et al., 2014] and it was held at ECIR each year since then. This year we organized the 10th iteration of BIR as an all-virtual workshop with a peak of 97 participants. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic...
Article
Full-text available
ECIR 2020 ¹ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organising the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organisers and th...
Preprint
Full-text available
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Preprint
Full-text available
ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the worksh...
Article
A potential motivation for scientists to deposit their scientific work as preprints is to enhance its citation or social impact. In this study we assessed the citation and altmetric advantage of bioRxiv, a preprint server for the biological sciences. We retrieved metadata of all bioRxiv preprints deposited between November 2013 and December 2017, a...
Chapter
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 [19] and it was held at ECIR each year since then. This year we organize the 10th iteration of BIR. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic search, at the crossroads between Information Retrieval, Natural Language...
Article
Full-text available
Citation metrics have value because they aim to make scientific assessment a level playing field, but urgent transparency-based adjustments are necessary to ensure that measurements yield the most accurate picture of impact and excellence. One problematic area is the handling of self-citations, which are either excluded or inappropriately accounted...
Preprint
The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 \cite{MayrEtAl2014} and it was held at ECIR each year since then. This year we organize the 10th iteration of BIR. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic search, at the crossroads between Information Retrieval, N...
Preprint
Full-text available
Citation metrics have value because they aim to make scientific assessment a level playing field, but urgent transparency-based adjustments are necessary to ensure that measurements yield the most accurate picture of impact and excellence. One problematic area is the handling of self-citations, which are either excluded or inappropriately accounted...
Article
The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and reco...
Preprint
Full-text available
The popular social media platforms are now making it possible for scholarly articles to be shared rapidly in different forms, which in turn can significantly improve the visibility and reach of articles. Many authors are now utilizing the social media platforms to disseminate their scholarly articles (often as pre- or post- prints) beyond the paywa...
Preprint
The Bibliometric-enhanced Information Retrieval workshop series (BIR) at ECIR tackled issues related to academic search, at the crossroads between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia (e.g., ArnetMiner, CiteSeerx, DocEar) and the industry (e.g., Google Scholar, Microsoft Academic Search, Semantic...
Preprint
Full-text available
Scholarly articles are now increasingly being mentioned and discussed in social media platforms, sometimes even as pre- or post-print version uploads. Measures of social media mentions and coverage are now emerging as an alternative indicator of impact of scholarly articles. This article aims to explore how much scholarly research output from India...
Conference Paper
Full-text available
The popular social media platforms are now making it possible for scholarly articles to be shared rapidly in different forms, which in turn can significantly improve the visibility and reach of articles. Many authors are now utilizing the social media platforms to disseminate their scholarly articles (often as pre-or post-prints) beyond the paywall...