Peter Murray-Rust’s research while affiliated with University of Cambridge and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (209)


Image credit: DEI.
Concept for a federated scholarly information network. A federated network of institutional repositories constitutes the underlying infrastructure. Ideally, this infrastructure is designed redundantly, such that large fractions of nodes may go offline and the remaining nodes still provide 100% of the content. Users only directly interact with the output and narrative layers. The output layer contains all research objects, text, data and code. The narrative layer combines research objects in various forms, including research articles. The community layer encompasses the social technologies we are referring to in this article. See also our companion publication [6]. Modified from [49].
Mastodon over Mammon: towards publicly owned scholarly knowledge
  • Literature Review
  • Full-text available

July 2023

·

153 Reads

·

10 Citations

·

·

Peter Murray-Rust

·

[...]

·

Twitter is in turmoil and the scholarly community on the platform is once again starting to migrate. As with the early internet, scholarly organizations are at the forefront of developing and implementing a decentralized alternative to Twitter, Mastodon. Both historically and conceptually, this is not a new situation for the scholarly community. Historically, scholars were forced to leave social media platform FriendFeed after it was bought by Facebook in 2006. Conceptually, the problems associated with public scholarly discourse subjected to the whims of corporate owners are not unlike those of scholarly journals owned by monopolistic corporations: in both cases the perils associated with a public good in private hands are palpable. For both short form (Twitter/Mastodon) and longer form (journals) scholarly discourse, decentralized solutions exist, some of which are already enjoying some institutional support. Here we argue that scholarly organizations, in particular learned societies, are now facing a golden opportunity to rethink their hesitations towards such alternatives and support the migration of the scholarly community from Twitter to Mastodon by hosting Mastodon instances. Demonstrating that the scholarly community is capable of creating a truly public square for scholarly discourse, impervious to private takeover, might renew confidence and inspire the community to focus on analogous solutions for the remaining scholarly record—encompassing text, data and code—to safeguard all publicly owned scholarly knowledge.

Download

Mining the literature for ethics statements: A step towards standardizing research ethics

December 2022

·

466 Reads

·

1 Citation

Ethical aspects of research continue to gain attention, be that in the process of proposing and planning research or performing, documenting or publishing it. One of the ways in which this trend manifests itself is the increasingly common addition of ethics statements to publications in fields like biomedicine, psychology or ethnography. Such ethics statements in publications provide the reader with a window into some of the practical yet typically hidden aspects of research ethics. As more and more publications are becoming available in full text and in machine readable formats through repositories like Europe PubMed Central, we propose to mine the literature for ethics statements and to extract information about the various aspects of research ethics that they address. The more standardized these statements are, the better the mined materials can be converted into structured and queryable information that can in turn be used to inform efforts towards higher levels of standardization in research ethics. This paper sketches out the motivation for such mining and outlines some methodological approaches that could be leveraged towards this end.


Figure 1. Ethics Statement from Cui et al. (2021) with putative markup of some key elements. Colors indicate the legal basis (pink), some boilerplate language pertaining to ethical review, approval and permissions (purple), oversight body (yellow) and approval number (green) as well as the aspect of the research that triggered the need for ethical oversight (grey).
Mining the literature for ethics statements: a step towards standardizing research ethics

September 2022

·

23 Reads

·

1 Citation

Ethical aspects of research are steadily receiving more attention, from descriptions of research that is proposed to be done to the documentation of ongoing research to reports about research already performed. One of the ways in which this trend manifests itself is the increasingly common addition of ethics statements to publications in fields like biomedicine, psychology or ethnography. Such ethics statements in publications provide the reader with a window into some of the practical yet typically hidden aspects of research ethics. As more and more publications are becoming available in full text and in machine readable formats through repositories like Europe PubMed Central, we propose to mine the literature for ethics statements and to extract information about the various aspects of research ethics that they address. The more standardized these statements are, the better the mined materials can be converted into structured and queryable information that can in turn be used to inform efforts towards higher levels of standardization in research ethics. This paper sketches out the motivation for such mining and outlines some methodological approaches that could be leveraged towards this end.


Unpacking IPCC and IPBES Reports

September 2022

·

57 Reads

Biodiversity Information Science and Standards

Humanity is facing a set of existential challenges, including the handling of the parallel and interconnected crises of climate change and biodiversity loss. In an effort to address these challenges, international bodies like the Intergovernmental Panel on Climate Change (IPCC) and the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) have been created. These bodies are producing a series of reports that compile scientific information on specific aspects of climate and biodiversity research and discuss policy options and other social implications. So far, these reports have been provided in formats that make it hard to mobilize the knowledge encapsulated in them. Our contribution demonstrates technical workflows for achieving such mobilization by mining the reports. Development of these workflows is spearheaded by young researchers from the SemanticClimate team of interns based at the National Institute of Plant Genome Research (NIPGR) in New Delhi, India, and volunteers from all over the world. The tool chain includes methods for cleaning up the formatting, for extracting and processing raw text, tables and figures and annotating them semantically with the help of controlled vocabularies, ontologies and Wikidata. The semantic information and the mined information can then be combined in a way that iteratively improves both, eventually resulting in versions of the reports wherein entities like species, countries or references are semantically marked up and rendered in responsive formats. Our framework supports multilingual and specialist interests (e.g., endangered species, plant chemistry), and we will also briefly discuss how the use of standard open licensing could further contribute to mobilizing information from the reports. We are keen to work with other groups sharing these interests as well as with the teams involved in producing the reports.




Figure 6. The consensus supertree produced from an analysis of 924 source trees from the journal IJSEM. 
A machine-compiled microbial supertree from figure-mining thousands of papers

May 2017

·

97 Reads

·

6 Citations

Background There is a huge diversity of microbial taxa, the majority of which have yet to be fully characterized or described. Plant, animal and fungal taxa are formally named and described in numerous vehicles. For prokaryotes, by constrast, all new validly described taxa appear in just one repository: the International Journal of Systematics and Evolutionary Microbiology (IJSEM). This is the official journal of record for bacterial names of the International Committee on Systematics of Prokaryotes (ICSP) of the International Union of Microbiological Societies (IUMS). It also covers the systematics of yeasts. This makes IJSEM an excellent candidate against which to test systems for the automated and semi-automated synthesis of published phylogenies. New information In this paper we apply computer vision techniques to automatically convert phylogenetic tree figure images from IJSEM back into re-usable, computable, phylogenetic data in the form of Newick strings and NEXML. Furthermore, we go on to use the extracted phylogenetic data to compute a formal phylogenetic MRP supertree synthesis, and we compare this to previous hypotheses of taxon relationships given by NCBI’s standard taxonomy tree. This is the world’s first attempt at automated supertree construction using data exclusively extracted by machines from published figure images. Additionally we reflect on how recent changes to UK copyright law have enabled this project to go ahead without requiring permission from copyright holders, and the related challenges and limitations of doing research on copyright-restricted material.


A day in the life of : A content miner and team

July 2016

·

39 Reads

·

1 Citation

Insights the UKSG journal

It’s tough for Peter getting out of bed today – yesterday he travelled to Brussels and back, fighting for ‘The Right to Read is the Right to Mine’ (R2RR2M). Content mining – also known as text and data mining (TDM) – is a hot topic in Europe. It’s got huge promise, with two million scholarly publications a year and so much data that we can’t take it all in – 5,000 papers a day (and grey literature, and theses, and…). So we must have machines to help.




Citations (71)


... Mastodon is a decentralized microblogging social media which operates similarly to X (formerly known as Twitter) and to that of web or email servers (Brembs et al., 2023). Many of Mastodon users have decided to choose Linux as an alternative operating system and many of them ISSN (Print) 2355-6579 | ISSN (Online) 2528-2247 http://ejournal.bsi.ac.id/ejurnal/index.php/ji ...

Reference:

Optimizing Sentiment Analysis on the Linux Desktop Using N-Gram Features
Mastodon over Mammon: towards publicly owned scholarly knowledge

... In the second part, the poster explores what the benefits and risks would be of making more use of FAIR Digital Objects in research ethics workflows (Hegde et al. 2022). The components considered include the circumstances suggesting or even requiring an ethical review, the types of information that need to be exchanged during the process, the types of communications set up to convey said information, the stakeholders involved in any part of the process, the ways in which metadata about the process is stored and shared, and rules that govern any of these aspects and related matters. ...

Mining the literature for ethics statements: a step towards standardizing research ethics

... Furthermore, there is merit in the idea of making Jupyter notebooks or similar environments for combining computational and narrative elements a publication type of their own. This is already the case in some places, as examplified by [101] or [102] in the Journal of Open Source Software. ...

pygetpapers: a Python library for automated retrieval of scientific literature

The Journal of Open Source Software

... In October 2022, the German Society of Sport Science (dvs) set up an ad hoc committee "Research Data Management" (RDM) with the aim, among others, to develop RDM guidelines for the German sports science community 1 . Setting up this committee came at a time when FAIR and open research data management practices (see Wilkinson et al., 2016;Murray-Rust, 2008 for definitions, respectively) became increasingly acknowledged by the (sports) science communities (e.g. Caldwell et al., 2020), and was accompanied by recent developments within the German sports science community, e.g. the release of the sports science research data repository MO|RE data 2 (Klemm et al., 2024a), highlighting the potential of sustainable sports science RDM (Krüger, Biniossek, Stocker, & Betz, 2023). ...

Open Data in Science

Nature Precedings

... 52 In contrast, the UK and Japan have explicit exceptions to copyright for content mining, and EU member states have to at least permit copying for noncommercial research or private studies. 53,54 Several resources besides the major scientific publishers are available for data mining ( Most available data sources also provide automated access through an API, simplifying data collection. However, Fig. 3 Data extraction workflow. ...

Responsible Content Mining
  • Citing Chapter
  • January 2016

... Automated (software-based) transcription tools have been developed, although their failure rate is reported to be high (Stoltzfus et al., 2012), with the few successful applications being on sets of tree figures from certain journals that have policies for standardized plotting (Mounce et al., 2017). With recent advances in machine learning we assume that the various requirements will soon converge, enabling development of sophisticated and robust transcription solutions; in the meantime, manual transcription is a viable option for smaller phylogenies. ...

A machine-compiled microbial supertree from figure-mining thousands of papers

... In order to avoid multiple-inheritance, some researchers took an approach that they developed ontologies using single-inheritance and reorganized them by reasoning using a DL-reasoner [11]. It corresponds to reorganization of is-a hierarchy based on a transcriptional hierarchy in step 4 of the proposed method. ...

ChemAxiom – An Ontological Framework for Chemistry in Science
  • Citing Article
  • September 2009

Nature Precedings

... The authors utilized Latent Dirichlet Allocation (LDA) to identify latent topics diachronically and to identify representative dissertations of those topics". Morgan et al. (2008), used "OSCAR3, an Open Source chemistry text-mining tool, to parse and extract data from theses in PDF, and from theses in Office Open XML document format". Brook et al. (2014), emphasized on the fact that the "main barriers against the uptake of TDM were not technical, but, primarily a lack of awareness among the academics, and a skills gap. ...

Extracting and re-using research data from chemistry e-theses: the SPECTRa-T project
  • Citing Article
  • June 2008

... LDT's background included exposure to writing studies education as well as training in professional aspects of biomedical writing for expert audiences [26]. Unlike the type of training suggested in work like the ACS Style Guide [27] or Write Like a Chemist [3], this perspective is multidisciplinary and therefore necessarily more flexible because chemistry is only one of the sciences represented. Furthermore, biomedical writing is also intended to reach various audiences outside the sciences, such as patients and caregivers. ...

The ACS style guide: Effective communication of scientific information
  • Citing Article
  • January 2006

ACS Symposium Series

... Making chemical databases more FAIR (findable, accessible, interoperable, and reusable) and openlicensed benefits both computational chemistry and cheminformatics [1,2]. The discussion of the need for FAIR and Open in chemistry has been ongoing for some time now [3,4]. ...

Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

Journal of Cheminformatics