Silvio Peroni

Silvio Peroni
University of Bologna | UNIBO · Department of Computer Science and Engineering DISI

Ph.D.

About

179
Publications
53,684
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,364
Citations
Introduction
Silvio Peroni holds a Ph.D. degree in Computer Science and he is a researcher at the University of Bologna. His research interests include: - Semantic Publishing; - Semantic Web technologies; - interfaces for semantic data; - design patterns for digital documents and ontology modelling; - markup languages for complex documents; - automatic processes of analysis and segmentation of documents.
Additional affiliations
April 2013 - present
Italian National Research Council
Position
  • Consultant
January 2012 - October 2015
University of Bologna
Position
  • PostDoc Position
November 2011 - October 2012
University of Oxford
Position
  • Consultant
Education
June 2010 - December 2010
University of Oxford
Field of study
  • Computer Science
January 2009 - May 2012
University of Bologna
Field of study
  • Computer Science
April 2008 - September 2008
The Open University (UK)
Field of study
  • Computer Science

Publications

Publications (179)
Article
In this paper we introduce the Publishing Workflow Ontology (PWO), i.e., an OWL 2 DL ontology for the description of workflows that is particularly suitable for formalising typical publishing processes such as the publication of articles in journals. We support the presentation with a discussion of all the ontology design patterns that have been re...
Article
Full-text available
In this paper we investigate whether it is possible to create a computational approach that allows us to distinguish topical tags (i.e. talking about the topic of a resource) and non-topical tags (i.e. describing aspects of a resource that are not related to its topic) in folksonomies, in a way that correlates with humans. Towards this goal, we col...
Article
Full-text available
Purpose. Citation data needs to be recognized as a part of the Commons – those works that are freely and legally available for sharing – and placed in an open repository. Design/methodology/approach. The Open Citation Corpus is a new open repository of scholarly citation data, made available under a Creative Commons CC0 1.0 public domain dedication...
Article
Full-text available
Ontologies are knowledge constructs essential for creation of the Web of Data. Good documentation is required to permit people to understand ontologies and thus employ them correctly, but this is costly to create by tradition authorship methods, and is thus inefficient to create in this way until an ontology has matured into a stable structure. We...
Article
Semantic publishing is the use of Web and Semantic Web technologies to enhance the meaning of a published journal article, to facilitate its automated discovery, to enable its linking to semantically related articles, to provide access to data within the article in actionable form, and to facilitate integration of data between articles. Recently, s...
Preprint
Full-text available
Our goal was to obtain the digital twin of the temporary exhibition "The Other Renaissance: Ulisse Aldrovandi and the Wonders of the World", to make it accessible online to users using various devices (from smartphones to VR headsets). We started with a preliminary assessment of the exhibition, focussing on possible acquisition constraints - time,...
Preprint
Full-text available
The aim of this work is to understand the retraction phenomenon in the arts and humanities domain through an analysis of the retraction notices: formal documents stating and describing the retraction of a particular publication. The retractions and the corresponding notices are identified using the data provided by Retraction Watch. Our methodology...
Preprint
Full-text available
The paper introduces a tool prototype that combines SHACL's capabilities with ad-hoc validation functions to create a controlled and user-friendly form interface for producing valid RDF data. The proposed tool is developed within the context of the OpenCitations Data Model (OCDM) use case. The paper discusses the current status of the tool, outline...
Preprint
Full-text available
OpenCitations Meta is a new database that contains bibliographic metadata of scholarly publications involved in citations indexed by the OpenCitations infrastructure. It adheres to Open Science principles and provides data under a CC0 license for maximum reuse. The data can be accessed through a SPARQL endpoint, REST APIs, and dumps. OpenCitations...
Preprint
Full-text available
The data within collections from all Digital Humanities fields must be trustworthy. To this end, both provenance and change-tracking systems are needed. This contribution offers a systematic review of the metadata representation models for provenance in RDF, focusing on the problem of modelling conjectures in humanistic data.
Preprint
Full-text available
The work presented in this paper is twofold. On the one hand, we aim to define the concept of semantic artefact catalogue (SAC) by overviewing various definitions used to clarify the meaning of our target of observation, including the meaning of the focal item: semantic artefacts. On the other hand, we aim to identify metrics and dimensions that ca...
Article
Full-text available
In this work, we investigate existing citation practices by analysing a huge set of articles published in journals to measure which metadata are used across the various scholarly disciplines, independently from the particular citation style adopted, for defining bibliographic reference. We selected the most cited journals in each of the 27 subject...
Article
Full-text available
In the past, several works have investigated ways for combining quantitative and qualitative methods in research assessment exercises. Indeed, the Italian National Scientific Qualification (NSQ), i.e. the national assessment exercise which aims at deciding whether a scholar can apply to professorial academic positions as Associate Professor and Ful...
Article
Full-text available
In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and a...
Preprint
Full-text available
This article introduces a methodology to perform live time-traversal queries on RDF datasets and software based on this procedure. It offers a solution to manage the provenance and change-tracking of entities described using RDF. Although these two aspects are crucial factors in ensuring verifiability and trust, some of the most prominent knowledge...
Chapter
This paper presents a methodology for designing a containerized and distributed open science infrastructure to simplify its reusability, replicability, and portability in different environments. The methodology is depicted in a step-by-step schema based on four main phases: (1) Analysis, (2) Design, (3) Definition, and (4) Managing and provisioning...
Article
Full-text available
The importance of open bibliographic repositories is widely accepted by the scientific community. For evaluation processes, however, there is still some skepticism: even if large repositories of open access articles and free publication indexes exist and are continuously growing, assessment procedures still rely on proprietary databases, mainly due...
Article
Full-text available
OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. OpenCitations collaborates with projects that are part of the Open Science ecosystem and complies with the UNESCO founding princip...
Article
Full-text available
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges su...
Article
Full-text available
Citation indexes are by now part of the research infrastructure in use by most scientists: a necessary tool in order to cope with the increasing amounts of scientific literature being published. Commercial citation indexes are designed for the sciences and have uneven coverage and unsatisfactory characteristics for humanities scholars, while no com...
Preprint
Full-text available
OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. OpenCitations collaborates with projects that are part of the Open Science ecosystem and complies with the UNESCO founding princip...
Article
Full-text available
This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing...
Preprint
Full-text available
This paper presents a methodology for designing a containerized and distributed open science infrastructure to simplify its reusability, replicability, and portability in different environments. The methodology is depicted in a step-by-step schema based on four main phases: (1) Analysis, (2) Design, (3) Definition, and (4) Managing and provisioning...
Article
Full-text available
This study presents the results of an experiment we performed to measure the coverage of Digital Humanities (DH) publications in mainstream open and proprietary bibliographic data sources, by further highlighting the relations among DH and other disciplines. We created a list of DH journals based on manual curation and bibliometric data. We used th...
Article
Digital archives of memory institutions are typically concerned with the cataloguing of artefacts of artistic, historical, and cultural value. Recently, new forms of citizen participation in cultural heritage have emerged, producing a wealth of material spanning from visitors’ experiential feedback on exhibitions and cultural artefacts to digitally...
Preprint
Full-text available
In this article, we show and discuss the results of a quantitative and qualitative analysis of citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annota...
Preprint
Full-text available
The importance of open bibliographic repositories is widely accepted by the scientific community. For evaluation processes, however, there is still some skepticism: even if large repositories of open access articles and free publication indexes exist and are continuously growing, assessment procedures still rely on proprietary databases in many cou...
Preprint
Full-text available
Citation indexes are by now part of the research infrastructure in use by most scientists: a necessary tool in order to cope with the increasing amounts of scientific literature being published. Commercial citation indexes are designed for the sciences and have uneven coverage and unsatisfactory characteristics for humanities scholars, while no com...
Article
Full-text available
Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of RDF data by common Web users, engineers and developers unfamiliar with Sem...
Preprint
Full-text available
Purpose. This study presents the results of an experiment we performed to measure the coverage of Digital Humanities (DH) publications in mainstream open and proprietary bibliographic data sources, by also highlighting the relation that exists between DH and other disciplines. Methodology. We created a list of DH journals based on manual curation a...
Article
Full-text available
In this article, we show the results of a quantitative and qualitative analysis of open citations on a popular and highly cited retracted paper: “Ileal-lymphoid-nodular hyperplasia, non-specific colitis and pervasive developmental disorder in children” by Wakefield et al . , published in 1998. The main purpose of our study is to understand the beha...
Preprint
Full-text available
Automatic processing of bibliographic data becomes very important in digital libraries, data science and machine learning due to its importance in keeping pace with the significant increase of published papers every year from one side and to the inherent challenges from the other side. This processing has several aspects including but not limited t...
Preprint
Full-text available
In the past, several works have investigated ways for combining quantitative and qualitative methods in research assessment exercises. In this work, we aim at introducing a methodology to explore whether citation-based metrics, calculated only considering open bibliographic and citation data, can yield insights on how human peer-review of research...
Preprint
Full-text available
In this article, we present a methodology which takes as input a collection of retracted articles, gathers the entities citing them, characterizes such entities according to multiple dimensions (disciplines, year of publication, sentiment, etc.), and applies a quantitative and qualitative analysis on the collected values. The methodology is compose...
Preprint
Full-text available
The need for open scientific knowledge graphs is ever increasing. While there are large repositories of open access articles and free publication indexes, there are still few free knowledge graphs exposing citation networks, and often their coverage is partial. Consequently, most evaluation processes based on citation counts rely on commercial cita...
Preprint
Full-text available
In the past, several works have investigated ways for combining quantitative and qualitative methods in research assessment exercises. Indeed, the Italian National Scientific Qualification (NSQ), i.e. the national assessment exercise which aims at deciding whether a scholar can apply to professorial academic positions as Associate Professor and Ful...
Preprint
Full-text available
In this article, we show the results of a quantitative and qualitative citation analysis on a popular and highly cited retracted paper: "Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children" by Wakefield et al., published in 1998. The main purpose of our study is to understand the behavior of th...
Preprint
Full-text available
Automatic text analysis methods, such as Topic Modelling, are gaining much attention in Humanities. However, scholars need to have extensive coding skills to use such methods appropriately. The need of having this technical expertise prevents the broad adoption of these methods in Humanities research. In this paper, to help scholars in the Humaniti...
Preprint
Full-text available
Ontology reuse aims to foster interoperability and facilitate knowledge reuse. Several approaches are typically evaluated by ontology engineers when bootstrapping a new project. However, current practices are often motivated by subjective, case-by-case decisions, which hamper the definition of a recommended behaviour. In this chapter we argue that...
Chapter
Ontology reuse aims to foster interoperability and facilitate knowledge reuse. Several approaches are typically evaluated by ontology engineers when bootstrapping a new project. However, current practices are often motivated by subjective, case-by-case decisions, which hamper the definition of a recommended behaviour. In this chapter we argue that...
Chapter
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Preprint
Full-text available
Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of these data by common Web users, engineers and develop-ers unfamiliar with...
Preprint
Full-text available
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data suppli...
Article
Full-text available
In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific...
Article
Full-text available
OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes. Open citation data are valuable for bibliometric analysis, increasing the reproducibility...
Article
Full-text available
In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived f...
Preprint
Full-text available
OpenCitations is a scholarly infrastructure organization dedicated to open scholarship and the publication of open bibliographic and citation data as Linked Open Data using Semantic Web technologies, to the development of software tools and services that enable convenient access to these open data, and to community advocacy for open citations. This...
Article
Full-text available
Background: The 2010 reform of the Italian university system introduced the National Scientific Habilitation (ASN) as a requirement for applying to permanent professor positions. Since the CVs of the 59,149 candidates and the results of their assessments have been made publicly available, the ASN constitutes an opportunity to perform analyses abou...
Preprint
Full-text available
In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived f...
Article
Full-text available
In this paper we introduce the latest version (Version 2.0) of OSCAR, the OpenCitations RDF Search Application, which has several improved features and extends the query workflow comparing with the previous version (Version 1.0) that we presented at the workshop entitled Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-S...
Preprint
Full-text available
In this work, we discuss the result of an experiment that wants to track how authors use self-citations in their articles. In particular, we have analysed a subset of all the articles published between 1959 and 2016 in ScienceDirect written by the participants to the 2012-2013 Italian Scientific Habilitation so as to see if their citation habits ha...
Preprint
Full-text available
Background. The 2010 reform of the Italian university system introduced the National Scientific Habilitation (ASN) as a requirement for applying to permanent professor positions. Since the CVs of the 59149 candidates and the results of their assessments have been made publicly available, the ASN constitutes an opportunity to perform analyses about...
Preprint
Background. The 2010 reform of the Italian university system introduced the National Scientific Habilitation (ASN) as a requirement for applying to permanent professor positions. Since the CVs of the 59149 candidates and the results of their assessments have been made publicly available, the ASN constitutes an opportunity to perform analyses about...
Preprint
Full-text available
The need for scholarly open data is ever increasing. While there are large repositories of open access articles and free publication indexes, there are still a few examples of free citation networks and their coverage is partial. One of the results is that most of the evaluation processes based on citation counts rely on commercial citation databas...
Preprint
Full-text available
In this paper, we analyse the current availability of open citations data in one particular dataset, namely COCI (the OpenCitations Index of Crossref open DOI-to-DOI citations; http://opencitations.net/index/coci) provided by OpenCitations. The results of these analyses show a persistent gap in the coverage of the currently available open citation...
Article
Over the last 20 years, the use of automated and semi-automated techniques for extracting meanings from text have been widely debated in the social sciences. Automated and semi-automated techniques can be employed in all research phases: data collection (e.g. scraping), data cleaning (e.g. lemmatization of words), analysis (e.g. Named Entity Recogn...
Preprint
Full-text available
Alternative metrics (aka altmetrics) are gaining increasing interest in the scientometrics community as they can capture both the volume and quality of attention that a research work receives online. Nevertheless, there is limited knowledge about their effectiveness as a mean for measuring the impact of research if compared to traditional citation-...
Chapter
Full-text available
Over the past eight years, we have been involved in the development of a set of complementary and orthogonal ontologies that can be used for the description of the main areas of the scholarly publishing domain, known as the SPAR (Semantic Publishing and Referencing) Ontologies. In this paper, we introduce this suite of ontologies, discuss the basic...
Conference Paper
Citations within academic literature keep gaining more importance both for the work of scholars and for improving digital librariesrelated tools and services. We present in this article the preliminary results of an investigation on the characterisations of citationswhose objective is to propose a framework for globally enriching citations with exp...
Conference Paper
This paper is about web applications to browse and efficiently visualise large Linked Open Dataset (LOD). The focus is on the customisation of LOD views over semantic datasets also for non expert users. The paper presents the motivation and the details of a visual data format and a chain of tools to easily produce and customize such visualisations....