ArticlePDF Available

PyblioNet -Software for the creation, visualization and analysis of bibliometric networks

Authors:

Abstract and Figures

PyblioNet is a software tool for the creation, visualization and analysis of bibliometric networks. It combines a Python-based data collection tool that accesses the Scopus database with a browser-based visualization and analysis tool. It allows users to create networks of publication data based on joint authorship, citations, co-citations, bibliographic coupling, and shared keywords. The overall goal of PyblioNet is to provide valuable insight and context when conducting research, to help users identify areas for further investigation, and to support the development of a robust research framework.
Content may be subject to copyright.
SoftwareX 24 (2023) 101565
Available online 21 October 2023
2352-7110/© 2023 The Author. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
PyblioNet Software for the creation, visualization and analysis of
bibliometric networks
Matthias Müller
University of Hohenheim, Stuttgart, Germany
ARTICLE INFO
Keywords:
Network
Bibliometrics
Science mapping
Scopus
Python
ABSTRACT
PyblioNet is a software tool for the creation, visualization and analysis of bibliometric networks. It combines a
Python-based data collection tool that accesses the Scopus database with a browser-based visualization and
analysis tool. It allows users to create networks of publication data based on joint authorship, citations, co-
citations, bibliographic coupling, and shared keywords. The overall goal of PyblioNet is to provide valuable
insight and context when conducting research, to help users identify areas for further investigation, and to
support the development of a robust research framework.
Metadata
Nr Code metadata description Please ll in this colum
C1 Current code version V0.8
C2 Permanent link to code/repository used
for this code version
https://github.com/Mat-Mueller/
PyblioNet
C3 Permanent link to reproducible capsule
C4 Legal code license MIT license
C5 Code versioning system used git
C6 Software code languages, tools and
services used
python, html, javascript
C7 Compilation requirements, operating
environments and dependencies
Pybliometrics 3.4.0, NetworkX 2.7.1,
VisJs 9.1.6, Python 3.10
C8 If available, link to developer
documentation/manual
C9 Support email for questions m_mueller@uni-hohenheim.de
1. Motivation and signicance
In recent years, the eld of bibliometrics has gained considerable
momentum in the research community [1]. Within the eld of biblio-
metrics, scientic mapping is a method of visualizing and analyzing
scientic research to identify the structure and dynamics of a eld of
study [2,3]. It uses the information in bibliometric data to determine the
relatedness between publications based on for example shared authors,
direct citation, bibliographic coupling, co-citation, or shared keywords.
The general process of scientic mapping includes the two main
steps of (1) data-collection (and pre-processing) and (2) visualization
and analysis [4,5]. The goal of PyblioNet is to support researchers when
performing bibliometric analyses, scoping reviews or in their daily
literature searches etc. by combining a Scopus data collection and
pre-processing tool with a user-friendly and intuitive analysis and
visualization tool. PyblioNet builds on the Python libraries Pyblio-
metrics [6] and NetworkX [7], as well as the Javascript library vis.js [8],
and can download publication data directly from the Scopus database
via Scopus APIs and visualize/analyze the resulting network using a
browser-based tool.
The three main advantages of PyblioNet in contrast to existing
software capable of using Scopus data (e.g. Bibliometrix [9] Bibexcel
[10], VOSviewer [11], Litstudy [12], CiteSpace [13], or SciMAT [14],
see also [4,15] for detailed overviews of bibliometric software) are: (i)
the ability to work with unique Scopus identiers for authors and doc-
uments, which avoids the need for complex data cleaning and manual
assignment [16,17], (ii) the possibility to collect further information on
citing documents, e.g. to determine co-citation relationships, and (iii)
the ability to include references and citing documents in the analysis and
visualization, e.g. to scope for relevant literature.
More specically, the main functions of PyblioNet include:
- literature searches based on Scopus advanced query strings
- calculation of publication networks based on shared authors, cita-
tion, bibliographic coupling, co-citation and shared keywords
- network manipulation and export of network data
- ltering of network nodes and links (by node type, publication date,
degree centrality, network level or link weight)
- visualization and analysis tools (e.g. searching for specic nodes,
resizing or recoloring nodes, force-directed or hierarchical layout)
E-mail address: m_mueller@uni-hohenheim.de.
Contents lists available at ScienceDirect
SoftwareX
journal homepage: www.elsevier.com/locate/softx
https://doi.org/10.1016/j.softx.2023.101565
Received 31 July 2023; Received in revised form 11 October 2023; Accepted 15 October 2023
SoftwareX 24 (2023) 101565
2
- direct user interaction (e.g. manual repositioning of nodes, node pop-
ups with publications abstract, access to the full text of a publication
via its DOI)
In order to use PyblioNet, users need access to the Scopus database
and a valid Scopus API key which needs to be entered upon rst usage.
After that, the user can start entering Scopus advanced search query
strings. After downloading publication data, an HTML le is generated
that can be opened in a browser and contains all necessary data and
visualization/analysis tools.
2. Software description
2.1. CLI data collection
PyblioNet consists of two main components. The rst component is a
Python-based data collection script which is built around the Pyblio-
metrics library [6] to download publication data from the Scopus
database via using Scopus Abstract Retrieval API and Search API
(Fig. 1).
Users can use PyblioNet by executing a Python script, which requires
the installation of the libraries such as Pybliometrics [6], NetworkX [7],
etc. Alternatively, users can run the exe le, which includes all necessary
libraries. For the rst use, users need to enter a valid Scopus API key in
order to access the database via Pybliometrics.
1
After that, users can
start by entering Scopus advanced search query strings.
2
PyblioNet will
display how many publications were found using the search query and
ask the user if they want to continue. If so, the user can continue with a
standard setting, or with an advanced mode where the user can decide
on the following settings:
- Minimum citation count: exclude search results based on their cita-
tions. (default: 0).
- Use cached data if possible: download publication data even if it is
data cached on your computer. (default: yes).
- Download information about citing papers: downloading informa-
tion on publications citing the search results is necessary for co-
citation analysis but takes additional time. (default: yes).
- Create extra nodes for references and citing papers: creating extra
nodes for references and citing papers can result in huge networks
that may be too large to visualize. If the user chooses later,
PiblioNet will ask for a minimum occurrence of extra nodes for ref-
erences and citing papers. (default: yes).
- Download abstracts: downloading abstracts for search results in-
creases the size of the html le and takes additional time. (default:
yes).
- Minimum weight for bibliographic coupling: include bibliographic
coupling links between publications only if there are at least x shared
references (reduces network size). (default: 1).
- Minimum weight for co-citation: include co-citation links between
publications only if there are at least x shared citing publications
(reduces network size). (default: 1).
- Minimum weight for shared keywords: include shared keyword links
between publications only if there are at least x shared keywords
(reduces network size). (default: 1).
- Create Gephi le: Creates an additional .gexf le of the network [18]
which can be opened in Gephi. (default: no).
More specically, PyblioNet rst downloads information on the
initial set of publications via Pybliometrics [6], which accesses the
Scopus search API using Scopus advanced search query strings. In a
second step, further information on references of the main search results
are collected using the Abstract Retrieval API (FULLand REFview).
Finally, citing publications are collected for each main search result via
the Scopus Search API again. The publication data is then used to create
network data using NetworkX [7] where each publication is represented
as a node and relationships between nodes are visualized via links
connecting nodes. PyblioNet covers ve methods to determine network
relationships:
- Shared authorship: nodes are connected if they share one or more
authors (using Scopus author IDs).
- Citation: nodes are connected (via a directed link) if one cites the
other (using Scopus EIDs).
- Bibliographic Coupling: nodes are connected if they share one or
more references (using Scopus EIDs; only for Scopus main results).
- Co-citation: nodes are connected if they share one or more citing
papers (using Scopus EIDs; only for Scopus main results).
- Shared keywords: nodes are connected if they share one or more
keywords (using author keywords; only for Scopus main results).
Fig. 1. UML Activity diagram of the Python data collection tool
1
See also https://pybliometrics.readthedocs.io/en/stable/access.html on
how to access the Scopus database.
2
See also https://www.scopus.com/search/form.uri?display=advanced.
M. Müller
SoftwareX 24 (2023) 101565
3
Finally, a HTML le is created that contains both the network data
and the analysis and visualization tools.
2.2. Browser analysis and visualization tool
The second component is a HTML / JavaScript analysis and visual-
ization tool building on the VisJs [8] package. The visualization and
analysis tool is designed as a browser-based graphical interface that
provides a user-friendly and intuitive way to navigate through complex
publication data.
PyblioNet also allows for different ltering and visualization
methods. First, users can choose to display only publication data from
the main search results or also citing and cited publications. Further
ltering of nodes can be done based on publication date, or the current
degree centrality. Link ltering allows users to easily switch between the
ve network levels authorship, citation, bibliographic coupling, co-
citation or shared keywords [5,1921]. In the case of bibliographic
coupling, co-citation and keyword relationships, links can additionally
be ltered by their weight (representing the number of commonly cited
or citing literature or keywords).
For visualization, users can enter search queries to highlight nodes
where e.g. the search query agent based and network will highlight
nodes that mention agent basedand networkin their keywords, title
or abstract. Resizing nodes can be done based on nodes` current degree
centrality (and in case of citation networks also based on their in-degree
or out-degree) or number of citations. Recoloring of nodes can be done
to identify cluster structures within large and dense networks using a
Louvain community detection algorithm [21] as implemented by [22],
or based on common journals (node colors are also used to identify
clusters that are analyzed in more detail using the Show information
button). The default visualization of nodes is based on a force-directed
layout algorithm placing well-connected nodes in the center of the
network and less well-connected nodes at the periphery [23]. Users can
also choose to use a hierarchical layout where the y-coordinate of nodes
in the canvas is based on the publication year, hereby positioning older
publications at the top and newer ones at the bottom (in case of a
citation network, for example, the hierarchical layout shows the cu-
mulative nature of a research eld). By changing the spring length users
can dene the distance between nodes and the nodes size denes the size
of the nodes visualized.
Finally, the Show information button opens a new window
showing the number of nodes and edges as well as the most frequent
keywords and journals. If users have previously colored nodes (based on
the Louvain algorithm or common journals), additional information for
the communities is displayed. Additionally, users can delete selected
nodes (for selecting multiple nodes, press and hold Ctrl), export the
current set of nodes in a Gephi compatible format [18], or display
additional navigation buttons.
PyblioNet also allows for direct user interaction such as repositioning
nodes manually via drag-and-drop, hovering over nodes to get more
information such as abstract, keywords etc. as well as highlighting nodes
and their direct peers by clicking on a node e.g. to identify related
literature. To quickly access the publication directly from the publisher,
double-clicking on a node opens a new tab using the publications DOI
or, if not available, opens google scholar with the publications title as a
search query.
3. Illustrative examples
3.1. Literature search
In this section, we illustrate how users can use PyblioNet. A rst use
case of PyblioNet consists of performing literature searches in order to
nd, scan and evaluate relevant academic and scholarly articles or books
to gather information on a specic topic. To illustrate this, we use the
following example search query:
TITLE-ABS-KEY(("innovation diffusion" OR "diffusion of innovation")
AND (agent-based OR multi-agent)) AND DOCTYPE(ar) aiming at nding
relevant published articles within the eld of innovation diffusion and
applying the method of agent-based modelling. The search query results
in 173 search results and in sum, these articles use 6342 references and
are being cited by 1179 publications. After removing duplicates (e.g.
two or more documents use the same reference) a network of 173 main
documents, 4751 references and 967 citing papers is created (see also
Fig. 2).
After opening the network in a browser, users can scan for relevant
literature for example, by starting with well-connected or highly cited
nodes. After identifying an interesting publication, the ve different
network levels allow for scanning nodes peers (so called snowballing
[24]) to identify additional relevant publications. As PyblioNet down-
loads and visualizes also cited and citing literature, the snowballing,
however, is not limited to the initial list of results but may contain
further relevant publications not included in the initial search results.
Fig. 2. Example of a citation network of 173 main search results.
M. Müller
SoftwareX 24 (2023) 101565
4
3.2. Science mapping
A second use case addresses research questions within the broad eld
of bibliographic analysis and science mapping. For this we chose the
example of published articles within the journal ‘SoftwareX, obtained
via the search query ISSN(23527110) which resulted in 984 articles
using 24.422 references and 17.961 citing papers. Fig. 3 presents the
resulting networks for the citation, co-citation and keyword analysis as
well as bibliographic coupling.
The citation and co-citation analysis of articles published in Soft-
wareX show only sparsely connected networks where the majority of
nodes remain isolated. This indicates that articles published in the
journal neither recognize related articles nor are considered together by
articles outside the journal. The bibliographic coupling and keyword
analysis, however, indicate a strong common basis and a (perceived)
thematical relatedness. A further analysis of the corresponding key-
words of the communities in the bibliographic coupling network shows
the following information for the two biggest communities.
Cluster: 1. with 177 nodes. Keywords: ‘python:39, machine
learning:19, deep learning:11, image processing:7, gravitational
waves:6, time series:5, optimization:5, computer vision:5, image
analysis:4, data analysis:4, software:4, open-source software:3,
tensorow:3, pytorch:3, feature extraction:3
Cluster: 3. with 138 nodes. Keywords: ‘python:12, openfoam:7,
c++:5, visualization:5, computational uid dynamics:4, data
analysis:4, high performance computing:4, high-performance
computing:4, simulation:4, nite element method:4, cfd:4,
permeability:3, nite volume method:3, nite elements:3, gpu:3,
multiphysics:3, android:3, numerical simulations:3, library:3,
turbulence:3, parallel:3
Finally, analyzing common references connecting articles at the
bibliographic coupling level can be done by switching to a citation
perspective and including references made by the articles. In Fig. 4 we
see the resulting network where important references are highlighted
via scaling the node size based on the in-degree of nodes.
4. Impact
PyblioNet complements the existing set of software designed to help
researchers to perform bibliometric analysis and literature search. It
offers a range of useful features that can change how researchers,
Fig. 3. Example of articles published in SoftwareX.
M. Müller
SoftwareX 24 (2023) 101565
5
educators, or institutions engage with bibliometric analysis, literature
reviews, and knowledge exploration. By combining data collection and
preprocessing with user-friendly visualization and analysis tools,
PyblioNet provides a comprehensive platform for gaining deeper in-
sights into the intricate web of scientic knowledge.
One of PyblioNets standout features is its seamless integration with
unique Scopus identiers for authors and documents. This eliminates the
need for time-consuming and error-prone manual data cleaning and
assignment, resulting in more accurate and reliable bibliometric ana-
lyses and allows accessing and incorporating information on citing
documents, and thus, co-citation relationships. The potential use cases
are widespread and range from identifying gaps in existing research,
developing research frameworks, identication of key authors and
works, or synthesizing information.
Requirements for using PyblioNet is a valid Scopus API key as well as
access to the Scopus API. The Python module exists also as standalone
exe le and the visualization and analysis requires only a browser. A
main disadvantage of PyblioNet against other software which uses data
obtained directly from the Scopus homepage is speed and downloading
all information for set of 100 main search results takes up to 15 minutes
and more depending on the internet connections.
5. Conclusions
In this paper, we presented PyblioNet, a software suite designed to
support researchers in bibliometric analysis, scoping reviews, and daily
literature searches. PyblioNet is a valuable addition to the bibliometrics
eld, providing researchers with an efcient and versatile tool for bib-
liometric analysis, literature searches, and science mapping. By
streamlining data collection and offering extensive analysis and visual-
ization capabilities, PyblioNet contributes to a deeper understanding of
scholarly interactions and knowledge dissemination, fostering informed
decision-making and advancing research in diverse domains.
Declaration of Competing Interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Acknowledgments
I gratefully acknowledge the support by the members of the
computational science hub (CSH) of the university of Hohenheim. In
particular, I want to thank Konstantin Kuck, Daniela Bendel and Martin
Müller for their help.
Fig. 4. citation network including information on referenced publications.
M. Müller
SoftwareX 24 (2023) 101565
6
References
[1] Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM. How to conduct a
bibliometric analysis: an overview and guidelines. J Bus Res 2021;133:28596.
https://doi.org/10.1016/j.jbusres.2021.04.070.
[2] Small H. Visualizing science by citation mapping. J Am Soc Inf Sci 1999;50(9):
799813. 10.1002/(SICI)1097-4571(1999)50:9<799AID-ASI9>3.0.CO;2-G.
[3] Chen C. Science mapping: a systematic review of the literature. J Data Inf Sci 2017;
2(2):140. https://doi.org/10.1515/jdis-2017-0006.
[4] Moral-Mu˜
noz JA, Herrera-Viedma E, Santisteban-Espejo A, Cobo MJ. Software
tools for conducting bibliometric analysis in science: an up-to-date review. EPI
2020;29(1). https://doi.org/10.3145/epi.2020.ene.03.
[5] Zupic I, ˇ
Cater T. Bibliometric methods in management and organization. Organ Res
Methods 2015;18(3):42972. https://doi.org/10.1177/1094428114562629.
[6] Rose ME, pybliometrics KJR. Scriptable bibliometrics using a python interface to
Scopus. SoftwareX 2019;10:100263. https://doi.org/10.1016/j.
softx.2019.100263.
[7] Hagberg AA, Schult DA, Swart PJ, Varoquaux G, Vaught T, Millman J. Exploring
network structure, dynamics, and function using NetworkX. editors. In:
Proceedings of the 7th Python in Science Conference; 2008. p. 115.
[8] vis.js community. vis.js, 2023. URL: https://visjs.org.
[9] Aria M, Cuccurullo C. bibliometrix An R-tool for comprehensive science mapping
analysis. J Informetr 2017;11(4):95975. https://doi.org/10.1016/j.
joi.2017.08.007.
[10] Persson O, Danell R, Wiborg-Schneider J. How to use Bibexcel for various types of
bibliometric analysis. Celebrating scholarly communications studies: A festschrift
for Olle Persson at his 60th birthday 2009:924. https://portal.research.lu.se/ws/
les/5902071/1458992.pdf.
[11] van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for
bibliometric mapping. Scientometrics 2010;84(2):52338. https://doi.org/
10.1007/s11192-009-0146-3.
[12] Heldens S, Sclocco A, Dreuning H, van Werkhoven B, Hijma P, Maassen J, et al.
litstudy: a Python package for literature reviews. SoftwareX 2022;20:101207.
https://doi.org/10.1016/j.softx.2022.101207.
[13] Chen C, CiteSpace II. Detecting and visualizing emerging trends and transient
patterns in scientic literature. J Am Soc Inf Sci Technol 2006;57(3):35977.
https://doi.org/10.1002/asi.20317.
[14] Cobo MJ, L´
opez-Herrera AG, Herrera-Viedma E, Herrera F. SciMAT: A new science
mapping analysis software tool. J Am Soc Inf Sci Tec 2012;63(8):160930. https://
doi.org/10.1002/asi.22688.
[15] Cobo MJ, L´
opez-Herrera AG, Herrera-Viedma E, Herrera F. Science mapping
software tools: Review, analysis, and cooperative study among tools. J Am Soc Inf
Sci Tec 2011;62(7):1382402. https://doi.org/10.1002/asi.21525.
[16] Strotmann A, Zhao D. Author name disambiguation: What difference does it make
in author-based citation analysis? J Am Soc Inf Sci Tec 2012;63(9):182033.
https://doi.org/10.1002/asi.22695.
[17] Baas J, Schotten M, Plume A, Cˆ
ot´
e G, Karimi R. Scopus as a curated, high-quality
bibliometric data source for academic research in quantitative science studies.
Quant Sci Stud 2020;1(1):37786. https://doi.org/10.1162/qss_a_00019.
[18] Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring
and manipulating networks. ICWSM 2009;3(1):3612. https://doi.org/10.1609/
icwsm.v3i1.13937.
[19] Chang YW, Huang MH, Lin CW. Evolution of research subjects in library and
information science based on keyword, bibliographical coupling, and co-citation
analyses. Scientometrics 2015;105(3):207187. https://doi.org/10.1007/s11192-
015-1762-8.
[20] Boyack KW, Klavans R. Co-citation analysis, bibliographic coupling, and direct
citation: Which citation approach represents the research front most accurately?
J Am Soc Inf Sci Tec 2010;61(12):2389404. https://doi.org/10.1002/asi.21419.
[21] Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities
in large networks. J Stat Mech 2008;(10):P10008. https://doi.org/10.1088/1742-
5468/2008/10/P10008. 2008.
[22] Corneliu S., Louvain community detection for Javascript, 2020, [online] Available:
https://github.com/upphiminn/jLouvain.
[23] Jacomy M, Venturini T, Heymann S, Bastian M. ForceAtlas2, a continuous graph
layout algorithm for handy network visualization designed for the Gephi software.
PLoS One 2014;9(6):e98679. https://doi.org/10.1371/journal.pone.0098679.
[24] Wohlin C, Runeson P, Neto PMS, Engstr¨
om E, I CM, de Almeida ES. On the
reliability of mapping studies in software engineering. J Syst Softw 2013;86(10):
2594610. https://doi.org/10.1016/j.jss.2013.04.076.
M. Müller
... Data features are summarized in descriptive analysis [65]. The expansion of financial literacy articles was studied using VOSviewer [22], visualizing and analyzing bibliometric networks [49]. In the bibliometric section, we analyzed the key information summary, publication trends, prominent authors, sources, and affiliations and conducted a collaboration analysis based on documents and countries. ...
Article
Full-text available
In this study, the level of financial literacy among SMEs was examined through bibliometric and content analyses. Thoroughly analyzing and quantifying the literature on financial literacy and SMEs are the goal of bibliometric research. This comprehensive overview aims to pinpoint trends, research gaps, important authors, and key ideas in order to guide future studies and policy initiatives aimed at improving the financial literacy and welfare of small- and medium-sized enterprises (SMEs). A comprehensive search of articles was conducted in 2024 to extract data. The search utilized the Scopus database and included inclusion and exclusion criteria. A total of 195 articles published between 2021 and 2024 were identified. Key concepts, including "Global Financial Literacy," were discovered through the use of the Biblioshiny app and VOS viewer program, which allows for the visualization of networks involving keywords and bibliographic coupling. Topics include: "Empowerment through financial literacy: Overcoming the manacles of Domestic violence"; "SMEs Development and planning and preparedness"; "Addressing the gap in financial inclusion and personal finance behavior"; and "Addressing disparities and enhancing education." The following are potential areas for future research: the level of financial literacy among small- and medium-sized enterprises (SMEs) around the world, the effectiveness of targeted interventions to improve SMEs' financial literacy, the role of SMEs in domestic violence policies, the factors that influence SMEs' planning processes, and the promotion of global equality and financial well-being.
... This integration of both textual analysis and visualization results in "even more effective analysis of text data by making it easier to identify patterns, outliers, and correlations that may not be readily apparent from the raw text alone" (Milev, 2023). TechMiner (Velasquez, 2023), PyblioNet (Müller, 2023), Pybliometrics (Rose & Kitchin, 2019), pyBibX (Pereira et al., 2023), metaknowledge (McLevey & McIlroy-Young, 2017), and litstudy (Heldens et al., 2022) are a few examples of tools that have been developed to add value to the analysis of textual data. Other tools that can analyze text from various sources include MoreThanSentiments (Jiang & Srinivasan, 2023) and TAll (Aria et al., 2023). ...
Article
Full-text available
In the era of big and ubiquitous data, professionals and students alike are finding themselves needing to perform a number of textual analysis tasks. Historically, the general lack of statistical expertise and programming skills has stopped many with humanities or social sciences backgrounds from performing and fully benefiting from such analyses. Thus, we introduce Coconut Libtool ( www.coconut-libtool.com/ ), an open‐source, web‐based application that utilizes state‐of‐the‐art natural language processing (NLP) technologies. Coconut Libtool analyzes text data from customized files and bibliographic databases such as Web of Science, Scopus, and Lens. Users can verify which functions can be performed with the data they have. Coconut Libtool deploys multiple algorithmic NLP techniques at the backend, including topic modeling ( LDA, Biterm, and BERTopic algorithms), network graph visualization, keyword lemmatization, and sunburst visualization. Coconut Libtool is the people‐first web application designed to be used by professionals, researchers, and students in the information sciences, digital humanities, and computational social sciences domains to promote transparency, reproducibility, accessibility, reciprocity, and responsibility in research practices.
... The general literature retrieval and analysis method used in this study comprises of four steps (see Figure 1). For data retrieval and analysis, we use the PyblioNet software tool (Müller, 2023a(Müller, , 2023b, which combines a Python-based data collection tool (accessing the Scopus database) with a browser-based visualization and analysis tool. This allows the creation of networks of publication data based on citations, co-citations, co-authorships, bibliographic coupling, and shared keywords. ...
Article
Full-text available
In this article, we analyze the literature on the simultaneous green (or sus- tainable) and digital transition, or simply the twin transition. We conduct a bibliometric analysis based on a citation network of scientific articles on digi- talization and sustainability. Our results show that both research strands have well-established but largely separate research traditions. Only recently, there has been a growing interest in studying them together. An in-depth analysis of the community structure of the citation network reveals that the literature is highly fragmented, with a significant number of hidden links between the two research strands, connecting seemingly unrelated thematic clusters.
... This integration of both textual analysis and visualization results in "even more effective analysis of text data by making it easier to identify patterns, outliers, and correlations that may not be readily apparent from the raw text alone" (Milev, 2023). TechMiner (Velasquez, 2023), PyblioNet (Müller, 2023), Pybliometrics (Rose & Kitchin, 2019), pyBibX (Pereira et al., 2023), metaknowledge (McLevey & McIlroy-Young, 2017), and litstudy (Heldens et al., 2022) are a few examples of tools that have been developed to add value to the analysis of textual data. Other tools that can analyze text from various sources include MoreThanSentiments (Jiang & Srinivasan, 2023) and TAll (Aria et al., 2023). ...
Preprint
Full-text available
In the era of big and ubiquitous data, professionals and students alike are finding themselves needing to perform a number of textual analysis tasks. Historically, the general lack of statistical expertise and programming skills has stopped many with humanities or social sciences backgrounds from performing and fully benefiting from such analyses. Thus, we introduce Coconut Libtool (www.coconut-libtool.com/), an open-source, web-based application that utilizes state-of-the-art natural language processing (NLP) technologies. Coconut Libtool analyzes text data from customized files and bibliographic databases such as Web of Science, Scopus, and Lens. Users can verify which functions can be performed with the data they have. Coconut Libtool deploys multiple algorithmic NLP techniques at the backend, including topic modeling (LDA, Biterm, and BERTopic algorithms), network graph visualization, keyword lemmatization, and sunburst visualization. Coconut Libtool is the people-first web application designed to be used by professionals, researchers, and students in the information sciences, digital humanities, and computational social sciences domains to promote transparency, reproducibility, accessibility, reciprocity, and responsibility in research practices.
... Co-citation and coupling analyses are commonly used in systematic reviews [27][28][29][30][31]. They fulfill the functions of revealing the main research approaches, establishing the fronts, and revealing future research directions [14]. ...
Article
Full-text available
Name ambiguity is a common problem in many bibliographic repositories affecting data integrity and validity. This article presents an author name disambiguation (AND) literature review using the theory of the consolidated meta-analytic approach, including quantitative techniques and bibliometric aspects. The literature review covers information from 211 documents of the Web of Science and Scopus databases in the period 2003 to 2022. A taxonomy based on the literature was used to organize the identified approaches to solve the AND problem. We identified that the most widely used AND solving approaches are author grouping associated with similarity functions and clustering methods and some works using author assignment allied to classification methods. The countries that publish most in AND are the USA, China, Germany, and Brazil with 21%, 19%, 13% and 8% of the total papers, respectively. The review results provide an overview of AND state-of-the-art research that can direct further investigation based on the quantitative and qualitative information from the AND research history.
Article
Full-text available
Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results. We present several key features of Gephi in the context of interactive exploration and interpretation of networks. It provides easy and broad access to network data and allows for spatializing, filtering, navigating, manipulating and clustering. Finally, by presenting dynamic features of Gephi, we highlight key aspects of dynamic network visualization.
Article
Full-text available
Researchers are often faced with exploring new research domains. Broad questions about the research domain, such as who are the influential authors or what are important topics, are difficult to answer due to the overwhelming number of relevant publications. Therefore, we present litstudy: a Python package that enables answering such questions using simple scripts or Jupyter notebooks. The package enables selecting scientific publications and studying their metadata using visualizations, bibliographic network analysis, and natural language processing. The software was previously used in a publication on the landscape of Exascale computing, and we envision great potential for reuse.
Article
Full-text available
Scopus is among the largest curated abstract and citation databases, with a wide global and regional coverage of scientific journals, conference proceedings, and books, while ensuring only the highest quality data are indexed through rigorous content selection and re-evaluation by an independent Content Selection and Advisory Board. Additionally, extensive quality assurance processes continuously monitor and improve all data elements in Scopus. Besides enriched metadata records of scientific articles, Scopus offers comprehensive author and institution profiles, obtained from advanced profiling algorithms and manual curation, ensuring high precision and recall. The trustworthiness of Scopus has led to its use as bibliometric data source for large-scale analyses in research assessments, research landscape studies, science policy evaluations, and university rankings. Scopus data have been offered for free for selected studies by the academic research community, such as through application programming interfaces, which have led to many publications employing Scopus data to investigate topics such as researcher mobility, network visualizations, and spatial bibliometrics. In June 2019, the International Center for the Study of Research was launched, with an advisory board consisting of bibliometricians, aiming to work with the scientometric research community and offering a virtual laboratory where researchers will be able to utilize Scopus data.
Article
Full-text available
Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between universities, the effect of state-owned science funding on national research and development performance and educational efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis and visualization tools. The included tools were divided into three categories: general bibliometric and performance analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and scientometric analysis, they have been developed for a different purpose. The number of exportable records is between 500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny. VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to decide the desired analysis output and chose the option that better fits into their aims.
Article
Full-text available
We present a wrapper for the Scopus RESTful API written for Python 3. The wrapper allows users to access the Scopus database via user-friendly interfaces and can be used without prior knowledge of RESTful APIs. The package provides classes to interact with different Scopus APIs to retrieve information as diverse as citation counts, author information or document abstracts. Files are cached to speed up subsequent analysis. The package addresses all users of Scopus data, such as researchers working in Science of Science or evaluators. It facilitates reproducibility of research projects and enhances data integrity for researchers using Scopus data. Keywords: Scopus, Software, Python, Bibliometrics, Scientometrics
Article
Full-text available
Purpose: We present a systematic review of the literature concerning major aspects of science mapping to serve two primary purposes: First, to demonstrate the use of a science mapping approach to perform the review so that researchers may apply the procedure to the review of a scientific domain of their own interest, and second, to identify major areas of research activities concerning science mapping, intellectual milestones in the development of key specialties, evolutionary stages of major specialties involved, and the dynamics of transitions from one specialty to another. http://www.jdis.org/10.1515/jdis-2017-0006
Article
Bibliometric analysis is a popular and rigorous method for exploring and analyzing large volumes of scientific data. It enables us to unpack the evolutionary nuances of a specific field, while shedding light on the emerging areas in that field. Yet, its application in business research is relatively new, and in many instances, underdeveloped. Accordingly, we endeavor to present an overview of the bibliometric methodology, with a particular focus on its different techniques, while offering step-by-step guidelines that can be relied upon to rigorously perform bibliometric analysis with confidence. To this end, we also shed light on when and how bibliometric analysis should be used vis-à-vis other similar techniques such as meta-analysis and systematic literature reviews. As a whole, this paper should be a useful resource for gaining insights on the available techniques and procedures for carrying out studies using bibliometric analysis. Keywords: Bibliometric analysis; Performance analysis; Science mapping; Citation analysis; Co-citation analysis; Bibliographic coupling; Co-word analysis; Network analysis; Guidelines.
Article
This study involved using three methods, namely keyword, bibliographic coupling, and co-citation analyses, for tracking the changes of research subjects in library and information science (LIS) during 4 periods (5 years each) between 1995 and 2014. We examined 580 highly cited LIS articles, and the results revealed that the two subjects “information seeking (IS) and information retrieval (IR)” and “bibliometrics” appeared in all 4 phases. However, a decreasing trend was observed in the percentage of articles related to IS and IR, whereas an increasing trend was identified in the percentage of articles focusing on bibliometrics. Particularly, in the 3rd phase (2005–2009), the proportion of articles on bibliometrics exceeded 80 %, indicating that bibliometrics became predominant. Combining various methods to explore research trends in certain disciplines facilitates a deeper understanding for researchers of the development of disciplines.