Figure 3 - uploaded by Antti Knutas
Content may be subject to copyright.
Visualization of a social network analysis results from the example dataset 

Visualization of a social network analysis results from the example dataset 

Source publication
Conference Paper
Full-text available
There is an increasing number of scientific articles being published, which makes tracking the state of the art more time-consuming. There are software tools available to help with systematic mapping studies in a field of science, but most of these tools are closed source and involve several manual time-consuming steps that could be automated furth...

Context in source publication

Context 1
... of components are intended to make software design for different environments easier. This way the open source R functions can for example be included in other desktop analysis software packages and similarly more analysis components can be added to server without having to recompile the server software itself. The entire process can be deployed to a group of dynamically scaling cloud instances. The analysis process is initiated by a user from the front-end web server, which keeps track of input files and queues up job requests into a relational database. The queue of job requests is then batch processed by a separate analysis program, which can be on a single server in low traffic situations, or on several different server instances when necessary. After the analysis is complete, the analysis program uploads the results to storage and updates the database with completed job status. The request process between different server components is illustrated in the Figure 1. The separation of components might seem excessive at a first glance, but the software components can be run on a single machine at low traffic situations. Conversely the components can be deployed on separate servers with multiple simultaneous program instances if necessary and cloud storage service with content distribution services be used for file storage. The system works on publication records available for download from Thomson Reuters Web of Science Core Collection. The system analyses seven essential variables from each publication, which include the authors, keywords, publication forum, article type and cited articles. The user downloads the literature data from Web of Science and uploads it to the analysis system via a web interface. The system then removes duplicate records and performs an exploratory data analysis on provided literature data. The analysis identifies for instance the most cited articles and authors, most common keywords, and journals with most publications. These statistics are accompanied with visualizations for a quick data overview. Additionally the system extracts the citation network data from the literature. Having an access to a citation network enables calculating how many times each reference has been cited by a paper inside the analyzed dataset. This feature is useful for identifying influential sources within a field of science, especially because it finds often-cited papers and books not listed in the primary literature dataset. In addition to providing an exploratory analysis report, the system extracts and exports data about citation and author cooperation networks that can be visualized e.g. with the Gephi [4] open graph visualization platform. This dataset of citation connections can be used to calculate the relative influence of publications in the network, for example using the eigenvector centrality analysis. Eigenvector is a measure of node centrality, which can be applied to identify nodes that play central roles in the network structure. It can be seen as a weighted sum of not only direct connections, but indirect connections of every length [5]. Compared to simpler geometrical measures like degree centrality (i.e. total number of citations), eigenvector centrality also considers the influence of the connected nodes and takes into account the entire pattern of the graph. Where degree centrality gives a simple count of number of connections a node has, eigenvector centrality assigns higher values to connections to higher-ranking nodes [13]. For example, with this calculation method a node with few high-ranking connections might outrank a node with a larger number of low-ranking connections. For the purpose of demonstration a sample dataset was retrieved from the Thomson Reuters Web of Science with the search term of “computer supported collaborative learning” and year limit of 1990 -2015. 1806 records were stored and processed with the analysis system. The processing and output rendering phase took 32 seconds on a dual core 2GHz Xeon test server. The entire analysis process from database search, analysis server upload and result download took three and half minutes. The keyword summary section from the exploratory data analysis is displayed in the Figure 2 as a sample feature. It allows one to get an overview what are other common research topics in the dataset. In the sample dataset it can be seen that distance education and higher learning are current research topics in computer-supported collaborative learning. Another result from exploratory data analysis is the publication citation counts. An example result for citation counts from the sample dataset is displayed in the Table 1. Another measure of relevance is centrality values, displayed in the rightmost column (eigenvector), which discussed further in the next paragraph. Table 1. Four most cited articles in the example dataset Additionally, we applied an influence analysis to the network using eigenvector centrality measure. In the Figure 3 we present a visualization of the network analysis results using the Gephi [4] visualization software. In the graph the size and shade denote node, or article, influence, with darkest and largest nodes being the most influential. Because of size limitations we display only the 250 most influential nodes according to the eigenvector centrality analysis results. Additionally, we marked the most referred article in black (node D) and the three most central articles with white and light gray (nodes A, B, C). In the figure the benefits of centrality analysis become apparent. The literature review article by Kreijns (node D) has most citations, but is not as central to the field of science as for example the three other nodes, which discuss fundamental issues of CSCL and are commonly cited by other influential articles in the dataset. The value of centrality analysis is highlighted by the results from the sample dataset. Basic citation count would not have highlighted the fundamental articles, because the citations are more diffused among several valued papers, but literature review articles are more rare and often cited, despite not bibliographically interacting as much with the field of science. The exploratory analysis figures and network analysis results can help a researcher to get a quick visual overview and then deeper insight into the investigated field of science. By having the data automatically analyzed and visualized, the researcher can identify the core publications, publication trends, common research themes and the direction of latest research. With centrality analysis the core publications can be identified more reliably than by measuring the total number of citations, because it takes into account citation weighing and considers the centrality of citing articles. In multidisciplinary fields network analysis enables identifying the interplay of citations and the contributions from other disciplines. This enables the researcher to see how the multidisciplinary nature of the field has formed and from which papers. Another opportunity that emerges by adding the dimension of time to the network, which visualizes the evolution of the publication network. This feature illustrates how the literature in a given field of science came to exist over time and allows one to identify publication forums and influential publications at different periods. The research question of this paper was how to make systematic mapping studies more straightforward and accessible for researchers. We presented an open, extensible cloud-based literature analysis architecture as a solution and an implementation of that architecture. The presented tool, NAILS , allows the user to get a statistical and network overview of bibliographical datasets by uploading it to the cloud-based analysis service. The service uses an open source, extensible plugin architecture, which can serve as a platform for researchers who implement additional analysis features. The sample implementation is a basic version and could benefit from additional features. The largest limitation is that data import works now only with Web of Science input data and thus cannot process articles not included in that database. The second limitation is that while the system analyses both statistical and citation network data, at the moment it visualizes only statistical data, requiring an user-installed software package for network visualization and the display of centrality values. Future work will include adding these features using the open plugin architecture and including additional major data sources, like Scopus. Additionally, having automatic import tools as browser plugins for different scientific datasets would make initiating analysis jobs even easier for researchers, but automatic downloads would also involve complicated copyright ...

Citations

... Desse modo, se remete para a relevância de fornecer uma visão geral sobre os influenciadores da inovação em clusters industriais. Nesse sentido, o objetivo desta pesquisa é mapear a produção científica internacional sobre os fatores que influenciam a inovação em clusters industriais, baseado na exploração de redes bibliométricas, que possibilitam a classificação e a estruturação do campo de conhecimento com base em representações gráficas geradas a partir das medidas de relações entre unidades bibliométricas (periódicos, autores, palavras-chave, etc.) (Knutas et al., 2015). Esse objetivo desdobra-se nos seguintes objetivos secundários: 1) apresentar a distribuição temporal das publicações ao longo dos anos (de 1998 a 2023); 2) identificar as fontes de publicação com maior volume de publicações sobre o recorte temático; ...
Article
Full-text available
Existe uma preponderância de estudos de revisão sistemática da literatura em comparação com estudos bibliométricos, conectando os construtos “inovação” e “clusters industriais”. Somado a isso, observa-se que, nessas revisões, existe uma escassez de pesquisas fornecendo uma visão holística sobre os determinantes da inovação no contexto dos clusters industriais. Desse modo, o objetivo desta pesquisa é mapear a produção científica internacional sobre os fatores que influenciam a inovação em clusters industriais. Os resultados apontam que este domínio temático ganhou crescente atenção de nos últimos anos em função de múltiplas motivações (necessidades dos clientes, progresso das TICs, etc) e, com base na mensuração da produção científica por meio em redes bibliométricas, confirmam três leis clássicas da bibliometria (Lei de Bradford, Lei de Lotka e Lei de Zipf). Conclui-se que o tema aqui estudado continua a despertar grande interesse entre múltiplos atores sociais. Ademais, a inovação em clusters industriais é influenciada por diversos mecanismos, com destaque para a P&D, a gestão do conhecimento e a interação entre os múltiplos atores sociais.
... (Blei et al., 2003;Pritchard et al., 2000) is used for the respective analyses. In WoS, for the first term, we used the codes present in the NAILS project -Network Analysis Interface for Literature Studies (Knutas et al., 2015); for the second, third, and fourth terms, due to the number of articles found in WoS, the proposed parameters were analyzed manually. In Scopus, we used the codes present in the Bibliometrix project (Aria and Cuccurullo, 2017). ...
Article
Full-text available
The mining is an activity of importance to the world economy moving billions of dollars/years and employing a network of people. After the tragedies observed in Brazil over mining dams, reflections have been raised about the safety of these environments and their impacts on the environment. This paper focused on compliance weakness on tailing dams' regulations in Brazil and the impacts of environmental accidents in Canada. Besides that, it did a bibliometric analysis with tailing dams’ strings/terms relationed and a comparative Brazil-Canada, a developed country where a similar accident with a tailing dam happened. It discussed the following topics: Salient keywords and Emerging themes, temporal and geographical distribution of publications; growth publication; word cloud, the federal legislation, and instruments for safety of dams, regulations, and the role of inspections agencies, tailing dam accidents and the Canadian scenario of mining regulations and make a comparison between these two countries.
... A literature search on innovation automation was performed for this paper with the help of an analysis on the records downloaded from Web of Science. The analysis (for further details concerning the method, see Knutas et al., 2015 and the Nails project 2 ) identifies the important authors, journals, and keywords in the dataset based on the number of occurrences and citation counts. This search shows that extant research linked to the keywords innovation and automation does not focus on innovation automation in the sense discussed in this paper, but instead is limited to more traditional contexts, such as the automation of industrial processes. ...
Article
The Internet economy and computer-aided innovation enable improvements in the quality and quantity of outcomes of innovation processes. Traditional “research pipes” are often too slow to fit with contemporary business logic. In this paper, we focus on the intersection of innovation and automation and the potential they create together. Innovation automation represents a next generation of automation that has structural implications. Automation in the innovation context is about maintaining the richness of creative innovation processes while also absorbing a greater amount of data, information, and knowledge inputs and producing more holistic outputs that meet customer needs better and are faster on the market. The paper builds a novel academic “playground” for the research on innovation automation as the efficient and effective use of co-creative intelligence—the fusion and mixture of artificial intelligence, human intelligence, and the intelligence of crowds. Covering the wide field of innovation automation requires various future research programs. The main focus areas in this paper are related to understanding innovation automation, enabling the way to new management of innovation and ecosystem development. We also propose relevant research themes for the future.
... Betweenness centrality measures the distance and frequency of the nodes that appeared among all other nodes in the field, and it reveals the articles' ability to connect future research with past works (Baker et al., 2020), and it also reveals the link strength of the article's author's connection with another group of authors in the same field (Gholampour et al., 2019). Eigen-centrality measures the article's connection with the other practical and influential articles that show the articles' relative influence within the network by direct and indirect connections (Knutas et al., 2015). We calculate these four centrality measures by using the Gephi software to evaluate the articles' influence on board committees. ...
Article
Purpose This study aims to examine the literature from the Web of Science database published on board committees between 2002 and 2023 and outline the quantitative summary, journey of board committees’ research and suggest future research directions. Design/methodology/approach This study examines bibliometric-content analysis combined with a systematic literature review of articles on board committees to document the summary of the field. The authors used co-citation, co-occurrence and cluster analysis under bibliometric-content analysis to present the field summary. Findings Board committee composition, such as their gender, independence and expertise, as well as factors affecting corporate governance, such as reporting quality, earnings management and board monitoring, all have a significant impact on board committee literature. The field is getting growing attention from authors, journals and countries. Nevertheless, there is a need for further exploration in areas like expertise, member age and tenure, the economic crisis and the nomination and remuneration committee, which have not yet received sufficient attention. Originality/value This paper has both theoretical and practical contributions. From a theoretical perspective, this study substantiates the prevalence of agency theory within board committee literature, reinforcing the foundational role of agency theory in shaping discussions about board committees. On practical ground, the comprehensive overview of board committee literature offers scholars a road map for navigating this field and directing their future research journey. The identification of research gaps in certain areas serves as a catalyst for scholars to explore untapped dimensions, enabling them to strengthen the essence of the committees’ performance.
... Various algorithms have been developed to perform topic modelling. Among these, Latent Dirich-Volume 14 • Number 2 • June 2023 let Allocation (LDA) was selected for its simplicity and popularity in performing literature review studies (e.g., D 'Amato D. et al., 2017;Pirola et al., 2020;Knutas et al., 2015). LDA is an unsupervised and generative probabilistic model (Blei et al., 2003). ...
... LDA has been implemented mainly in two different programming languages: Python and R. This work uses R because the source code is inspired by the Network Analysis Interface for Literature Studies (NAILS project) (Knutas et al., 2015) and the work published by Asmussen & Møller (2019), which also uses R. ...
Article
Full-text available
The fourth industrial revolution has resulted in technology advancements in the manufacturing industry. However, the innovation potential embedded in these technologies should be unlocked by a viable application, i.e., the business model (BM). The BM as a holistic concept featuring different interacting elements is thus emerging as a promising vehicle for innovation. Current BM research describes the entire domain but lacks depth in the characterization of its individual components. This paper investigates the available manufacturing literature through the lens of the BM concept performing a scientometric analysis. The results are presented in a relational framework that provides an in-depth characterization of the manufacturing element of the BM and highlights identified connections that link the BM components. This is the basis for tools that will support firms in developing manufacturing portfolios aligned with their strategic goals.
... In view of these discussions about entrepreneurial intention and sustainability, it is worth mentioning what Morin (1996) stated about what science is since he states that science is considered a community that presents the essence of the relationships between scientists of a friendly and hostile nature, as well as collaboration and rivalry, concurrently and constantly. Based on the construction of scientific knowledge, Knutas et al. (2015) emphasize that scientific research can be an essential agent in changing and expanding knowledge, showing how investigations are structured -to provide a vision of broad information on the dataset found in the empirical and conceptual literature on all levels of the scientific process. Furthermore, Peña Ramírez et al. (2021) highlight the contribution of scientific research to science when investigating bibliometric indicators, also considering the most valuable researchers who share the characteristic of publishing in journals on a given topic. ...
... Scientific publications show trends and influences in the most diverse areas of knowledge since they are agents of change in science and, consequently, in scientific understanding. Faced with the need to research how the fields of knowledge on entrepreneurial intention and sustainability are being structured in the academic sphere, especially considering the analysis of bibliometric networks, as well as content analysis (Knutas et al., 2015), the question is: How are scientific researches that jointly address entrepreneurial intention and sustainability structured? From this research question, the aim of the study is to investigate the international scientific production on entrepreneurial intention and sustainability. ...
Article
Full-text available
The aim of this paper was to analyze international scientific production on entrepreneurial intention and sustainability. Bibliometric research was carried out according to bibliometric laws (Lotka, Bradford and Zipf). In addition, content analysis was used to identify the main methodological approaches adopted. From the 76 documents analyzed, emerging topics related to entrepreneurial intention and sustainability were found: entrepreneurial education and gender; sustainable practices; innovation and personality traits; intention, sustainable entrepreneurship, and social entrepreneurship. These results suggest the development of research that addresses the context of social, economic and sustainable entrepreneurship. This research is relevant for providing reflections and knowledge for future research in the field of entrepreneurship, especially in relation to the alignment of entrepreneurial intention and sustainability.
... 'Machining of superalloys' and 'machining of nickel-based alloys' were primarily used as search terms resulting in maximum number of articles. Initial search results were analysed using NAILS project [23] developed by which helped in identifying missed articles in the particular area. The addition of the missed papers to the search results brought in a cumulative of 266 papers. ...
Article
This article presents a detailed systematic review on the current developments in the machining of superalloys. Superalloys have extensive high temperature applications and research has been conducted in various machining processes. The conventional and nonconventional processing of superalloys is an emerging research area and needs attention. This systematic literature survey intends to consolidate, review, and critically highlight the machining aspects carried out in this field. Bibliometric analysis using BibExcel is conducted to capture new insights in the focused field of research. Articles published since 1991 on material processing of superalloys are listed and referenced. Article statistics on each class of publications have been highlighted. A citation analysis is also reported and the articles with the highest impact are identified. Geographical mapping reveals the role of universities and institutions in the focused research. Gephi is used to identify the key research clusters and topics. Seven prominent clusters were identified and articles in each research cluster have been elaborated. A quality versus quantity analysis of the various publications is done. Present state of art in the focused research and the future directions on the material processing of superal-loys based on the review are reported.
... Nesse sentido, será fornecida uma visão geral da literatura produzida sobre o empreendedorismo digital, baseado na exploração de redes bibliométricas, que propiciam a classificação e a estruturação do campo de conhecimento baseado em representações gráficas geradas a partir das medidas de relações entre unidades bibliométricas (periódicos, autores, palavras-chave etc.) (Knutas, Hajikhani, Salminen, Ikonen, & Porras, 2015). ...
Article
Full-text available
O objetivo desta pesquisa foi mapear a produção científica internacional sobre empreendedorismo digital. Para alcançar esse propósito, foi conduzida uma pesquisa do tipo exploratória, descritiva e bibliométrica, com abordagem quantitativa. A mensuração da produção científica foi baseada nas três leis clássicas da bibliometria: Lei de Bradford, Lei de Zipf e Lei de Lotka. O estudo seguiu as etapas definidas por Sousa, Fontenele, Silva e Filho (2019) para a condução de pesquisas bibliométricas, a saber: seleção da base de dados e do software bibliométrico, definição da amostra, levantamento da amostra, tabulação e tratamento de dados descritivos, e análise dos dados. Os dados foram coletados do banco de dados da Web Of Science e tabulados no software Excel. Para analisar as redes bibliométricas de coocorrência de palavras-chave e de citação (autores e periódicos) utilizou-se o software VOS Viewer. Este estudo mostra de forma geral as principais correntes temáticas exploradas em cenário internacional sobre o empreendedorismo digital: Transformação Digital, Estratégia, Data Science e Inovação Digital. Também sinaliza outros aspectos relacionados ao construto, oferecendo um respaldo teórico para outras pesquisas, sobretudo para ampliar o campo do conhecimento sobre o assunto analisado.
... A maioria dos estudos bibliométricos foi realizada com base na Web of Science(ESTOQUE ET AL., 2019;LIU, 2013). Neste estudo foi utilizado o aplicativo Network Analysis Interface for Literature Studies (NAILS) para a execução analítica e bibliométrica, o NAILS foi idealizado porKnutas et al. (2015). O NAILS trabalha de forma integrada com uma das maiores e mais consolidadas redes de periódicos de alto impacto do mundo.A Web of Science (WoS) (SALMINEN ET AL.,2019), além de ser facilmente acoplado a outros softwares de análise visuais, fornecendo a pesquisadores respaldo estatístico e acadêmico em suas publicações onde é utilizado o software R (FREIRE-SILVA ET AL., 2019). ...
Article
Full-text available
Este estudo é fruto de investigação desenvolvidos no estudo de doutorado em desenvolvimento e meio ambiente do autor principal, que tem como um dos temas do seu objeto de pesquisa os serviços ecossistêmicos. O estudo teve como objetivo identificar tendências globais de pesquisa envolvendo artigos altamente citados sobre Serviços Ecossistêmicos (SE) do ano 2000 até o ano 2020. A pesquisa é explicativa com abordagem quantitativa utilizando a análise bibliométrica a partir de levantamento realizado na base de dados NetworkAnalysis Interface for Literature Studies (NAILS) como fonte de artigos publicados de 2000 a 2020, correspondendo aos quatro termos de busca. Os resultados permeiam o panorama geral das pesquisas sobre SE. Apresentando pela ordem, os periódicos que mais foram citados correspondem a (1); autores mais citados em estudos sobre os SE (2); amostra do ano de maior produção de estudos sobre SE (3); aplicação por localização, área científica e escolha do SE (4). Alguns pontos levantados, mostra que a totalidade dos estudos dos SE, tem caráter conservacionista, outro apontamento, mostra que muitos estudos sobre os SE têm caráter econômico/monetário, buscando o uso sustentável dos ambientes.
... The exported files were analysed with the Network Analysis Interface for Literature Studies (NAILS cf. Knutas et al., 2015) in order to produce node and edge files of the bibliometric network. These files could then be imported into the open source network visualisation software Gephi (Bastian, Heymann & Jacomy, 2009), which was then used to filter the network to identify the specific research communities for the analysis. ...
Article
Full-text available
For a research article (RA) to be accepted, not only for publication, but also by its readers, it must display proficiency in the content, methodologies and discourse conventions of its specific discipline. While numerous studies have investigated the linguistic characteristics of different research disciplines, none have utilised Social Network Analysis techniques to identify communities prior to analysing their language use. This study aims to investigate the language use of three highly specific research communities in the fields of Psychology, Physics and Sports Medicine. We were interested in how these language features are related to the total number of citations, the eigencentrality within the community and the intra-network citations of the individual RAs. Applying Biber’s Multidimensional Analysis approach, a total of 771 RA abstracts published between 2010 and 2019 were analysed. We evaluated correlations between one of three network characteristics (citations, eigencentrality and in-degree), the corpora’s dimensions and 72 individual language features. The pattern of correlations suggest that features cited by other RAs within the discourse community network are in almost all cases different from those that are cited by RAs from outside the network. This finding highlights the challenges of writing for both a discipline-specific and a wider audience.