Article

Detecting research topic trends by author-defined keyword frequency

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Detecting research trends helps researchers and decision makers to promptly identify and analyze research topics. However, due to citation and publication delay, previous studies on trend analysis are more likely to identify ex-post trends. In this study, we employ author-defined keywords to represent topics and propose a simple, effective, and ex-ante approach, called author-defined keyword frequency prediction (AKFP), to detect research trends. More specifically, the proposed AKFP relies on the long short-term memory (LSTM) neural network. Four categories of features are proposed as input variables: Temporal feature, Persistence, Community size, and Community development potential. To verify the effectiveness and feasibility of the AKFP, we also proposed a simple but effective method to build a balanced and sufficient data set and conducted extensive comparative experiments, based on data extracted from the ACM Digital Library. Our empirical result confirms the feasibility of word frequency prediction by forecasting precision. Specifically, the short- and medium-term word frequency prediction achieved excellent performance, and the long-term word frequency prediction obtained acceptable prediction accuracy. In addition, we found that these proposed features have a significant but inconsistent impact on the AKFP. Specifically, the temporal feature is always an unignorable factor. The persistence has a strong correlation with the community size, and both are more important in the short- and medium-term prediction. In contrast, the community development potential is particularly significant in the long-term prediction.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Keyword analysis is a widely used classic bibliometric technique that illustrates a given research discipline's core topics and ideas (Cheng et al., 2020). According to Lu et al. (2021), the keyword analysis technique depends on the keywords selected by the authors of the articles to express topics that are most relevant to the research. The objective of keyword analysis is twofold (Choi et al., 2011). ...
... Keyword frequency analysis is extensively used to indicate a topic's significance, where high-frequency keywords are considered hot topics (Lu et al., 2021). Trevisani & Tuzzi (2018) indicated that the increasing frequency of a particular keyword over time mirrors the historical evolution of the corresponding notion. ...
... Co-word analysis is a content analysis technique that can map connections between items in textual data (Cobo et al., 2011). This technique depends on the co-occurrence of keywords, examines relationships among words located in the title, abstract, and keyword sections in documents, and describes the keyword centrality (Lu et al., 2021). This indicates the cognitive structure of a given research discipline (the main concepts, problems, and ideas treated by the research field) (Hu & Zhang, 2015). ...
Article
Full-text available
The objectives of the study are to illustrate the evolution of the audit quality research discipline over the past forty years, determine whether this research discipline is expandable and, if so, what potential future research avenues need further examination, and understand the current knowledge structure of the audit quality research discipline. To achieve these objectives, we employed bibliometric techniques (keyword-frequency and co-word analyses) to review a dataset of 1,831 articles extracted from the Scopus database between 1981 and 2021. A newly introduced keyword frequency tool (K-indicator) was used to measure the evolutionary stages of the audit quality research discipline. We then employed co-word analysis visualizations to present the cognitive structure of the audit quality field. The K-indicator revealed that audit quality had become a mature discipline with established concepts, keywords, and conclusions. Also, it indicated that despite extensive audit quality research, there is room for further research. The co-word analysis showed that audit quality had reached a tight and coherent status from 1981 to the end of 2021. Co-word visualizations indicated that the audit quality structure revolves around four main themes: auditor characteristics, client-related factors, audit firm characteristics, and audit regulations. Therefore, the audit quality research discipline concentrated on some specific elements and ignored others. To the extent of the authors' knowledge, no similar study was conducted to determine whether the audit quality notion is still researchable. Therefore, the results of this study would add much value to audit researchers, practitioners, and regulators.
... These extracted keywords can subsequently undergo further analysis. Many keyword-based research often leads to predicting technological trends, indicating whether a particular technology is likely to gain or lose in popularity [6][7][8]. ...
... However, these predictions typically do not include the magnitude of growth or decline of the technology in question [6,7]. To overcome this limitation, one proposed solution involves using author-dened keyword features as input for Long Short-Term Memory (LSTM) [9] and regression models to predict future keyword frequencies [8]. This prediction allows for a more comprehensive assessment of the growth and decline of specic keywords. ...
... Lu et al. used Author-Defined Keyword Frequency Prediction (AKFP) to detect trends in research topics [8]. The prediction model employed a Long Short-Term Memory (LSTM) with temporal features, persistence, community size, and community development potential as inputs for the LSTM. ...
Article
Full-text available
In this ever-changing technological landscape, the ability to quickly predict technological trends becomes crucial for any company or institute engaged in informed decision-making and strategic planning. Data for predicting technological trends can come from various sources such as patent data, which is easily accessible to the public due to the nature of patents. This research is aimed at patent analysis, focusing on combining the keyword- based method, social network analysis (SNA) method, and neural network prediction to propose a feasible keyword trend prediction method based on patent analysis by targeting upcoming keyword trends. More specifically, we utilize Long Short-Term Memory (LSTM) to predict changes in keyword frequency using keyword centralities as input. To assess the effectiveness of the proposed method, we constructed the input dataset using the USPTO patent database in the Information and Communication Technology (ICT) field. We then experimented to compare the proposed method with the benchmark method. Furthermore, to counteract the unbalanced nature of patent data, the SMOGN method is introduced. The results demonstrate its potential for application in broader contexts.
... Keyword analysis is a widely used classic bibliometric technique that illustrates a given research discipline's core topics and ideas (Cheng et al., 2020). According to Lu et al. (2021), the keyword analysis technique depends on the keywords selected by the authors of the articles to express topics that are most relevant to the research. The objective of keyword analysis is twofold (Choi et al., 2011). ...
... Keyword frequency analysis is extensively used to indicate a topic's significance, where high-frequency keywords are considered hot topics (Lu et al., 2021). Trevisani & Tuzzi (2018) indicated that the increasing frequency of a particular keyword over time mirrors the historical evolution of the corresponding notion. ...
... Co-word analysis is a content analysis technique that can map connections between items in textual data (Cobo et al., 2011). This technique depends on the co-occurrence of keywords, examines relationships among words located in the title, abstract, and keyword sections in documents, and describes the keyword centrality (Lu et al., 2021). This indicates the cognitive structure of a given research discipline (the main concepts, problems, and ideas treated by the research field) (Hu & Zhang, 2015). ...
Preprint
Full-text available
The objectives of the study are to illustrate the evolution of the audit quality research discipline over the past forty years, determine whether this research discipline is expandable and, if so, what potential future research avenues need further examination, and understand the current knowledge structure of the audit quality research discipline. To achieve these objectives, we employed bibliometric techniques (keyword-frequency and co-word analyses) to review a dataset of 1,831 articles extracted from the Scopus database between 1981 and 2021. A newly introduced keyword frequency tool (K-indicator) was used to measure the evolutionary stages of the audit quality research discipline. We then employed co-word analysis visualizations to present the cognitive structure of the audit quality field. The K-indicator revealed that audit quality had become a mature discipline with established concepts, keywords, and conclusions. Also, it indicated that despite extensive audit quality research, there is room for further research. The co-word analysis showed that audit quality had reached a tight and coherent status from 1981 to the end of 2021. Co-word visualizations indicated that the audit quality structure revolves around four main themes: auditor characteristics, client-related factors, audit firm characteristics, and audit regulations. Therefore, the audit quality research discipline concentrated on some specific elements and ignored others. To the extent of the authors' knowledge, no similar study was conducted to determine whether the audit quality notion is still researchable. Therefore, the results of this study would add much value to audit researchers, practitioners, and regulators.
... This clustering process enables a more focused analysis and interpretation of the extensive dataset, allowing for deeper insights into the various branches of the topic under investigation. While some authors focused on identifying patterns for the most used keywords in the past [69], others worked on defining methods to help predict future trends [70,71]. Our research followed in their footsteps. ...
... Considering keywords that registered a steady increase in mentions over time, such as "ethical consumption" and "sustainability", we can infer that these areas will continue to attract attention. The growing concern for environmental sustainability and ethical business practices likely means these areas While some authors focused on identifying patterns for the most used keywords in the past [69], others worked on defining methods to help predict future trends [70,71]. Our research followed in their footsteps. ...
Article
Full-text available
Ethical food consumption has gained significant attention in the past years, reflecting a societal shift towards ethical behavior. Our study examines the evolution of ethical food consumption research over the past three decades, aiming to map its transformation. We identified key trends, influential contributors, and major thematic clusters through a bibliometric analysis, employing VOSviewer (v.1.6.18) for bibliometric visualization, focusing on citation networks and keyword o-occurrences to reveal the field’s structure and dynamics. We made extensive use of the Web of Science database, where we selected 1096 relevant articles and review papers. Our analysis shows a notable rise in publications starting in 2005, with a peak in 2022, indicating increased scholarly interest in the topic. The findings underscore the importance of integrating empathy and human values into ethical food consumption, highlighting the critical roles of animal welfare, sustainability, and social justice. Despite a strong pro-ethical attitude among consumers, a significant “attitude-behavior gap” persists, emphasizing the need for strategies that bridge this divide. Our results emphasize the importance of interdisciplinary efforts to align ethical practices with broader societal goals, offering valuable insights for future research and policy-making to promote sustainable and ethical food consumption worldwide.
... Many researchers recently explored the technical trends based on deep neural networks to overcome these shortcomings. For example, Some researchers [10,11] used recurrent neural networks to capture the dynamic properties of techniques. Besides, the pre-trained language models were introduced to enhance the representation of documents and words in technology trend mining [12,13]. ...
... First, the difficulty in identifying technical terms in the corpus results in the output of trend mining being hard for humans to understand. Many existing methods use uni-grams to represent trends [4,10,11,13,15]. However, uni-gram representations often yield indistinct topic descriptions and inadequate interpretability. ...
Article
Full-text available
The past decades have witnessed significant progress in scientific research, where new technologies emerge and traditional technologies constantly evolve. As a critical task in the Science of Science (SciSci), automatically mining technology trends from massive scientific publications have attracted broad research interests in various communities. While existing approaches can achieve remarkable performance, there are still many critical challenges to address, such as data sparsity, cross-document influence, and temporal dependency. To this end, in this paper, we propose a technical terms-based graph propagated neural topic model for mining technology trends in scientific publications. Specifically, we first utilize the documents’ citation relations and technical terms to construct a heterogeneous graph. Then, we design a term propagation network to spread the technical terms on the heterogeneous graph to overcome the sparseness of technical terms. In addition, we develop a dynamic embedded topic modeling method to capture the temporal dependencies for technical terms in cross-document, which can discover the distribution of technical terms over time. Finally, extensive experiments on real-world scientific datasets validate the effectiveness and interpretability of our approach compared with state-of-the-art baselines.
... Computational powers helped discern meaningful information from the unstructured data or 'text' (Grimmer & Stewart, 2013). After extracting keywords from briefing documents, the issues related to the keywords can be quantified (Lu et al., 2021;Pao, 1978). The measurements from the method are keyword frequency and their weighted frequency (known as term frequencyinverse document frequency (TF-IDF)) in a single document. ...
... The measurements from the method are keyword frequency and their weighted frequency (known as term frequencyinverse document frequency (TF-IDF)) in a single document. A high frequency of keywords mentioned in a document can be interpreted as relatively important concepts as a strong attention to a topic (Grimmer & Stewart, 2013;Lu et al., 2021). Since the briefing materials contain not only the government's announcements but also the dialogs between speakers and new reporters, the daily frequency of keywords in briefing documents indicates the salience to which the government and the public considered the issue related to the keywords. ...
Article
The prolonged COVID−19 pandemic has given governments the challenge of increasing policy effect certainties while tackling uncertainties derived from the crisis. This research investigates the policy learning that occurred across the waves by specifically focusing on South Korea’s policy implementations directed at healthcare facility management, including practitioners, during the pandemic. To empirically analyze the government’s prompt response to changing COVID−19 situations, a text analysis of the official government briefings and a semi-structured interview were conducted. The results show that the government may have gained confidence in their policy decision and implemented policies more decisively in the later waves despite the surge of COVID−19 cases. Our findings provide an example of an uncertainty-certainty mechanism in a crisis that explains a relationship between policy learning and confidence. We also suggest capabilities that enable governments to enhance policy effects and cope with uncertainties.
... These so-called keyphrases concisely and explicitly encapsulate the core content of a document, which makes them valuable for a variety of NLP and information retrieval tasks. For instance, keyphrases were proven useful for improving document indexing (Fagan, 1987;Zhai, 1997;Jones and Staveley, 1999;Gutwin et al., 1999;Boudin et al., 2020), summarization (Zha, 2002;Wan et al., 2007;Koto et al., 2022) and question-answering (Subramanian et al., 2018;Lee et al., 2021), analyzing topic evolution (Hu et al., 2019;Cheng et al., 2020;Lu et al., 2021) or assisting with reading comprehension (Chi et al., 2007;Jiang et al., 2023a). ...
Preprint
Full-text available
Keyphrase generation refers to the task of producing a set of words or phrases that summarises the content of a document. Continuous efforts have been dedicated to this task over the past few years, spreading across multiple lines of research, such as model architectures, data resources, and use-case scenarios. Yet, the current state of keyphrase generation remains unknown as there has been no attempt to review and analyse previous work. In this paper, we bridge this gap by presenting an analysis of over 50 research papers on keyphrase generation, offering a comprehensive overview of recent progress, limitations, and open challenges. Our findings highlight several critical issues in current evaluation practices, such as the concerning similarity among commonly-used benchmark datasets and inconsistencies in metric calculations leading to overestimated performances. Additionally, we address the limited availability of pre-trained models by releasing a strong PLM-based model for keyphrase generation as an effort to facilitate future research.
... The top three author keywords were international students, online learning and COVID-19, with respective frequencies of 92, 83 and 79. As the high frequency of keywords reflects the hot topics in the research field (Lu et al., 2021), the top three keywords might indicate that international students' online education caused by COVID-19 is a hotspot in virtual mobility research. ...
Article
Full-text available
With the development of information and communication technology, virtual mobility, as an approach to internationalising education, allows students to receive cross-border education without going abroad through online educational interactions. Additionally, the outbreak of the COVID-19 pandemic accelerates the development of virtual mobility, and the challenges of virtual mobility come into the front, which leads to concerns about its sustainability in the post-pandemic era. This study aims to analyse the state and trends of virtual mobility research through a bibliometric analysis of the existing literature. A bibliometric study of 540 virtual mobility-related publications from the Scopus database, spanning from 1998 to July 2024, has been conducted. Performance analysis is utilised to examine the annual publication distribution, the main contributors (authors, journals, articles, affiliations and countries), and the top author keywords in virtual mobility research. Science mapping is adopted to reveal the social network and intellectual structure followed by themetic evoluation of keywords within virtual mobility research domain. The results indicate that the publication trend can be identified in three waves, with a sharp increase in the latest period. The dominant research hotspot is international students’ online education caused by COVID-19; and collaborative online international learning (COIL), student engagement, e-learning, and programme for international student assessment (PISA) are the four research areas with potential for further research.
... Traditionally, identifying and detecting research topics has posed challenges. Previous studies used machine learning method to identify research topics by different approaches: clustering method and cocitation analysis to investigate the research topics [20] [21], co-word analysis to calculate keyword frequency from the publication context to establish research topics [22] [23] and topic modelling method to study token within the publication to create research topics [24]. However, these kinds of studies typically focus on exploring research topics within speci c elds. ...
Preprint
Full-text available
Data visualization has been used to communicate insightful findings and decision making to the target community for many years. In this digital era, the widespread adoption of Generative Artificial intelligence (GenAI) technology and Large Language Models (LLMs) have provided new approaches for data driven decision making. To study the advantages of data visualization and GenAI toward decision making, this paper created an application which includes a dashboard and a chatbot to investigate research topics trend analysis for Hong Kong universities and global patterns. The potential of utlize data visualization and GenAI technology to accelerate data driven decision making for academic development were discussed. The results indicated that data visualization explores research topics data by revealing patterns and trends over years while GenAI technology helps interact with research topics data conversationally to draw actionable conclusions and recommendations. Leveraging the use of GenAI technology for university management and individual researchers identify potential research topics and talent acquisition, improves the accuracy and effectiveness of decision-making process, and minimizes time expenditure to devote attention to other strategic and career planning.
... Keywords play a crucial role in academic papers, as they concisely summarize the core topic, objectives, target audience, and methodology employed in the research. A systematic analysis of keywords can reveal the trends and evolution of research in a particular academic field, as well as the focus of research at a given time (48). Keywords are not only a quick way to understand the main idea of a paper, but also an important indicator of the concerns and research hotspots in an academic field (48). ...
Article
Full-text available
Background Tumor-associated neutrophils (TANs) play crucial roles in tumor progression, immune response modulation, and the therapeutic outcomes. Despite significant advancements in TAN research, a comprehensive bibliometric analysis that objectively presents the current status and trends in this field is lacking. This study aims to fill this gap by visually analyzing global trends in TANs research using bibliometric and knowledge mapping techniques. Methods We retrieved articles and reviews related to TANs from the Web of Science core collection database, spanning the period from 2012 to2024. The data was analyzed using bibliometric tools such as Excel 365, CiteSpace, VOSviewer, and Bibliometrix (R-Tool of R-Studio) to identify key trends, influential countries and institutions, collaborative networks. and citation patterns. Results A total of 6l5 publications were included in the bibliometric analysis, showing a significant upward trend in TANs research over the last two decades. The United States and China emerged as the leading contributors with the highest number of publications and citations. The journal with the most publications in this field is Frontiers in Immunology, Prominent authors such as Fridlender ZG was identified as the key contributor, with his works frequently cited. The analysis highlighted major research themes. including the role of TANs in tumor microenvironment modulation, their dual functions in tumor promotion and suppression, and the exploration of TANs-targeted therapies, Emerging research hotspots include studies on TANs plasticity and their interactions with other immune cells. Conclusion This study is the first to employ bibliometric methods to visualize trends and frontiers in TANs research. The findings provide valuable insights into the evolution of the field, highlighting critical areas for future investigation and potential collaborative opportunities. This comprehensive analysis serves as a crucial resource for researchers and practitioners aiming to advance TAN research and its application in cancer therapy.
... It's important to note that these are not novel concepts or keywords used in this research field. They might be used differently and depends mostly on the authors' scope of study [61]. From the map, the keywords identified used in recent years but not that frequently include "space radiation", "equatorial plasma bubble", "LSTM", "deep learning", "space weather forecasting", "neural network", "GIC" which refers to geomagnetically induced currents, "telescopes", and "Sun: UV radiation". ...
Article
Full-text available
Space weather (SpW) is a phenomenon caused by a variety of solar events and has the potential to disrupt infrastructure systems and technology, putting them at risk. Despite SpW’s immense impact, there has been a notable absence of bibliometric analysis studies to understand the research trends, regional distribution, social structure, conceptual structure, and knowledge gaps. This review synthesized scopus documents of SpW domain from 1988 to 2021. In this study, three tools were used, such as Microsoft Excel, VOSviewer, and Harzing’s Publish or Perish for statistical analysis, graphical presentation, and citation metrics, respectively. Based on the 3,956 articles, roughly 70% of the articles were published in the last ten years, reveals a rapid growth in SpW research. The study discovered that China ranked third in publication volume, following the United States and the United Kingdom with Russian Federation following closely in fourth place. This study also presents six key findings, including the growth pattern of publications, contributions, and authorship collaboration by countries, most productive and influenced authors, co-authorship status, most influenced journals and articles, research cluster and new SpW subtopics discovered. These findings provide useful insight and aid in the advancement and progress of this field.
... This suggests that while nursing is a relevant field, it constitutes a smaller portion of the research focus than other topics. papers, and keyword analysis of research papers in a certain field can quickly locate the research hotspots and frontiers in the field [46]. A total of 29 documents retrieved from the WoS database were imported into VOSviewer. ...
Article
Full-text available
The study aims to analyze the research trends, collaborative networks, and evolving themes in parent-adolescent sexual and reproductive health (SRH) communication research from 2010 to 2024. A bibliometric analysis was conducted using the Web of Science (WoS) Core Collection database. The search, completed on May 23, 2024, used the terms "parent-adolescent communication" and "sexual and reproductive health" as search criteria. We identified 29 documents and exported the data in RIS format for analysis. VOSviewer software was employed to visualize co-authorship networks, keyword trends, and research hotspots. The analysis revealed significant shifts in research focus over time, from communication dynamics in 2019 to a stronger emphasis on adolescent SRH issues by 2020. Key research areas included public health, paediatrics, and psychology, with notable contributions from institutions such as Bahir Dar University, The Centres for Disease Control and Prevention (USA), and Makerere University. Collaborative networks identified highly active research groups, with researchers like Kemigisha E and Nyakato VN playing central roles. Keyword trends indicated a growing interest in topics such as HIV prevention, mental health, and adolescent risk behaviours. This study highlights the dynamic nature of parent-adolescent SRH communication research, emphasizing the importance of addressing evolving SRH challenges. Despite its reliance on a single database, the analysis provides valuable insights into research trends and collaborations. Future studies should incorporate multiple databases and broader publication types to enhance understanding and support effective policy development.
... To identify trending topics, we analyze the frequency of keywords associated with VRP research topics over time [32]: ...
Article
Full-text available
This bibliometric analysis focuses on the vehicle routing problem (VRP) model in the field of logistics delivery. The study utilizes a comprehensive dataset of 2,000 VRP-related publications obtained from the Scopus database, spanning the years 2007 to 2023. Through the application of bibliometric methods, this research aims to uncover key insights regarding research trends, country contributions, and recent topics within the VRP research network. Various bibliometric indicators, including publication count, author productivity, relevant sources, institutional affiliation, and citation frequency, are employed to conduct the analysis. The findings shed light on the evolution and trajectory of VRP research, while also highlighting noteworthy countries and topics that have received significant attention. This study not only enhances the overall understanding of VRP but also serves as a foundation for future investigations aimed at enhancing the efficiency and effectiveness of logistics delivery.
... The two main areas in bibliometric measurement are dynamic analysis and structural analysis (Chen et al., 2017). Dynamic analysis considers the development and distribution of publications, keyword trends, keyword Frequency, citation metrics, distribution patterns, and impact measures like the hindex (Lu et al., 2021). ...
Article
Full-text available
This study aims to analyze global trends in STEM-based robotic physics education through a bibliometric analysis conducted using data obtained from the Scopus database. STEM and robotics are increasingly gaining attention as effective approaches in physics education, aligning with the 21st-century demand for technical skills, creativity, and problem-solving abilities. This research utilizes R bibliometric software to identify publication patterns, researcher collaborations, and key trends in STEM-based robotic physics education from 2008 to 2024. The methods involved searching for articles using the keywords "STEM AND robotic AND physics AND education," resulting in 53 downloaded and analyzed articles. The analysis was conducted in two main steps: data collection from Scopus and metadata analysis to identify research trends and collaboration networks among researchers. The results indicated a substantial growth in the volume of publications and international collaborations, with the United States leading in publication contributions, followed by Ukraine and Colombia. The key findings reveal that recent research trends focus on learning media and STEM. Meanwhile, topics that have received less attention include the impact of STEM on the curriculum and the development of critical thinking skills. Several topics, such as augmented reality and critical thinking, were also identified as challenging potential avenues for future research. This research offers significant guidance for future researchers in developing more effective STEM-based education strategies and suggests closer collaboration between academics and practitioners in various countries. Future research is recommended to explore alternative bibliometric analysis tools and expand the database coverage to enrich perspectives and enhance the scope of findings.
... We determined underresearched themes by identifying the least frequent IAASB factors and infrequent keywords (repeated two times or less). Rare keywords indicate less-studied research themes that may represent future research opportunities (Lu et al., 2021). Studying such factors would add value to audit quality literature. ...
... Furthermore, the concept of emergence highlights how new ideas in SD and SM evolve and spread within society [25]. By recognising both innovation and emergence, identifying the research trends enables this research to capture and analyse research topics promptly, thus facilitating the advancement of knowledge and progress towards UN Agenda 2030 in these academic fields [27]. ...
Article
Full-text available
This study aimed to identify emerging trends and topics in strategic management and sustainable development research in the context of global disruptions, especially when they combine into polycrises, contributing to a state of uncertainty known as the non-ergodic world. The authors employed the Scopus database to collect and analyse academic literature from 2015 to 2023. Additionally, they reviewed United Nations reports to complement the academic data with a statistical analysis of the progress made in implementing the Sustainable Development Goals (SDG). The results revealed discrepancies in this progress, with advances in poverty eradication, responsible consumption and production and climate action. However, more progress is needed to reduce hunger, preserve life on land and below water, promote peace, establish justice and strengthen institutions. This lack of focus is also mirrored in the most cited academic research on strategic management and sustainable development. The analysed research experienced a 13.0 % annual growth, with an average of 26.9 citations from 84 countries. The bibliometric analysis identified further research areas, including sustainable economic development, business sustainability strategies, leadership, and the impact of global disruptions. Thus, reflecting the challenges of achieving the United Nations Agenda 2030 in the context of global disruptions, this study confirmed the increasing role of integrating sustainable development considerations into strategic management research and highlighted the need for more research and action to meet the SDGs.
... To overcome this issue, Long Short-Term Memory (LSTM) is proposed with complex "gates" that can retain important information in the data vectors (Hochreiter & Schmidhuber, 1997;Lu et al., 2021). State-of-the-art research indicates that LSTM architecture has resulted in significant improvements compared to the other neural networks for these sequence-to-sequence predictions (Chaudhari & Thakkar, 2023;Taheri & Aliakbary, 2022). ...
Article
Identifying scholars with potentials early in their careers is critical for informed evaluations, effective allocation of funding, and tenure decisions, which in turn propel advancements in science and technology. This paper investigates the impact of social capital features on the identification of such scholars. Utilizing a comprehensive dataset spanning from 1991 to 2020, extracted from the Microsoft Academic Knowledge Graph, we analyze the novelty values of 56,568 scholars’ future publications using disruption index. We identify potential scholars as those within the top 1% based on these values. Our approach involves extracting nine key features of structural, relational, and cognitive capital from the dynamic co-authorship networks of these scholars during their early career stages. The influence of these features on scholar identification is assessed through ablation experiments using an LSTM-based predictive model. Our findings underscore the critical importance of cognitive capital features in the identification process. Furthermore, the integration of structural and relational capital features markedly enhances the model’s predictive accuracy, achieving significant improvements in precision metrics. Notably, relational capital features demonstrate a greater influence than structural features in predicting scholar potentials. These results provide essential insights and practical implications for strategies aimed at recognizing and fostering outstanding academic talent.
... Keyword emergence analysis is a technique in text mining that identifies keywords exhibiting a significant increase in frequency within texts, thereby revealing dynamic shifts, trends over time, and new focal points within a research area [85,86]. In this article, the Kleinberg algorithm is employed for the analysis of keyword emergence. ...
Article
Full-text available
Urban historical heritage areas serve as vital repositories of urban culture and history, playing a crucial role in cultural inheritance and the promotion of urban development. The protection and development of these heritage areas are essential for preserving the cultural characteristics and architectural styles of cities. Despite the growing body of research, a comprehensive review of the dynamic evolution, research frontiers, and future trajectories in this field remains absent. To bridge this gap, this study draws on the Web of Science Core Collection database, selecting 828 papers published between 2000 and 2024 that focus on urban historical heritage conservation and development. By employing Python programming and network analysis tools, this study conducted a systematic analysis of research structures and trends over the past 25 years. The results indicate that countries such as China and Italy, along with their respective research institutions, are at the forefront of global research in this area. Furthermore, this study identified research hotspots, including historic districts, sustainable urban development, urban regeneration, risk assessment, 3D modeling, digital documentation, and cultural tourism. This research not only discusses the challenges faced in the field but also explores future development trends, providing new theoretical perspectives and practical guidance for subsequent studies.
... A2: numerous scholars have delved into the concepts expressed by keywords by associating them with trend topics (Lu et al., 2021;Huang et al., 2022). Often, in fact, one of the most frequently used methods of generating trend topics is precisely through keyword frequency analysis (Calof et al., 2022). ...
Article
Full-text available
This study aims to conduct a Structured Literature Review (SLR) on Non-Performing Loans (NPLs), defined as distressed credits or deteriorated loans, to explore their historical developments and prospects. NPLs played a prominent role in the global financial landscape post the 2007 economic crisis and, nowadays, their volume is managed thanks to regulatory intervention. However, academic research on this topic is limited and sparse, particularly in relation to market volume and prices trend, as well as emerging management strategies and sustainability’s perspective. Therefore, our objective is to fill this gap and observe how the academic literature has responded to the development of this instrument. The SLR, and its associated bibliometric analysis, conducted using the Biblioshny package available on R-Studio, were performed on a sample of 1.236 academic documents (Articles, Book Chapters and Conference Papers) available on Scopus and published from 2010 to 2023. The sample, selected through a rigorous and validated screening procedure, was then studied across variables defined in the Analytical Framework: Topics, Research Methods, Research Area and Geographical Area. Based on this analysis, as explicitly stated in the findings section, we observe specific trending topics and geographical areas of study related to NPLs. This study not only helps fill a significant gap in academic literature concerning NPLs, but also provides important implications for financial practice and economic policy conducted by professionals, aiding in a better understanding of how to address and manage NPLs in various economic and geographical contexts. The originality of this research lies in its structured approach and use of bibliometric analysis, to examine a wide range of academic publications over an extended period, serving as a potential base for further insights and future studies.
... The option provided by Scopus to limit the selected papers by their indexed keywords ) was used to reduce the number of papers even further. Scopus indexed keywords are generated from author keywords (Lu et al. 2021), which are a list of topic-specific words chosen by authors that gives information about the topics under investigation (Sun and Teichert 2022;Zhang et al. 2015), and summarises and characterises the content of the scientific publication (Kwon, 2018). Keywords contain important information for both human indexing and automatic indexing systems to organise information more effectively (Fadlalla and Amani 2015). ...
Article
Full-text available
Climate change is one of the biggest challenges facing the world today threatening societies and the future of the planet. The impacts of climate change are more severe in poor and marginalised populations like Indigenous communities where people rely heavily on their Indigenous Knowledge (IK) to adapt to the changing environment. Climate change adaptation and resilience are critical for the survival of Indigenous communities under the threat of climate change. This systematic literature review seeks to understand how IK contributes to climate change adaptation and resilience. A total of 71 papers from Scopus were analysed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method. It investigated three research questions: (i) How is IK understood in climate change studies? (ii) What kind of IK is used to address climate change and enhance adaptation and resilience? and finally, (iii) What could be done to maximise the use of IK towards enhancing climate adaptation and resilience? The study found that Indigenous people use IK to predict extreme climatic conditions, prepare for it, and live through it making use of Indigenous adaptation strategies in multiple manifestations. The solutions to maximise the benefits of IK promote two dominant themes requiring more research on IK and climate change with diverse focus areas and the need to bridge it with scientific knowledge. This review provides a starting point for such research that will draw upon IK to enhance climate adaptation and resilience towards meaningful sustainable development.
... • Novelty: We introduce the concept of potential development periods to describe the novelty of topics inspired by Lu et al. [31], which is defined as the inverse of the potential development period. The novelty reflects the life cycle of topics. ...
Article
Full-text available
With the widespread use of online social media, Public Opinion Events (POEs) quickly propagate on the Internet, generating a vast amount of textual data centered around various discussed topics. The development of POEs is closely linked to the evolution of these topics. However, in developing of POEs, the key challenges lie in estimating the duration of different topics, dealing with their dynamic natures, and quantifying topic evolution to predict the number of topics in the future. In this paper, we propose an Unsupervised Spatio-Temporal Graph Attention approach (USTGAT-TT) to tackle these challenges. First, we introduce a topic evolution periods generation method without human intervention. Initially, POEs data undergoes pre-processing to establish initial periods and extract keywords. According to the persistence and hotness of keywords, new periods are reconstructed and keywords are clustered by their similarity to form topics. Then we analyze three pieces of knowledge to further learn the evolution of topics, macro properties, micro properties and dynamic topic network graphs via topics co-occurrence relationship. Finally, we design a Spatio-Temporal Graph Attention topic trend prediction model (STGAT-TT) by taking the mutual effect of topics and temporal dependencies into account. At the same time, attention mechanism and average method are employed to obtain the contribution of topics and Long Short-Term Memory (LSTM) is used to predict the number of topics in the next period to study the state of POEs. Experiments on five POEs show that the effectiveness of the proposed approach. It can estimate the duration of topics to form periods and quantify their features to learn evolution and predict the number of topics in the next period.
... Thus, studies examining the impact of CSR and sustainability on various areas, including responsible consumption, production, human resources, supply chain management, industry, innovation, infrastructure, national culture, decision-making, the circular economy, affordable and clean energy, and more, demonstrate that sustainable development and CSR are emerging as global trends [33][34][35][36][37][38][39][40][41]. Identifying these research trends allows researchers to recognise and analyse research topics promptly, thus facilitating the advancement of knowledge in these fields [42]. 3 The relationship between sustainability and SM has evolved from initial concepts to a sophisticated framework for managing business organisations, guiding them to integrate sustainability into their corporate, competitive, and functional strategies [43]. ...
Preprint
Full-text available
This article aims to identify the relationship between sustainability and strategic management to determine whether sustainability can be considered a strategic management research fashion. This involves a bibliometric analysis of recent academic literature from 2021 to 2023 to identify the latest academic research, key trends, collaboration and keyword networks within this relationship. The analysis was conducted using two datasets from the Scopus database. These datasets focus on English-language journal articles on business, management and accounting. The first covers academic research on strategic management, while the second expands to sustainability and sustainable development. The results show that strategic management research focusing on sustainability has recently grown faster (24.70%) than the whole strategic management research area (14.30%). Furthermore, the geographical analysis of co-authorship identified articles from 88 countries, suggesting a broad interest in this relationship. Notably, the strategic management network mapping revealed a unique, sustainable development, corporate social responsibility and sustainability cluster. Moreover, extended mapping revealed four clusters covering crisis management, strategic and creative sustainable development, operational and regulatory sustainability, sustainable supply chains, and resource management. The results thus confirm the rapid growth and widespread coverage of research on sustainability and strategic management, highlighting sustainability as a strategic management research fashion.
... Collaborations between influential global players, such as the USA and China, are important for driving technological advances and facilitating the emergence of innovative protein platforms (Bassoli et al. 2023). The article keywords, considering a minimum of one occurrence, are shown in Fig. 4. The analysis of keywords is a useful way to find trends in emerging topics and identify critical points that may be of interest for the purposes of research, development, and innovation (Lu et al. 2021). Considering the last decade, a total of 317 keywords related to protein extraction from biomass were found. ...
... Keyword searching techniques are commonly used in various search engines, databases, and journal sites. In this section, the author's keywords from previous studies are analyzed to determine research trends (Lu et al., 2021). To ensure accuracy, similar American and British spellings, as well as singular and plural keywords, are combined along with their abbreviations. ...
Article
The gig economy in India is still at an infant stage, with approximately seven million workers participating in it. The urban and ruban (Rural-UrBan Region) population is more aware of the concept, while the rural population is yet to embrace it. This Gig Economy paradigm has gained traction in recent years, and it is heading with a defined perspective and impact on the problem of unemployment in the country. This demands the scientometric approach to understand this area and, its contemporary situation. A scientometric review of the 'Gig Economy' in the 'Indian context' was performed over a data set of 60 documents published from 2017 to 2023 from scopus scientific indexing database. The research work aims to present the investigation about the lead author's country affiliation, most published authors, leading research journals, top trending topics. This research finds that India is seeing a paradigm shift in the gig economy advent of the COVID-19 pandemic, which has redefined the temporary work made possible by digital platforms. The work dynamics are altered by digital labour platforms, which serve as middlemen and the trend indicates that India's gig economy will change with the effective use of internetwork platforms. Globally, Indian authors contributed more on scholarly research.
... Our research revealed the main themes that were present in relation to COVID-19 and international migration in the first years of the pandemic. The ex-post analysis allowed understanding what was relevant in the field (Lu et al. 2021). Further research is needed for ex-ante identification of future trends in this area. ...
Article
Full-text available
Refugees increasingly become part of the European societies. Afghans, Syrians, Ukrainians fled their countries due to war, conflicts, persecution, and settled, temporary or not, in more stable countries. During pandemics, with openness towards foreigners shrinking, and borders closing, the situation of refugees might become uncertain. Our scoping review explores what academics considered relevant about refugees to Europe in relation to the COVID-19 pandemic. The findings reveal increasing resentment against immigrants, a need for redesigning European migration policies, preparing welfare systems, asylum protection mechanisms, and societies as a whole, in order to prevent disruptions in the eventuality of large-scale crises. Such implications are to also be considered for the consequences of the current Russian aggression on Ukraine.
... Collaborations between influential global players, such as the USA and China, are important for driving technological advances and facilitating the emergence of innovative protein platforms (Bassoli et al. 2023). The article keywords, considering a minimum of one occurrence, are shown in Fig. 4. The analysis of keywords is a useful way to find trends in emerging topics and identify critical points that may be of interest for the purposes of research, development, and innovation (Lu et al. 2021). Considering the last decade, a total of 317 keywords related to protein extraction from biomass were found. ...
... We determined underresearched themes by identifying the least frequent IAASB factors and infrequent keywords (repeated two times or less). Rare keywords indicate less-studied research themes that may represent future research opportunities (Lu et al., 2021). Studying such factors would add value to audit quality literature. ...
Conference Paper
Full-text available
Purpose: The objective of this study is to understand the knowledge structure of the audit quality research discipline and to compare perspectives of audit academics and practitioners of audit quality. The academic perspective is represented by audit quality literature from 1981 to 2021, whereas the 2014 IAASB framework represents the practitioner's perspective. Design/methodology/approach: The keyword frequency analysis was used to analyze 2,110 articles published in 332 journals indexed in the Scopus database. Findings: Audit academics and practitioners perceive audit quality differently. Academic literature overemphasized some themes and overlooked others. A great academic attention was directed to input and contextual factors. Some contextual factors received attention in prior literature, while others were overlooked. Implications: The results of this study inform academics of areas that represent potential future research opportunities. Evidence-based findings can help auditors, standard-setters, and regulators review and improve auditing standards and best practices to meet the needs of financial reporting stakeholders. Originality/value: The study contributes to audit quality literature, as it shows points of commonalities and divergence between audit academics and practitioners. It extends Simnett et al.'s (2016) study through providing a more global perspective and covering research of various methods from a wide range of peer-reviewed journals.
... As a result of the research carried out, a coauthorship network was created. In their studies to detect research topic trends by author-defined keyword frequency (ADKF) (Lu et al., 2021) used four common supervised machine learning approaches: linear regression model (LR), knearest neighbor (KNN), eXtreme Gradient Boosting (XGBoost) and random forest (RF).However, these are more standard approaches to bibliometric analysis compared to the GAT, in which the model was structure-aware. ...
Article
Full-text available
Inventory control is one of the key areas of research in logistics. Using the SCOPUS database, we have processed 9,829 articles on inventory control using triangulation of statistical methods and machine learning. We have proven the usefulness of the proposed statistical method and Graph Attention Network (GAT) architecture for determining trend-setting keywords in inventory control research. We have demonstrated the changes in the research conducted between 1950 and 2021 by presenting the evolution of keywords in articles. A novelty of our research is the applied approach to bibliometric analysis using unsupervised deep learning. It allows to identify the keywords that determined the high citation rate of the article. The theoretical framework for the intellectual structure of research proposed in the studies on inventory control is general and can be applied to any area of knowledge.
Article
Full-text available
Amaç: Bu çalışmanın amacı, Türk Kütüphaneciliği dergisinin web sitesinde yer alan ve anahtar kelime ataması yapılmamış hakemli makalelere Türkçe özlerini kullanarak anahtar kelime atamaları gerçekleştirmektir. Bu sayede anahtar kelime eksikliği nedeniyle erişimde güçlük çekilen web arşivindeki çalışmalara daha etkin bir şekilde erişim sağlanması hedeflenmektedir. Yöntem: Çalışmada, 1995 ile 1999 yılları arasında anahtar kelime olmaksızın yayımlanmış olan 58 hakemli makale incelenmiştir. Anahtar kelime ataması için Türkçe özleri girdi olarak kullanılan YAKE algoritması kullanılmış, metinlerin işlenmesi için Zemberek doğal dil işleme aracı ve Python programlama dili tercih edilmiştir. Atanan anahtar kelimelerin anlamlılığı, Anlamlılık Kontrol Oranı (AKO) ve Ortalama Mutlak Hata (OMH) değerleri ile ölçülmüştür. Atanan anahtar kelimelerin bağlamsal geçerliliği üç uzman değerlendirici ile algoritmanın atadığı anahtar kelimeler arasında değerlendirici uyumunu ölçen AC1 oranı ile ölçülmüştür. Bulgular: Anahtar kelime ataması yapılan makalelerde en sık atama yapılan kelimelerin “kütüphane”, “bilgi”, “hizmet” ve “makale” olduğu tespit edilmiştir. Algoritmanın OMH değeri 0,099 olarak belirlenmiş, bu da algoritmanın yüksek doğrulukla anahtar kelime atadığını göstermektedir. Ancak, uzman değerlendiriciler ile algoritma tarafından atanan anahtar kelimeler arasında düşük seviyede uyum tespit edilmiştir. Sonuç: Çalışma, Türkçe özlerden anahtar kelime çıkarımı yapmanın dijital belge erişimini artırmada önemli bir yöntem olduğunu göstermektedir. Algoritmaların daha yüksek performans göstermesi için, uzmanlar tarafından belirlenen veri setleri ile eğitilmeleri gerekmektedir. Ayrıca, yapılandırılmış öz kullanımı ve öz metinlerin uzunluğunun artırılması önerilmektedir. Özgünlük: Bu çalışma bilimsel makalelerin Türkçe özlerden anahtar kelime çıkarımı yaparak dijital belge erişimini artırmaya yönelik özgün bir yaklaşım sunmaktadır. Kütüphanecilik ve Bilgi Bilim alanındaki dijitalleşmiş makaleleri web ortamından erişimin arttırılmasına yönelik yapılan ilk çalışmalardan biridir.
Article
Topic analysis aims to study topic evolution and trends in order to help researchers understand the process of knowledge evolution and creation. This paper develops a novel topic evolution analysis framework, which we use to demonstrate, forecast, and explain topic evolution from the perspective of the geometrical motion of topic embeddings generated by pretrained language models. Our dataset comprises approximately 15 million papers in the computer science field, with 7,000 “fields of study” to represent the topics. First, we demonstrated that over 80% of topics had undergone obvious motion in the semantic vector space, based on the hyperplane and its normal vector generated by a support vector machine. Subsequently, we verified the predictability of the motion based on three vector regression models by predicting topic embeddings. Finally, we employed a decoder to explain the predicted motion, whose forecast embeddings can capture about 50% of unseen topics. Our research framework shows that topic evolution can be analyzed via the geometrical motion of topic embeddings, and the semantic motion of old topics nurtures new topics. The current study opens new research pathways in topic analysis and sheds light on the topic evolution mechanism from a novel geometric perspective. Peer Review https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00344
Article
Scientific knowledge evolution is an important signal for the innovative development of science and technology. As we know, new concepts and ideas are frequently born out of extensive recombination of existing concepts or notions. The evolution of a single knowledge unit or concept can be transformed into the formation of its ego-centered network from the perspective of combination innovation. Specifically, we proposed the eight research hypotheses from three aspects, namely, preferential attachment, transitivity, and homophily mechanisms. The 10,462 egocentric networks of scientific knowledge were extracted from knowledge co-occurrence network (KCN), and the Exponential Random Graph Models (ERGMs) were applied to model these sample networks individually, taking into account the influence of endogenous network structure and exogenous knowledge attribute variables. By conducting large-scale analytics on the fitting results, we found that (1) the degree centrality has a positive effect on knowledge evolution in the 99.9% sample networks, while the clustering coefficient contributes to the knowledge evolution in 56.8% sample networks at the 0.05 significance level; (2) the adoption behavior and domain impact of authors positively influence the scientific knowledge evolution, respectively, in the 93.5% and 80.8% sample networks; and (3) the knowledge type as well as the journal rank has an impact on the knowledge network evolution, demonstrating the homophily mechanism during the evolution of scientific knowledge.
Article
Full-text available
This article aims to identify the relationship between sustainability and strategic management to determine whether sustainability can be considered a strategic management research fashion. This involves a bibliometric analysis of recent academic literature from 2021 to 2023 to identify the latest academic research, key trends, collaboration and keyword networks within this relationship. The analysis was conducted using two datasets from the Scopus database. These datasets focus on English-language journal articles on business, management and accounting. The first covers academic research on strategic management, while the second expands to sustainability and sustainable development. The results show that strategic management research focusing on sustainability has recently grown faster (24.70%) and with higher funding frequency (22.4%) than the whole strategic management research field (14.30% and 17.5%, respectively). Furthermore, the geographical analysis of co-authorship identified articles from 88 countries, suggesting a broad interest in this relationship. Notably, the strategic management network mapping revealed a unique, sustainable development, corporate social responsibility, and sustainability cluster. Moreover, extended mapping revealed four clusters: strategic and innovation-driven sustainability, operational corporate sustainability, crisis management and environmental economics, and sustainable supply chain and resource management. The thematic analysis further highlights well-developed sustainability and strategic management research topics like digitalisation, circular economy, sustainable supply chain management, sustainable development goals, industry 4.0, COVID-19, environmental sustainability, etc. that are contributing to the progress of sustainability and strategic management research. The results thus confirm the rapid growth and widespread coverage of research on sustainability and strategic management, highlighting sustainability as a strategic management research fashion.
Article
Bu çalışma, Ocak 2012-Temmuz 2023 tarihleri arasında hemşirelik alanında yayınlanan afetlerle ilgili araştırmaları incelemek amacıyla yapılmıştır. Araştırma verileri Web of Science (WoS) veri tabanından elde edilmiştir. Hemşirelikte afetle ilgili yayınlanan araştırmalar “disaster”, “catastrophe”, “calamity”, “stunner”, “cataclysm”, “nursing”, “nursing care” anahtar kelimeleriyle taranmıştır. Analizlerde VOSviewer programı kullanılarak bibliyometrik analizler yapılmıştır. Çalışmamıza 270 araştırma dahil edilmiştir. Analiz sonucunda alandaki çalışmaların en çok 2021 yılında yayınlandığı belirlenmiştir. En üretken yazarın Abbas Ebadi ve en çok atıf alan yazarın De Los Santos olduğu görülmüştür. En çok üretkenlik gösteren ülke ABD ve en çok atıf alan ülke Filipinler’dir. Üretkenlik açısından Türkiye onuncu sırada yer alırken atıf alma sıralamasında on ikinci sıradadır. En çok üretkenlik gösteren kurumun Karolinska Institute ve en çok atıf alan kurumun ise Sultan Qaboos University olduğu bulunmuştur. Konuyla ilgili en çok üretkenlik gösteren ve en çok atıf alan dergi Journal of Nursing Management, en çok bibliyografik eşleşme alan eser Labrague (2020)’dir. Alanda en çok kullanılan anahtar kelimelerin Covid-19, hemşireler, hemşirelik, pandemi, afet, afetler, afet hemşireliği, nitel araştırma, akıl sağlığı, acil durum hazırlığı olduğu bulunmuştur. Çalışmamızın sonucunda, hemşirelik alanında afetle ilgili araştırmaların artmaya devam ettiği görülmüştür. Bu çalışmada elde edilen sonuçların, afetle ilgili hemşirelik alanındaki çalışmaların mevcut durumunun değerlendirilmesi, gelecekte yapılması planlanan çalışmalara rehberlik etmesine katkı sağlayacağı düşünülmektedir.
Article
The purpose of the article is to analyze global trends in the practical use of artificial intelligence algorithms in library science in 2019–2023, establish the state of practical use of AI algorithms in libraries of leading countries, identify problems and prospects for the implementation of foreign experience into the practice of Ukrainian libraries. The methodology of the research includes content analysis, literature review, and systematization. 20% of the most influential (CiteScore metric in Scopus) scientific journals in library and information science in 2019–2023 were selected. Then 100 articles related to artificial intelligence were filtered. Only those articles that have practical results were used for this study. The results. The analysis of articles allowed to identify the main research topics of artificial intelligence in library science: application of artificial intelligence in: digital linguistics (20%), scientometrics and altmetrics (45,7%), integration with Big Data to ensure data quality (5,7%), research on historical and cultural heritage (11,4%), and integration of AI technologies into library production (17,1%). The results of the conducted research allow to clarify the state of development of AI problems in foreign library science, to determine the methodologies of integrating AI technologies into modern library production. The scientific novelty of the article is explained by the absence of Ukrainian comprehensive studies on the international experience of implementing AI in library activities, which emphasizes the need for such research. The practical significance. Examples of practical implementation of AI algorithms are valuable because studying approaches, analyzing mistakes, and conclusions of experienced scientists will improve models of AI application in the work of Ukrainian archives, libraries, and other document-communication institutions.
Article
With the exponential growth of the volume of scientific literature, it is particularly important to grasp the research frontier. Predicting emerging research topics will help research institutions and scholars promptly discover promising research topics. However, previous studies mainly focused on identifying and detecting emerging research topics and lacked a method to efficiently represent and predict the emerging degree of research topics. Therefore, this study proposes a novel deep learning-based method to predict the emerging degree of research topics. First, a new indicator, the emerging index, is proposed based on the emerging attributes such as novelty, growth, and impact to quantitatively measure the emerging degree of research topics. Second, new features reflecting the emerging attributes of the research topics are extracted by constructing heterogeneous networks of bibliographic entities in the research domain. Finally, a deep learning-based time series model was employed to predict the future emerging index based on these new features. Data from the neoplasms and metabolism research domains in the PubMed Central database were used to validate the proposed method. The experimental results showed that the emerging index proposed effectively measures the emerging degree of the research topics. Furthermore, the deep learning-based model demonstrates superior performance to other models in predicting the emerging index, as evidenced by both error-based and rank-based metrics.
Article
Scientific literature records research progresses of science and technology. Research topics of technologies are evolving in scientific literature. The temporal distribution of the research topic keywords in literature can reflect the evolving stages of a research topic over time. A research topic can be in different evolving stages with different evolving distributions. Previous work mainly focused on visualizing the temporal distribution of keyword weights to illustrate the developing history and trend of a research topic in a literature collection. Quantitatively measuring the evolving stage of a research topic keyword by a baseline distribution can help to detect topic evolving stages in a large scientific literature corpus in an automatic way. How to build a quantitative baseline and how to quantitative compare the topic temporal distribution with the baseline distribution are two challenges. In this paper, an explicit function of the research heat curve is obtained by constructing a differential equation system of evolving research population groups within a research community on a research topic represented by a topic keyword. Six segments of the heat curve are obtained by zero points of derivatives of the heat curve, which together with the full heat curve are used as the quantitative baselines for measuring the temporal distribution of a research topic in different evolving stages. The temporal distribution of a research topic keyword in a scientific literature collection is obtained from the TF-IDF features of the literature collection. A curve shape matching algorithm is designed to match the temporal distribution curve with each baseline segment of the heat curve function to obtain a distance by measuring the shape similarity between the baseline segment curve and the temporal distribution curve. The segment with the smallest distance is used as a quantitative indicator of the evolving stage of the research topic. Experiments on the produced distributions and the real distributions confirm the effectiveness of the heat curve matching method for measuring the evolving stages from the temporal distribution of topics.
Article
Background: The COVID-19 pandemic has triggered a significant increase in academic research in the realm of social sciences. As such, there is an increasing need for the scientific community to adopt effective and efficient methods to examine the potential role and contribution of social sciences in the fight against COVID-19. Objectives: This study aims to identify the key topics and explore publishing trends in social science research pertaining to COVID-19 via automated literature analysis. Methods: The automated literature analysis employed utilizes keyword analysis and topic modelling technique, specifically Latent Dirichlet Allocation, to highlight the most relevant research terms, overarching research themes and research trends within the realm of social science research on COVID-19. Results: The focus of research and topics were derived from 9733 full-text academic papers. The bulk of social science research on COVID-19 centres on the following themes: 'Clinical Treatment', 'Epidemic Crisis', 'Mental Influence', 'Impact on Students', 'Lockdown Influence' and 'Impact on Children'. Conclusion: This study adds to our understanding of key topics in social science research on COVID-19. The automated literature analysis presented is particularly useful for librarians and information specialists keen to explore the role and contributions of social science topics in the context of pandemics.
Article
Full-text available
With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end‐to‐end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users’ comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro‐avgs of P@5, R@5, and F1@5 on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
Article
Full-text available
Mapping the knowledge structure from word co-occurrences in a collection of academic papers has been widely used to provide insight into the topic evolution in an arbitrary research field. In a traditional approach, the paper collection is first divided into temporal subsets, and then a co-word network is independently depicted in a 2D map to characterize each period’s trend. To effectively map emerging research trends from such a time-series of co-word networks, this paper presents TrendNets, a novel visualization methodology that highlights the rapid changes in edge weights over time. Specifically, we formulated a new convex optimization framework that decomposes the matrix constructed from dynamic co-word networks into a smooth part and a sparse part: the former represents stationary research topics, while the latter corresponds to bursty research topics. Simulation results on synthetic data demonstrated that our matrix decomposition approach achieved the best burst detection performance over four baseline methods. In experiments conducted using papers published in the past 16 years at three conferences in different fields, we showed the effectiveness of TrendNets compared to the traditional co-word representation. We have made our codes available on the Web to encourage scientific mapping in all research fields.
Article
Full-text available
Despite persistent efforts in understanding the creativity of scientists over different career stages, little is known about the underlying dynamics of research topic switching that drives innovation. Here, we analyze the publication records of individual scientists, aiming to quantify their topic switching dynamics and its influence. We find that the co-citing network of papers of a scientist exhibits a clear community structure where each major community represents a research topic. Our analysis suggests that scientists have a narrow distribution of number of topics. However, researchers nowadays switch more frequently between topics than those in the early days. We also find that high switching probability in early career is associated with low overall productivity, yet with high overall productivity in latter career. Interestingly, the average citation per paper, however, is in all career stages negatively correlated with the switching probability. We propose a model that can explain the main observed features.
Article
Full-text available
Our purpose is to adapt a statistical method for the analysis of discrete numerical series to the keywords appearing in scientific articles of a given area. As an example, we apply our methodological approach to the study of the keywords in the Library and Information Sciences (LIS) area. Our objective is to detect the new author keywords that appear in a fixed knowledge area in the period of 1 year in order to quantify the probabilities of survival for 10 years as a function of the impact of the journals where they appeared. Many of the new keywords appearing in the LIS field are ephemeral. Actually, more than half are never used again. In general, the terms most commonly used in the LIS area come from other areas. The average survival time of these keywords is approximately 3 years, being slightly higher in the case of words that were published in journals classified in the second quartile of the area. We believe that measuring the appearance and disappearance of terms will allow understanding some relevant aspects of the evolution of a discipline, providing in this way a new bibliometric approach.
Article
Full-text available
As interdisciplinary branches of ecology are developing rapidly in the 21st century, contents of ecological researches have become more abundant than ever before. Along with the exponential growth of number of published literatures, it is more and more difficult for ecologists to get a clear picture of their discipline. Nevertheless, the era of big data has brought us massive information of well documented historical literature and various techniques of data processing, which greatly facilitates the implementation of bibliometric analysis on ecology. Frequency has long been used as the primary metric in keyword analysis to detect ecological hotspots, however, this method could be somewhat biased. In our study, we have suggested a method called PAFit to measure keyword popularity, which considered ecology-related topics in a large temporal dynamical knowledge network, and found out the popularity of ecological topics follows the “rich get richer” and “fit get richer” mechanism. Feasibility of network analysis and its superiority over simply using frequency had been explored and justified, and PAFit was testified by its outstanding performance of prediction on the growth of frequency and degree. In addition, our research also encourages ecologists to consider their domain knowledge in a large dynamical network, and be ready to participate in interdisciplinary collaborations when necessary.
Article
Full-text available
The ability to predict the long-term impact of a scientific article soon after its publication is of great value towards accurate assessment of research performance. In this work we test the hypothesis that good predictions of long-term citation counts can be obtained through a combination of a publication's early citations and the impact factor of the hosting journal. The test is performed on a corpus of 123,128 WoS publications authored by Italian scientists, using linear regression models. The average accuracy of the prediction is good for citation time windows above two years, decreases for lowly-cited publications, and varies across disciplines. As expected, the role of the impact factor in the combination becomes negligible after only two years from publication.
Article
Full-text available
Science foresight comprises a range of methods to analyze past, present and expected research trends, and uses this information to predict the future status of different fields of science and technology. With the ability to identify high-potential development directions, science foresight can be a useful tool to support the management and planning of future research activities. Science foresight analysts can choose from a rather large variety of approaches. There is, however, relatively little information about how the various approaches can be applied in an effective way. This paper describes a three-step methodological framework for science foresight on the basis of published research papers, consisting of (i) life-cycle analysis, (ii) text mining and (iii) knowledge gap identification by means of automated clustering. The three steps are connected using the research methodology of the research papers, as identified by text mining. The potential of combining these three steps in one framework is illustrated by analyzing scientific literature on wind catchers; a natural ventilation concept which has received considerable attention from academia, but with quite low application in practice. The knowledge gaps that are identified show that the automated foresight analysis is indeed able to find uncharted research areas. Results from a sensitivity analysis further show the importance of using full-texts for text mining instead of only title, keywords and abstract. The paper concludes with a reflection on the methodological framework, and gives directions for its intended use in future studies.
Article
Full-text available
Keyword networks, formed from keywords occurring in scholarly articles provide a useful mechanism for understanding academic research trends. In keyword networks, keywords are represented as nodes and a link is formed between a pair of keywords if they appear in the same article. Each link is assigned a weight, representing the number of co-occurrences of the pair in different articles. A statistical and visual analysis of the structural and temporal characteristics of such networks reveals the organizing pattern and the evolution of keywords. In this study we analyse the difference between structured keyword system and unstructured keyword system. We use keywords from two prominent business management journals from USA and India and analyse the corresponding keyword networks. Our results indicate that the network characteristics of structured keyword system are more suitable than unstructured keyword system to analyse research trends and bring forth the emerging areas and popular research methods. The adoption of structured keyword system will aid researchers and funding agencies to optimize their decision on the use of research funding.
Conference Paper
Full-text available
We assessed all papers published in two key environmental modelling journals in 2008 to determine the degree to which the citation counts of the papers could be predicted without considering the paper's quality. We applied both random forests and general additive models to predict citation counts using a range of easily quantified or categorised characteristics of the papers as covariates. The more highly cited papers were, on average, longer, had longer reference lists, had more authors, were more likely to have been published in Environmental Modelling and Software and less likely to include differential or integral equations than papers with lower citation counts. Other equations had no effect. Although these factors had significant predictive power regardless of which statistical modelling approach was applied, unknown factors (presumably, research quality and relevance) accounted for the majority of variability in citation rates. A longer version of this paper, focusing on the random forest model results and considering several additional potential predictive variables, has been submitted for consideration for publication in Environmental Modelling & Software (Robson and Mousquès, submitted).
Article
Full-text available
The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a dataset in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this dataset using only four features. The best features, among those we evaluated, were features based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index.
Article
Full-text available
We investigate the community structure of physics subfields in the citation network of all Physical Review publications between 1893 and August 2007. We focus on well-cited publications (those receiving more than 100 citations), and apply modularity maximization to uncover major communities that correspond to clearly identifiable subfields of physics. While most of the links between communities connect those with obvious intellectual overlap, there sometimes exist unexpected connections between disparate fields due to the development of a widely applicable theoretical technique or by cross fertilization between theory and experiment. We also examine communities decade by decade and also uncover a small number of significant links between communities that are widely separated in time.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
The rapid development of scientific fields in this modern era has raised the concern for prospective scholars to find a proper research field to conduct their future studies. Thus, having a vision of future could be helpful to pick the right path for doing research and ensuring that it is worth investing in. In this study, we use article keywords of computer science journals and conferences, assigned by INSPEC controlled indexing, to construct a temporal scientific knowledge network. By observing keyword networks snapshots over time, we can utilize the link prediction methods to foresee the future structures of these networks. We use two different approaches for this link prediction problem. First, we have utilized three topology-based link prediction algorithms, two of which are commonly used in literature. We have also proposed a third algorithm based on nodes (keywords) clustering coefficient, their centrality measures like eigenvector centrality, and nodes community information. Then, we used nodes topological features and the outputs of aforementioned topology-based link prediction algorithms as features to feed five machine learning link prediction algorithms (SVM, Random Forest Classifier, K-Nearest Neighbors, Gaussian Naïve Bayes, and Multinomial Naïve Bayes). All tested predictors have shown considerable performance and their results are discussed in this paper.
Article
Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map.
Article
Author keywords for scientific literature are terms selected and created by authors. Although most studies have focused on how to apply author keywords to represent their research interests, little is known about the process of how authors select keywords. To fill this research gap, this study presents a pilot study on author keyword selection behavior. Our empirical results show that the average percentages of author keywords appearing in titles, abstracts, and both titles and abstracts are 31%, 52.1%, and 56.7%, respectively. Meanwhile, we find that keywords also appear in references and high-frequency keywords. The proportions of author-selected keywords appearing in the references and high-frequency keywords are 41.6% and 56.1%, respectively. In addition, keywords of papers written by core authors (productive authors) are found to appear less frequently in titles and abstracts in their papers than that of others, and appear more frequently in references and high-frequency keywords. The percentages of keywords appearing in titles and abstracts in scientific papers are negatively correlated with citation counts of papers. In contrast, the percentages of author keywords appearing in high-frequency keywords are positively associated with citation counts of papers.
Article
This paper proposes keyword-citation-keyword (KCK) network to analyze the knowledge structure of a discipline. Different from traditional co-word network analysis, KCK network highlights the importance of keywords assigned in different articles, as well as the semantic relationship between keywords in various articles. In this study, we select computer science domain as an example to illustrate the proposed method. Meanwhile, the results of network analysis, PageRank analysis, and research topic analysis are compared with those of traditional co-word analysis. A total of 110,360 articles with 164,146 unique keywords and 1,615,030 references collected from ACM digital library have been used for this empirical study. The results demonstrate that KCK network outperforms in detecting indirect links between keywords with higher semantic relationship, identifying important knowledge units, as well as discovering the topics with greater significance. Findings from this study contribute to a new perspective and understanding for elucidating discipline knowledge structures, and provide guidance for applying this method in various disciplines.
Article
Predicting the citation counts of academic papers is of considerable significance to scientific evaluation. This study used a four-layer Back Propagation (BP) neural network model to predict the five-year citations of 49,834 papers in the library, information and documentation field indexed by the CSSCI database and published from 2000 to 2013. We extracted six paper features, two journal features, nine author features, eight reference features, and five early citation features to make the prediction. The empirical experiments showed that the performance of the BP neural network is significantly better than those of the six baseline models. In terms of the prediction effect, the accuracy of the model at predicting infrequently cited papers was higher than that for frequently cited ones. We determined that five essential features have significant effects on the prediction performance of the model, i.e., ‘citations in the first two years’, ‘first-cited age’, ‘paper length’, ‘month of publication’, and ‘self-citations of journals’, and the other features contribute only slightly to the prediction.
Article
The number of received citations have been used as an indicator of the impact of academic publications. Developing tools to find papers that have the potential to become highly-cited has recently attracted increasing scientific attention. Topics of concern by scholars may change over time in accordance with research trends, resulting in changes in received citations. Author-defined keywords, title and abstract provide valuable information about a research article. This study performs a latent Dirichlet allocation technique to extract topics and keywords from articles; five keyword popularity (KP) features are defined as indicators of emerging trends of articles. Binary classification models are utilized to predict papers that were highly-cited or less highly-cited by a number of supervised learning techniques. We empirically compare KP features of articles with other commonly used journal-related and author-related features proposed in previous studies. The results show that, with KP features, the prediction models are more effective than those with journal and/or author features, especially in the management information system discipline.
Article
Emerging research topic detection can benefit the research foundations and policy-makers. With the long-term and recent interest in detecting emerging research topics, various approaches are proposed in the literature. Though, there is still a lack of well-established linkages between the clear conceptual definition of emerging research topics and the proposed indicators for operationalization. This work follows the definition by Wang (2018), and several machine learning models are together used to detect and foresight the emerging research topics. Finally, experimental results on gene editing dataset discover three emerging research topics, which make clear that it is feasible to identify emerging research topics with our framework.
Article
Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.
Article
With the growing number of published scientific papers world-wide, the need to evaluation and quality assessment methods for research papers is increasing. Scientific fields such as scientometrics, informetrics, and bibliometrics establish quantified analysis methods and measurements for evaluating scientific papers. In this area, an important problem is to predict the future influence of a published paper. Particularly, early discrimination between influential papers and insignificant papers may find important applications. In this regard, one of the most important metrics is the number of citations to the paper, since this metric is widely utilized in the evaluation of scientific publications and moreover, it serves as the basis for many other metrics such as h-index. In this paper, we propose a novel method for predicting long-term citations of a paper based on the number of its citations in the first few years after publication. In order to train a citation count prediction model, we employed artificial neural network which is a powerful machine learning tool with recently growing applications in many domains including image and text processing. The empirical experiments show that our proposed method outperforms state-of-the-art methods with respect to the prediction accuracy in both yearly and total prediction of the number of citations.
Article
Users of social media websites tend to rapidly spread breaking news and trending stories without considering their truthfulness. This facilitates the spread of rumors through social networks. A rumor is a story or statement for which truthfulness has not been verified. Efficiently detecting and acting upon rumors throughout social networks is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this paper, we study the problem of detecting breaking news rumors, instead of long-lasting rumors, that spread in social media. We propose a new approach that jointly learns word embeddings and trains a recurrent neural network with two different objectives to automatically identify rumors. The proposed strategy is simple but effective to mitigate the topic shift issues. Emerging rumors do not have to be false at the time of the detection. They can be deemed later to be true or false. However, most previous studies on rumor detection focus on long-standing rumors and assume that rumors are always false. In contrast, our experiment simulates a cross-topic emerging rumor detection scenario with a real-life rumor dataset. Experimental results suggest that our proposed model outperforms state-of-the-art methods in terms of precision, recall, and F1.
Article
The growing availability of large diachronic corpora of scientific literature offers the opportunity of reading the temporal evolution of concepts, methods and applications, i.e., the history of disciplines involved in the strand under investigation. After a retrieval process of the most relevant keywords, bag-of-words approaches produce words × time-points contingency tables, i.e. the frequencies of each word in the set of texts grouped by time-points. Through the analysis of word counts over the observed period of time, main purpose of the study is, after reconstructing the "life-cycle" of words, clustering words that have similar life-cycles and, thus, detecting prototypical or exemplary temporal patterns. Unveiling such relevant and (through expert opinion) meaningful inner dynamics enables us to trace a historical narrative of the discipline of interest. However, different history readings are possible depending on the type of data normalization, which is needed to account for the fluctuating size of texts across time and the general problems of data sparsity and strong asymmetry. This study proposes a methodology consisting of (1) a stepwise information retrieval procedure for keywords' selection and (2) a functional clustering two-stage approach for statistical learning. Moreover, a sample of possible normalizations of word frequencies is considered, showing that the different concept of curve similarity induced in clustering by the type of transformation heavily affects groups' composition and size. The corpus of titles of scientific papers published by the American Statistical Association journals in the time span 1888-2012 is examined for illustration.
Article
Nowadays within the global economy, organizations of all kinds require strategic information to ensure that their decision making is competitive in uncertain complex environments. This is a key factor for knowledge-intensive companies insofar as this has an impact on their capacity to anticipate, influence and collaborate. Specialized literature is beginning to connect two disciplines such as strategic intelligence and public relations from an enterprise management approach, proposing concepts such as “Public Relations Intelligence”. This paper aims to explore their research areas using bibliometric analysis to uncover how those disciplines evolve in order to propose future research fronts relating to technological observatories. Firstly, a systematic review was carried out to identify and systematize available scientific information between 2006 and 2016 in publications with international impact from common topics. Secondly, it employed bibliometric analysis based on patterns of co-citation and the co-occurrence of keywords to focus on assessing impact and maturity. The main findings suggest that there is an already potential emerging research field between strategic intelligence and public relations which is highlighting common topics such as strategy, issue management, reputation and the American countries-territories being the dominant literature. This work provides evidence to claim a need to foster the conditions for consolidation of this field as a topic of research and it might well be a valuable opportunity to enhance technological observatories in the networked society.
Article
As network analysis methods prevail, more metrics are applied to co-word networks to reveal hot topics in a field. However, few studies have examined the relationships among these metrics. To bridge this gap, this study explores the relationships among different ranking metrics, including one frequency-based and six network-based metrics, in order to understand the impact of network structural features on ranking themes on co-word networks. We collected bibliographic data from three disciplines from Web of Science (WoS), and generated 40 simulation networks following the preferential attachment assumption. Correlation analysis on the empirical and simulated networks shows strong relationships among the metrics. Their relationships are consistent across disciplines. The metrics can be categorized into three groups according to the strength of their correlations, where Degree Centrality, H-index, and Coreness are in one group, Betweenness Centrality, Clustering Coefficient, and frequency in another, and Weighted PageRank by itself. Regression analysis on the simulation networks reveals that network topology properties, such as connectivity, sparsity, and aggregation, influence the relationships among selected metrics. In addition, when comparing the top keywords ranked by the metrics in the three disciplines, we found the metrics exhibit different discriminative capacity. Coreness and H-index may be better suited for categorizing keywords rather than ranking keywords. Findings from this study contribute to a better understanding of the relationships among different metrics and provide guidance for using them effectively in different contexts.
Article
Patent citation analysis is considered a useful tool for identifying emerging technologies. However, the outcomes of previous methods are likely to reveal no more than current key technologies, since they can only be performed at later stages of technology development due to the time required for patents to be cited (or fail to be cited). This study proposes a machine learning approach to identifying emerging technologies at early stages using multiple patent indicators that can be defined immediately after the relevant patents are issued. For this, first, a total of 18 input and 3 output indicators are extracted from the United States Patent and Trademark Office database. Second, a feed-forward multilayer neural network is employed to capture the complex nonlinear relationships between input and output indicators in a time period of interest. Finally, two quantitative indicators are developed to identify trends of a technology's emergingness over time. Based on this, we also provide the practical guidelines for implementation of the proposed approach. The case of pharmaceutical technology shows that our approach can facilitate responsive technology forecasting and planning.
Article
While logistic sigmoid neurons are more biologically plausable that hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabelled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labelled data sets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised nueral networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training
Article
Detecting emerging research topics is essential, not only for research agencies but also for individual researchers. Previous studies have created various bibliographic indicators for the identification of emerging research topics. However, as indicated by Rotolo et al. (2015), the most serious problems are the lack of an acknowledged definition of emergence and incomplete elaboration of the linkages between the definitions that are used and the indicators that are created. With these issues in mind, this study first adjusts the definition of an emerging technology that Rotolo et al. (2015) have proposed in order to accommodate the analysis. Next, a set of criteria for the identification of emerging topics is proposed according to the adjusted definition and attributes of emergence. By the use of two sets of parameter values, several emerging research topics are identified. Finally, evaluation tests are conducted by demonstration of the proposed approach and comparison with previous studies. The strength of the present methodology lies in the fact that it is fully transparent, straightforward, and flexible.
Article
To understand quantitatively how scientists choose and shift their research focus over time is of high importance, because it affects the ways in which scientists are trained, science is funded, knowledge is organized and discovered, and excellence is recognized and rewarded1, 2, 3, 4, 5, 6, 7, 8, 9. Despite extensive investigation into various factors that influence a scientist’s choice of research topics8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, quantitative assessments of mechanisms that give rise to macroscopic patterns characterizing research-interest evolution of individual scientists remain limited. Here we perform a large-scale analysis of publication records, and we show that changes in research interests follow a reproducible pattern characterized by an exponential distribution. We identify three fundamental features responsible for the observed exponential distribution, which arise from a subtle interplay between exploitation and exploration in research-interest evolution5,22. We developed a random-walk-based model, allowing us to accurately reproduce the empirical observations. This work uncovers and quantitatively analyses macroscopic patterns that govern changes in research interests, thereby showing that there is a high degree of regularity underlying scientific research and individual careers.
Article
The claim that co-citation analysis is a useful tool to map subject-matter specialties of scientific research in a given period, is examined. A method has been developed using quantitative analysis of content-words related to publications in order to: (1) study coherence of research topics within sets of publications citing clusters, i.e., (part of) the “current work” of a specialty; (2) to study differences in research topics between sets of publications citing different clusters; and (3) to evaluate recall of “current work” publications concerning the specialties identified by co-citation analysis. Empirical support is found for the claim that co-citation analysis identifies indeed subject-matter specialties. However, different clusters may identify the same specialty, and results are far from complete concerning the identified “current work.” These results are in accordance with the opinion of some experts in the fields. Low recall of co-citation analysis concerning the “current work” of specialties is shown to be related to the way in which researchers build their work on earlier publications: the “missed” publications equally build on very recent earlier work, but are less “consensual” and/or less “attentive” in their referencing practice. Evaluation of national research performance using co-citation analysis appears to be biased by this “incompleteness.”
Article
A number of bibliometric studies have shown that many factors impact citation counts besides the scientific quality. This paper used a large bibliometric dataset to investigate the impact of the different statistical properties of author-selected keywords and the network attributes of their co-occurrence networks on citation counts. Four statistical properties of author-selected keywords were considered: (i) Keyword growth (i.e., the relative increase or decrease in the presence statistics of an underlying keyword over a given period of time); (ii) Keyword diversity (i.e., the level of variety in a set of author-selected keywords); (iii) Number of keywords; and (iv) Percentage of new keywords. This study also considered network centrality which is a network attribute from the keyword co-occurrence network. Network centrality was calculated using the average of three basic network centrality measures: degree, closeness and betweenness centrality. A correlation and regression analysis showed that all of these factors had a significant positive relation with citation counts except the percentage of new keywords that had a significant negative relation. However, when the effect of four potential control variables (i.e., the number of article authors, the length of an article, the quality of the journal in which the article was published and the length of the title of an article) were controlled, only four variables related to author-selected keywords showed a significant relation with citation counts. Keyword growth, number of keywords and network centrality showed a positive relation with citation counts; whereas, the percentage of new keywords showed a negative relation with citation counts. The implications of these findings are discussed in this article.
Article
Quantitative measurements of bibliometrics based on knowledge entities (i.e., keywords) improve competencies in tracking the structure and dynamic development of various scientific domains. Co-word networks (a content analysis technique and type of knowledge network) are often employed to discern relationships among various scientific concepts in scholarly publications to reveal the development and evolution of scientific knowledge. In relation to evolutionary network analysis, different link prediction methods in network science can assist in the prediction of missing links and modelling of network dynamics. These traditional methods (based on topological similarity scores and time series methods of link prediction) can be used to predict future co-occurrence trends among scientific concepts. This study attempted to build supervised learning models for link prediction in co-word networks using network topological similarity metrics and their temporal evolutionary information. In addition to exploring the underlying mechanism of temporal co-word network evolution, classification datasets containing links with both positive and negative labels were also built. A set of topological metrics and their temporal evolutionary information were produced to describe instances of classification datasets. Supervised classifications methods were then applied to classify the links and accurately predict future associations among keywords. Time series based forecasting methods were used to predict the future values of topological evolution. Results in relation to supervised link prediction by different classifiers showed that both static and dynamic information are valuable in predicting new links between literary concepts extracted from scientific literature.
Article
This study involved using three methods, namely keyword, bibliographic coupling, and co-citation analyses, for tracking the changes of research subjects in library and information science (LIS) during 4 periods (5 years each) between 1995 and 2014. We examined 580 highly cited LIS articles, and the results revealed that the two subjects “information seeking (IS) and information retrieval (IR)” and “bibliometrics” appeared in all 4 phases. However, a decreasing trend was observed in the percentage of articles related to IS and IR, whereas an increasing trend was identified in the percentage of articles focusing on bibliometrics. Particularly, in the 3rd phase (2005–2009), the proportion of articles on bibliometrics exceeded 80 %, indicating that bibliometrics became predominant. Combining various methods to explore research trends in certain disciplines facilitates a deeper understanding for researchers of the development of disciplines.
Article
This paper examines the research patterns and trends of Recommendation System (RecSys) in China during the period of 2004–2013. Data (keywords in articles) was collected from the China Academic Journal Network Publishing Database (CAJD) and the China Science Periodical Database (CSPD). A co-word analysis was conducted to measure correlation among the extracted keywords. The cluster analysis and social network analysis revealed 12 theme-clusters, network characteristics (centrality and density) of the clusters, the strategic diagram, and the correlation network. The study results show that there are several important themes with a high correlation in Chinese RecSys research, which is considered to be relatively focused, mature, and well-developed overall. Some research themes have developed on a considerable scale, while others remain isolated and undeveloped. This study also identified a few emerging themes with great potential for development. It was also determined that studies overall on the applications of RecSys are increasing.
Article
The aim of this study is to map and analyze the structure and evolution of the scientific literature on gender differences in higher education and science, focusing on factors related to differences between 1991 and 2012. Co-word analysis was applied to identify the main concepts addressed in this research field. Hierarchical cluster analysis was used to cluster the keywords and a strategic diagram was created to analyze trends. The data set comprised a corpus containing 652 articles and reviews published between 1991 and 2012, extracted from the Thomson Reuters Web of Science database. In order to see how the results changed over time, documents were grouped into three different periods: 1991-2001, 2002-2007, and 2008-2012. The results showed that the number of themes has increased significantly over the years and that gender differences in higher education and science have been considered by specific research disciplines, suggesting important research-field-specific variations. Overall, the study helps to identify the major research topics in this domain, as well as highlighting issues to be addressed or strengthened in further work.
Conference Paper
In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a two-stage prediction model – in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the well-known baseline system.
Article
Understanding the evolution of research topics is crucial to detect emerging trends in science. This paper proposes a new approach and a framework to discover the evolution of topics based on dynamic co-word networks and communities within them. The NEViewer software was developed according to this approach and framework, as compared to the existing studies and science mapping software tools, our work is innovative in three aspects: (a) the design of a longitudinal framework based on the dynamics of co-word communities; (b) it proposes a community labelling algorithm and community evolution verification algorithms; (c) and visualizes the evolution of topics at the macro and micro level respectively using alluvial diagrams and coloring networks. A case study in computer science and a careful assessment was implemented and demonstrating that the new method and the software NEViewer is feasible and effective.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Article
The methods of patent analysis are largely divided into network-based patent analysis and keyword-based morphological patent analysis. Both methods have their shortcomings: internal patent information composed of natural languages cannot be analyzed in the network-based patent analysis method, and the correlation between patents cannot be analyzed in the keyword-based morphological patent analysis method. In this research, we analyze the patents of Light Emitting Diode (LED) and wireless broadband fields via a method that incorporates both the network-based patent analysis and the keyword-based patent analysis methods. And by using network indices, we identify the characteristics of the patent keyword network, and also perform a trend analysis to discover how keywords play a significant role in network changes over time. The analysis results indicate that the patent keyword network is sporadic but clustered and shows a clear power law distribution. Further, the inflow keywords are highly likely to tie new connections with other keywords in the existing associated communities. Also, we confirm the fact that, as time passes, the top core keywords of a particular technology field continue to play an important role in the network and that also the rate of technological changes in wireless broadband field is faster than that of LED. Through the proposed analysis, researchers can easily grasp what technology keywords are important in the specific technology field and identify the relations between the essential technology elements; furthermore, this information can be utilized for developing new technologies by combining these technology elements extracted from community analysis.
Article
The aim of this study is to map the intellectual structure of digital library (DL) field in China during the period of 2002–2011. Co-word analysis was employed to reveal the patterns of DL field in China through measuring the association strength of keywords in relevant journals. Data was collected from Chinese Journal Full-Text Database during the period of 2002–2011. And then, the co-occurrence matrix of keywords was analyzed by the methods of multivariate statistical analysis and social network analysis. The results mainly include five parts: seven clusters of keywords, a two-dimensional map, the density and centrality of clusters, a strategic diagram, and a relation network. The results show that there are some hot research topics and marginal topics in DL field in China, but the research topics are relatively decentralized compared with the international studies.
Article
This paper argues that the technology life cycle literature is confused and incomplete. This literature is first reviewed with consideration of the related concepts of the life cycles for industries and products. By exploring the inter-relationships between these, an integrated view of the technology life cycle is produced. A new conceptualization of the technology life cycle is then proposed. This is represented as a model that incorporates three different levels for technology application, paradigm and generation. The model shows how separate paradigms emerge over time to achieve a given application. It traces the eras of ferment and incremental change and shows how technology generations evolve within these. It also depicts how the eras are separated by the emergence of a dominant design, and how paradigms are replaced at a technological discontinuity. By adopting this structure, the model can demarcate the evolution of technologies at varying levels of granularity from the specific products in which they may be manifest to the industries in which they are exploited.By taking technology as the unit of analysis the model departs from previous work, which has adopted a product-based perspective predominantly. The paper discusses the managerial and research implications associated with the technology life cycle, and indicates how these inform future research directions. As well as contributing to academic knowledge, the results of this research are of value to those who make decisions about the development, exploitation and use of technology including technology developers, engineers, technologists, R & D managers, and designers.
Article
Usually scientists breed research ideas inspired by previous publications, but they are unlikely to follow all publications in the unbounded literature collection. The volume of literature keeps on expanding extremely fast, whilst not all papers contribute equal impact to the academic society. Being aware of potentially influential literature would put one in an advanced position in choosing important research references. Hence, estimation of potential influence is of great significance. We study a challenging problem of identifying potentially influential literature. We examine a set of hypotheses on what are the fundamental characteristics for highly cited papers and find some interesting patterns. Based on these observations, we learn to identify potentially influential literature via Future Influence Prediction (FIP), which aims to estimate the future influence of literature. The system takes a series of features of a particular publication as input and produces as output the estimated citation counts of that article after a given time period. We consider several regression models to formulate the learning process and evaluate their performance based on the coefficient of determination (R2). Experimental results on a real-large data set show a mean average predictive performance of 83.6% measured in R^2. We apply the learned model to the application of bibliography recommendation and obtain prominent performance improvement in terms of Mean Average Precision (MAP).
Article
We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal 'hidden' units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.
Article
This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators.
Article
New concepts and ideas build on older ones. This path dependence in knowledge evolution has promoted research to identify important knowledge elements, research trends, and opportunities by analyzing publication data. In our study, keyword networks formed from published academic articles were analyzed to examine how keywords are associated with each other and to identify important keywords and their change over time. Based on MIS publication data from 1999 to 2008, our analysis provided several notable findings. First, while the MIS field has changed rapidly, resulting in many new keywords, the connectivity among them is highly clustered. Second, the keyword networks show clear power-law distribution, which implies that the more popular a keyword, the more likely it is selected by new researchers and used in follow-on studies. In addition, a strong hierarchical structure is identified in the network. Third, the network-based perspective reveals interdisciplinary keywords which are different from popular ones and have the potential to lead research in the MIS field.
Article
A modified approach to algorithmic historiography is used to investigate the changing influence of the work of Conrad Hal Waddington over the period 1945–2004. Overall, Waddington's publications were cited by almost 5,500 source items in the Web of Science (Thomson Scientific, formerly Thomson ISI, Philadelphia, PA). Rather than simply analyzing the data set as a whole, older works by Waddington are incorporated into a series of historiographic maps (networks of highly cited documents), which show long-term and short-term research themes grounded in Waddington's work. Analysis by 10–20-year periods and the use of social network analysis soft- ware reveals structures—thematic networks and subnetworks—that are hidden in a mapping of the entire 60-year period. Two major Waddington-related themes emerge—canalization-genetic assimilation and embryonic induction. The first persists over the 60 years studied while active, visible research in the second appears to have declined markedly between 1965 and 1984, only to reappear in conjunction with the emergence of a new research field—Evolutionary Developmental Biology. © 2008 Wiley Periodicals, Inc.
Article
In this paper, co-word analysis is used to analyze the evolvement in stem cell field. Articles in the stem cell journals are downloaded from PubMed for analysis. Terms selection is one of the most important steps in co-word analysis, so the useless and the general subject headings are removed firstly, and then the major subject headings and minor subject headings are weighted respectively. Then, improved information entropy is exploited to select the subject headings with the experts consulting. Hierarchical cluster analysis is used to cluster the subject headings and the strategic diagram is formed to analyze the evolutionary trends in the stem cell field.
Article
In this study, we aim to evaluate the global scientific production of stem cell research for the past 16 years and provide insights into the characteristics of the stem cell research activities and identify patterns, tendencies, or regularities that may exist in the papers. Data are based on the online version of SCI, Web of Science from 1991 to 2006. Articles referring to stem cell were assessed by many aspects including exponential fitting the trend of publication outputs during 1991–2006, distribution of source title, author keyword, and keyword plus analysis. Based on the exponential fitting the yearly publicans of the last decade, it can also be calculated that, in 2,011, the number of scientific papers on the topic of stem-cell will be twice of the number of publications in 2006. Synthetically analyzing three kinds of keywords, it can be concluded that application of stem cell transplantation technology to human disease therapy, especially research related on “embryonic stem cell” and “mesenchymal stem cell” is the orientation of all the stem cell research in the 21st century. This new bibliometric method can help relevant researchers realize the panorama of global stem cell research, and establish the further research direction.