Article

Semantic domain comparison of research keywords by indicator-based fuzzy distances: A new prospect

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Social networks have become popular among researchers and scientists. Specialized platforms for researchers offer many metrics and indicators which are used to evaluate various scientists and assess the strength of their impact. In this article the authors perform systematic comparison between the main university level ResearchGate (RG) metrics: total RG Score, number of publications, number of affiliated profiles and ARWU. A tool for acquiring the RG metrics of research units and a framework for calculating alternative university ranks was implemented and tested. As a point of reference the ranking system of the Academic Ranking of World Universities (ARWU, 2019) was used. The authors used a web scraping technique to acquire data. Data analysis was based on Spearman's rho and multiple linear regression (MLR). Ten additional ranks were developed and compared with the benchmark ranking. The k-means clustering method was used to identify the groups of ARWU universities. The research results show that the metrics provided by specialized social networks can be used for the assessment of universities, however, an in-depth evaluation requires a more advanced procedure and indicators to measure many areas of scholarly activity like research, integration, application, teaching, and co-creation. Clustering method showed also that the distance between the ARWU universities measured in values of RG metrics are bigger for the top of the ranking. The university authorities should encourage researchers to use specialized social networks, and train them how to do it, to promote not only their own achievements, but also to increase the impact and recognition of their respective research units. At the end of the article some limitations of the method used and some practical recommendations for the university authorities were formulated.
Article
Full-text available
This article proposes an approach to compare semantic networks using concept-centered sub-networks. A concept-centered sub-network is defined as an induced network whose vertex set consists of the given concept (ego) and all its adjacent concepts (alters) and whose link set consists of all the links between the ego and alters (including alter-alter links). By looking at the vertex and link overlap indices of concept-centered networks we infer semantic similarity of the underlying concepts. We cross-evaluate the semantic similarity by close-reading textual contexts from which networks are derived. We illustrate the approach on written and interview texts from an ethnographic study of flood management practice in England.
Article
Full-text available
Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.
Article
Full-text available
Our study is a scientometric analysis of research publications of science and social science disciplines in Pakistan during 2009–2018. The study examines 2000 published articles belonging to 50 research scholars of different disciplines. This analysis is conducted on three different levels: researcher level, field level and domain level. In this paper we discuss readability scores, title formats, single and multiple authorships of articles, citation rates, publication rates over time, the research contribution of both genders and the impact of PhD institution of authors in research publications. The IR features like number of words, sentence length and readability scores are extracted. The results indicate that more research is being conducted in science disciplines as compared to social sciences. It also appears that the readability scores of authors changed over time, and Science articles have higher readability scores compared to Social Science articles. Moreover, title formats are observed to effect citation rates. Titles tend to use more colons over time; and the use of simple sentences as titles is more than 50%, while use of question marks is below 3%, the rest are titles with colons. The citation rates also appear to increase due to co-authorship. Male researchers contribute more to research activities in science disciplines while in social science the ratio is almost equal while PhD institutes appear to influence the performance of researcher slightly. Finally, it is observed that the publication rates of Science disciplines are much higher than Social Science articles, depicting that the growth of Science disciplines is more rapid in Pakistan.
Article
Full-text available
Scientific articles available in Open Access (OA) have been found to attract more citations and online attention to the extent that it has become common to speak about OA Altmetrics Advantage. This research investigates how the OA Altmetrics Advantage holds for a specific case of research articles, namely the research outputs from universities in Finland. Furthermore, this research examines disciplinary and platform specific differences in that (dis)advantage. The new methodological approaches developed in this research focus on relative visibility, i.e. how often articles in OA journals receive at least one mention on the investigated online platforms, and relative receptivity, i.e. how frequently articles in OA journals gain mentions in comparison to articles in subscription-based journals. The results show significant disciplinary and platform specific differences in the OA advantage, with articles in OA journals within for instance veterinary sciences, social and economic geography and psychology receiving more citations and attention on social media platforms, while the opposite was found for articles in OA journals within medicine and health sciences. The results strongly support field- and platform-specific considerations when assessing the influence of journal OA status on altmetrics. The new methodological approaches used in this research will serve future comparative research into OA advantage of scientific articles over time and between countries.
Article
Full-text available
The purpose of this study is to ascertain the suitability of GS’s url-based method as a valid approximation of universities’ academic output measures, taking into account three aspects (retroactive growth, correlation, and coverage). To do this, a set of 100 Turkish universities were selected as a case study. The productivity in Web of Science (WoS), Scopus and GS (2000–2013) were captured in two different measurement iterations (2014 and 2018). In addition, a total of 18,174 documents published by a subset of 14 research-focused universities were retrieved from WoS, verifying their presence in GS within the official university web domain. Findings suggest that the retroactive growth in GS is unpredictable and dependent on each university, making this parameter hard to evaluate at the institutional level. Otherwise, the correlation of productivity between GS (url-based method) and WoS and Scopus (selected sources) is moderately positive, even though it varies depending on the university, the year of publication, and the year of measurement. Finally, only 16% out of 18,174 articles analyzed were indexed in the official university website, although up to 84% were indexed in other GS sources. This work proves that the url-based method to calculate institutional productivity in GS is not a good proxy for the total number of publications indexed in WoS and Scopus, at least in the national context analyzed. However, the main reason is not directly related to the operation of GS, but with a lack of universities’ commitment to open access.
Article
Full-text available
Calculating semantic similarity between words is a challenging task of a lot of domains such as Natural language processing (NLP), information retrieval and plagiarism detection. WordNet is a lexical dictionary conceptually organized, where each concept has several characteristics: Synsets and Glosses. Synset represent sets of synonyms of a given word and Glosses are a short description. In this paper, we propose a new approach for calculating semantic similarity between two concepts. The proposed method is based on set theory’s concepts and WordNet properties, by calculating the relatedness between the synsets’ and glosses’s of the two concepts.
Article
Full-text available
There have been many attempts to identify relationships among concepts corresponding to terms from biomedical information ontologies such as the Unified Medical Language System (UMLS). In particular, vector representation of such concepts using information from UMLS definition texts is widely used to measure the relatedness between two biological concepts. However, conventional relatedness measures have a limited range of applicable word coverage, which limits the performance of these models. In this paper, we propose a concept-embedding model of a UMLS semantic relatedness measure to overcome the limitations of earlier models. We obtained context texts of biological concepts that are not defined in UMLS by utilizing Wikipedia as an external knowledgebase. Concept vector representations were then derived from the context texts of the biological concepts. The degree of relatedness between two concepts was defined as the cosine similarity between corresponding concept vectors. As a result, we validated that our method provides higher coverage and better performance than the conventional method.
Article
Full-text available
The assessment of semantic similarity between lexical terms plays a critical part in semantic-oriented applications for natural language processing and cognitive science. The optimization of calculation models is still a challenging issue for improving the performance of similarity measurement. In this paper, we investigate WordNet-based measures including distance-based, information-based, feature-based and hybrid. Among them, the distance-based measures are considered to have the lowest computational complexity due to simple distance calculation. However, most of existing works ignore the meronymy relation between concepts and the non-uniformity of path distances caused by various semantic relations, in which path distances are simply determined by conceptual hyponymy relation. To solve this problem, we propose a novel model to calculate the path distance between concepts, and also propose a similarity measure which nonlinearly transforms the distance to semantic similarity. In the proposed model, we assign different weights in accordance with various relations to edges that link different concepts. On basis of the distance model, we use five structure properties of WordNet for similarity measurement, which consist of multiple meanings, multiple inheritance, link type, depth and local density. Our similarity measure is compared against state-of-the-art WordNet-based measures on M&C dataset, R&G dataset and WS-353 dataset. According to experiment results, the proposed measure in this work outperforms others in terms of both Pearson and Spearman correlation coefficients, which indicates the effectiveness of our distance model. Besides, we construct six additional benchmarks to prove that the proposed measure maintains stable performance.
Article
Full-text available
The standard image representation can be considered as obsolete in image processing area since it was designed mainly to visualize images and not to support image processing algorithms. For that reason, seeking alternative image representations becomes an important issue. This paper focuses on images represented by means of fuzzy functions (so-called fuzzy images) and investigates its sensibility under arbitrary distortions. Then, it shows that this fuzzy representation is less sensitive to distortions than the raster image representation. Finally, it also shows the impact on a practical image processing task, where the fuzzy representation has achieved significantly better results.
Article
Full-text available
Science studies are persistently challenged by the elusive structures of their subject matter, be it scientific knowledge or the various collectivities of researchers engaged with its production. Bibliometrics has responded by developing a strong and growing structural bibliometrics, which is concerned with delineating fields and identifying thematic structures. In the course of these developments, a concern emerged and is steadily growing. Do the sets of publications, authors or institutions we identify and visualise with our methods indeed represent thematic structures? To what extent are results of topic identification exercises determined by properties of knowledge structures, and to what extent are they determined by the approaches we use? Do we produce more than artefacts? These questions triggered the collective process of comparative topic identification reported in this special issue. The introduction traces the history of bibliometric approaches to topic identification, identifies the major challenges involved in these exercises, and introduces the contributions to the special issue.
Article
Full-text available
We present an unsupervised explainable word embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a word using human-readable labels, thereby it readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed word embedding model, we consider its usefulness in three fundamental tasks: 1) intruder detection - to evaluate its ability to identify a non-coherent vector from a list of coherent vectors, 2) ability to cluster - to evaluate its tendency to group related vectors together while keeping unrelated vectors in separate clusters, and 3) sorting relevant items first - to evaluate its ability to rank vectors (items) relevant to the query in the top order of the result. For each task, we also propose a strategy to generate a task-specific human-interpretable explanation from the model. These demonstrate the overall effectiveness of the explainable embeddings generated by EVE. Finally, we compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across the three tasks, and report improvements over the state-of-the-art.
Article
Full-text available
Measuring Semantic Textual Similarity (STS), between words/terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.
Article
Full-text available
Recent interest towards university rankings has led to the development of several ranking systems at national and global levels. Global ranking systems tend to rely on internationally accessible bibliometric databases and reputation surveys to develop league tables at a global level. Given their access and in-depth knowledge about local institutions, national ranking systems tend to include a more comprehensive set of indicators. The purpose of this study is to conduct a systematic comparison of national and global university ranking systems in terms of their indicators, coverage and ranking results. Our findings indicate that national rankings tend to include a larger number of indicators that primarily focus on educational and institutional parameters, whereas global ranking systems tend to have fewer indicators mainly focusing on research performance. Rank similarity analysis between national rankings and global rankings filtered for each country suggest that with the exception of a few instances global rankings do not strongly predict the national rankings.
Article
Full-text available
Scientometrics is the study of the quantitative aspects of the process of science as a communication system. It is centrally, but not only, concerned with the analysis of citations in the academic literature. In recent years it has come to play a major role in the measurement and evaluation of research performance. In this review we consider: the historical development of scientometrics, sources of citation data, citation metrics and the "laws" of scientometrics, normalisation, journal impact factors and other journal metrics, visualising and mapping science, evaluation and policy, and future developments.
Article
Full-text available
In just a decade, the international university rankings have become dominant measures of institutional performance for policy-makers worldwide. Bolstered by the façade of scientific neutrality, these classification systems have reinforced the hegemonic model of higher education – that of the elite, Anglo-Saxon research university – on a global scale. The process is a manifestation of what Bourdieu and Wacquant have termed US “cultural imperialism.” However, the rankings paradigm is facing growing criticism and resistance, particularly in regions such as Latin America, where the systems are seen as forcing institutions into a costly and high-stakes “academic arms race” at the expense of more pressing development priorities. That position, expressed at the recent UNESCO conferences in Buenos Aires, Paris, and Mexico City, shows the degree to which the rankings have become a fundamental element in the contest for cultural hegemony, waged through the prism of higher education.
Conference Paper
Full-text available
This paper describes the Duluth systems that participated in Task 2 of SemEval-2012. These systems were unsupervised and relied on variations of the Gloss Vector measure found in the freely available software package WordNet:: Similarity. This method was moderately successful for the Class-Inclusion, Similar, Contrast, and Non-Attribute categories of semantic relations, but mimicked a random baseline for the other six categories.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
Automated measures of semantic relatedness are important for effectively processing medical data for a variety of tasks such as information retrieval and natural language processing. In this paper, we present a context vector approach that can compute the semantic relatedness between any pair of concepts in the Unified Medical Language System (UMLS). Our approach has been developed on a corpus of inpatient clinical reports. We use 430 pairs of clinical concepts manually rated for semantic relatedness as the reference standard. The experiments demonstrate that incorporating a combination of the UMLS and WordNet definitions can improve the semantic relatedness. The paper also shows that second order co-occurrence vector measure is a more effective approach than path-based methods for semantic relatedness.
Article
Full-text available
The LEXIMAPPE method and Multidimensional Scaling (MDS) are discussed as methods to visualize (map) characteristics of structures of word-occurrence (co-word) relations. Utilization of MDS is proposed as an alternative mapping method able to circumvent problematic features of LEXIMAPPE maps of the total co-word structure. A comparison of both methods on the same real-life co-word matrix demonstrates topological advantages of an extended MDS-mapping.
Conference Paper
Full-text available
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and sum- marization, and often depends on knowl- edge of a broad range of real-world con- cepts and relationships. We address this knowledge integration issue by comput- ing semantic relatedness using person- alized PageRank (random walks) on a graph derived from Wikipedia. This pa- per evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analy- sis. We evaluate our techniques on stan- dard word relatedness and text similarity datasets, finding that they capture similar- ity information complementary to existing Wikipedia-based relatedness measures, re- sulting in small improvements on a state- of-the-art measure.
Article
Assessment of new research topics and emerging technologies in any branch of knowledge is important for researchers, universities and research institutes, research investors, industry sectors, and scientific policymakers for a variety of reasons. The basic premise of this research is that the topics of interest for academic research are those that are yet underdeveloped, but are relatively well sponsored by investors. This paper proposes a method to identify and evaluate topics for their research, industrial and commercial potential based on development, investment and investment-to-development ratio (investment appeal). Since the target audience of this paper is researchers in all fields of knowledge who are mostly unfamiliar with scientometric schemes, the proposed method is aimed to be simple, based on meta-databases with easy access, without any need to clustering on keywords. The development index is defined as the keyword link strength obtained from the keyword co-occurrence network, and investment is introduced as the number of sponsors associated with each keyword. From the qualitative analysis of the development-investment diagram, six sets of keywords, entitled as: for Research, for Industry, for Commerce, Matured, Academic and Chaotic, are identified. Due to uncertain membership of research topics to these sets and their relative overlapping, they are defined as fuzzy sets. A fuzzy model, called as Fuzzy Research Ranking System (FRRS), is designed to characterize the fuzzy behavior of research topics and their potential assessment, the output of which is the membership of keywords to any of the six predefined fuzzy sets. The proposed method has been implemented for a sample knowledge domain, Geo-Engineering, which is an interdisciplinary field with significant technological capacity. Expert review of the results shows that the method is relatively well qualified for its ability to identify research topics with technological and industrial perspectives from purely scientific keywords, and may efficiently introduce a ranked list of research topics to the researchers.
Chapter
Semantics can be used to assess responses in question answering systems (QAS). The responses are typically short sentences. Assessing short sentences for similarity with the expected answer is a challenge for Artificial Intelligence. Unlike long paragraphs, short texts lacks the adequate and accurate semantic information. Existing algorithms don’t work well for short texts due to insufficient semantic information. This Paper provides the state of art on semantic similarity techniques and proposed the research framework to enhance the accuracy of short texts.KeywordsNLPText summarizationSemantic similarityShort textInformation contentInformation retrievalText assessmentAccuracy QASSemantic information
Article
Semantic similarity is a fundamental task in natural language processing that determines the similarity between two concepts within a taxonomy. For example, a pair of words (e.g., car and bike) appear similar because they share the same category (e.g., vehicle). Numerous computation methods, such as distance-based and feature-based approaches, are proposed to precisely depict this similarity. As knowledge graphs become heterogeneous (e.g., DBpedia), existing methods have limitations on utilizing multi-view features (e.g., abstract, structure, and categories). On the one hand, some features are incomplete for various reasons, reducing the effectiveness of embedding methods. On the other hand, the hidden connections among multi-view features are omitted by existing approaches. To address the problems mentioned above, we first extract three subgraphs from a heterogeneous knowledge graph and then combine various embedding approaches to capture the global semantics of each concept. Next, we offer subgraph-based feature fusion models that improve concept representation by fusing multi-view features. Finally, we devise mixed computation methods to calculate the semantic similarity between the two concepts. Experiment results show that multi-view features, particularly the abstract feature, can effectively improve the performance of the proposed methods. Compared to existing approaches, our methods significantly improve the Pearson correlation coefficient by about 7%. The source code of this paper is available at: https://github.com/fiego/SubgraphSS.
Article
The international and global dimension of higher education continues to expand in an increasingly inter-dependent and connected world, notwithstanding geo-political tensions and conflicts. There are several different strands of research and scholarship in relation to international and global phenomena. Each has distinctive perspectives, methods, and concerns; issues that it seeks to investigate, understand, and explain; and often, policies and practices that it wants to develop and change. These strands of thought can be summarised as follows: comparative higher education, higher education and international development, post-colonial studies of higher education, global higher education studies, and studies of international education mobility. This Special Issue of Oxford Review of Education adds a further strand, one not prominent in the English-language literature but longstanding and influential in China, the perspective of thinking through the world (tianxia). Each of these strands is the starting point for one of the six articles in the Special Issue. The introduction compares the approaches taken in the six articles to matters of geo-spatiality, ethics and values, and relations of power. It also discusses how each perspective sees the other perspectives; and their respective suggestions about the future development of research and scholarship in relation to international and global higher education.
Article
This paper presents REWOrD, an approach to compute semantic relatedness between entities in the Web of Data representing real word concepts. REWOrD exploits the graph nature of RDF data and the SPARQL query language to access this data. Through simple queries, REWOrD constructs weighted vectors keeping the informativeness of RDF predicates used to make statements about the entities being compared. The most informative path is also considered to further refine informativeness. Relatedness is then computed by the cosine of the weighted vectors. Differently from previous approaches based on Wikipedia, REWOrD does not require any prepro- cessing or custom data transformation. Indeed, it can lever- age whatever RDF knowledge base as a source of background knowledge. We evaluated REWOrD in different settings by using a new dataset of real word entities and investigate its flexibility. As compared to related work on classical datasets, REWOrD obtains comparable results while, on one side, it avoids the burden of preprocessing and data transformation and, on the other side, it provides more flexibility and applicability in a broad range of domains.
Article
Nowadays, recommendation models based on matrix factorization (MF) suffer from the problem of rating sparsity because user-product rating matrix is usually sparse. To address the problem, it is significant to fuse some contextual data or side information on basic MF models. According to this core idea, this paper proposes a modified recommendation model, MFFR (matrix factorization fusing reviews) which recommend products by considering the fusing information on user reviews and user ratings. First, MFFR constructs user-product preference matrix from user reviews by using Latent Dirichlet Allocation (LDA) topic model. Then MFFR predicts ratings and generates personalized top-n recommendation products by using MF model to learn comprehensive latent factors of user-product rating matrix and user-product preference matrix simultaneously. The experimental results of three published datasets demonstrate that our model MFFR can achieve more accurate predicted ratings and hits more correct products of top-n recommendation than the comparative traditional models. MFFR can effectively raise the quality of recommendation, especially in the high level of rating sparsity.
Article
The similarity method has an important effect on some tasks of natural language processing, such as information retrieval, automatic translation and named entity recognition. Hypernymy/hyponymy relations are widespread in semantic webs and knowledge graphs, so computing the similarity of hypernymy/hyponymy is a key issue in the text processing field. All measures of both feature-based and IC-based methods have obvious deficiencies. The feature-based method estimated the similarity by the depth of the node, and the IC-based method computed the similarity by the position of the deepest common parent. The deficiency of the feature-based method and IC-based method is that they include one parameter, so the performance is slightly inaccurate and unstable. To address this deficiency, our paper proposed a hybrid method that computes the similarity of hypernymy/hyponymy by a hybrid parameter (dhype(lch)) that implies two parameters: depth of the node and position of the deepest common parent. Compared with several similarity methods, the proposed method achieved better performance in terms of accuracy rate, Pearson correlation coefficient and artificial fitting effect.
Article
Recommender systems play an indispensable role in today’s online businesses. In these systems, memory-based (neighborhood-based) collaborative filtering is an important strategy to predict items as expected by users. It consists of two phases: computing the preference similarity between each pair of users in the offline phase and predicting the rating of an active user for a target item in the online phase by aggregating ratings of his/her neighbors for the target item. Previous studies on memory-based collaborative filtering have heavily concentrated on proposing methods for the computation of user preference similarity. To further improve the performance of memory-based collaborative filtering, this paper is aimed at the rating prediction phase. By optimizing a proposed objective function, the method we used in the rating prediction phase helps more accurately estimate the weight between the active user and each of his/her neighbors. The experimental results show that the proposed method outperforms others, especially in the case of a small and medium number of selected neighbors.
Article
Recent research has shifted to investigating knowledge integration in an interdisciplinary field and measuring the interdisciplinarity. Conventional citation analysis does not consider the context of citations, which limits the understanding of interdisciplinary knowledge integration. This study introduces a novel analytical framework to characterize interdisciplinary knowledge integration by both the content, i.e., integrated knowledge phrases (IKPs), and location of citances (i.e., citing sentences) in addition to citations. Seven knowledge categories are used to classify IKPs, including Research Subject, Theory, Research Methodology, Technology, Human Entity, Data, and Others. The eHealth field is explored as an exemplar interdisciplinary field in the case study. The result reveals that the ranks of source disciplines quantified by the integrated knowledge phrases are different from those by citations, especially in terms of average knowledge integration density. The distributions of the IKPs over the knowledge categories differ among source disciplines, indicating their different contributions to knowledge integration of eHealth field. The knowledge from adjacent disciplines is integrated into the field faster than that from other disciplines. Knowledge distributions over sections of articles are also different among source disciplines, and a correlation between knowledge categories and the sections they were used is observed. The analytical framework offers a way to better understand an interdisciplinary field by disclosing the characteristics of interdisciplinary knowledge integration from the perspective of knowledge content and usage.
Article
This article takes the challenges of global governance and legitimacy seriously and looks at new ways in which international organizations (IOs) have attempted to ‘govern’ without explicit legal or regulatory directives. Specifically, we explore the growth of global performance indicators as a form of social control that appears to have certain advantages even as states and civil society actors push back against international regulatory authority. This article discusses the ways in which Michael Zürn's diagnosis of governance dilemmas helps to explain the rise of such ranking systems. These play into favored paradigms that give information and market performance greater social acceptance than rules, laws, and directives designed by international organizations. We discuss how and why these schemes can constitute governance systems, and some of the evidence regarding their effects on actors’ behaviors. Zürn's book provides a useful context for understanding the rise and effectiveness of Governance by Other Means: systems that ‘inform’ and provoke competition among states, shaping outcomes without directly legislating performance.
Article
Semantic similarity plays a critical role in geospatial cognition, semantic interoperability, information integration and information retrieval and reasoning in geographic information science. Although some computational models for semantic similarity measurement have been proposed in literature, these models overlook spatial distribution characteristics or geometric features and pay little attention to the types and ranges of properties. This paper presents a novel semantic similarity measurement approach that employs a richer structured semantic description containing properties as well as relations. This approach captures the geo-semantic similarity more accurately and effectively by evaluating the contributions for ontological properties, measuring the effect of the relative position in the ontology hierarchy structure and computing the geometric feature similarity for geospatial entities. A water body ontology is used to illustrate the approach in a case study. A human-subject experiment was carried out and the experiment results shows that this proposed approach has a good performance based on the high correlation between its computed similarity results and human's judgements of similarity.
Article
New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science Core Collection (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citations found by these six data sources to 2,515 English-language highly-cited documents published in 2006 from 252 subject categories, expanding and updating the largest previous study. Google Scholar found 88% of all citations, many of which were not found by the other sources, and nearly all citations found by the remaining sources (89-94%). A similar pattern held within most subject categories. Microsoft Academic is the second largest overall (60% of all citations), including 82% of Scopus citations and 86% of WoS citations. In most categories, Microsoft Academic found more citations than Scopus and WoS (182 and 223 subject categories, respectively), but had coverage gaps in some areas, such as Physics and some Humanities categories. After Scopus, Dimensions is fourth largest (54% of all citations), including 84% of Scopus citations and 88% of WoS citations. It found more citations than Scopus in 36 categories, more than WoS in 185, and displays some coverage gaps, especially in the Humanities. Following WoS, COCI is the smallest, with 28% of all citations. Google Scholar is still the most comprehensive source. In many subject categories Microsoft Academic and Dimensions are good alternatives to Scopus and WoS in terms of coverage.
Article
Fast-and-frugal heuristics are simple judgement strategies that are based on only a few predictor variables. Bornmann and Marewski (Scientometrics 120(2):419–459, 2019) introduced bibliometrics-based heuristics (BBHs) which are judgement strategies in evaluative bibliometrics being solely based on publication and/or citation data. To support the understanding and applying of BBHs, Bornmann (in press) proposed bibliometrics-based decision trees (BBDTs) that are visualized BBHs. In this letter to the editor, a BBDT is presented that can be used for the interpretation of results from the Leiden ranking.
Article
Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.
Article
Word embeddings resulting from neural language models have been shown to be a great asset for a large variety of NLP tasks. However, such architecture might be difficult and time-consuming to train. Instead, we propose to drastically simplify the word embeddings computation through a Hellinger PCA of the word co-occurence matrix. We compare those new word embeddings with some well-known embeddings on named entity recognition and movie review tasks and show that we can reach similar or even better performance. Although deep learning is not really necessary for generating good word embeddings, we show that it can provide an easy way to adapt embeddings to specific tasks.
Article
In this paper we consider different approaches to assigning distances between fuzzy numbers. A pseudo-metric on the set of fuzzy numbers arising from the idea of the value of a fuzzy number is described, and some of its topological properties are noted. Reducing functions are used to define a family of metrics on the space of fuzzy numbers; some convergent properties for these metrics are illustrated. Finally, a fuzzy distance between fuzzy numbers is introduced and its basic properties are studied.
Article
The paper deals with the well-known notion of (dis)similarity measures between fuzzy sets. We provide three separate lists of axioms that fit with the respective notions of “general comparison measure”, “similarity measure” and “dissimilarity measure”. Then we review some of the most important axiomatic definitions of (dis)similarity measures in the literature, by referring to the axioms in those lists satisfied by each specific definition. This common framework will make our study about the formal relationships among different axiomatic definitions much easier: some of them, which are apparently different, do in fact share many commonalities. We provide a self-contained picture of these relationships, by providing formal results and counterexamples that reflect which of the (dis)similarity definitions in the literature are connected by implication relations and which of them are not. We finalize the paper with an in-depth study about the notion of “duality” between similarity and dissimilarity measures as well as with some concluding remarks.
Article
The authors wish to present a class of mathematical functions that can be used to calculate the distance between fuzzy sets. Essential is that this distance is given by a semi-pseudometric, i.e. the triangle inequality is not fulfilled. The relation to the ordinary pseudometrics is explained. Also, some theoretical examples are provided.
Article
Given two fuzzy subsets μ and ν of a metric space S (e.g., the Euclidean plane), we define the ‘shortest distance’ between μ and ν as a density function on the non-negative reals; our definition is applicable both when μ and ν are discrete-valued and when they are ‘smooth’ (i.e., differentiable), and it generalizes the definition of shortest distance for crisp sets in a natural way. We also define the mean distance between μ and ν, and show how it relates to the shortest distance. the relationship to earlier definitions of distance between fuzzy sets [1,3] is also discussed.
Article
This paper extends the work of Pappis and Karacapilidis (1993) to present and compare the properities of several measures of similarity of fuzzy values. The measures examined in this paper are based on the geometric model, the set-theoretic approach, and the matching function S we presented in (Chen, 1988). It is shown that several properties are common to all measures and some properties do not hold for all of them.
Article
A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Article
Ordered sets of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents hence need to be extended to these ordered sets. This is done in this paper using fuzzy set techniques. First a general similarity measure is developed which contains the classical strong similarity measures such as Jaccard, Dice, Cosine and which contains the classical weak similarity measures such as Recall and Precision. Then these measures are extended to comparing fuzzy sets of documents. Measuring the similarity for ordered sets of documents is a special case of this, where, the higher the rank of a document, the lower its weight is in the fuzzy set. Concrete forms of these similarity measures are presented. All these measures are new and the ones for the weak similarity measures are the first of this kind (other strong similarity measures have been given in a previous paper by Egghe and Michel). Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT© extracts ranked documents sets in three different contexts, each for 600 request. The practical useability of the OS-measures is then discussed based on these experiments.
Conference Paper
Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.
Conference Paper
We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of different concept representations like Wiktionary pseudo glosses, the first paragraph of Wikipedia articles, English WordNet glosses, and GermaNet pseudo glosses. We show that: (i) Wiktionary is the best lexical semantic resource in the ranking task and performs comparably to other resources in the word choice task, and (ii) the concept vector based approach yields the best results on all datasets in both evaluations.