ArticlePDF Available

Social Semantic Search: A Case Study on Web 2.0 for Science

Authors:
  • iMinds - Ghent University

Abstract

When researchers formulate search queries to find relevant content on the Web, those queries typically consist of keywords that can only be matched in the content or its metadata. The Web of Data extends this functionality by bringing structure and giving well-defined meaning to the content and it enables humans and machines to work together using controlled vocabularies. Due the high degree of mismatches between the structure of the content and the vocabularies in different sources, searching over multiple heterogeneous repositories of structured data is considered challenging. Therefore, the authors present a semantic search engine for researchers facilitating search in research related Linked Data. To facilitate high-precision interactive search, they annotated and interlinked structured research data with ontologies from various repositories in an effective semantic model. Furthermore, the authors' system is adaptive as researchers can synchronize using new social media accounts and efficiently explore new datasets.
DOI: 10.4018/IJSWIS.2017100108

Volume 13 • Issue 4 • October-December 2017
Copyright © 2017, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.


Laurens De Vocht, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium
Selver Softic, Graz University of Technology, Graz, Austria
Ruben Verborgh, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium
Erik Mannens, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium
Martin Ebner, Graz University of Technology, Graz, Austria

When researchers formulate search queries to find relevant content on the Web, those queries
typically consist of keywords that can only be matched in the content or its metadata. The Web of
Data extends this functionality by bringing structure and giving well-defined meaning to the content
and it enables humans and machines to work together using controlled vocabularies. Due the high
degree of mismatches between the structure of the content and the vocabularies in different sources,
searching over multiple heterogeneous repositories of structured data is considered challenging.
Therefore, the authors present a semantic search engine for researchers facilitating search in research
related Linked Data. To facilitate high-precision interactive search, they annotated and interlinked
structured research data with ontologies from various repositories in an effective semantic model.
Furthermore, the authors’ system is adaptive as researchers can synchronize using new social media
accounts and efficiently explore new datasets.

Digital Libraries, Linked Data, Research 2.0, Semantic Search, Social Media, Web 2.0, Web of Data

The evolution of Web 2.0 enabled many users via wikis, blogs and other content publishing platforms
to become the main content providers on the web. The Web 2.0 for Science, also known as Science 2.0
or Research 2.0 aims to adapt the Web 2.0 for researchers. It entails a set of tools and services which
researchers use to discover resources, such as academic publications or events they might be interested
in, as an alternative to traditional search engines (De Vocht et al., 2011). The tools and services are
typically API’s, publishing feeds, search and discovery services and interfaces designed based on
social profiles (Parra & Duval, 2010; Ullmann et al., 2010). Research 2.0 comprises interacting with
information published on Social Media, online collaboration platforms and other Web 2.0 tools.
These platforms find more and more uptake (Van Noorden, 2014). The data is available under the
form of posts, threads, tags and user information is transferable into semantic form, since widely
used and accepted vocabularies for these domains exist. Weaving microblogs into the Web of Data is
155

Volume 13 • Issue 4 • October-December 2017
156
interesting from a researcher centric semantic search perspective. Twitter1, as exemplary microblog
Social Media platform, can help resolving scientific citations (Weller et al., 2011).
Studies on the use of microblogs like Twitter during conferences within the science community
showed that researchers were using Twitter to discuss and asynchronously communicated on topics
during conferences (Ebner et al., 2010) and in their everyday work (Reinhardt et al., 2009). A
survey on Twitter use for scientific purposes (Letierce et al., 2010) showed that Twitter is not only
a communication medium but also reliable source of data for scientific analysis, profiling tasks
and trends detection (Tao et al., 2011; Mathioudakis & Koudas, 2010; Softic, Ebner et al., 2010).
Twitter hashtags have an influence on the structuring of communication within Twitter as well as
for community building (Laniado & Mika, 2010; Bakshy et al., 2011).
However, the mass-produced data remains in so-called ‘data silos’ bound to a specific platform
or somewhere within databases. The access to these data sources is associated with specialized
application interfaces (API’s) which requires specialized technical knowledge to retrieve the data in a
desirable form. Many information public interest sources remain captured behind a so-called ‘walled
garden. Combining information resources over the walls leads to a high degree of mismatches between
vocabulary and data structure of the different sources (Herzig & Tran, 2012). When users formulate
a (Web) search in a certain context across multiple data sources, it often includes keywords. In many
cases the semantic importance and meaning of the keyword is not considered. The keyword order
and combination in a query affects the context, the precise goal of the search and thus the results.
Mostly direct querying approaches were tried and applications were often built around a limited
set of supported query patterns. Furthermore, queries are still hard to construct for end users or even
developers, despite GUIs and advanced query builders. Vocabularies are getting more streamlined
and linked data is maturing. This leads to much more possibilities compared to traditional keyword
search. Therefore, we propose a semantic model that drives the search engine, and is optimized for
this use case. The key variables that are important in this regard are the efficiency (performance and
complexity) and the effectiveness (search precision) of the proposed engine and thus indirectly the
model it implements.

Social semantic search combines concepts aimed at personalized information retrieval with well-
defined services resolving case specific Web user needs. Understanding semantic search in the scope
of information retrieval (IR) differs from the one in the Semantic Web community (Tran, Herzig &
Ladwig, 2011). However, common to many semantic search approaches is using a ‘Semantic Model’
which includes (heterogeneous) data sources, a query mechanism and a matching framework. We
investigate how researchers find the results by implementing an engine that enables them to interact
with relevant data sources. It is relevant to measure if and how well the semantic model proves to
be useful in tackling these issues. The following questions address a set of research questions by
applying social semantic search to Research 2.0:
1. How does the semantic (search) model reveal relations between resources interlinked in a scientific
research context?. Our approach and evaluation illustrates how to apply these paradigms for
semantic search within Research 2.0;
2. How well does the implementation of a semantic model enable researchers to find people
(researchers), documents (papers) and events (conferences)?, as well as some other related entities
relevant for the context;
3. How do researchers effectively search give a certain search context?, For instance detect
conferences, based on their earlier activities on social media;
4. How does the proposed engine perform compared to a relevant semantic baseline?
24 more pages are available in the full version of this
document, which may be purchased using the "Add to Cart"
button on the product's webpage:
www.igi-global.com/article/social-semantic-
search/189769?camid=4v1
This title is available in InfoSci-Journals, InfoSci-Select,
InfoSci-Journal Disciplines Computer Science, Security, and
Information Technology, InfoSci-Computer Systems and
Software Engineering eJournal Collection, InfoSci-
Networking, Mobile Applications, and Web Technologies
eJournal Collection. Recommend this product to your
librarian:
www.igi-global.com/e-resources/library-
recommendation/?id=2
Related Content
Ontological Evaluation of Scheer’s Reference Model for Production Planning
and Control Systems
Peter Fettke and Peter Loos (2012). Semantic Technologies for Business and
Information Systems Engineering: Concepts and Applications (pp. 40-58).
www.igi-global.com/chapter/ontological-evaluation-scheer-reference-
model/60055?camid=4v1a
Efficient Processing of RDF Queries with Nested Optional Graph Patterns in
an RDBMS
Artem Chebotko, Shiyong Lu, Mustafa Atay and Farshad Fotouhi (2008). International
Journal on Semantic Web and Information Systems (pp. 1-30).
www.igi-global.com/article/efficient-processing-rdf-queries-
nested/2854?camid=4v1a
Security in Semantic Interoperation
Yi Zhao, Xia Wang and Wolfgang A. Halang (2009). Handbook of Research on Social
Dimensions of Semantic Technologies and Web Services (pp. 489-504).
www.igi-global.com/chapter/security-semantic-
interoperation/35744?camid=4v1a
Identity of Resources and Entities on the Web
Valentina Presutti and Aldo Gangemi (2008). International Journal on Semantic Web
and Information Systems (pp. 49-72).
www.igi-global.com/article/identity-resources-entities-web/2849?camid=4v1a
... (De Vocht et al., 2017) presents a semantic search engine focusing in particular on integration of different sources of data in the science research field. To increase the precision of the results, the authors annotated and interlinked structured research data with ontologies from various repositories exploiting a semantic model. ...
... As to possible experimental comparisons, as anticipated in the related, there are currently no approaches that can be directly compared with the proposed one. In particular, among the approaches whose implementation was publicly available, those offering semantic features require unavailable manual annotations and/or were strictly designed to work on specific ontologies (Haslhofer et al., 2013;Thesprasith & Jaruskulchai, 2014;De Vocht et al., 2017). This is also true for semantic enterprise search engines (Cogito, Attivio, Content Analytics). ...
Article
Full-text available
This article describes how in addition to general purposes search engines, specialized search engines have appeared and have gained their part of the market. An enterprise search engine enables the search inside the enterprise information, mainly web pages but also other kinds of documents; the search is performed by people inside the enterprise or by customers. This article proposes an enterprise search engine called AMBIT¹-SE that relies on two enhancements: first, it is user-aware in the sense that it takes into consideration the profile of the users that perform the query; second, it exploits semantic techniques to consider not only exact matches but also synonyms and related terms. It performs two main activities: (1) information processing to analyse the documents and build the user profile and (2) search and retrieval to search for information that matches user's query and profile. An experimental evaluation of the proposed approach is performed on different real websites, showing its benefits over other well-established approaches. Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
... Even though XML has the advantage of surface syntax for structured Web Data and Web APIs, it lacks the computer-interpretability to support knowledge representation for organizational and end user computing applications development. One future work is to investigate how to integrate Semantic Web technologies [De Vocht, Softic, Verborgh et al. (2017); Selvan, Vairavasundaram and Ravi (2019)], such as RDF Schema, OWL and Ontology, into WMaaS to facilitate the development of intelligent cloud computing. ...
... Overall, our proposed technique contributes to the emerging field of bibliometric-enhanced information retrieval by increasing the query search capabilities of search engines and semantic search approaches on Web 2.0 (De Vocht et al., 2017;Jiang & Yang, 2018). Last but not least, this work can help improve citation-based full-text summarization techniques. ...
Preprint
Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic-and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.
... The literature suggests that social networking sites are better suited for networking and maintaining a professional image in the academic community (Dermentzi et al., 2016) Practically, scholars have been adopting academic social websites professionally for their research endeavors because of the convenience of forming new connections with their peers (Yu et al., 2016;Kuo et al., 2017). Today, scholars can collaborate, publish, and promote their work online (De Vocht et al., 2017). Wang and Chen (2012) further suggest that when more members with similar interests join an online community, network externalities can form, and interaction ties would occur. ...
Article
Full-text available
Purpose The purpose of this paper is to draw on social capital theory to develop a model to explain the determinants of a supply chain management scholar’s academic research impact. Design/methodology/approach Drawing from a database of 450 supply chain management scholars in different countries collected from ResearchGate and the World Bank, the bootstrapping method was applied on the moderated mediation analysis. Findings Analysis of the mediating role of a scholar’s social capital suggests that social capital theory has a strong explanatory power on the relationship between a scholar’s research skill and academic impact. To account for the boundary effect at the country-level, the authors further examine if this mechanism differs by country in the supply chain management research context. Research limitations/implications The findings from this study are from a single research area, which limits the generalizability of the study. Although the data are collected from different sources, including ResearchGate and the World Bank, it is cross-sectional in nature. The variables in this model do not have strong causal relationships. Practical implications The results suggest that supply chain management scholars can reap the benefits of their social capital. Specifically, scholars can enhance their academic impact by increasing their social capital. Originality/value The results provide a reference for supply chain management scholars keen on enhancing their academic research impact. It also provides a reference to explain why country-level differences can influence these scholars.
Chapter
This article describes how the traditional web search is essentially based on a combination of textual keyword searches with an importance ranking of the documents depending on the link structure of the web. However, one of the dimensions that has not been captured to its full extent is that of semantics. Currently, combining search and semantics gives birth to the idea of the semantic search. The purpose of this article is to present some new methods to semantic search to solve some shortcomings of existing approaches. Concretely, the authors propose two novel methods to semantic search by combining formal concept analysis, rough set theory, and similarity reasoning. In particular, the authors use Wikipedia to compute the similarity of concepts (i.e., keywords). The experimental results show that the authors' proposals perform better than some of the most representative similarity search methods and sustain the intuitions with respect to human judgements.
Chapter
This article describes how in addition to general purposes search engines, specialized search engines have appeared and have gained their part of the market. An enterprise search engine enables the search inside the enterprise information, mainly web pages but also other kinds of documents; the search is performed by people inside the enterprise or by customers. This article proposes an enterprise search engine called AMBIT1-SE that relies on two enhancements: first, it is user-aware in the sense that it takes into consideration the profile of the users that perform the query; second, it exploits semantic techniques to consider not only exact matches but also synonyms and related terms. It performs two main activities: (1) information processing to analyse the documents and build the user profile and (2) search and retrieval to search for information that matches user's query and profile. An experimental evaluation of the proposed approach is performed on different real websites, showing its benefits over other well-established approaches.
Article
Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic- and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.
Article
This article describes how the traditional web search is essentially based on a combination of textual keyword searches with an importance ranking of the documents depending on the link structure of the web. However, one of the dimensions that has not been captured to its full extent is that of semantics. Currently, combining search and semantics gives birth to the idea of the semantic search. The purpose of this article is to present some new methods to semantic search to solve some shortcomings of existing approaches. Concretely, the authors propose two novel methods to semantic search by combining formal concept analysis, rough set theory, and similarity reasoning. In particular, the authors use Wikipedia to compute the similarity of concepts (i.e., keywords). The experimental results show that the authors' proposals perform better than some of the most representative similarity search methods and sustain the intuitions with respect to human judgements.
Conference Paper
Full-text available
Linked Data offers an entity-based infrastructure to resolve indirect relations between resources, expressed as chains of links. If we could benchmark how effective retrieving chains of links from these sources is, we can motivate why they are a reliable addition for exploratory search interfaces. A vast number of applications could reap the benefits from encouraging insights in this field. Especially all kinds of knowledge discovery tasks related for instance to ad-hoc decision support and digital assistance systems. In this paper, we explain a benchmark model for evaluating the effectiveness of associating chains of links with keyword-based queries. We illustrate the benchmark model with an example case using academic library and conference metadata where we measured precision involving targeted expert users and directed it towards search effectiveness. This kind of typical semantic search engine evaluation focusing on information retrieval metrics such as precision is typically biased towards the final result only. However, in an exploratory search scenario, the dynamics of the intermediary links that could lead to potentially relevant discoveries are not to be neglected.
Conference Paper
Full-text available
Conference Linked Data (COLINDA)3, a recent addition to the LOD (Linked Open Data) Cloud4, exposes information about scientific events (confer- ences and workshops) for the period from 2002 up to 2015. Beside title, descrip- tion and time COLINDA includes venue information of scientific events which is interlinked with Linked Data sets of GeoNames5, and DBPedia6. Additionally in- formation about events is enhanced with links to corresponding proceedings from DBLP (L3S)7 and Semantic Web Dog Food 8 repositories. The main sources of COLINDA are WikiCfP9 and Eventseer10. The research questions addressed by this work in particular are: how scientific events can be extracted and summa- rized from the Web, how to model them in Semantic Web to be useful for mining and adapting of research related social media content in particular micro blogs, and finally how they can be interlinked with other scientific information from the Linked Data Cloud to be used as base for explorative search for researchers.
Article
Full-text available
As one of its main goals, the Research 2.0 concept focuses on the improvement of the connection and collaboration between researchers. Within this short paper we present More!, a mobile social discovery tool for researchers. We describe the application itself and present some initial results obtained by using the tool on small scenarios. Later we describe the current challenges of the tool and the future developments. Finally, we state open problems of the field and the application itself.
Article
Knowledge graphs such as Yago and Freebase have become a powerful asset for enhancing search, and are being intensively used in both academia and industry. Many existing knowledge graphs are either available as Linked Open Data, or they can be exported as RDF datasets enhanced with background knowledge in the form of an OWL 2 ontology. Faceted search is the de facto approach for exploratory search in many online applications, and has been recently proposed as a suitable paradigm for querying RDF repositories. In this paper, we provide rigorous theoretical underpinnings for faceted search in the context of RDF-based knowledge graphs enhanced with OWL 2 ontologies. We identify well-defined fragments of SPARQL that can be naturally captured using faceted search as a query paradigm, and establish the computational complexity of answering such queries. We also study the problem of updating faceted interfaces, which is critical for guiding users in the formulation of meaningful queries during exploratory search. We have implemented our approach in a fully-fledged faceted search system, , which we have evaluated over the Yago knowledge graph.
Article
We will show that semantically annotated paths lead to discovering meaningful, non-trivial relations and connections between multiple resources in large online datasets such as the Web of Data. Graph algorithms have always been key in path finding applications (e.g., navigation systems). They make optimal use of available computation resources to find paths in structured data. Applying these algorithms to Linked Data can facilitate the resolving of complex queries that involve the semantics of the relations between resources. In this paper, we introduce a new approach for finding paths in Linked Data that takes into account the meaning of the connections and also deals with scalability. An efficient technique combining pre-processing and indexing of datasets is used for finding paths between two resources in large datasets within a couple of seconds. To demonstrate our approach, we have implemented a testcase using the DBpedia dataset.
Article
How can we make the Web more useful, more intelligent, and more knowledge intensive to fulfill our demanding learning and working needs? Visual Knowledge Modeling for Semantic Web Technologies: Models and Ontologies aims to make visual knowledge modeling available to individuals as an intellectual method and a set of tools at different levels of formalization. It aims to provide to its readers a simple, yet powerful visual language to structure their thoughts, analyze information, transform it to personal knowledge, and communicate information to support knowledge acquisition in collaborative activities.
Article
Social content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3.
Article
Along with the rapid growth of the data Web, searching linked objects for information needs and for reusing become emergent for ordinary Web users and developers, respectively. To meet the challenge, we present Falcons Object Search, a keyword-based search engine for linked objects. To serve various keyword queries, for each object the system constructs a comprehensive virtual document including not only associated literals but also the textual descriptions of associated links and linked objects. The resulting objects are ranked by considering both their relevance to the query and their popularity. For each resulting object, a query-relevant structured snippet is provided to show the associated literals and linked objects matched with the query. Besides, Web-scale class-inclusion reasoning is performed to discover implicit typing information, and users could navigate class hierarchies for incremental class-based results filtering. The results of a task-based experiment show the promising features of the system.
Article
Semantic search is gradually establishing itself as the next generation search paradigm, which meets better a wider range of information needs, as compared to traditional full-text search. At the same time, however, expanding search towards document structure and external, formal knowledge sources (e.g. LOD resources) remains challenging, especially with respect to efficiency, usability, and scalability.This paper introduces Mímir—an open-source framework for integrated semantic search over text, document structure, linguistic annotations, and formal semantic knowledge. Mímir supports complex structural queries, as well as basic keyword search.Exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices and term clouds. There is also an interactive retrieval interface, where users can save, refine, and analyse the results of a semantic search over time. The more well-studied precision-oriented information seeking searches are also well supported.The generic and extensible nature of the Mímir platform is demonstrated through three different, real-world applications, one of which required indexing and search over tens of millions of documents and fifty to hundred times as many semantic annotations. Scaling up to over 150 million documents was also accomplished, via index federation and cloud-based deployment.