ArticlePDF Available

End-User Evaluations of Semantic Web Technologies

Authors:

Abstract and Figures

Stanford University's Knowledge Systems Laboratory (KSL) is working in partnership with Battelle Memorial Institute and IBM Watson Research Center to develop a suite of technologies for information extraction, knowledge representation & reasoning, and human-information interaction, in unison entitled "Knowledge Associates for Novel Intelligence" (KANI). We have developed an integrated analytic environment composed of a collection of analyst associates, software components that aid the user at different stages of the information analysis process. An important part of our participatory design process has been to ensure our technologies and designs are tightly integrate with the needs and requirements of our end users, To this end, we perform a sequence of evaluations towards the end of the development process that ensure the technologies are both functional and usable. This paper reports on that process.
Content may be subject to copyright.
End-User Evaluations of Semantic Web Technologies
Rob McCool1, Andrew J. Cowell2 and David A. Thurman3
1 Knowledge Systems Lab, Stanford University
robm@ksl.stanford.edu
2 Rich Interaction Environments, Pacific Northwest National Laboratory
andrew@pnl.gov
3 National Security Directorate, Battelle Pacific Northwest Division
thurmand@battelle.org
Abstract. Stanford University’s Knowledge Systems Laboratory (KSL) is
working in partnership with Battelle Memorial Institute and IBM Watson
Research Center to develop a suite of technologies for information extraction,
knowledge representation & reasoning, and human-information interaction, in
unison entitled “Knowledge Associates for Novel Intelligence” (KANI). We
have developed an integrated analytic environment composed of a collection of
analyst a ssociates, software components that aid th e user at d ifferent stages of
the information analysis process. An important part of our participatory design
process has been to ensure our technologies and designs are tightly integrate
with the needs and requirements of our end users, To this end, we perform a
sequence of evaluations towards the end of the development process that ensure
the technologies are both functional and usable. This paper reports on that
process.
1. Introduction
An often-overlooked element in the software engineering lifecycle is the end
user, the individual that shall eventually be using the tool, technique or
technology developed for some real purpose. The computer science historical
literature is awash with stories of lavishly funded projects that failed to take this
stakeholder into account, resulting in systems that fail to meet the exact
requirements of their users and are a chore to use. Within KANI, we have used a
participatory design process to ensure that our designs and processes are in line
with the thoughts of our subject matter experts. Here we report on our iterative
evaluation Search on TAP, an application developed at Stanford Knowledge
Systems Lab.
2 Rob McCool1, Andrew J. Cowell2 and David A. Thurman3
2. Search on TAP
Search on TAP is an end-user application that uses documents from the Semantic
Web to enhance the search experience beyond the capabilities provided by
typical Information Retrieval systems. Search on TAP builds on techniques
previously described by Guha et al. [2004a] to aggregate information from
multiple websites. The information from these websites is translated via scraping
from HTML into RDF, and then merged together via a series of owl:sameAs
assertions [Guha et al., 2004b]. The end result is a coherent data set that can then
be used to perform both entity-based search as well as traditional keyword
search. Search on TAP covers 31 source sites over 12 topics, resulting in 188,680
pages containing 1,089,389 entities.
2.1 The User Experience
The use of structured information from the Semantic Web enables users to
perform queries that are not possible with a simple keyword based search
engine. The types of queries that the Search on Tap engine supports include:
Entity queries about a single named entity. While keyword search
engines can perform such queries, Search on TAP supports
disambiguation of named entities. For example, in a query “Harrison
Ford”, the system can distinguish between Harrison Ford, the modern
actor who played Han Solo, and Harrison Ford, the silent film star from
the 1920’s.
Attribute queries allow the user to ask for specific attributes of an entity,
such as Harrison Ford’s birth date, or the population of China. The user
may also ask for entities related to another entity, such as querying for
“Chicago buildings” or “rail mobile nuclear m issiles.
Comparison queries allow the user to ask for entities that compare in a
particular way to another entity. For example, buildings taller than the
Sears Tower or roller coasters faster than the American Eagle.
Group queries support searching for groups of entities, such as countries
with a population greater than 250 million or movies starring both Meg
Ryan and Tom Hanks.
Search on TAP can be accessed online at http://tap.stanford.edu.
2.2 The User Interface
The user interface for the Search on TAP application is a basic browser-based
search interface consisting of a single text entry box labeled Query and a
End-User Evaluations of Semantic Web Technologies 3
submit button. This is familiar to users and is comparable to other search
technologies. As the user types, suggestions are offered to aid the user in
automatically completing their query. As shown in Figure 1, this auto-
completion feature accelerates the search experience while also educating the
user on the coverage of the Search on TAP knowledge bases.1
Fig. 1. Entering a query into Search on TAP
This functionality was included as a result of specific user feedback,
discussed in section 2.2. Once a user submits a query (by typing a set of
words into the query box) the system responds with two columns of results.
On the left, Semantic Web-based entities matching the query are displayed,
while traditional keyword-search results are displayed on the right (see
Figure 2).
Fig. 2. Traditional & Search on TAP results
Displaying results together in this manner enables Search on TAP to add
value to the potentially ambiguous keyword results by providing another
mechanism to narrow down ones primary search aim . After selecting a
specific result, Figure 3 shows the additional query-specific details, including
associated sources tha t are presented to the user.
1 A full list of sources indexed by Search on TAP is available at
http://sp11.stanford.edu/crawl-050210.html
4 Rob McCool1, Andrew J. Cowell2 and David A. Thurman3
Fig. 3. Detailed information for results
2.3 User Interaction
The Search on TAP user interface is intended to mimic as closely as possible the
interface of traditional information retrieval systems such as search engines.
Toward that end, the primary query interface is the text entry box, into which the
user enters a set of query keywords. This interface approach was taken in lieu of a
different approach, such as having the user fill out a form based on the structure of
the underlying information, because the structure of the underlying information is
very broad. Unlike a structured information source with information about a single
domain, such as replacement parts for cellular phones, the Search on TAP dataset
is intended to be a prototype for the comprehensive information that will be
available on the Semantic Web.
Because natural language processing techniques have proven to be error prone, our
approach has been to analyze the query keywords using the structure of the
information as a guide. While this does not afford the user the level of flexibility
that true natural language processing would provide, we believe it provides a
superior user experience than an error-laden natural language processing system.
Our philosophy is similar to that which drives Palm Computing to use the
“Graffiti” alphabet [Palm, 2005]. Because handwriting recognition at the time was
very error prone, Palm developed a pseudo-alphabet that was easier to decode with
computing resources at the time, and as a result their product was considered to
have a superior user experience compared to the Apple Newton, which tried to
perform full handwriting recognition.
In this spirit, Search on TAP analyzes a set of keywords entered by the user using
the underlying structured information as a guide. In particular, it looks for words
that can represent Classes, Properties, and instances of classes, and tries to find
sequences of these representations. For example, an attribute query as described
above would look for an instance of a class, followed by a property name that can
apply to that class. To ask for Tom Hanks’ birthday, for instance, a user can enter
“tom hanks birthdate”. The underlying Semantic Web information contains an
instance of a tap:Person with an rdfs:label of “Tom Hanks”, and an rdf:Property
End-User Evaluations of Semantic Web Technologies 5
with the rdfs:label “birthdate”. The query recognizer can then recognize that the
query contains an instance followed by a property, and deduce that the user is
asking for the value of the birthdate property for Tom Hanks.
There are 37 such combinations defined in the Search on TAP system, though
many of them are redundant. These patterns are used to answer all four types of
extended queries described above.
Complications arise in the process of understanding the user’s query. The first
complication is that one query may have multiple interpretations. In fact, some of
them can have thousands of interpretations, particularly when data from movies is
included. There are hundreds of movies with very common words as their title,
which means many false matches must be processed and removed by the system.
Search on TAP uses RDF Schema information to weed out nonsensical pairings.
When finding Class-Property-Instance patterns as described above, two types of
interpretations are typically removed from consideration.
The first type of interpretation that is removed is the incomplete one. If some of
the user’s keywords matched structured information, but not all, then that
interpretation is removed from consideration. The second type which is removed is
one in which the set of classes, properties, and instances do not form anything
coherent. This occurs when properties appear which cannot be applied to a class or
instance that is also mentioned, or when a class and instance are mentioned and
cannot be reasonably connected via property values or chains of property values.
A final complication that arises with this technique is the issue of training. When
training a user to use palmOne’s Graffiti system, for example, there are 26
characters and several gestures that a user must relearn. Because the scope of the
Semantic Web is the full scope of human knowledge, the Search on TAP input
system does not have the luxury of such a small set of things to remember. Users
must potentially learn hundreds of class names and hundreds of properties. The
question of how much of a burden this is to the user is part of what we evaluated
in user testing for this tool.
3. Evaluation
A two-step process was used to evaluate the Search on TAP user experience. A
heuristic evaluation of the component first allowed us to review the tool against
common usability and consistency standards. The elements indicated by the
review were re-designed, re-implemented, and reviewed in iterative fashion. After
successful heuristic evaluation, the component was evaluated using a group of
practicing analysts.
6 Rob McCool1, Andrew J. Cowell2 and David A. Thurman3
3.1 Heuristic Evaluation
Heuristic evaluation is a usability analysis method that utilizes history and
experience to discover problems with a particular component. It involves
performing some typical tasks using the component and then noting any disparities
between of the component design and a checklist of user interface design
principles. The heuristics used in this experiment follow Nielsen’s [1994]
checklist of design principles. These include ten general principles for user
interface design including visibility of system status, consistency and standards,
minimalist design, flexibility and efficiency of use, and help and documentation. A
study conducted by two usability professionals at Battelle found 17 issues with the
initial design. These included issues relating to speaking the users language (e.g.,
removal of overly technical prose within the user interface), error prevention (e.g.,
users could potentially enter a refinement query prior to entering the main query)
and performance (e.g., a considerable amount of time passed without system status
updates).
3.2 User Evaluation
Three Battelle analysts were recruited to help evaluate Search on TAP. They were
recruited randomly from an available analyst pool and consisted of two females
and one male, all aged within their 30s. The sessions highlighted a number of
issues related to customizing such a tool for a specific domain. The participants
had difficulty understanding the breadth of information behind Search on TAP.
Being able to express that broadness while at the same time maintain expectations
(that no tool will be able to answer everything) was a challenge. One solution to
this problem was to provide an automatic completion capability (see Figure 1) that
guided the user in selecting property names and values. Another highlight from the
analyst sessions was the location of the Search on TAP results in relation to the
traditional keyword results. Originally, Search on TAP results were shown on the
right side of the screen. Due to familiarity with advertisements in popular search
engine pages, users indicated they had “learned” to disregard anything on the
right-hand side of such a page. The simple solution, shown in Figure 2, was to
reverse the columns and provide Search on TAP results on the left side.
4. Summary
The evaluation of a semantic search application using heuristic and user-centered
methods provided insight into users’ preferences for interacting with semantic
information. Subsequent design changes based on user feedback resulted in
improvements to the application, leading to a more useful tool for a sample user
population. The Search on Tap technology is currently undergoing more formal
testing under a government-sponsored evaluation program.
End-User Evaluations of Semantic Web Technologies 7
Acknowledgements
The authors would like to thank their respective teams at Stanford University and
Battelle Pacific Northwest Division. This work is supported in part by the
Advanced Research and Development Activity’s Novel Intelligence from Massive
Data (NIMD) program.
References
[Guha et al., 2004a] R. Guha, R. McCool, and E. Miller. Semantic Search. In
Proceedings of WWW 2004, Budapest, Hungary, 2003.
[Guha et al., 2004b] R.V. Guha, Rob McCool, Richard Fikes, Contexts for the
Semantic Web. In Proceedings of the Third International Semantic
WebConference, Hiroshima, Japan, 2004.
[Nielsen, J., 1994] Jacob Neilsen. Heuristic evaluation. In Nielsen, J., and Mack,
R.L. (Eds.), Usability Inspection Methods, John Wiley & Sons, New York, NY.
[Palm, 2005] palmOne, Ways to Enter Data into a palmOne Device.
http://www.palm.com/us/products/input/
... However, in prescription drug abuse, gold standard datasets are unavailable. In general, the unavailability of standardized datasets for evaluating semantic search system is a common issue in the semantic web community [20,21]. To evaluate our approach, we perform a comparative analysis of our system against existing search systems through a user-driven evaluation. ...
... That is, among the evaluators how many agreed their first, second, third result and so on, were relevant. Given the absence of a goal standard dataset for evaluation, as noted in [20,21,22], was deemed a reasonable compromise. Scenario 2: The same evaluators repeated the evaluation for a different query scenario. ...
Article
While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and “intelligible constructs not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval (or knowledge-aware search system) that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of certain specific expressions that belong to such patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems.
... This difference in the search space used by ontology-based search systems introduces a big gap in the evaluation methodologies used by the two different research communities. While the evaluation methods used by the IR community are systematic, easily reproducible , and scalable, the evaluation methods used by the semantic technologies community rely on user-centered studies (Sure & Iosif, 2002) (McCool, Cowell, & Thurman, 2005) (Todorov & Schandl, 2008 ) and therefore they tend to be high-cost, nonscalable and difficult to reproduce. This use of user-centered evaluation methods also involves three main limitations: ...
... Other evaluation approaches, such as the one reported in (McCool, Cowell, & Thurman, 2005) to test the TAP search engine (Guha, McCool, & Miller, 2003 ) make use of usercentered evaluation methodologies that evaluate the user satisfaction interacting with the system but do not measure the quality of results returned by the search engine. ...
Article
Full-text available
The construction of standard datasets and benchmarks to evaluate ontology-based search approaches and to compare then against baseline IR models is a major open problem in the semantic technologies community. In this paper we propose a novel evaluation benchmark for ontology-based IR models based on an adaptation of the well-known Cranfield paradigm (Cleverdon, 1967) traditionally used by the IR community. The proposed benchmark comprises: 1) a text document collection, 2) a set of queries and their corresponding document relevance judgments and 3) a set of ontologies and Knowledge Bases covering the query topics. The document collection and the set of queries and judgments are taken from one of the most widely used datasets in the IR community, the TREC Web track. As a use case example we apply the proposed benchmark to compare a real ontology-based search model (Fernandez, et al., 2008) against the best IR systems of TREC 9 and TREC 2001 competitions. A deep analysis of the strengths and weaknesses of this benchmark and a discussion of how it can be used to evaluate other ontology-based search systems is also included at the end of the paper.
... The search techniques are one-field one-shot search i.e. users retrieve information by building a question/query through only a text field and receive the answer in response. There have been end user evaluations on semantic web to improve the human-semantic web interaction [11,22]. There are various methods for evaluating a method such as Concept Testing [23], Heuristic Evaluation [24] and User Experience Evaluation [29]. ...
Chapter
Searching for domain-specific information on the web is tough. Community documents are therefore made searchable with a dedicated search platform. Search Methods employed on a document corpora are often evaluated over the aspect of efficiency and not focusing on the often-overlooked user experience. In the paper, we present an evaluation of search methods over domain-specific document corpora over search methods. The document corpora are represented in RDF as well as free-text. We describe the search methods as well as present the evaluation environment prepared. Moreover, we present the result of the user study to understand the experience of a user with the search methods.KeywordsMeta-DataDomain-Specific DocumentsQuestion AnsweringInformation RetrievalUEQHuman-Computer Interaction
... On the contrary, an ontology-based semantic search system that employs sound and complete algorithms will always exhibit perfect precision and recall, as is typically the case in [23]. What seem as the most widely accepted techniques for these systems' evaluation are those based on user-centered studies ( [19], [24]). ...
Article
Full-text available
Purpose – Successful learning infrastructures and repositories often depend on well-organized content collections for effective dissemination, maintenance and preservation of resources. By combining semantic descriptions already lying or implicit within their descriptive metadata, reasoning-based or semantic searching of these collections can be enabled and produce novel possibilities for content browsing and retrieval. The specifics and necessities of such an approach, however, make it hard to assess and measure its effectiveness. The paper aims to discuss these issues. Design/methodology/approach – Therefore in this paper the authors introduce a concrete methodology toward a pragmatic evaluation of semantic searching in such scenarios, which is exemplified through the semantic search plugin the authors have developed for the popular DSpace repository system. Findings – The results reveal that this approach can be appealing to expert as well as novice users alike, improve the effectiveness of content discovery and enable new retrieval possibilities in comparison to traditional, keyword-based search. Originality/value – This paper presents applied research efforts to employ semantic searching techniques on digital repositories and to construct a novel methodology for evaluating the outcomes against various perspectives. Although this is original in itself, value lies also within the concrete and measurable results presented, accompanied by an analysis, that would be helpful to assess similar (i.e. semantic query answering and searching) techniques in the particular scenario of digital repositories and libraries and to evaluate corresponding investments. To the knowledge there has been hardly any other evaluation effort in the literature for this particular case; that is, to assess the merit and usage of advanced semantic technologies in digital repositories.
... There are some exceptions, of course. For instance, McCool et al. (2005) tested the TAP search engine (Guha et al. 2003) by measuring the user satisfaction interacting with the system, however, without measuring the quality of retrieved documents. Tomassen & Strasunskas (2009) allowed potential users to formulate queries and assess retrieved information. ...
Article
Full-text available
There is an intensive on-going research on semantic search. The progress and results in this area offer a promising prospect to improve performance of current search systems. Existing sparse evaluations of semantic search systems report improvement compared to traditional information retrieval (IR) systems. However, the results lack indications whether this improvement is optimal. Yet, majority of IR evaluation methods is mainly based on relevance of retrieved information. Typically, additional sophistication of the semantic systems adds complexity on user interaction to reach improved results. Consequently, standard IR metrics as recall and precision do not suffice alone to measure user satisfaction because of complexity and effort needed to use semantic search tools. Furthermore, evaluation methods based on recall and precision do not indicate the causes for variation in different retrieval results. There are many factors that influence the performance of ontology-driven information retrieval, such as query quality, ontology quality, complexity of user interaction, difficulty of a searching topic with respect to retrieval, indexing, searching, presentation of results, and ranking methods. Therefore, the paper targets to deepen understanding in evaluation of semantic search systems. The main objective is to analyse essential components of such IR systems and establish constructs that would give a close-to-complete insight of the system's performance. We survey a set of representative semantic search systems and their evaluation methods. Then we conceptualise and outline a proposal for a holistic evaluation of semantic search systems. The framework is constructed based on analysis and findings from a contemporary literature. Hence, the contribution of the paper is as follows: structured review and classification of semantic search systems, analysis of evaluation of the systems, and the derived evaluation framework.
... TREC competitions, has been common for decades, the SW community is still a long way from defining standard evaluation benchmarks to judge the quality of semantic search methods. Current approaches for SW technology evaluation are based on user-centred methods (Sure & Iosif, 2002;McCool, Cowell & Thurman, 2005;Todorov & Schandl, 2008), and therefore tend to be high-cost, nonscalable and difficult to repeat, especially at Web scale. ...
Article
Full-text available
Currently, techniques for content description and query processing in Information Retrieval (IR) are based on keywords, and therefore provide limited capabilities to capture the conceptualizations associated with user needs and contents. Aiming to solve the limitations of keyword-based models, the idea of conceptual search, understood as searching by meanings rather than literal strings, has been the focus of a wide body of research in the IR field. More recently, it has been used as a prototypical scenario (or even envisioned as a potential “killer app”) in the Semantic Web (SW) vision, since its emergence in the late nineties. However, current approaches to semantic search developed in the SW area have not yet taken full advantage of the acquired knowledge, accumulated experience, and technological sophistication achieved through several decades of work in the IR field. Starting from this position, this work investigates the definition of an ontology-based IR model, oriented to the exploitation of domain Knowledge Bases to support semantic search capabilities in large document repositories, stressing on the one hand the use of fully fledged ontologies in the semantic-based perspective, and on the other hand the consideration of unstructured content as the target search space. The major contribution of this work is an innovative, comprehensive semantic search model, which extends the classic IR model, addresses the challenges of the massive and heterogeneous Web environment, and integrates the benefits of both keyword and semantic-based search. Additional contributions include: an innovative rank fusion technique that minimizes the undesired effects of knowledge sparseness on the yet juvenile SW, and the creation of a large-scale evaluation benchmark, based on TREC IR evaluation standards, which allows a rigorous comparison between IR and SW approaches. Conducted experiments show that our semantic search model obtained comparable and better performance results (in terms of MAP and P@10 values) than the best TREC automatic system.
... However, the diversity of semantic technologies and the lack of uniformity in the construction and exploitation of the data sources are some of the main reasons why there is still not a general adoption of evaluation methods. Therefore evaluations are generally small scale with ad-hoc tasks that represent the user needs and the system functionality to be evaluated (Uren et al., 2010), (McCool et al., 2005). Although the different evaluation set-ups and techniques undermine the value of direct comparisons , nevertheless, they are still useful to do an approximate assessment of the strength and weaknesses of the different systems. ...
Article
Full-text available
With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open userfriendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources.
... There have also been several relevant user studies in the area of semantic search (e.g., Guha et al., 2003;McCool et al., 2005;Sure and Iosif, 2002;Kaufmann and Bernstein, 2007). We will not discuss these at length because good practice for user studies of semantic search tools is not different from that for any other kind of system. ...
Article
Full-text available
Evaluations of semantic search systems are generally small scale and ad hoc due to the lack of appropriate resources such as test collections, agreed performance criteria and independent judgements of performance. By analysing our work in building and evaluating semantic tools over the last five years, we conclude that the growth of the semantic web led to an improvement in the available resources and the consequent robustness of performance assessments. We propose two directions for continuing evaluation work: the development of extensible evaluation benchmarks and the use of logging parameters for evaluating individual components of search systems.
Conference Paper
Full-text available
A central theme of the Semantic Web is that programs should be able to easily aggregate data from different sources. Unfortunately, even if two sites provide their data using the same data model and vocabulary, subtle differences in their use of terms and in the assumptions they make pose challenges for aggregation. Experiences with the TAP project reveal some of the phenomena that pose obstacles to a simplistic model of aggregation. Similar experiences have been reported by AI projects such as Cyc, which has led to the development and use of various context mechanisms. In this paper we report on some of the problems with aggregating independently published data and propose a context mechanism to handle some of these problems. We briefly survey the context mechanisms developed in AI and contrast them with the requirements of a context mechanism for the Semantic Web. Finally, we present a context mechanism for the Semantic Web that is adequate to handle the aggregation tasks, yet simple from both computational and model theoretic perspectives.