Natalya F. Noy’s research while affiliated with Mountain View College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (42)


Eye tracking the user experience – An evaluation of ontology visualization techniques
  • Article

November 2016

·

193 Reads

·

68 Citations

Semantic Web

Bo Fu

·

Natalya F. Noy

·

Margaret-Anne Storey

Various ontology visualization techniques have been developed over the years, offering essential interfaces to users for browsing and interacting with ontologies, in an effort to assist with ontology understanding. Yet few studies have focused on evaluating the usability of existing ontology visualization techniques. This paper presents an eye-tracking user study that evaluates two commonly used ontology visualization techniques, namely, indented list and graph. The eye-tracking experiment and analysis presented in this paper complements the set of existing evaluation protocols for ontology visualization. In addition, the results found from this study contribute to a greater understanding of the strengths and weaknesses of the two visualization techniques, and in particular, how and why one is more effective than the other. Based on approximately 500 MB of eye movement data containing around 30 million rows of gaze data generated by a Tobii eye tracker, we found evidence suggesting indented lists are more efficient at supporting information searches and graphs are more efficient at supporting information processing.


Goods: Organizing Google's Datasets

June 2016

·

354 Reads

·

196 Citations

Alon Halevy

·

Flip Korn

·

Natalya F. Noy

·

[...]

·

Steven Euijong Whang

Enterprises increasingly rely on structured datasets to run their businesses. These datasets take a variety of forms, such as structured files, databases, spreadsheets, or even services that provide access to the data. The datasets often reside in different storage systems, may vary in their formats, may change every day. In this paper, we present GOODS, a project to rethink how we organize structured datasets at scale, in a setting where teams use diverse and often idiosyncratic ways to produce the datasets and where there is no centralized system for storing and querying them. GOODS extracts metadata ranging from salient information about each dataset (owners, timestamps, schema) to relationships among datasets, such as similarity and provenance. It then exposes this metadata through services that allow engineers to find datasets within the company, to monitor datasets, to annotate them in order to enable others to use their datasets, and to analyze relationships between them. We discuss the technical challenges that we had to overcome in order to crawl and infer the metadata for billions of datasets, to maintain the consistency of our metadata catalog at scale, and to expose the metadata to users. We believe that many of the lessons that we learned are applicable to building large-scale enterprise-level data-management systems in general.


Discovering Structure in the Universe of Attribute Names

April 2016

·

30 Reads

·

19 Citations

Recently, search engines have invested significant effort to answering entity--attribute queries from structured data, but have focused mostly on queries for frequent attributes. In parallel, several research efforts have demonstrated that there is a long tail of attributes, often thousands per class of entities, that are of interest to users. Researchers are beginning to leverage these new collections of attributes to expand the ontologies that power search engines and to recognize entity--attribute queries. Because of the sheer number of potential attributes, such tasks require us to impose some structure on this long and heavy tail of attributes. This paper introduces the problem of organizing the attributes by expressing the compositional structure of their names as a rule-based grammar. These rules offer a compact and rich semantic interpretation of multi-word attributes, while generalizing from the observed attributes to new unseen ones. The paper describes an unsupervised learning method to generate such a grammar automatically from a large set of attribute names. Experiments show that our method can discover a precise grammar over 100,000 attributes of {\sc Countries} while providing a 40-fold compaction over the attribute names. Furthermore, our grammar enables us to increase the precision of attributes from 47\% to more than 90\% with only a minimal curation effort. Thus, our approach provides an efficient and scalable way to expand ontologies with attributes of user interest.


Fig. 1 Structured log of changes in Protégé and iCAT  
Table 2 The 5 extracted features for each record in the change log that are used for association mining 
Fig. 2 The iCAT user interface used for editing the ICD-11 and ICTM ontologies  
Fig. 3 Training and Prediction with Sliding Windows  
Fig. 10 Prediction Across Ontologies  
Analysis of User Editing Patterns in Ontology Development Projects
  • Article
  • Full-text available

June 2015

·

294 Reads

·

18 Citations

Journal on Data Semantics

The development of real-world ontologies is a complex undertaking, commonly involving a group of domain experts with different expertise that work together in a collaborative setting. These ontologies are usually large scale and have complex structures. To assist in the authoring process, ontology tools are key at making the editing process as streamlined as possible. Being able to predict confidently what the users are likely to do next as they edit an ontology will enable us to focus and structure the user interface accordingly and to facilitate more efficient interaction and information discovery. In this paper, we use data mining, specifically the association rule mining, to investigate whether we are able to predict the next editing operation that a user will make based on the change history. We simulated and evaluated continuous prediction across time using sliding window model. We used the association rule mining to generate patterns from the ontology change logs in the training window and tested these patterns on logs in the adjacent testing window. We also evaluated the impact of different training and testing window sizes on the prediction accuracies. At last, we evaluated our prediction accuracies across different user groups and different ontologies. Our results indicate that we can indeed predict the next editing operation a user is likely to make. We will use the discovered editing patterns to develop a recommendation module for our editing tools, and to design user interface components that better fit with the user editing behaviors.

Download

SWEET ontology coverage for earth system sciences

December 2014

·

834 Reads

·

36 Citations

Earth Science Informatics

Scientists in the Earth and Environmental Sciences (EES) domain increasingly use ontologies to analyze and integrate their data. For example, the NASA’s SWEET ontologies (Semantic Web for Earth and Environmental Terminology) have become the de facto standard ontologies to represent the EES domain formally (Raskin 2010). Now we must develop principled ways both to evaluate existing ontologies and to ascertain their quality in a quantitative manner. Existing literature describes many potential quality metrics for ontologies. Among these metrics is the coverage metric, which approximates the relevancy of an ontology to a corpus (Yao et al. (PLoS Comput Biol 7(1):e1001055+, 2011)). This paper has three primary contributions to the EES domain: (1) we present an investigation of the applicability of existing coverage techniques for the EES domain; (2) we present a novel expansion of existing techniques that uses thesauri to generate equivalence and subclass axioms automatically; and (3) we present an experiment to establish an upper-bound coverage expectation for the SWEET ontologies against real-world EES corpora from DataONE (Michener et al. (Ecol Inform 11:5–15, 2012)), and a corpus designed from research articles to specifically match the topics covered by the SWEET ontologies. This initial evaluation suggests that the SWEET ontology can accurately represent real corpora within the EES domain.


Reasoning Based Quality Assurance of Medical Ontologies: A Case Study

November 2014

·

23 Reads

·

2 Citations

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

The World Health Organisation is using OWL as a key technology to develop ICD-11 - the next version of the well-known International Classification of Diseases. Besides providing better opportunities for data integration and linkages to other well-known ontologies such as SNOMED-CT, one of the main promises of using OWL is that it will enable various forms of automated error checking. In this paper we investigate how automated OWL reasoning, along with a Justification Finding Service can be used as a Quality Assurance technique for the development of large and complex ontologies such as ICD-11. Using the International Classification of Traditional Medicine (ICTM) - Chapter 24 of ICD-11 - as a case study, and an expert panel of knowledge engineers, we reveal the kinds of problems that can occur, how they can be detected, and how they can be fixed. Specifically, we found that a logically inconsistent version of the ICTM ontology could be repaired using justifications (minimal entailing subsets of an ontology). Although over 600 justifications for the inconsistency were initially computed, we found that there were three main manageable patterns or categories of justifications involving TBox and ABox axioms. These categories represented meaningful domain errors to an expert panel of ICTM project knowledge engineers, who were able to use them to successfully determine the axioms that needed to be revised in order to fix the problem. All members of the expert panel agreed that the approach was useful for debugging and ensuring the quality of ICTM.


Figure 1. Screenshot of the question form that experts completed to verify relations in SNOMED CT. All 200 questions were randomly ordered per expert.
An empirically derived taxonomy of errors in SNOMED CT

November 2014

·

26 Reads

·

6 Citations

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

Ontologies underpin methods throughout biomedicine and biomedical informatics. However, as ontologies increase in size and complexity, so does the likelihood that they contain errors. Effective methods that identify errors are typically manual and expert-driven; however, automated methods are essential for the size of modern biomedical ontologies. The effect of ontology errors on their application is unclear, creating a challenge in differentiating salient, relevant errors with those that have no discernable effect. As a first step in understanding the challenge of identifying salient, common errors at a large scale, we asked 5 experts to verify a random subset of complex relations in the SNOMED CT CORE Problem List Subset. The experts found 39 errors that followed several common patterns. Initially, the experts disagreed about errors almost entirely, indicating that ontology verification is very difficult and requires many eyes on the task. It is clear that additional empirically-based, application-focused ontology verification method development is necessary. Toward that end, we developed a taxonomy that can serve as a checklist to consult during ontology quality assurance.


Figure 1: Overview of the method. We devised a standard workflow with which to perform crowdsourcing tasks. We then adapted this workflow to the task of ontology verification, shown above. To note, in this work we combine 'Optimization Algorithm' and 'Spam Removal & Filtering' in the 'Response Aggregation' step. However, we still highlight them because they are integral components in the generalized crowdsourcing workflow. Specific details of each item in the workflow are discussed after the overview section of the Methods. 
Figure 2: Filtering steps to select relationships for verification. The process of selecting a set of relationships follows a basic filtering strategy. First, we created a syntactic ontology module for the SNOMED CT CORE Problem List. Next, we used Snorocket 19 to find all entailments from the CORE subset. From the entire set of entailments, we then removed all asserted axioms and any trivial entailments (those where the entailment's justification contains only one axiom). From this set, we removed all entailments that were indirect as defined in the OWL API. 20 Finally, we required that both the parent and child of a relationship be contained in CORE, as some entailments contain concepts that are not in CORE but are necessary for the syntactic module. To create a manageable study size for the experts, we subsampled this filtered dataset to 200 relationships. 
Table 2 : Mean free marginal j between experts themselves and the crowd 
Figure 4 of 5
Figure 5 of 5
Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT

October 2014

·

145 Reads

·

54 Citations

Journal of the American Medical Informatics Association

Objectives: The verification of biomedical ontologies is an arduous process that typically involves peer review by subject-matter experts. This work evaluated the ability of crowdsourcing methods to detect errors in SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) and to address the challenges of scalable ontology verification. Methods: We developed a methodology to crowdsource ontology verification that uses micro-tasking combined with a Bayesian classifier. We then conducted a prospective study in which both the crowd and domain experts verified a subset of SNOMED CT comprising 200 taxonomic relationships. Results: The crowd identified errors as well as any single expert at about one-quarter of the cost. The inter-rater agreement (κ) between the crowd and the experts was 0.58; the inter-rater agreement between experts themselves was 0.59, suggesting that the crowd is nearly indistinguishable from any one expert. Furthermore, the crowd identified 39 previously undiscovered, critical errors in SNOMED CT (eg, 'septic shock is a soft-tissue infection'). Discussion: The results show that the crowd can indeed identify errors in SNOMED CT that experts also find, and the results suggest that our method will likely perform well on similar ontologies. The crowd may be particularly useful in situations where an expert is unavailable, budget is limited, or an ontology is too large for manual error checking. Finally, our results suggest that the online anonymous crowd could successfully complete other domain-specific tasks. Conclusions: We have demonstrated that the crowd can address the challenges of scalable ontology verification, completing not only intuitive, common-sense tasks, but also expert-level, knowledge-intensive tasks.


Indented Tree or Graph? A Usability Study of Ontology Visualization Techniques in the Context of Class Mapping Evaluation

October 2013

·

179 Reads

·

56 Citations

Lecture Notes in Computer Science

Research effort in ontology visualization has largely focused on developing new visualization techniques. At the same time, researchers have paid less attention to investigating the usability of common visualization techniques that many practitioners regularly use to visualize ontological data. In this paper, we focus on two popular ontology visualization techniques: indented tree and graph. We conduct a controlled usability study with an emphasis on the effectiveness, efficiency, workload and satisfaction of these visualization techniques in the context of assisting users during evaluation of ontology mappings. Findings from this study have revealed both strengths and weaknesses of each visualization technique. In particular, while the indented tree visualization is more organized and familiar to novice users, subjects found the graph visualization to be more controllable and intuitive without visual redundancy, particularly for ontologies with multiple inheritance.


Simplified OWL Ontology Editing for the Web: Is WebProtégé Enough?

October 2013

·

66 Reads

·

17 Citations

Lecture Notes in Computer Science

Ontology engineering is a task that is notorious for its difficulty. As the group that developed Protégé, the most widely used ontology editor, we are keenly aware of how difficult the users perceive this task to be. In this paper, we present the new version of WebProtégé that we designed with two main goals in mind: (1) create a tool that will be easy to use while still accounting for commonly used OWL constructs; (2) support collaboration and social interaction around distributed ontology editing as part of the core tool design. We designed this new version of the WebProtégé user interface empirically, by analysing the use of OWL constructs in a large corpus of publicly available ontologies. Since the beta release of this new WebProtégé interface in January 2013, our users from around the world have created and uploaded 519 ontologies on our server. In this paper, we describe the key features of the new tool and our empirical design approach. We evaluate language coverage in WebProtégé by assessing how well it covers the OWL constructs that are present in ontologies that users have uploaded to WebProtégé. We evaluate the usability of WebProtégé through a usability survey. Our analysis validates our empirical design, suggests additional language constructors to explore, and demonstrates that an easy-to-use web-based tool that covers most of the frequently used OWL constructs is sufficient for many users to start editing their ontologies.


Citations (37)


... Effectively built QALD systems may significantly aid domain experts in their ontology augmentation evaluation process. As a result, domain experts and ontologists may collaborate to debate and implement necessary improvements to the ontology increment under review [5][6][7]. ...

Reference:

Arbitrary Verification of Ontology Increments using Natural Language
How Ontologies are Made: Studying the Hidden Social Dynamics Behind Collaborative Ontology Engineering Projects
  • Citing Article
  • January 2013

SSRN Electronic Journal

... It is important to leverage the data amount and evaluate the trustworthiness of extracted information using the truth finding technology. Fortunately, textual patterns, such as E-A (entity-attribute) patterns [13,14], S-O-V (subject-object-value) patterns [42], dependency parsing patterns (by PATTY [27]), and meta patterns (by MetaPAD [17]), have been proposed to turn text data into structures in an unsupervised way. Specifically, Google's Biperpedia generated the E-A patterns (e.g., "A of E" and "E 's A") from users' fact-seeking queries by replacing entity with "E" and noun-phrase attribute with "A". ...

Discovering Structure in the Universe of Attribute Names
  • Citing Conference Paper
  • April 2016

... In the field of semantic data visualization, eye tracking has been mainly utilized as an evaluation tool to assess the usability issues of established visualization techniques [60,61]. The benefit of utilizing eye tracking in the evaluation of a visualization technique is that direct measures such as user visual search and processing behaviors can be revealed to complement indirect measures such as user success, time on task, and overall satisfaction that implicitly reflect the usability of a visualization. ...

Eye tracking the user experience – An evaluation of ontology visualization techniques
  • Citing Article
  • November 2016

Semantic Web

... To the best of our knowledge, our hybrid reasoning framework is the first approach applying statistical methods as part of logic-based subsumption checking. Conversely, Movshovitz-Attias et al. [18] train a subsumption classifier and include logic-based features using a KB, in particular the overlap of concepts' properties in Biperpedia. Like us they exploit dependency trees, used in terms of a feature expressing whether any paths in the dependency trees of the two concepts match. ...

Discovering Subsumption Relationships for Web-Based Ontologies
  • Citing Conference Paper
  • January 2010

... We decided to implement essential characteristics as classes in order to facilitate upgrading and extension of the ontology. Nevertheless, it remains unintuitive for domain experts as Horridge et al. (2013) admit it: "as the group that 166 developed Protégé, the most widely used ontology editor, we are keenly aware of how difficult the users perceive this task [ontology engineering] to be". The implementation of the linguistic dimension was achieved by annotations (metadata in the W3C sense). ...

Simplified OWL Ontology Editing for the Web: Is WebProtégé Enough?
  • Citing Conference Paper
  • October 2013

Lecture Notes in Computer Science

... Fu et al. [FNS13] compared indented trees and graphs (the typical representations of ontologies), and found that the former are more familiar to novices, while the latter are more intuitive and controllable. OWLViz [Hor] and OntoGraf [Fal] visualize ontologies using node-link diagrams, but they struggle with large ontologies and have limited interaction possibilities. ...

Indented Tree or Graph? A Usability Study of Ontology Visualization Techniques in the Context of Class Mapping Evaluation
  • Citing Conference Paper
  • October 2013

Lecture Notes in Computer Science

... An important part of the problem solution for the adaptation is the study of the properties of the user's entity in the information system, which is reflected in [4,22]. It discusses several approaches for information collection about the user, evaluation his equipment and used software. ...

Analysis of User Editing Patterns in Ontology Development Projects

Journal on Data Semantics

... DL-based mechanisms allow ontology curators to formally and unambiguously represent concept meanings and relationships, and to use off the shelf reasoning tools such as HermiT 6 to automate the computation of the relationship between two class expressions and consistency checks . Rector, et al. developed an effective quality assurance mechanism using reasoners to incorporate qualifiers (e.g., acute or chronic) in the post-coordination in SNOMED CT 7 9 . The post-coordination approach using compositional expressions has been used to build a common ontology to harmonize ICD-11 and SNOMED CT 10,11 . ...

Reasoning Based Quality Assurance of Medical Ontologies: A Case Study
  • Citing Article
  • November 2014

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

... Summaries reduce the cognitive load on domain experts during multidimensional data exploration, allowing them to drill-down to specific instances as needed [111]. While many visualization recommendation systems exist for analyzing numerical data [7,[112][113][114], visualizations in healthcare often include categorical and text data [115,[116][117][118]. As such, node-link diagrams are a common data representation and have been used for tracking family history [119], decision-making [22,120], and identifying hidden variables [121]. ...

An empirically derived taxonomy of errors in SNOMED CT

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium