Huanying (helen Gu

New York Institute of Technology, New York City, NY, USA

Are you Huanying (helen Gu?

Claim your profile

Publications (15)8.96 Total impact

  • Article: A study of terminology auditors' performance for UMLS semantic type assignments.
    [show abstract] [hide abstract]
    ABSTRACT: Auditing healthcare terminologies for errors requires human experts. In this paper, we present a study of the performance of auditors looking for errors in the semantic type assignments of complex UMLS concepts. In this study, concepts are considered complex whenever they are assigned combinations of semantic types. Past research has shown that complex concepts have a higher likelihood of errors. The results of this study indicate that individual auditors are not reliable when auditing such concepts and their performance is low, according to various metrics. These results confirm the outcomes of an earlier pilot study. They imply that to achieve an acceptable level of reliability and performance, when auditing such concepts of the UMLS, several auditors need to be assigned the same task. A mechanism is then needed to combine the possibly differing opinions of the different auditors into a final determination. In the current study, in contrast to our previous work, we used a majority mechanism for this purpose. For a sample of 232 complex UMLS concepts, the majority opinion was found reliable and its performance for accuracy, recall, precision and the F-measure was found statistically significantly higher than the average performance of individual auditors.
    Journal of Biomedical Informatics 06/2012; · 1.79 Impact Factor
  • Article: Relationship auditing of the FMA ontology.
    Huanying Helen Gu, Duo Wei, Jose L V Mejino, Gai Elhanan
    [show abstract] [hide abstract]
    ABSTRACT: The Foundational Model of Anatomy (FMA) ontology is a domain reference ontology based on a disciplined modeling approach. Due to its large size, semantic complexity and manual data entry process, errors and inconsistencies are unavoidable and might remain within the FMA structure without detection. In this paper, we present computable methods to highlight candidate concepts for various relationship assignment errors. The process starts with locating structures formed by transitive structural relationships (part_of, tributary_of, branch_of) and examine their assignments in the context of the IS-A hierarchy. The algorithms were designed to detect five major categories of possible incorrect relationship assignments: circular, mutually exclusive, redundant, inconsistent, and missed entries. A domain expert reviewed samples of these presumptive errors to confirm the findings. Seven thousand and fifty-two presumptive errors were detected, the largest proportion related to part_of relationship assignments. The results highlight the fact that errors are unavoidable in complex ontologies and that well designed algorithms can help domain experts to focus on concepts with high likelihood of errors and maximize their effort to ensure consistency and reliability. In the future similar methods might be integrated with data entry processes to offer real-time error detection.
    Journal of Biomedical Informatics 07/2009; 42(3):550-7. · 1.79 Impact Factor
  • Article: Relationship Auditing of the FMA Ontology.
    Huanying (helen Gu, Duo Wei, Jose L V Mejino Jr, Gai Elhanan
    [show abstract] [hide abstract]
    ABSTRACT: The Foundational Model of Anatomy (FMA) Ontology is a domain reference ontology based on a disciplined modeling approach. Due to its large size, semantic complexity and manual data entry process, errors and inconsistencies are unavoidable and might remain within the FMA structure without detection. In this paper we present computable methods to highlight candidate concepts for various relationship assignment errors. The process starts with locating structures formed by transitive structural relationships (part_of, tributary_of, branch_of) and examine their assignments in the context of the IS-A hierarchy. The algorithms were designed to detect five major categories of possible incorrect relationship assignments: circular, mutually exclusive, redundant, inconsistent, and missed entries. A domain expert reviewed samples of these presumptive errors to confirm the findings. 7052 presumptive errors were detected, the largest proportion related to part_of relationship assignments. The results highlight the fact that errors are unavoidable in complex ontologies and that well designed algorithms can help domain experts to focus on concepts with high likelihood of errors and maximize their effort to ensure consistency and reliability. In the future similar methods might be integrated with data entry processes to offer real-time error detection.
    Journal of Biomedical Informatics 02/2009; · 1.79 Impact Factor
  • Article: Structural group-based auditing of missing hierarchical relationships in UMLS.
    Yan Chen, Huanying Helen Gu, Yehoshua Perl, James Geller
    [show abstract] [hide abstract]
    ABSTRACT: The Metathesaurus of the UMLS was created by integrating various source terminologies. The inter-concept relationships were either integrated into the UMLS from the source terminologies or specially generated. Due to the extensive size and inherent complexity of the Metathesaurus, the accidental omission of some hierarchical relationships was inevitable. We present a recursive procedure which allows a human expert, with the support of an algorithm, to locate missing hierarchical relationships. The procedure starts with a group of concepts with exactly the same (correct) semantic type assignments. It then partitions the concepts, based on child-of hierarchical relationships, into smaller, singly rooted, hierarchically connected subgroups. The auditor only needs to focus on the subgroups with very few concepts and their concepts with semantic type reassignments. The procedure was evaluated by comparing it with a comprehensive manual audit and it exhibits a perfect error recall.
    Journal of Biomedical Informatics 09/2008; 42(3):452-67. · 1.79 Impact Factor
  • Article: Structural group auditing of a UMLS semantic type's extent.
    [show abstract] [hide abstract]
    ABSTRACT: Each UMLS concept is assigned one or more of the semantic types (STs) from the Semantic Network. Due to the size and complexity of the UMLS, errors are unavoidable. We present two auditing methodologies for groups of semantically similar concepts. The straightforward procedure starts with the extent of an ST, which is the group of all concepts assigned this ST. We divide the extent into groups of concepts that have been assigned exactly the same set of STs. An algorithm finds subgroups of suspicious concepts. The human auditor is presented with these subgroups, which purportedly exhibit the same semantics, and thus she will notice different concepts with wrong or missing ST assignments. The dynamic procedure detects concepts which become suspicious in the course of the auditing process. Both procedures are applied to two semantic types. The results are compared with a comprehensive manual audit and show a very high error recall with a much higher precision.
    Journal of Biomedical Informatics 07/2008; 42(1):41-52. · 1.79 Impact Factor
  • Article: Evaluation of a UMLS Auditing Process of Semantic Type Assignments.
    [show abstract] [hide abstract]
    ABSTRACT: The UMLS is a terminological system that integrates many source terminologies. Each concept in the UMLS is assigned one or more semantic types from the Semantic Network, an upper level ontology for biomedicine. Due to the complexity of the UMLS, errors exist in the semantic type assignments. Finding assignment errors may unearth modeling errors. Even with sophisticated tools, discovering assignment errors requires manual review. In this paper we describe the evaluation of an auditing project of UMLS semantic type assignments. We studied the performance of the auditors who reviewed potential errors. We found that four auditors, interacting according to a multi-step protocol, identified a high rate of errors (one or more errors in 81% of concepts studied) and that results were sufficiently reliable (0.67 to 0.70) for the two most common types of errors. However, reliability was low for each individual auditor, suggesting that review of potential errors is resource-intensive.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2007;
  • Source
    Article: Using the metaschema to audit UMLS classification errors.
    Huanying Helen Gu, Hua Min, Yi Peng, Li Zhang, Yehoshua Perl
    [show abstract] [hide abstract]
    ABSTRACT: The Unified Medical Language System integrates about 800,000 concepts from 99 biomedical terminologies. Each concept is assigned to at least one semantic type of the Semantic Network. During the integration, it is unavoidable that some classification errors and inconsistencies will be introduced. In this paper, we present an auditing technique to find such errors and inconsistencies. Our technique is based on an expert reviewing the pure intersections of meta-semantic types of the metaschema, a compact abstract view of the Semantic Network. Results regarding the pure intersections are reported. The analysis results for pure intersections with 1 to 6 concepts are presented. Various kinds of errors are identified.
    Proceedings / AMIA ... Annual Symposium. AMIA Symposium 02/2002;
  • Source
    Article: Identifying a Forest Hierarchy in an OODB Specialization Hierarchy Satisfying Disciplined Modeling
    Yehoshua Perl, James Geller, Huanying (helen Gu
    [show abstract] [hide abstract]
    ABSTRACT: Our work is motivated by the desire to develop methods to comprehend large vocabularies and large schemas of Object-Oriented Databases. The ability of a user of a database participating in a federated system to retrieve information from the other database systems will be greatly enhanced by acquiring a better comprehension of these systems. We are trying to develop both a theoretical paradigm and a methodology to analyze existing large schemas. Our approach to achieve comprehension is based on combining two concepts: informational thinning (i.e. concentration on the specialization hierarchy of the schema) and partitioning. In this paper we present a new technique for modeling which is called disciplined modeling. Based on the rules of disciplined modeling we develop a theoretical paradigm to support the existence of a meaningful forest hierarchy within the specialization hierarchy. Such a hierarchy functions as a skeleton of the schema and supports comprehension and partitioning effort...
    12/2000;
  • Article: Representing the UMLS as an OODB: Modeling Issues and Advantages
    [show abstract] [hide abstract]
    ABSTRACT: Objective: The Unified Medical Language System (UMLS) designed by NLM combines many well established authoritative medical informatics terminologies in one knowledge representation system. Such a resource is very valuable to the healthcare community and industry. However, the UMLS is very large and complex and poses serious comprehension problems for users and maintenance personnel. We present a representation to support the comprehension and navigation of the UMLS. Design: An Object-Oriented Database representation is utilized to represent the two major components of the UMLS, the Metathesaurus and the Semantic Network, as a unified system. The semantic types of the Semantic Network are modeled as semantic type classes. Intersection classes are defined to model concepts of multiple semantic types which are removed from the semantic type classes. Results: We present examples of how the intersection classes help expose omissions of concepts, highlight errors of semantic type classification, and uncover ambiguities of concepts in the UMLS. The resulting UMLS OODB schema is deeper and more refined than the Semantic Network since intersection classes are introduced. The Metathesaurus is classified into a larger number of mutually exclusive, uniform sets of concepts. The schema improves the comprehension and navigation of the Metathesaurus. Conclusions: The UMLS OODB schema supports the comprehension and navigation of the Metathesaurus. It also supports exposing and resolving modeling problems in the UMLS. This research was (partially) done under a cooperative agreement between the National Institute of Standards and Technology Advanced Technology Program (under the HIIT contract #70NANB5H1011) and the Healthcare Open Systems and Trials, Inc. consortium. ...
    12/2000;
  • Source
    Article: Partitioning a Vocabulary's IS-A Hierarchy into Trees
    [show abstract] [hide abstract]
    ABSTRACT: This paper introduces a methodology for partitioning a vocabulary into small, meaningful pieces. The partitioning is done with respect to the vocabulary's IS-A hierarchy. The methodology, based on a set of rules for refining the IS-A hierarchy, is a process carried out by a user in conjunction with the computer. The methodology is demonstrated on a complex portion of a vocabulary. INTRODUCTION Controlled medical vocabularies ("vocabularies" for short) [1,2] play an important role in many medical enterprises that employ a large number of disparate information systems (e.g., clinical databases). Often, each such system has its own inherent "language" or terminology. A vocabulary allows for the integration of the different systems and the standardization of common information handling tasks, helping to reduce the overall cost of data processing. A vocabulary can also aid in the orientation of users of the information systems. However, a vocabulary can be of an overwhelmi
    12/2000;
  • Source
    Article: Modeling the UMLS Using an OODB
    [show abstract] [hide abstract]
    ABSTRACT: INTRODUCTION The Unified Medical Language System (UMLS) [1--4] designed by the National Library of Medicine (NLM) combines many well established medical informatics terminologies in a unified system. It enables electronic access to a very large compendium of medical terminologies. However, the UMLS is large and complex, which poses serious comprehension problems. It is difficult to maintain and use the UMLS without proper comprehension. Designers, maintainers and users of the UMLS need tools to help with their work. Tools for retrieval and manipulation of the content of such a system are insufficient. Rather, they must help professionals reach a level of comprehension essential to performing their tasks. In previous work [5,6], we have developed a methodology for representing Controlled Medical Terminologies (CMTs) as Object-Oriented Databases (OODBs) to provide support for comprehension of their structure and content. The comprehension support was achieved v
    12/2000;
  • Source
    Article: Using a Similarity Measurement to Partition a Vocabulary of Medical Concepts
    Huanying (helen Gu, James Geller, Michael Halper
    [show abstract] [hide abstract]
    ABSTRACT: . Controlled medical vocabularies have become increasingly important in a range of medical informatics applications. However, the extensive size of most vocabularies often makes it difficult for users to gain an understanding of their contents. In previous work, we have investigated the partitioning of a large semantic-network based medical vocabulary into smaller units, for the purpose of easier graphical display and comprehension. The partitioning process relied heavily on a domain expert. In this paper, we propose a structural method for automating the partitioning of a vocabulary. The structural method is based on a definition of the similarity of a pair consisting of a child concept and its parent concept in the semantic network. A distribution over these similarities for all pairs in the semantic network is then computed. Based on this distribution, the semantic network can be partitioned into more manageable pieces. The approach has been applied to the InterMED and a complex portion of the MED, two large medical vocabularies. 1
    12/2000;
  • Source
    Article: Utilizing OODB Schema Modeling For Vocabulary Management
    [show abstract] [hide abstract]
    ABSTRACT: INTRODUCTION Large medical vocabularies are emerging as important resources for use in medical information systems. Acceptance of these vocabularies has been slow, however. Part of this may be an inability to understand and adapt a system developed elsewhere to systems grown at home---the "not invented here" syndrome. These vocabularies also present significant maintenance challenges for their creators, especially when they grow to 10s or 100s of thousands of terms. We are exploring the use of an object-oriented database (OODB) paradigm for generating high-level vocabulary schemata, intended to enhance comprehension by users and maintainers of a large controlled medical vocabulary. We present one such schema (for the Object Oriented Health Vocabulary repository (OOHVR)) generated from the Columbia-Presbyterian Medical Center (CPMC) Medical Entities Dictionary (MED) [1] and show how the comprehension it provides has improved the MED content. BACKGROUND The Medical Enti
    12/2000;
  • Chapter: Using a Similarity Measurement to Partition a Vocabulary of Medical Concepts
    Huanying Helen Gu, James Geller, Li-min Liu, Michael Halper
    [show abstract] [hide abstract]
    ABSTRACT: Controlled medical vocabularies have become increasingly important in a range of medical informatics applications. However, the extensive size of most vocabularies often makes it difficult for users to gain an understanding of their contents. In previous work, we have investigated the partitioning of a large semantic-network based medical vocabulary into smaller units, for the purpose of easier graphical display and comprehension. The partitioning process relied heavily on a domain expert. In this paper, we propose a structural method for automating the partitioning of a vocabulary. The structural method is based on a definition of the similarity of a pair consisting of a child concept and its parent concept in the semantic network. A distribution over these similarities for all pairs in the semantic network is then computed. Based on this distribution, the semantic network can be partitioned into more manageable pieces. The approach has been applied to the InterMED and a complex portion of the MED, two large medical vocabularies.
    12/1998: pages 810-810;
  • Article: How to Partition a Complex Schema of a Medical Terminology

Institutions

  • 2012
    • New York Institute of Technology
      New York City, NY, USA
  • 2002–2009
    • University of Medicine & Dentistry of New Jersey
      • Department of Health Informatics
      Newark, NJ, USA
  • 1998–2008
    • New Jersey Institute of Technology
      • Department of Computer Science
      Newark, NJ, USA