[show abstract][hide abstract] ABSTRACT: Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS's coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics.A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type's extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements.
A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the "Chemical Entities of Biological Interest" (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown.
The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI.
Journal of Cheminformatics 05/2012; 4(1):9. · 3.59 Impact Factor
[show abstract][hide abstract] ABSTRACT: The relationships of the UMLS Metathesaurus are used to describe the nature of the connections between pairs of concepts. The occurrence of multiple relationships between a given pair of concepts may be indicative of some kind of inconsistency. A methodology to algorithmically identify and then categorize all pairs of concepts with multiple relationships is presented. These potentially problematic concept pairs are grouped into four categories, including those that are possibly conflicting from a hierarchical standpoint and those that may violate a mutual-exclusion constraint. Samples of the identified concept pairs are reviewed. Some of the errors and inconsistencies found during the review are reported. The findings indicate that questionable UMLS relationship triples can be easily detected by algorithmic approaches, are common, and at times can be corrected in the UMLS itself.
Biomedical and Health Informatics (BHI), 2012 IEEE-EMBS International Conference on; 01/2012
[show abstract][hide abstract] ABSTRACT: This paper strives to overcome a major problem encountered by a previous expansion methodology for discovering concepts highly likely to be missing a specific semantic type assignment in the UMLS. This methodology is the basis for an algorithm that presents the discovered concepts to a human auditor for review and possible correction. We analyzed the problem of the previous expansion methodology and discovered that it was due to an obstacle constituted by one or more concepts assigned the UMLS Semantic Network semantic type Classification. A new methodology was designed that bypasses such an obstacle without a combinatorial explosion in the number of concepts presented to the human auditor for review. The new expansion methodology with obstacle avoidance was tested with the semantic type Experimental Model of Disease and found over 500 concepts missed by the previous methodology that are in need of this semantic type assignment. Furthermore, other semantic types suffering from the same major problem were discovered, indicating that the methodology is of more general applicability. The algorithmic discovery of concepts that are likely missing a semantic type assignment is possible even in the face of obstacles, without an explosion in the number of processed concepts.
Journal of Biomedical Informatics 09/2011; 45(1):61-70. · 2.13 Impact Factor
[show abstract][hide abstract] ABSTRACT: Auditors of a large terminology, such as SNOMED CT, face a daunting challenge. To aid them in their efforts, it is essential to devise techniques that can automatically identify concepts warranting special attention. "Complex" concepts, which by their very nature are more difficult to model, fall neatly into this category. A special kind of grouping, called a partial-area, is utilized in the characterization of complex concepts. In particular, the complex concepts that are the focus of this work are those appearing in intersections of multiple partial-areas and are thus referred to as overlapping concepts. In a companion paper, an automatic methodology for identifying and partitioning the entire collection of overlapping concepts into disjoint, singly-rooted groups, that are more manageable to work with and comprehend, has been presented. The partitioning methodology formed the foundation for the development of an abstraction network for the overlapping concepts called a disjoint partial-area taxonomy. This new disjoint partial-area taxonomy offers a collection of semantically uniform partial-areas and is exploited herein as the basis for a novel auditing methodology. The review of the overlapping concepts is done in a top-down order within semantically uniform groups. These groups are themselves reviewed in a top-down order, which proceeds from the less complex to the more complex overlapping concepts. The results of applying the methodology to SNOMED's Specimen hierarchy are presented. Hypotheses regarding error ratios for overlapping concepts and between different kinds of overlapping concepts are formulated. Two phases of auditing the Specimen hierarchy for two releases of SNOMED are reported on. With the use of the double bootstrap and Fisher's exact test (two-tailed), the auditing of concepts and especially roots of overlapping partial-areas is shown to yield a statistically significant higher proportion of errors.
Journal of Biomedical Informatics 09/2011; 45(1):1-14. · 2.13 Impact Factor
[show abstract][hide abstract] ABSTRACT: Each Unified Medical Language System (UMLS) concept is assigned one or more semantic types (ST). A dynamic methodology for aiding an auditor in finding concepts that are missing the assignment of a given ST, S is presented.
The first part of the methodology exploits the previously introduced Refined Semantic Network and accompanying refined semantic types (RST) to help narrow the search space for offending concepts. The auditing is focused in a neighborhood surrounding the extent of an RST, T (of S) called an envelope, consisting of parents and children of concepts in the extent. The audit moves outward as long as missing assignments are discovered. In the second part, concepts not reached previously are processed and reassigned T as needed during the processing of S's other RSTs. The set of such concepts is expanded in a similar way to that in the first part.
The number of errors discovered is reported. To measure the methodology's efficiency, "error hit rates" (i.e., errors found in concepts examined) are computed.
The methodology was applied to three STs: Experimental Model of Disease (EMD), Environmental Effect of Humans, and Governmental or Regulatory Activity. The EMD experienced the most drastic change. For its RST "EMD intersection Neoplastic Process" (RST "EMD") with only 33 (31) original concepts, 915 (134) concepts were found by the first (second) part to be missing the EMD assignment. Changes to the other two STs were smaller.
The results show that the proposed auditing methodology can help to effectively and efficiently identify concepts lacking the assignment of a particular semantic type.
Journal of the American Medical Informatics Association 07/2009; 16(5):746-57. · 3.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Chemical concepts assigned multiple "Chemical Viewed Structurally" semantic types (STs) in the Unified Medical Language System (UMLS) are subject to ambiguous interpretation. The multiple assignments may denote the fact that a specific represented chemical (combination) is a conjugate, derived via a chemical reaction of chemicals of the different types, or a complex, composed of a mixture of such chemicals. The previously introduced Refined Semantic Network (RSN) is modified to properly model these varied multi-typed chemical combinations.
The RSN was previously introduced as an enhanced abstraction of the UMLS's concepts. It features new types, called intersection semantic types (ISTs), each of which explicitly captures a unique combination of ST assignments in one abstract unit. The ambiguous ISTs of different "Chemical Viewed Structurally" ISTs of the RSN are replaced with two varieties of new types, called conjugate types and complex types, which explicitly denote the nature of the chemical interactions. Additional semantic relationships help further refine that new portion of the RSN rooted at the ST "Chemical Viewed Structurally."
The number of new conjugate and complex types and the amount of changes to the type assignment of chemical concepts are presented.
The modified RSN, consisting of 35 types and featuring 22 new conjugate and complex types, is presented. A total of 800 (about 98%) chemical concepts representing multi-typed chemical combinations from "Chemical Viewed Structurally" STs are uniquely assigned one of the new types. An additional benefit is the identification of a number of illegal ISTs and ST assignment errors, some of which are direct violations of exclusion rules defined by the UMLS Semantic Network.
The modified RSN provides an enhanced abstract view of the UMLS's chemical content. Its array of conjugate and complex types provides a more accurate model of the variety of combinations involving chemicals viewed structurally. This framework will help streamline the process of type assignments for such chemical concepts and improve user orientation to the richness of the chemical content of the UMLS.
Journal of the American Medical Informatics Association 11/2008; 16(1):116-31. · 3.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Unified Medical Language System (UMLS) integrates about 880,000 concepts from 100 biomedical terminologies. Each concept is categorized to at least one semantic type of the Semantic Network. During the integration, it is unavoidable that some categorization errors and inconsistencies will be introduced. In this paper, we present an auditing technique to find such errors and inconsistencies. Our technique is based on an expert reviewing the pure intersections of meta-semantic types of a metaschema, a compact abstract view of the UMLS Semantic Network. We use a divide and conquer approach, handling differently small pure intersections and medium to large pure intersections. By using this approach, we limit the number of concepts reviewed, for which we expect a high percentage of errors. We reviewed all concepts in 657 pure intersections containing one to 10 concepts. Various kinds of errors are identified and the analysis of the results are presented in the paper. Also, we checked the pure intersections containing more than 10 concepts for their semantic soundness, where the semantically suspicious pure intersections are presented in the paper and their concepts are reviewed.
Artificial Intelligence in Medicine 06/2004; 31(1):29-44. · 1.36 Impact Factor
[show abstract][hide abstract] ABSTRACT: Object-oriented databases (OODBs) have been utilized for complex modeling tasks within a variety of application domains. The OODB schema, typically expressed in a graphical notation, can serve as a useful presentation tool for the information contained in the underlying OODB. However, such a schema can be a large, complex network of classes and relationships. This may greatly hinder its effectiveness in helping users gain an understanding of the OODB's contents and data organization. To facilitate this orientation process, a theoretical framework is presented that guides the refinement of an existing schema's subclass-of relationship hierarchy - the backbone of any OODB. The framework sets forth three rules which, when satisfied, lead to the establishment of a collection of contexts, each of which exhibits an internal subclass-of tree structure. A formal proof of this result is presented. An algorithmic methodology, involving a human-computer interaction, describes how the approach can be applied to a given OODB schema. An application of the methodology to an example OODB schema is included.
Knowledge and Information Systems 01/2004; 6:315-344.
[show abstract][hide abstract] ABSTRACT: Capturing the semantics of concepts in a terminology has been an important problem in AI. A two-level approach has been proposed where concepts are classified into high-level semantic types, with these types constituting a portion of the concepts' semantics. We present an algorithmic methodology for refining such two-level terminologic networks. A new network is produced consisting of "pure" semantic types and intersection types. Concepts are uniquely re-assigned to these new types. Overall, these types form a better conceptual abstraction, with each exhibiting uniform semantics. Using them, it becomes easier to detect classification errors. The methodology is applied to the UMLS.
[show abstract][hide abstract] ABSTRACT: Semantic networks (SNs) are excellent knowledge representation structures. However, large semantic networks are difficult to comprehend. To overcome this difficulty, several methods of partitioning have been developed that rely on different mixes of structural and semantic methods. However, little has appeared in the literature concerning the question whether a partition of a semantic network creates subnetworks that agree with human insight. We address this issue by presenting a comparison between the results of an algorithmic partitioning method and a partition created by a group of experts. Subsequently, we show how a network partition can be used to generate various partial views of a semantic network, which facilitate user orientation. Examples from the Unified Medical Language System (UMLS) SN are used to demonstrate partial views.
IEEE Transactions on Information Technology in Biomedicine 07/2002; · 1.98 Impact Factor
[show abstract][hide abstract] ABSTRACT: The unified medical language system (UMLS) integrates many well-established biomedical terminologies. The UMLS semantic network (SN) can help orient users to the vast knowledge content of the UMLS Metathesaurus (META) via its abstract conceptual view. However, the SN itself is large and complex and may still be difficult to comprehend. Our technique partitions the SN into smaller meaningful units amenable to display on limited-sized computer screens. The basis for the partitioning is the distribution of the relationships within the SN. Three rules are applied to transform the original partition into a second more cohesive partition.
IEEE Transactions on Information Technology in Biomedicine 07/2002; 6(2):102-8. · 1.98 Impact Factor
[show abstract][hide abstract] ABSTRACT: Controlled medical terminologies are increasingly becoming strategic components of various healthcare enterprises. However, the typical medical terminology can be difficult to exploit due to its extensive size and high density. The schema of a medical terminology offered by an object-oriented representation is a valuable tool in providing an abstract view of the terminology, enhancing comprehensibility and making it more usable. However, schemas themselves can be large and unwieldy. We present a methodology for partitioning a medical terminology schema into manageably sized fragments that promote increased comprehension. Our methodology has a refinement process for the subclass hierarchy of the terminology schema. The methodology is carried out by a medical domain expert in conjunction with a computer. The expert is guided by a set of three modeling rules, which guarantee that the resulting partitioned schema consists of a forest of trees. This makes it easier to understand and consequently use the medical terminology. The application of our methodology to the schema of the Medical Entities Dictionary (MED) is presented.
Methods of Information in Medicine 08/2001; 40(3):204-12. · 1.60 Impact Factor
[show abstract][hide abstract] ABSTRACT: The purpose of this paper is to demonstrate how the transformation of a medical vocabulary based on a Semantic Network (SN) model into a vocabulary based on an Object-Oriented Database (OODB) model helps in the maintenance of the vocabulary. We describe an OODB schema which captures the overall structure of the vocabulary in a compact form and uncovers some errors and inconsistencies made in the vocabulary's original modeling. The resolution of these mistakes leads to an improved version of the SN-based vocabulary. A new OODB schema for the vocabulary is then derived based on the improved SN version. This experience demonstrates how the abstraction and modeling capabilities of OODBs can be used to enhance a user's understanding of the overarching structure of a complex medical vocabulary system. The OODB schema developed herein serves as the basis for the Object-Oriented Healthcare Vocabulary Repository (OOHVR), a medical vocabulary implemented as an ONTOS database. 1
[show abstract][hide abstract] ABSTRACT: Objective: Controlled medical terminologies (CMTs) have been recognized as important tools in a variety of medical informatics applications ranging from patient-record systems to decisionsupport systems. CMTs are typically organized in semantic network structures consisting of tens to hundreds of thousands of concepts. This overwhelming size and complexity can be a serious barrier to their maintenance and wide-spread utilization. In this paper, we propose the use of object-oriented databases (OODBs) to address the problems posed by the extensive scope and high complexity of most CMTs for maintenance personnel and general users alike. Design: We present a methodology that allows an existing CMT, modeled as a semantic network, to be represented as an equivalent OODB. Such a representation is called an ObjectOriented Healthcare Terminology Repository (OOHTR). Results: The major benefit of an OOHTR is its schema which provides an important layer of structural abstraction. Using the high-level view of a CMT afforded by the schema, one can gain insight into the CMT's overarching organization and begin to better comprehend it. Our methodology is applied to the Medical Entities Dictionary (MED), a large CMT developed at Columbia-Presbyterian Medical Center. Examples of how the OOHTR schema facilitated updating, correcting, and improving the design of the MED are presented. Conclusion: The OOHTR schema can serve as an important abstraction mechanism for enhancing comprehension of a large CMT, and thus aids in usability. 1
[show abstract][hide abstract] ABSTRACT: Controlled vocabularies have been used as the means for unifying disparate terminologies found within an application field. This unification leads to better administration of information and enhanced communication among various parties. Semantic networks have been shown to be excellent vehicles for modeling controlled vocabularies. However, they often lack the necessary access flexibility and robustness required by external agents such as intelligent information-locators and decision-support systems. In this paper, we describe the process of mapping an existing medical vocabulary based on a semantic network model into an Object-Oriented Database (OODB) system. We first consider two straightforward approaches to carrying out this task and describe their deficiencies. We then present a new approach which yields a very compact OODB schema for the representation of the vocabulary's entire hierarchy and inter-connectivity. We refer to the resulting OODB as the Object-Oriented Healthcare Vocabulary Repository (OOHVR), which is currently up and running in the context of ONTOS, a commercially available OODB system. 1
[show abstract][hide abstract] ABSTRACT: Controlled medical terminologies are increasingly becoming
strategic components of various healthcare enterprises. However, a
typical medical terminology can be difficult to exploit due to its
extensive size and high density. The schema of a medical terminology
offered by an object-oriented database representation provides an
abstract view of a terminology. Thus, the schema enhances terminology
comprehensibility, presentation and usability. However, terminology
schemas themselves can be large and unwieldy. In this paper, we present
a methodology for partitioning a medical terminology schema into more
manageably-sized fragments that promote increased comprehension. The
application of our methodology to the schema of a large, existing
medical terminology, called the Medical Entities Dictionary, is
Information Technology Applications in Biomedicine, 2000. Proceedings. 2000 IEEE EMBS International Conference on; 02/2000
[show abstract][hide abstract] ABSTRACT: The Unified Medical Language System (UMLS) combines many well-established authoritative medical informatics terminologies in one knowledge representation system. Such a resource is very valuable to the health care community and industry. However, the UMLS is very large and complex and poses serious comprehension problems for users and maintenance personnel. The authors present a representation to support the user's comprehension and navigation of the UMLS.
An object-oriented database (OODB) representation is used to represent the two major components of the UMLS-the Metathesaurus and the Semantic Network-as a unified system. The semantic types of the Semantic Network are modeled as semantic type classes. Intersection classes are defined to model concepts of multiple semantic types, which are removed from the semantic type classes.
The authors provide examples of how the intersection classes help expose omissions of concepts, highlight errors of semantic type classification, and uncover ambiguities of concepts in the UMLS. The resulting UMLS OODB schema is deeper and more refined than the Semantic Network, since intersection classes are introduced. The Metathesaurus is classified into more mutually exclusive, uniform sets of concepts. The schema improves the user's comprehension and navigation of the Metathesaurus.
The UMLS OODB schema supports the user's comprehension and navigation of the Metathesaurus. It also helps expose and resolve modeling problems in the UMLS.
Journal of the American Medical Informatics Association 01/2000; 7(1):66-80. · 3.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Controlled medical vocabularies are useful in application areas such as medical information systems and decision-support systems. However, such vocabularies are large and complex, and working with them can be daunting. It is important to provide a means for orienting vocabulary designers and users to the vocabulary's contents. We describe a methodology for partitioning a vocabulary based on an IS-A hierarchy into small meaningful pieces. The methodology uses our disciplined modeling framework to refine the IS-A hierarchy according to prescribed rules in a process carried out by a user in conjunction with the computer. The partitioning of the hierarchy implies a partitioning of the vocabulary. We demonstrate the methodology with respect to a complex sample of the MED, an existing medical vocabulary.
Artificial Intelligence in Medicine 02/1999; 15(1):77-98. · 1.36 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Unified Medical Language System combines many well established authoritative medical informatics terminologies in one system. Such a resource is very valuable to the healthcare industry. However, the UMLS is very large and complex and poses serious comprehension problems for users and maintenance personnel. Furthermore, the sets of concepts of semantic types are not semantically uniform and thus are difficult to study. We describe a method to represent two components of the UMLS, the Metathesaurus (META) and the Semantic Network, as an OODB. The resulting UMLS OODB schema is deeper and more refined than the Semantic Network. It offers semantically uniform classes, which improves support for comprehension and navigation of META. The UMLS OODB also exposes problems in the semantic type classifications.