James Geller

New Jersey Institute of Technology, Newark, NJ, USA

Are you James Geller?

Claim your profile

Publications (106)50.53 Total impact

  • Source
    Article: Rule-based support system for multiple UMLS semantic type assignments.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: When new concepts are inserted into the UMLS, they are assigned one or several semantic types from the UMLS Semantic Network by the UMLS editors. However, not every combination of semantic types is permissible. It was observed that many concepts with rare combinations of semantic types have erroneous semantic type assignments or prohibited combinations of semantic types. The correction of such errors is resource-intensive. OBJECTIVE: We design a computational system to inform UMLS editors as to whether a specific combination of two, three, four, or five semantic types is permissible or prohibited or questionable. METHODS: We identify a set of inclusion and exclusion instructions in the UMLS Semantic Network documentation and derive corresponding rule-categories as well as rule-categories from the UMLS concept content. We then design an algorithm adviseEditor based on these rule-categories. The algorithm specifies rules for an editor how to proceed when considering a tuple (pair, triple, quadruple, quintuple) of semantic types to be assigned to a concept. RESULTS: Eight rule-categories were identified. A Web-based system was developed to implement the adviseEditor algorithm, which returns for an input combination of semantic types whether it is permitted, prohibited or (in a few cases) requires more research. The numbers of semantic type pairs assigned to each rule-category are reported. Interesting examples for each rule-category are illustrated. Cases of semantic type assignments that contradict rules are listed, including recently introduced ones. CONCLUSION: The adviseEditor system implements explicit and implicit knowledge available in the UMLS in a system that informs UMLS editors about the permissibility of a desired combination of semantic types. Using adviseEditor might help accelerate the work of the UMLS editors and prevent erroneous semantic type assignments.
    Journal of Biomedical Informatics 10/2012; · 1.79 Impact Factor
  • Conference Proceeding: Enhancing the Famous People Ontology by Mining a Social Network
    Soon Ae Chun, Tian Tian, James Geller
    VLDB'12 workshop on Semantic Web Search (SSW), Istanbul, Turkey; 08/2012
  • Article: A study of terminology auditors' performance for UMLS semantic type assignments.
    [show abstract] [hide abstract]
    ABSTRACT: Auditing healthcare terminologies for errors requires human experts. In this paper, we present a study of the performance of auditors looking for errors in the semantic type assignments of complex UMLS concepts. In this study, concepts are considered complex whenever they are assigned combinations of semantic types. Past research has shown that complex concepts have a higher likelihood of errors. The results of this study indicate that individual auditors are not reliable when auditing such concepts and their performance is low, according to various metrics. These results confirm the outcomes of an earlier pilot study. They imply that to achieve an acceptable level of reliability and performance, when auditing such concepts of the UMLS, several auditors need to be assigned the same task. A mechanism is then needed to combine the possibly differing opinions of the different auditors into a final determination. In the current study, in contrast to our previous work, we used a majority mechanism for this purpose. For a sample of 232 complex UMLS concepts, the majority opinion was found reliable and its performance for accuracy, recall, precision and the F-measure was found statistically significantly higher than the average performance of individual auditors.
    Journal of Biomedical Informatics 06/2012; · 1.79 Impact Factor
  • Article: New Abstraction Networks and a New Visualization Tool in Support of Auditing the SNOMED CT Content.
    [show abstract] [hide abstract]
    ABSTRACT: Medical terminologies are large and complex. Frequently, errors are hidden in this complexity. Our objective is to find such errors, which can be aided by deriving abstraction networks from a large terminology. Abstraction networks preserve important features but eliminate many minor details, which are often not useful for identifying errors. Providing visualizations for such abstraction networks aids auditors by allowing them to quickly focus on elements of interest within a terminology. Previously we introduced area taxonomies and partial area taxonomies for SNOMED CT. In this paper, two advanced, novel kinds of abstraction networks, the relationship-constrained partial area subtaxonomy and the root-constrained partial area subtaxonomy are defined and their benefits are demonstrated. We also describe BLUSNO, an innovative software tool for quickly generating and visualizing these SNOMED CT abstraction networks. BLUSNO is a dynamic, interactive system that provides quick access to well organized information about SNOMED CT.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2012; 2012:237-46.
  • Article: Deriving an Abstraction Network to Support Quality Assurance in OCRe.
    [show abstract] [hide abstract]
    ABSTRACT: An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe). OCRe was selected as a representative of ontologies implemented using the Web Ontology Language (OWL) based on shared domains. The derivation of the partial-area taxonomy for the Entity hierarchy of OCRe is described. Utilizing the visualization of the content and structure of the hierarchy provided by the taxonomy, the Entity hierarchy is audited, and several errors and inconsistencies in OCRe's modeling of its domain are exposed. After appropriate corrections are made to OCRe, a new partial-area taxonomy is derived. The generalizability of the paradigm of the derivation methodology to various families of biomedical ontologies is discussed.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2012; 2012:681-9.
  • Article: Overcoming an obstacle in expanding a UMLS semantic type extent.
    [show abstract] [hide abstract]
    ABSTRACT: This paper strives to overcome a major problem encountered by a previous expansion methodology for discovering concepts highly likely to be missing a specific semantic type assignment in the UMLS. This methodology is the basis for an algorithm that presents the discovered concepts to a human auditor for review and possible correction. We analyzed the problem of the previous expansion methodology and discovered that it was due to an obstacle constituted by one or more concepts assigned the UMLS Semantic Network semantic type Classification. A new methodology was designed that bypasses such an obstacle without a combinatorial explosion in the number of concepts presented to the human auditor for review. The new expansion methodology with obstacle avoidance was tested with the semantic type Experimental Model of Disease and found over 500 concepts missed by the previous methodology that are in need of this semantic type assignment. Furthermore, other semantic types suffering from the same major problem were discovered, indicating that the methodology is of more general applicability. The algorithmic discovery of concepts that are likely missing a semantic type assignment is possible even in the face of obstacles, without an explosion in the number of processed concepts.
    Journal of Biomedical Informatics 09/2011; 45(1):61-70. · 1.79 Impact Factor
  • Article: Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED.
    [show abstract] [hide abstract]
    ABSTRACT: An algorithmically-derived abstraction network, called the partial-area taxonomy, for a SNOMED hierarchy has led to the identification of concepts considered complex. The designation "complex" is arrived at automatically on the basis of structural analyses of overlap among the constituent concept groups of the partial-area taxonomy. Such complex concepts, called overlapping concepts, constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy's content. A new methodology for partitioning the entire collection of overlapping concepts into singly-rooted groups, that are more manageable to work with and comprehend, is presented. Different kinds of overlapping concepts with varying degrees of complexity are identified. This leads to an abstract model of the overlapping concepts called the disjoint partial-area taxonomy, which serves as a vehicle for enhanced, high-level display. The methodology is demonstrated with an application to SNOMED's Specimen hierarchy. Overall, the resulting disjoint partial-area taxonomy offers a refined view of the hierarchy's structural organization and conceptual content that can aid users, such as maintenance personnel, working with SNOMED. The utility of the disjoint partial-area taxonomy as the basis for a SNOMED auditing regimen is presented in a companion paper.
    Journal of Biomedical Informatics 08/2011; 45(1):15-29. · 1.79 Impact Factor
  • Article: A survey of SNOMED CT direct users, 2010: impressions and preferences regarding content and quality.
    Gai Elhanan, Yehoshua Perl, James Geller
    [show abstract] [hide abstract]
    ABSTRACT: Little information exists concerning SNOMED CT (systematized nomenclature of medicine-clinical terms) users. This report describes current impressions and preferences of direct SNOMED CT users regarding coverage, quality, and concept details, and the change request mechanism. A 43-question anonymous survey distributed electronically to relevant online communities. Data on user demographic characteristics, modes and purposes of use, means and frequencies of access, satisfaction with SNOMED CT content coverage and quality and with the change request mechanism were recorded. The survey was conducted in January 2010 and elicited 215 responses. Details regarding users' profiles, modes of use and access were reported elsewhere. The coverage of SNOMED CT was perceived to be at least 85% complete by 42% of responders, and 60% were at least satisfied with its quality. Various deficiencies were encountered at least 'somewhat often' by 28-61% of responders. Incorrect data were more bothersome than missing data. Users indicated that significant resources should be allocated to more consistent and complete conceptual representations and to further enhance content coverage. Enhanced synonym coverage and the introduction of textual definitions were important to users (54% and 63%, respectively). A survey format with limited control over recruitment and selection bias. Lack of information regarding the SNOMED CT version used by responders. Despite overall satisfaction, direct users indicated a strong desire to improve consistency, quality, and completeness of conceptual representations and concept details, as well as a continued desire to expand coverage. The survey provides much needed data for informed decisions regarding the use and development goals of SNOMED CT. Focused periodical surveys are warranted.
    Journal of the American Medical Informatics Association 08/2011; 18 Suppl 1:i36-44. · 3.61 Impact Factor
  • Article: A prediction model for web search hit counts using word frequencies.
    Tian Tian, Soon Ae Chun, James Geller
    J. Information Science. 01/2011; 37:462-475.
  • Conference Proceeding: Google Knows Who is Famous Today - Building an Ontology from Search Engine Knowledge and DBpedia.
    Proceedings of the 5th IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, CA, USA, September 18-21, 2011; 01/2011
  • Conference Proceeding: Enhancing the Interface for Ontology-Supported Homonym Search.
    Tian Tian, James Geller, Soon Ae Chun
    Advanced Information Systems Engineering Workshops - CAiSE 2011 International Workshops, London, UK, June 20-24, 2011. Proceedings; 01/2011
  • Article: A Formal Approach to Evaluating Medical Ontology Systems using Naturalness.
    IJCMAM. 01/2010; 1:1-18.
  • Source
    Conference Proceeding: Predicting Web Search Hit Counts.
    Tian Tian, James Geller, Soon Ae Chun
    2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, Toronto, Canada, August 31 - September 3, 2010, Main Conference Proceedings; 01/2010
  • Article: Detecting duplicate biological entities using Shortest Path Edit Distance.
    Alex Rudniy, Min Song, James Geller
    [show abstract] [hide abstract]
    ABSTRACT: Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes.
    International Journal of Data Mining and Bioinformatics 01/2010; 4(4):395-410. · 0.43 Impact Factor
  • Conference Proceeding: Improving Web Search Results for Homonyms by Suggesting Completions from an Ontology.
    Tian Tian, James Geller, Soon Ae Chun
    Current Trends in Web Engineering - 10th International Conference on Web Engineering, ICWE 2010 Workshops, Vienna, Austria, July 2010, Revised Selected Papers; 01/2010
  • Article: A Survey of Direct Users and Uses of SNOMED CT: 2010 Status.
    Gai Elhanan, Yehoshua Perl, James Geller
    [show abstract] [hide abstract]
    ABSTRACT: SNOMED CT is gaining momentum and endorsements as an international clinical terminology. However, many vendors await a clearer business case and clients' demand. We conducted a survey of direct users of SNOMED CT to determine the current profile of users, modes of use, and attitudes towards different aspects of the terminology. A web-base survey, consisting of 43 questions was distributed in January 2010, and 215 responses were elicited. This paper summarizes findings regarding profiles of users and their SNOMED CT use. The results indicate significant use by non-researchers and by industry and government sectors. Many users are relative newcomers with less than 3 years experience with SNOMED CT, and production-related use was reported by 39% of respondents. Most users are satisfied with the level of content coverage. The results indicate that SNOMED CT has a solid footing in production systems, and that SCT is mostly used for concept searches and clinical coding.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:207-11.
  • Article: Auditing SNOMED Integration into the UMLS for Duplicate Concepts.
    [show abstract] [hide abstract]
    ABSTRACT: The UMLS contains terms from many sources. Every update of a source requires reintegration. Each new term needs to be assigned to a preexisting UMLS concept, or a new concept must be created. Whenever the integration process unnecessarily creates a new concept, this is undesirable. We report on a method to detect such undesirable duplicate concepts. Terms are removed from the UMLS and reintegrated using "piecewise synonym generation." The concept of the reintegrated term is programmatically compared to the initial concept of the term (before removal). If they are different, this indicates an error, either in the integration process or in the initial concept. Thus, such a term-concept pair is deemed suspicious. A study of five hierarchies of the SNOMED found 7.7% suspicious matches. A human expert needs to evaluate the correctness of suspicious concepts. In a sample of 149 of those, 19% of concepts were found to be duplicates.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:321-5.
  • Article: Shortest Path Edit Distance for Enhancing UMLS Integration and Audit.
    Alex Rudniy, James Geller, Min Song
    [show abstract] [hide abstract]
    ABSTRACT: Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:697-701.
  • Article: The Neighborhood Auditing Tool: A Hybrid Interface for Auditing the UMLS.
    [show abstract] [hide abstract]
    ABSTRACT: The UMLS's integration of more than 100 source vocabularies, not necessarily consistent with one another, causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports "neighborhood-based" auditing, where, at any given time, an auditor concentrates on a single focus concept and one of a variety of neighborhoods of its closely related concepts.Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings.The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a groupof case studies. Its impact is tested with a study involving a select group of auditors.
    Journal of Biomedical Informatics 03/2009; · 1.79 Impact Factor
  • Article: Using WordNet synonym substitution to enhance UMLS source integration.
    [show abstract] [hide abstract]
    ABSTRACT: Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synonyms, which can tax computational and expert review resources. Experiments are run using a synonym-substitution approach based on WordNet to see how constraining two methodological parameters, namely, "maximum number of substitutions per term" and "maximum term length," affects performance. Our hypothesis is that these values can be constrained rather tightly--thus greatly speeding up the methodology--without a marked decline in the additional matches produced. Furthermore, we investigate whether a limitation on only the first of the two parameters is sufficient to achieve the same results. A four-stage synonym-substitution methodology using WordNet is presented. A group of experiments is carried out in which the two methodological parameters "maximum number of substitutions per term" and "maximum term length" are varied. The purpose is to examine their effect on the growth in the number of potential synonyms generated and the associated loss of results. The experiments are based on the re-integration of the "Minimal Standard Terminology" (MST) into the UMLS. Synonym-substitution matches found to be inconsistent with the current content of the UMLS and thus deemed to be incorrect are further manually scrutinized as an audit of the original integration of the MST. An increase of 11% in the number of "MST term/UMLS term" matches was achieved using the synonym-substitution methodology. Importantly, this result prevailed when tight threshold values (such as a maximum of two synonym substitutions per term) were imposed on the parameters. Furthermore, it was found that limiting only the "maximum number of substitutions per term" parameter was sufficient to obtain the performance enhancement. During the additional audit phase, a number of the reported mismatches were actually seen to be correct, representing an additional 10% increase in the number of matches obtained. A synonym-substitution methodology that utilizes WordNet is a useful automated aide in UMLS source integration. Experiments showed that there was a significant speed-up but no degradation in match results when the methodology's "maximum number of substitutions per term" parameter was relatively tightly constrained. The methodology also helped to discover errors in the MST's original integration, and improve the quality of the UMLS's conceptual content.
    Artificial intelligence in medicine 01/2009; 46(2):97-109. · 1.65 Impact Factor

Institutions

  • 1998–2012
    • New Jersey Institute of Technology
      • Department of Computer Science
      Newark, NJ, USA
  • 2008–2011
    • Borough of Manhattan Community College
      New York City, NY, USA
  • 2007
    • University of Medicine & Dentistry of New Jersey
      Newark, NJ, USA
  • 2006
    • University of Missouri - Kansas City
      • School of Computing and Engineering
      Kansas City, MO, USA
  • 2004
    • Kean University
      USA