Journal of Biomedical Informatics (J Biomed Informat )

Publisher: Elsevier

Description

The Journal of Biomedical Informatics (formerly Computers and Biomedical Research) has been redesigned to reflect a commitment to high-quality original research papers and reviews in the area of biomedical informatics. Although published articles are motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, imaging, and bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices, and formal evaluations of completed systems, including clinical trials of information technologies, would generally be more suitable for publication in other venues. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report.

  • Impact factor
    2.13
  • 5-year impact
    2.43
  • Cited half-life
    4.40
  • Immediacy index
    0.55
  • Eigenfactor
    0.01
  • Article influence
    0.84
  • Website
    Journal of Biomedical Informatics website
  • Other titles
    Journal of biomedical informatics (Online)
  • ISSN
    1532-0480
  • OCLC
    45147742
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Elsevier

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Voluntary deposit by author of pre-print allowed on Institutions open scholarly website and pre-print servers
    • Voluntary deposit by author of authors post-print allowed on institutions open scholarly website including Institutional Repository
    • Deposit due to Funding Body, Institutional and Governmental mandate only allowed where separate agreement between repository and publisher exists
    • Set statement to accompany deposit
    • Published source must be acknowledged
    • Must link to journal home page or articles' DOI
    • Publisher's version/PDF cannot be used
    • Articles in some journals can be made Open Access on payment of additional charge
    • NIH Authors articles will be submitted to PMC after 12 months
    • Authors who are required to deposit in subject repositories may also use Sponsorship Option
    • Pre-print can not be deposited for The Lancet
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Insights about patterns of system use are often gained through the analysis of system log files, which record the actual behavior of users. In a clinical context, however, few attempts have been made to typify system use through log file analysis. The present study offers a framework for identifying, describing, and discerning among patterns of use of a clinical information retrieval system. We use the session attributes of volume, diversity, granularity, duration, and content to define a multidimensional space in which each specific session can be positioned. We also describe an analytical method for identifying the common archetypes of system use in this multidimensional space. We demonstrate the value of the proposed framework with a log file of the use of a health information exchange (HIE) system by physicians in an emergency department (ED) of a large Israeli hospital. The analysis reveals five distinct patterns of system use, which have yet to be described in the relevant literature. The results of this study have the potential to inform the design of HIE systems for efficient and effective use, thus increasing their contribution to the clinical decision-making process.
    Journal of Biomedical Informatics 07/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This work proposes a histology image indexing strategy based on multimodal representations obtained from the combination of visual features and associated semantic annotations. Both data modalities are complementary information sources for an image retrieval system, since visual features lack explicit semantic information and semantic terms do not usually describe the visual appearance of images. The paper proposes a novel strategy to build a fused image representation using matrix factorization algorithms and data reconstruction principles to generate a set of multimodal features. The methodology can seamlessly recover the multimodal representation of images without semantic annotations, allowing us to index new images using visual features only, and also accepting single example images as queries. Experimental evaluations on three different histology image data sets show that our strategy is a simple, yet effective approach to building multimodal representations for histology image search, and outperforms the response of the popular late fusion approach to combine information.
    Journal of Biomedical Informatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cost-benefit analysis is a prerequisite for making good business decisions. In the business environment, companies intend to make profit from maximizing information utility of published data while having an obligation to protect individual privacy. In this paper, we quantify the trade-off between privacy and data utility in health data publishing in terms of monetary value. We propose an analytical cost model that can help health information custodians (HICs) make better decisions about sharing person-specific health data with other parties. We examine relevant cost factors associated with the value of anonymized data and the possible damage cost due to potential privacy breaches. Our model guides an HIC to find the optimal value of publishing health data and could be utilized for both perturbative and non-perturbative anonymization techniques. We show that our approach can identify the optimal value for different privacy models, including K-anonymity, LKC-privacy, and ∊-differential privacy, under various anonymization algorithms and privacy parameters through extensive experiments on real-life data.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs. The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets. Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline. Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms. Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Medical documentation is a time-consuming task and there is a growing number of documentation requirements. In order to improve documentation, harmonization and standardization based on existing forms and medical concepts are needed. Systematic analysis of forms can contribute to standardization building upon new methods for automated comparison of forms. Objectives of this research are quantification and comparison of data elements for breast and prostate cancer to discover similarities, differences and reuse potential between documentation sets. In addition, common data elements for each entity should be identified by automated comparison of forms. A collection of 57 forms regarding prostate and breast cancer from quality management, registries, clinical documentation of two university hospitals (Erlangen, Münster), research datasets, certification requirements and trial documentation were transformed into the Operational Data Model (ODM). These ODM-files were semantically enriched with concept codes and analyzed with the compareODM algorithm. Comparison results were aggregated and lists of common concepts were generated. Grid images, dendrograms and spider charts were used for illustration. Overall, 1008 data elements for prostate cancer and 1232 data elements for breast cancer were analyzed. Average routine documentation consists of 390 data elements per disease entity and site. Comparisons of forms identified up to 20 comparable data elements in cancer conference forms from both hospitals. Urology forms contain up to 53 comparable data elements with quality management and up to 21 with registry forms. Urology documentation of both hospitals contains up to 34 comparable items with international common data elements. Clinical documentation sets share up to 24 comparable data elements with trial documentation. Within clinical documentation administrative items are most common comparable items. Selected common medical concepts are contained in up to 16 forms. The amount of documentation for cancer patients is enormous. There is an urgent need for standardized structured single source documentation. Semantic annotation is time-consuming, but enables automated comparison between different form types, hospital sites and even languages. This approach can help to identify common data elements in medical documentation. Standardization of forms and building up forms on the basis of coding systems is desirable. Several comparable data elements within the analyzed forms demonstrate the harmonization potential, which would enable better data reuse. Identifying common data elements in medical forms from different settings with systematic and automated form comparison is feasible.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Poor device design that fails to adequately account for user needs, cognition, and behavior is often responsible for use errors resulting in adverse events. This poor device design is also often latent, and could be responsible for "No Fault Found" (NFF) reporting, in which medical devices sent for repair by clinical users are found to be operating as intended. Unresolved NFF reports may contribute to incident under reporting, clinical user frustration, and biomedical engineering technologist inefficacy. This study uses human factors engineering methods to investigate the relationship between NFF reporting frequency and device usability. An analysis of medical equipment maintenance data was conducted to identify devices with a high NFF reporting frequency. Subsequently, semi-structured interviews and heuristic evaluations were performed in order to identify potential usability issues. Finally, usability testing was conducted in order to validate that latent usability related design faults result in a higher frequency of NFF reporting. The analysis of medical equipment maintenance data identified six devices with a high NFF reporting frequency. Semi-structured interviews, heuristic evaluations and usability testing revealed that usability issues caused a significant portion of the NFF reports. Other factors suspected to contribute to increased NFF reporting include accessory issues, intermittent faults and environmental issues. Usability testing conducted on three of the devices revealed 23 latent usability related design faults. These findings demonstrate that latent usability related design faults manifest themselves as an increase in NFF reporting and that devices containing usability related design faults can be identified through an analysis of medical equipment maintenance data.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interpretation of cardiotocogram (CTG) is a difficult task since its evaluation is complicated by a great inter- and intra-individual variability. Previous studies have predominantly analyzed clinicians' agreement on CTG evaluation based on quantitative measures (e.g. kappa coefficient) that do not offer any insight into clinical decision making. In this paper we aim to examine the agreement on evaluation in detail and provide data-driven analysis of clinical evaluation. For this study, nine obstetricians provided clinical evaluation of 634 CTG recordings (each ca. 60min long). We studied the agreement on evaluation and its dependence on the increasing number of clinicians involved in the final decision. We showed that despite of large number of clinicians the agreement on CTG evaluations is difficult to reach. The main reason is inherent inter- and intra-observer variability of CTG evaluation. Latent class model provides better and more natural way to aggregate the CTG evaluation than the majority voting especially for larger number of clinicians. Significant improvement was reached in particular for the pathological evaluation - giving a new insight into the process of CTG evaluation. Further, the analysis of latent class model revealed that clinicians unconsciously use four classes when evaluating CTG recordings, despite the fact that the clinical evaluation was based on FIGO guidelines where three classes are defined.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A myriad of new tools and algorithms have been developed to help public health professionals analyze and visualize the complex data used in infectious disease control. To better understand approaches to meet these users' information needs, we conducted a systematic literature review focused on the landscape of infectious disease visualization tools for public health professionals, with a special emphasis on geographic information systems (GIS), molecular epidemiology, and social network analysis. The objectives of this review are to: (1) identify public health user needs and preferences for infectious disease information visualization tools; (2) identify existing infectious disease information visualization tools and characterize their architecture and features; (3) identify commonalities among approaches applied to different data types; and (4) describe tool usability evaluation efforts and barriers to the adoption of such tools. We identified articles published in English from January 1, 1980 to June 30, 2013 from five bibliographic databases. Articles with a primary focus on infectious disease visualization tools, needs of public health users, or usability of information visualizations were included in the review. A total of 88 articles met our inclusion criteria. Users were found to have diverse needs, preferences and uses for infectious disease visualization tools, and the existing tools are correspondingly diverse. The architecture of the tools was inconsistently described, and few tools in the review discussed the incorporation of usability studies or plans for dissemination. Many studies identified concerns regarding data sharing, confidentiality and quality. Existing tools offer a range of features and functions that allow users to explore, analyze, and visualize their data, but the tools are often for siloed applications. Commonly cited barriers to widespread adoption included lack of organizational support, access issues, and misconceptions about tool use. As the volume and complexity of infectious disease data increases, public health professionals must synthesize highly disparate data to facilitate communication with the public and inform decisions regarding measures to protect the public's health. Our review identified several themes: consideration of users' needs, preferences, and computer literacy; integration of tools into routine workflow; complications associated with understanding and use of visualizations; and the role of user trust and organizational support in the adoption of these tools. Interoperability also emerged as a prominent theme, highlighting challenges associated with the increasingly collaborative and interdisciplinary nature of infectious disease control and prevention. Future work should address methods for representing uncertainty and missing data to avoid misleading users as well as strategies to minimize cognitive overload. Funding for this study was provided by the NIH (Grant# 1R01LM011180-01A1).
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Create an automated algorithm for predicting elderly patients' medication-related risks for readmission and validate it by comparing results with a manual analysis of the same patient population. Outcome and Assessment Information Set (OASIS) and medication data were reused from a previous, manual study of 911 patients from 15 Medicare-certified home health care agencies. The medication data was converted into standardized drug codes using APIs managed by the National Library of Medicine (NLM), and then integrated in an automated algorithm that calculates patients' high risk medication regime scores (HRMRs). A comparison of the results between algorithm and manual process was conducted to determine how frequently the HRMR scores were derived which are predictive of readmission. HRMR scores are composed of polypharmacy (number of drugs), Potentially Inappropriate Medications (PIM) (drugs risky to the elderly), and Medication Regimen Complexity Index (MRCI) (complex dose forms, instructions or administration). The algorithm produced polypharmacy, PIM, and MRCI scores that matched with 99, 87 and 99 percent of the scores, respectively, from the manual analysis. Imperfect match rates resulted from discrepancies in how drugs were classified and coded by the manual analysis vs. the automated algorithm. HRMR rules lack clarity, resulting in clinical judgments for manual coding that were difficult to replicate in the automated analysis. The high comparison rates for the three measures suggest that an automated clinical tool could use patients' medication records to predict their risks of avoidable readmissions.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Advanced Cardiac Life Support (ACLS) is a series of team-based, sequential and time constrained interventions, requiring effective communication and coordination of activities that are performed by the care provider team on a patient undergoing cardiac arrest or respiratory failure. The state-of-the-art ACLS training is conducted in a face-to-face environment under expert supervision and suffers from several drawbacks including conflicting care provider schedules and high cost of training equipment. The major objective of the study is to describe, including the design, implementation, and evaluation of a novel approach of delivering ACLS training to care providers using the proposed virtual reality simulator that can overcome the challenges and drawbacks imposed by the traditional face-to-face training method. We compare the efficacy and performance outcomes associated with traditional ACLS training with the proposed novel approach of using a virtual reality (VR) based ACLS training simulator. One hundred and forty eight (148) ACLS certified clinicians, translating into 26 care provider teams, were enrolled for this study. Each team was randomly assigned to one of the three treatment groups: control (traditional ACLS training), persuasive (VR ACLS training with comprehensive feedback components), or minimally persuasive (VR ACLS training with limited feedback components). The teams were tested across two different ACLS procedures that vary in the degree of task complexity: ventricular fibrillation or tachycardia (VFib/VTach) and pulseless electric activity (PEA). The difference in performance between control and persuasive groups was not statistically significant (P = .37 for PEA and P = .1 for VFib/VTach). However, the difference in performance between control and minimally persuasive groups was significant (P = .05 for PEA and P = .02 for VFib/VTach). The pre-post comparison of performances of the groups showed that control (P = .017 for PEA, P = .01 for VFib/VTach) and persuasive (P = .02 for PEA, P = .048 for VFib/VTach) groups improved their performances significantly, whereas minimally persuasive group did not (P = .45 for PEA, P = .46 for VFib/VTach). Results also suggest that the benefit of persuasiveness is constrained by the potentially interruptive nature of these features. Our results indicate that the VR-based ACLS training with proper feedback components can provide a learning experience similar to face-to-face training, and therefore could serve as a more easily accessed supplementary training tool to the traditional ACLS training. Our findings also suggest that the degree of persuasive features in VR environments have to be designed considering the interruptive nature of the feedback elements.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: While the study of privacy preserving data publishing has drawn a lot of interest, some recent work has shown that existing mechanisms do not limit all inferences about individuals. This paper is a positive note in response to this finding. We point out that not all inference attacks should be countered, in contrast to all existing works known to us, and based on this we propose a model called SPLU. This model protects sensitive information, by which we refer to answers for aggregate queries with small sums, while queries with large sums are answered with higher accuracy. Using SPLU, we introduce a sanitization algorithm to protect data while maintaining high data utility for queries with large sums. Empirical results show that our method behaves as desired.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time. We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts). Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: The ability to predict acuity (patients' care needs), would provide a powerful tool for health care managers to allocate resources. Such estimations and predictions for the care process can be produced from the vast amounts of healthcare data using information technology and computational intelligence techniques. Tactical decision-making and resource allocation may also be supported with different mathematical optimization models. Methods: This study was conducted with a data set comprising electronic nursing narratives and the associated Oulu Patient Classification (OPCq) acuity. A mathematical model for the automated assignment of patient acuity scores was utilized and evaluated with the pre-processed data from 23,528 electronic patient records. The methods to predict patient's acuity were based on linguistic pre-processing, vector-space text modeling, and regularized least-squares regression. Results: The experimental results show that it is possible to obtain accurate predictions about patient acuity scores for the coming day based on the assigned scores and nursing notes from the previous day. Making same-day predictions leads to even better results, as access to the nursing notes for the same day boosts the predictive performance. Furthermore, textual nursing notes allow for more accurate predictions than previous acuity scores. The best results are achieved by combining both of these information sources. The developed model achieves a concordance index of 0.821 when predicting the patient acuity scores for the following day, given the scores and text recorded on the previous day. Conclusions: By applying language technology to electronic patient documents it is possible to accurately predict the value of the acuity scores of the coming day based on the previous daýs assigned scores and nursing notes.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The anonymization of health data streams is important to protect these data against potential privacy breaches. A large number of research studies aiming at offering privacy in the context of data streams has been recently conducted. However, the techniques that have been proposed in these studies generate a significant delay during the anonymization process, since they concentrate on applying existing privacy models (e.g., k-anonymity and l-diversity) to batches of data extracted from data streams in a period of time. In this paper, we present delay-free anonymization, a framework for preserving the privacy of electronic health data streams. Unlike existing works, our method does not generate an accumulation delay, since input streams are anonymized immediately with counterfeit values. We further devise late validation for increasing the data utility of the anonymization results and managing the counterfeit values. Through experiments, we show the efficiency and effectiveness of the proposed method for the real-time release of data streams.
    Journal of Biomedical Informatics 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators.
    Journal of Biomedical Informatics 03/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene set enrichment analysis (GSEA) annotates gene microarray data with functional information from the biomedical literature to improve gene-disease association prediction. We hypothesize that supplementing GSEA with comprehensive gene function catalogs built automatically using information extracted from the scientific literature will significantly enhance GSEA prediction quality. Gold standard gene sets for breast cancer (BrCa) and colorectal cancer (CRC) were derived from the literature. Two gene function catalogs (CMeSH and CUMLS) were automatically generated. 1. by using Entrez Gene to associate all recorded human genes with PubMed article IDs. 2. using the genes mentioned in each PubMed article and associating each with the article's MeSH terms (in CMeSH) and extracted UMLS concepts (in CUMLS). Microarray data from the Gene Expression Omnibus for BrCa and CRC was then annotated using CMeSH and CUMLS and for comparison, also with several pre-existing catalogs (C2, C4 and C5 from the Molecular Signatures Database). Ranking was done using, a standard GSEA implementation (GSEA-p). Gene function predictions for enriched array data were evaluated against the gold standard by measuring area under the receiver operating characteristic curve (AUC). Comparison of ranking using the literature enrichment catalogs, the pre-existing catalogs as well as five randomly generated catalogs show the literature derived enrichment catalogs are more effective. The AUC for BrCa using the unenriched gene expression dataset was 0.43, increasing to 0.89 after gene set enrichment with CUMLS. The AUC for CRC using the unenriched gene expression dataset was 0.54, increasing to 0.9 after enrichment with CMeSH. C2 increased AUC (BrCa 0.76, CRC 0.71) but C4 and C5 performed poorly (between 0.35 and 0.5). The randomly generated catalogs also performed poorly, equivalent to random guessing. Gene set enrichment significantly improved prediction of gene-disease association. Selection of enrichment catalog had a substantial effect on prediction accuracy. The literature based catalogs performed better than the MSigDB catalogs, possibly because they are more recent. Catalogs generated automatically from the literature can be kept up to date. Prediction of gene-disease association is a fundamental task in biomedical research. GSEA provides a promising method when using literature-based enrichment catalogs.
    Journal of Biomedical Informatics 03/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Proliferation of health information technologies creates opportunities to improve clinical and public health, including high quality, safer care and lower costs. To maximize such potential benefits, health information technologies must readily and reliably exchange information with other systems. However, evidence from public health surveillance programs in two states suggests that operational clinical information systems often fail to use available standards, a barrier to semantic interoperability. Furthermore, analysis of existing policies incentivizing semantic interoperability suggests they have limited impact and are fragmented. In this essay, we discuss three approaches for increasing semantic interoperability to support national goals for using health information technologies. A clear, comprehensive strategy requiring collaborative efforts by clinical and public health stakeholders is suggested as a guide for the long road towards better population health data and outcomes.
    Journal of Biomedical Informatics 03/2014;

Related Journals