Journal of Biomedical Informatics (J Biomed Informat)

Publisher: Elsevier

Journal description

The Journal of Biomedical Informatics (formerly Computers and Biomedical Research) has been redesigned to reflect a commitment to high-quality original research papers and reviews in the area of biomedical informatics. Although published articles are motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, imaging, and bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices, and formal evaluations of completed systems, including clinical trials of information technologies, would generally be more suitable for publication in other venues. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report.

Current impact factor: 2.19

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 2.194
2013 Impact Factor 2.482
2012 Impact Factor 2.131
2011 Impact Factor 1.792
2010 Impact Factor 1.719
2009 Impact Factor 2.432
2008 Impact Factor 1.924
2007 Impact Factor 2
2006 Impact Factor 2.346
2005 Impact Factor 2.388
2004 Impact Factor 1.013
2003 Impact Factor 0.855
2002 Impact Factor 0.862

Impact factor over time

Impact factor

Additional details

5-year impact 3.44
Cited half-life 5.20
Immediacy index 0.61
Eigenfactor 0.01
Article influence 1.14
Website Journal of Biomedical Informatics website
Other titles Journal of biomedical informatics (Online)
ISSN 1532-0480
OCLC 45147742
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details


  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Authors pre-print on any website, including arXiv and RePEC
    • Author's post-print on author's personal website immediately
    • Author's post-print on open access repository after an embargo period of between 12 months and 48 months
    • Permitted deposit due to Funding Body, Institutional and Governmental policy or mandate, may be required to comply with embargo periods of 12 months to 48 months
    • Author's post-print may be used to update arXiv and RepEC
    • Publisher's version/PDF cannot be used
    • Must link to publisher version with DOI
    • Author's post-print must be released with a Creative Commons Attribution Non-Commercial No Derivatives License
    • Publisher last reviewed on 03/06/2015
  • Classification
    ​ green

Publications in this journal

  • Fan-Shu Chen · Zhen-Ran Jiang
    [Show abstract] [Hide abstract]
    ABSTRACT: Predicting Anatomical Therapeutic Chemical (ATC) code of drugs is of vital importance for drug classification and repositioning. Discovering new association information related to drugs and ATC codes is still difficult for this topic. We propose a novel method named drug-domain hybrid (dD-Hybrid) incorporating drug-domain interaction network information into prediction models to predict drug's ATC codes. It is based on the assumption that drugs interacting with the same domain tend to share therapeutic effects. The results demonstrated dD-Hybrid has comparable performance to other methods on the gold standard dataset. Further, several new predicted drug-ATC pairs have been verified by experiments, which offer a novel way to utilize drugs for new purposes effectively.
    Journal of Biomedical Informatics 10/2015; DOI:10.1016/j.jbi.2015.09.016
  • [Show abstract] [Hide abstract]
    ABSTRACT: Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDI). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work.
    Journal of Biomedical Informatics 10/2015; DOI:10.1016/j.jbi.2015.09.015
  • [Show abstract] [Hide abstract]
    ABSTRACT: The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1,304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.
    Journal of Biomedical Informatics 10/2015; DOI:10.1016/j.jbi.2015.09.018
  • [Show abstract] [Hide abstract]
    ABSTRACT: Utilizing external collections to improve retrieval performance is challenging research because various test collections are created for different purposes. Improving medical information retrieval has also gained much attention as various types of medical documents have become available to researchers ever since they started storing them in machine processable formats. In this paper, we propose an effective method of utilizing external collections based on the pseudo relevance feedback approach. Our method incorporates the structure of external collections in estimating individual components in the final feedback model. Extensive experiments on three medical collections (TREC CDS, CLEF eHealth, and OHSUMED) were performed, and the results were compared with a representative expansion approach utilizing the external collections to show the superiority of our method.
    Journal of Biomedical Informatics 10/2015; DOI:10.1016/j.jbi.2015.09.017
  • [Show abstract] [Hide abstract]
    ABSTRACT: Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.
    Journal of Biomedical Informatics 10/2015; DOI:10.1016/j.jbi.2015.09.012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Clinical pathways translate best available evidence into practice, indicating the most widely applicable order of treatment interventions for particular treatment goals. We propose a practice-based clinical pathway development process and a data-driven methodology for extracting common clinical pathways from electronic health record (EHR) data that is patient-centered, consistent with clinical workflow, and facilitates evidence-based care. Materials and methods: Visit data of 1,576 chronic kidney disease (CKD) patients who developed acute kidney injury (AKI) from 2009 to 2013 are extracted from the EHR. We model each patient's multi-dimensional clinical records into one-dimensional sequences using novel constructs designed to capture information on each visit's purpose, procedures, medications and diagnoses. Analysis and clustering on visit sequences identify distinct types of patient subgroups. Characterizing visit sequences as Markov chains, significant transitions are extracted and visualized into clinical pathways across subgroups. Results: We identified 31 patient subgroups whose extracted clinical pathways provide insights on how patients' conditions and medication prescriptions may progress over time. We identify pathways that show typical disease progression, practices that are consistent with guidelines, and sustainable improvements in patients' health conditions. Visualization of pathways depicts the likelihood and direction of disease progression under varied contexts. Discussion and conclusions: Accuracy of EHR data and diversity in patients' conditions and practice patterns are critical challenges in learning insightful practice-based clinical pathways. Learning and visualizing clinical pathways from actual practice data captured in the EHR may facilitate efficient practice review by healthcare providers and support patient engagement in shared decision making.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.009
  • [Show abstract] [Hide abstract]
    ABSTRACT: For the 2014 i2b2/UTHealth de-identification challenge, we introduced a new non-parametric Bayesian hidden Markov model using a Dirichlet Process (HMM-DP). The model intends to reduce task-specific feature engineering and to generalize well to new data. In the challenge we developed a variational method to learn the model and an efficient approximation algorithm for prediction. To accommodate out-of-vocabulary words, we designed a number of feature functions to model such words. The results show the model is capable of understanding local context cues to make correct predictions without manual feature engineering and performs as accurately as state-of-the-art conditional random field models in a number of categories. To incorporate long-range and cross-document context cues, we developed a skip-chain conditional random field model to align the results produced by HMM-DP, which further improved the performance.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.004
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective: With the ARX Data Anonymization Tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method. Methods: In this article, we first confirm experimentally that combining generalization with suppression significantly increases data utility. Next, we proof that, within this coding model, the outcome of data transformations regarding privacy and utility cannot be predicted. As a consequence, existing algorithms fail to deliver optimal data utility. We confirm this finding experimentally. The limitation of previous work can be overcome at the cost of increased computational complexity. However, scalability is important for anonymizing data with user feedback. Consequently, we identify properties of datasets that may be predicted in our context and propose a novel and efficient algorithm. Finally, we evaluate our solution with multiple datasets and privacy models. Results: This work presents the first thorough investigation of which properties of datasets can be predicted when data is anonymized with generalization and suppression. Our novel approach adopts existing optimization strategies to our context and combines different search methods. The experiments show that our method is able to efficiently solve a broad spectrum of anonymization problems. Conclusion: Our work shows that implementing syntactic privacy models is challenging and that existing algorithms are not well suited for anonymization data with transformation models which are more complex than generalization alone. As such models have been recommended for use in the biomedical domain, our results are of general relevance for de-identifying structured biomedical data.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.007
  • [Show abstract] [Hide abstract]
    ABSTRACT: Coronary artery disease (CAD) is the leading cause of death in both the UK and worldwide. The detection of related risk factors and tracking their progress over time is of great importance for early prevention and treatment of CAD. This paper describes an information extraction system that was developed to automatically identify risk factors for heart disease in medical records while the authors participated in the 2014 i2b2/UTHealth NLP Challenge. Our approaches rely on several nature language processing (NLP) techniques such as machine learning, rule-based methods, and dictionary-based keyword spotting to cope with complicated clinical contexts inherent in a wide variety of risk factors. Our system achieved encouraging performance on the challenge test data with an overall micro-averaged F-measure of 0.915, which was competitive to the best system (F-measure of 0.927) of this challenge task.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.006
  • [Show abstract] [Hide abstract]
    ABSTRACT: The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. This system was also used to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.08.025
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task. Efforts in this process can be expedited if these coordinators are directed toward specific parts of the text that are relevant for eligibility determination. In this study, we describe the creation of a dataset that can be used to evaluate automated methods capable of identifying sentences in a note that are relevant for screening a patient's eligibility in clinical trials. Using this dataset, we also present results for four simple methods in natural language processing that can be used to automate this task. We found that this is a challenging task (maximum F-score=26.25), but it is a promising direction for further research.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.008
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objectives: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Methods: Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Results: Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. Conclusion: In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non-English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to generate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad-hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transference quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Spanish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842).
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.08.026
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background and objective: Risk stratification aims to provide physicians with the accurate assessment of a patient's clinical risk such that an individualized prevention or management strategy can be developed and delivered. Existing risk stratification techniques mainly focus on predicting the overall risk of an individual patient in a supervised manner, and, at the cohort level, often offer little insight beyond a flat score-based segmentation from the labeled clinical dataset. To this end, in this paper, we propose a new approach for risk stratification by exploring a large volume of electronic health records (EHRs) in an unsupervised fashion. Methods: Along this line, this paper proposes a novel probabilistic topic modeling framework called probabilistic risk stratification model (PRSM) based on Latent Dirichlet Allocation (LDA). The proposed PRSM recognizes a patient clinical state as a probabilistic combination of latent sub-profiles, and generates sub-profile-specific risk tiers of patients from their EHRs in a fully unsupervised fashion. The achieved stratification results can be easily recognized as high-, medium- and low-risk, respectively. In addition, we present an extension of PRSM, called weakly supervised PRSM (WS-PRSM) by incorporating minimum prior information into the model, in order to improve the risk stratification accuracy, and to make our models highly portable to risk stratification tasks of various diseases. Results: We verify the effectiveness of the proposed approach on a clinical dataset containing 3463 coronary heart disease (CHD) patient instances. Both PRSM and WS-PRSM were compared with two established supervised risk stratification algorithms, i.e., logistic regression and support vector machine, and showed the effectiveness of our models in risk stratification of CHD in terms of the Area Under the receiver operating characteristic Curve (AUC) analysis. As well, in comparison with PRSM, WS-PRSM has over 2% performance gain, on the experimental dataset, demonstrating that incorporating risk scoring knowledge as prior information can improve the performance in risk stratification. Conclusions: Experimental results reveal that our models achieve competitive performance in risk stratification in comparison with existing supervised approaches. In addition, the unsupervised nature of our models makes them highly portable to the risk stratification tasks of various diseases. Moreover, patient sub-profiles and sub-profile-specific risk tiers generated by our models are coherent and informative, and provide significant potential to be explored for the further tasks, such as patient cohort analysis. We hypothesize that the proposed framework can readily meet the demand for risk stratification from a large volume of EHRs in an open-ended fashion.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.005
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. Methods: We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Results: Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). Conclusions: Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.003
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.002
  • [Show abstract] [Hide abstract]
    ABSTRACT: Healthcare is in a period significant transformational activity through the accelerated adoption of healthcare technologies, new reimbursement systems that emphasize shared savings and care coordination, and the common place use of mobile technologies by patients, providers, and others. The complexity of healthcare creates barriers to transformational activity and has the potential to inhibit the desired paths toward change envisioned by policymakers. Methods for understanding how change is occurring within this complex environment are important to the evaluation of delivery system reform and the role of technology in healthcare transformation. This study examines the use on an integrative review methodology to evaluate the healthcare literature for evidence of technology transformation in healthcare. The methodology integrates the evaluation of a broad set of literature with an established evaluative framework to develop a more complete understanding of a particular topic. We applied this methodology and the framework of punctuated equilibrium (PEq) to the analysis of the healthcare literature from 2004 to 2012 for evidence of technology transformation, a time during which technology was at the forefront of healthcare policy. The analysis demonstrated that the established PEq framework applied to the literature showed considerable potential for evaluating the progress of policies that encourage healthcare transformation. Significant inhibitors to change were identified through the integrative review and categorized into ten themes that describe the resistant structure of healthcare delivery: variations in the environment; market complexity; regulations; flawed risks and rewards; change theories; barriers; ethical considerations; competition and sustainability; environmental elements, and internal elements. We hypothesize that the resistant nature of the healthcare system described by this study creates barriers to the direct consumer involvement and engagement necessary for transformational change. Future policies should be directed at removing these barriers by demanding and emphasizing open technologies and unrestricted access to data versus as currently prescribed by technology vendors, practitioners, and policies that perpetuate market equilibrium.
    Journal of Biomedical Informatics 09/2015; DOI:10.1016/j.jbi.2015.09.014