Journal of Biomedical Informatics (J Biomed Informat)

Publisher: Elsevier

Journal description

The Journal of Biomedical Informatics (formerly Computers and Biomedical Research) has been redesigned to reflect a commitment to high-quality original research papers and reviews in the area of biomedical informatics. Although published articles are motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, imaging, and bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices, and formal evaluations of completed systems, including clinical trials of information technologies, would generally be more suitable for publication in other venues. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report.

Current impact factor: 2.48

Impact Factor Rankings

2015 Impact Factor Available summer 2015
2013 / 2014 Impact Factor 2.482
2012 Impact Factor 2.131
2011 Impact Factor 1.792
2010 Impact Factor 1.719
2009 Impact Factor 2.432
2008 Impact Factor 1.924
2007 Impact Factor 2
2006 Impact Factor 2.346
2005 Impact Factor 2.388
2004 Impact Factor 1.013
2003 Impact Factor 0.855
2002 Impact Factor 0.862

Impact factor over time

Impact factor

Additional details

5-year impact 2.43
Cited half-life 4.40
Immediacy index 0.55
Eigenfactor 0.01
Article influence 0.84
Website Journal of Biomedical Informatics website
Other titles Journal of biomedical informatics (Online)
ISSN 1532-0480
OCLC 45147742
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details


  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Pre-print allowed on any website or open access repository
    • Voluntary deposit by author of authors post-print allowed on authors' personal website, or institutions open scholarly website including Institutional Repository, without embargo, where there is not a policy or mandate
    • Deposit due to Funding Body, Institutional and Governmental policy or mandate only allowed where separate agreement between repository and the publisher exists.
    • Permitted deposit due to Funding Body, Institutional and Governmental policy or mandate, may be required to comply with embargo periods of 12 months to 48 months .
    • Set statement to accompany deposit
    • Published source must be acknowledged
    • Must link to journal home page or articles' DOI
    • Publisher's version/PDF cannot be used
    • Articles in some journals can be made Open Access on payment of additional charge
    • NIH Authors articles will be submitted to PubMed Central after 12 months
    • Publisher last contacted on 18/10/2013
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: The second track of the 2014 i2b2/UTHealth Natural Language Processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.001
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3-Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also had considerable struggles. The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.008
  • [Show abstract] [Hide abstract]
    ABSTRACT: Polarity classification is the main subtask of sentiment analysis and opinion mining, well-known problems in natural language processing that have attracted increasing attention in recent years. Existing approaches mainly rely on the subjective part of text in which sentiment is expressed explicitly through specific words, called sentiment words. These approaches, however, are still far from being good in the polarity classification of patients' experiences since they are often expressed without any explicit expression of sentiment, but an undesirable or desirable effect of the experience implicitly indicates a positive or negative sentiment. This paper presents a method for polarity classification of patients' experiences of drugs using domain knowledge. We first build a knowledge base of polar facts about drugs, called FactNet, using extracted patterns from Linked Data sources and relation extraction techniques. Then, we extract generalized semantic patterns of polar facts and organize them into a hierarchy in order to overcome the missing knowledge issue. Finally, we apply the extracted knowledge, i.e., polar fact instances and generalized patterns, for the polarity classification task. Different from previous approaches for personal experience classification, the proposed method explores the potential benefits of polar facts in domain knowledge aiming to improve the polarity classification performance, especially in the case of indirect implicit experiences, i.e., experiences which express the effect of one entity on other ones without any sentiment words. Using our approach, we have extracted 9703 triplets of polar facts at a precision of 92.26 percent. In addition, experiments on drug reviews demonstrate that our approach can achieve 79.78 percent precision in polarity classification task, and outperforms the state-of-the-art sentiment analysis and opinion mining methods. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.017
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a model that predicts future changes in systolic blood pressure (SBP) based on structured and unstructured (text-based) information from longitudinal clinical records. For each patient, the clinical records are sorted in chronological order and SBP measurements are extracted from them. The model predicts future changes in SBP based on the preceding clinical notes. This is accomplished using least median squares regression on salient features found using a feature selection algorithm. Using the prediction model, a correlation coefficient of 0.47 is achieved on unseen test data (p < .0001). This is in contrast to a baseline correlation coefficient of 0.39. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.024
  • [Show abstract] [Hide abstract]
    ABSTRACT: A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entites can be reliably identified, particularly challenging were mentions of Organisations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.029
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.030
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite considerable research efforts, the process of metastasis formation is still a subject of intense discussion, and even established models differ considerably in basic details and in the conclusions drawn from them. Mathematical and computational models add a new perspective to the research as they can quantitatively investigate the processes of metastasis and the effects of treatment. However, existing models look at only one treatment option at a time. We enhanced a previously developed computer model (called CaTSiT) that enables quantitative comparison of different metastasis formation models with clinical and experimental data, to include the effects of chemotherapy, external beam radiation, radioimmunotherapy and radioembolization. CaTSiT is based on a discrete event simulation procedure. The growth of the primary tumor and its metastases is modeled by a piecewise-defined growth function that describes the growth behavior of the primary tumor and metastases during various time intervals. The piecewise-defined growth function is composed of analytical functions describing the growth behavior of the tumor based on characteristics of the tumor, such as dormancy, or the effects of various therapies. The spreading of malignant cells into the blood is modeled by intravasation events, which are generated according to a rate function. Further events in the model describe the behavior of the released malignant cells until the formation of a new metastasis. The model is published under the GNU General Public License version 3. To demonstrate the application of the computer model, a case of a patient with a hepatocellular carcinoma and multiple metastases in the liver was simulated. Besides the untreated case, different treatments were simulated at two time points: one directly after diagnosis of the primary tumor and the other several months later. Except for early applied radioimmunotherapy, no treatment strategy was able to eliminate all metastases. These results emphasize the importance of early diagnosis and of proceeding with treatment even if no clinically detectable metastases are present at the time of diagnosis of the primary tumor. CaTSiT could be a valuable tool for quantitative investigation of the process of tumor growth and metastasis formation, including the effects of various treatment options. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.011
  • [Show abstract] [Hide abstract]
    ABSTRACT: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection. We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review. In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition. Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWAS) are a powerful tool for pathogenetic studies of complex diseases. The rich genetic information of GWAS data is mostly not fully utilized. In this study, we developed a sliding window-based genotype dependence testing tool SWGDT. SWGDT can be applied to GWAS data for genome-wide susceptibility gene scan utilizing known causal gene information. To evaluate the performance of SWGDT, a real GWAS dataset of Kashin-Beck disease (KBD) was analyzed. Immunohistochemisty was also performed to validate the relevance of identified gene with KBD. SWGDT analysis of KBD GWAS data identified a novel candidate gene TACR1 for KBD. Immunohistochemisty observed that the expression level of TACR1 protein in KBD articular cartilage was significantly higher than that in healthy articular cartilage. The real GWAS data analysis results illustrate the performance of SWGDT for genome-wide susceptibility gene scan. SWGDT can help to identify novel disease genes that may be missed by GWAS. Copyright © 2015 Elsevier Inc. All rights reserved.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.025
  • [Show abstract] [Hide abstract]
    ABSTRACT: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe / CLEF eHealth Task. For NER (strict span-only), our system achieves precision = 0.797, recall = 0.713, f-score = 0.753. For the normalization task (strict span + concept) it achieves precision = 0.712, recall = 0.637, f-score = 0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step towards NER and normalization methods that are trainable to a wide variety of domains and entities. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Older adults are at increased risk of adverse drug events due to medication. Older adults tend to take more medication and are at higher risk of chronic illness. Over-the-counter (OTC) medication does not require healthcare provider oversight and understanding OTC information is heavily dependent on a consumer's ability to understand and use the medication appropriately. Coupling health technology with effective communication is one approach to address the challenge of communicating health and improving health related tasks. However, the success of many health technologies also depends on how well the technology is designed and how well it addresses users needs. This is especially true for the older adult population. This paper describes 1) a formative study performed to understand how to design novel health technology to assist older adults with OTC medication information, and 2) how a user-centered design process helped to refine the initial assumptions of user needs and help to conceptualize the technology. An iterative design process was used. The process included two brainstorming and review sessions with human-computer interaction researchers and design sessions with older adults in the form of semi-structured interviews. Methods and principles of user-centered research and design were used to inform the research design. Two researchers with expertise in human-computer interaction performed expert reviews of early system prototypes. After initial prototypes were developed, seven older adults were engaged in semi-structured interviews to understand usability concerns and features and functionality older adults may find useful for selecting appropriate OTC medication. Eight usability concerns were discovered and addressed in the two rounds of expert review, and nine additional usability concerns were discovered in design sessions with older adults. Five themes emerged from the interview transcripts as recommendations for design. These recommendations represent opportunities for technology such as the one described in this paper to support older adults in the OTC decision-making process. This paper illustrates the use of an iterative user-centered process in the formative stages of design and its usefulness for understanding aspects of the technology design that are useful to older adults when making decisions about OTC medication. The technology support mechanisms included in the initial model were revised based on the results from the iterative design sessions and helped to refine and conceptualize the system being designed. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.006
  • [Show abstract] [Hide abstract]
    ABSTRACT: Youth are prolific users of cell phone minutes and text messaging. Numerous programs using short message service text messaging (SMS) have been employed to help improve health behaviors and health outcomes. However, we lack information on whether and what type of interaction or engagement with SMS program content is required to realize any benefit. We explored youth engagement with an automated SMS program designed to supplement a 25-session youth development program with demonstrated efficacy for reductions in teen pregnancy. Using two years of program data, we report on youth participation in design of message content and response frequency to messages among youth enrolled in the intervention arm of a randomized controlled trial (RCT) as one indicator of engagement. There were 221 youth between the ages of 14-18 enrolled over two years in the intervention arm of the RCT. Just over half (51%) were female; 56% were Hispanic; and 27% African American. Youth were sent 40,006 messages of which 16,501 were considered bi-directional where youth were asked to text a response. Four-fifths (82%) responded at least once to a text. We found variations in response frequency by gender, age, and ethnicity. The most popular types of messages youth responded to include questions and quizzes. The first two months of the program in each year had the highest response frequency. An important next step is to assess whether higher response to SMS results in greater efficacy. This future work can facilitate greater attention to message design and content to ensure messages are engaging for the intended audience. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.003
  • [Show abstract] [Hide abstract]
    ABSTRACT: The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.005
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the present work a cardiovascular simulator designed both for clinical and training use is presented. The core of the simulator is a lumped parameter model of the cardiovascular system provided with several modules for the representation of baroreflex control, blood transfusion, ventricular assist device (VAD) therapy and drug infusion. For the training use, a Pre-Set Disease module permits to select one or more cardiovascular diseases with a different level of severity. For the clinical use a Self-Tuning module was implemented. In this case, the user can insert patient's specific data and the simulator will automatically tune its parameters to the desired hemodynamic condition. The simulator can be also interfaced with external systems such as the Specialist Decision Support System (SDSS) devoted to address the choice of the appropriate level of VAD support based on the clinical characteristics of each patient. The Pre-Set Disease module permits to reproduce a wide range of pre-set cardiovascular diseases involving heart, systemic and pulmonary circulation. In addition, the user can test different therapies as drug infusion, VAD therapy and volume transfusion. The Self-Tuning module was tested on six different hemodynamic conditions, including a VAD patient condition. In all cases the simulator permitted to reproduce the desired hemodynamic condition with an error < 10%. The cardiovascular simulator could be of value in clinical arena. Clinicians and students can utilize the Pre-set Diseases module for training and to get an overall knowledge of the pathophysiology of common cardiovascular diseases. The Self-Tuning module is prospected as a useful tool to visualize patient's status, test different therapies and get more information about specific hemodynamic conditions. In this sense, the simulator, in conjunction with SDSS, constitutes a support to clinical decision - making. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.004
  • [Show abstract] [Hide abstract]
    ABSTRACT: The physical spaces within which the work of health occurs - the home, the intensive care unit, the emergency room, even the bedroom - influence the manner in which behaviors unfold, and may contribute to efficacy and effectiveness of health interventions. Yet the study of such complex workspaces is difficult. Health care environments are complex, chaotic workspaces that do not lend themselves to the typical assessment approaches used in other industrial settings. This paper provides two methodological advances for studying internal health care environments: a strategy to capture salient aspects of the physical environment and a suite of approaches to visualize and analyze that physical environment. We used a Faro ™ laser scanner to obtain point cloud data sets of the internal aspects of home environments. The point cloud enables precise measurement, including the location of physical boundaries and object perimeters, color, and light, in an interior space that can be translated later for visualization on a variety of platforms. The work was motivated by vizHOME, a multi-year program to intensively examine the home context of personal health information management in a way that minimizes repeated, intrusive, and potentially disruptive in-vivo assessments. Thus, we illustrate how to capture, process, display, and analyze point clouds using the home as a specific example of a health care environment. Our work presages a time when emerging technologies facilitate inexpensive capture and efficient management of point cloud data, thus enabling visual and analytical tools for enhanced discharge planning, new insights for designers of consumer-facing clinical informatics solutions, and a robust approach to context-based studies of health-related work environments. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.007
  • [Show abstract] [Hide abstract]
    ABSTRACT: Automated feature extraction from medical images is an important task in imaging informatics. We describe a graph-based technique for automatically identifying vascular substructures within a vascular tree segmentation. We illustrate our technique using vascular segmentations from computed tomography pulmonary angiography images. The segmentations were acquired in a semi-automated fashion using existing segmentation tools. A 3D parallel thinning algorithm was used to generate the vascular skeleton and then graph-based techniques were used to transform the skeleton to a directed graph with bifurcations and endpoints as nodes in the graph. Machine-learning classifiers were used to automatically prune false vascular structures from the directed graph. Semantic labeling of portions of the graph with pulmonary anatomy (pulmonary trunk and left and right pulmonary arteries) was achieved with high accuracy (percent correct ⩾0.97). Least-squares cubic splines of the centerline paths between nodes were computed and were used to extract morphological features of the vascular tree. The graphs were used to automatically obtain diameter measurements that had high correlation (r⩾0.77) with manual measurements made from the same arteries. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.07.002
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable heuristics and a good measure for guiding the search are essential for discovering interesting biclusters in an expression matrix. Nevertheless, not all existing biclustering approaches base their search on evaluation measures for biclusters. There exists a diverse set of biclustering tools that follow different strategies and algorithmic concepts which guide the search towards meaningful results. In this paper we present a extensive survey of biclustering approaches, classifying them into two categories according to whether or not use evaluation metrics within the search method: Biclustering Algorithms based on Evaluation Measures and Non Metric-based Biclustering Algorithms. In both cases, they have been classified according to the type of meta-heuristics which they are based on. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.028
  • [Show abstract] [Hide abstract]
    ABSTRACT: Schizophrenia (SCZ) is a common complex disorder with poorly understood mechanisms and no effective drug treatments. Despite the high prevalence and vast unmet medical need represented by the disease, many drug companies have moved away from the development of drugs for SCZ. Therefore, alternative strategies are needed for the discovery of truly innovative drug treatments for SCZ. Here, we present a disease phenome-driven computational drug repositioning approach for SCZ. We developed a novel drug repositioning system, PhenoPredict, by inferring drug treatments for SCZ from diseases that are phenotypically related to SCZ. The key to PhenoPredict is the availability of a comprehensive drug treatment knowledge base that we recently constructed. PhenoPredict retrieved all 18 FDA-approved SCZ drugs and ranked them highly (recall=1.0, and average ranking of 8.49%). When compared to PREDICT, one of the most comprehensive drug repositioning systems currently available, in novel predictions, PhenoPredict represented clear improvements over PREDICT in Precision-Recall (PR) curves, with a significant 98.8% improvement in the area under curve (AUC) of the PR curves. In addition, we discovered many drug candidates with mechanisms of action fundamentally different from traditional antipsychotics, some of which had published literature evidence indicating their treatment benefits in SCZ patients. In summary, although the fundamental pathophysiological mechanisms of SCZ remain unknown, integrated systems approaches to studying phenotypic connections among diseases may facilitate the discovery of innovative SCZ drugs. Copyright © 2015. Published by Elsevier Inc.
    Journal of Biomedical Informatics 07/2015; DOI:10.1016/j.jbi.2015.06.027