[show abstract][hide abstract] ABSTRACT: Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.
[show abstract][hide abstract] ABSTRACT: Investigators perform multi-site functional magnetic resonance imaging studies to increase statistical power, to enhance generalizability, and to improve the likelihood of sampling relevant subgroups. Yet undesired site variation in imaging methods could off-set these potential advantages. We used variance components analysis to investigate sources of variation in the blood oxygen level-dependent (BOLD) signal across four 3-T magnets in voxelwise and region-of-interest (ROI) analyses. Eighteen participants traveled to four magnet sites to complete eight runs of a working memory task involving emotional or neutral distraction. Person variance was more than 10 times larger than site variance for five of six ROIs studied. Person-by-site interactions, however, contributed sizable unwanted variance to the total. Averaging over runs increased between-site reliability, with many voxels showing good to excellent between-site reliability when eight runs were averaged and regions of interest showing fair to good reliability. Between-site reliability depended on the specific functional contrast analyzed in addition to the number of runs averaged. Although median effect size was correlated with between-site reliability, dissociations were observed for many voxels. Brain regions where the pooled effect size was large but between-site reliability was poor were associated with reduced individual differences. Brain regions where the pooled effect size was small but between-site reliability was excellent were associated with a balance of participants who displayed consistently positive or consistently negative BOLD responses. Although between-site reliability of BOLD data can be good to excellent, acquiring highly reliable data requires robust activation paradigms, ongoing quality assurance, and careful experimental control.
[show abstract][hide abstract] ABSTRACT: Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation-maximization (EM) method for semisupervised document classification (Machine Learning 39:103-134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies-Bouldin cluster validity index (IEEE Transactions on Pattern Analysis and Machine Intelligence 1:224-227, 1979), rivaled the state-of-the-art transductive support vector machines (TSVM) (Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML)). Moreover, the multi-mixture per class EM variant refined search results more quickly with more than one order of magnitude improvement in execution time compared with TSVM. A second tool, CRFNER, uses conditional random fields (Lafferty et al. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-2001) to recognize 15 types of named entities from schizophrenia abstracts outperforming ABNER (Settles (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA)) in biological named entity recognition and reaching F(1) performance of 82.5% on the second set of named entities.
Methods in molecular biology (Clifton, N.J.) 02/2009; 569:173-96.
[show abstract][hide abstract] ABSTRACT: Due to the increasing need for subject privacy, the ability to deidentify structural MR images so that they do not provide full facial detail is desirable. A program was developed that uses models of nonbrain structures for removing potentially identifying facial features. When a novel image is presented, the optimal linear transform is computed for the input volume (Fischl et al. : Neuron 33:341-355; Fischl et al. : Neuroimage 23 (Suppl 1):S69-S84). A brain mask is constructed by forming the union of all voxels with nonzero probability of being brain and then morphologically dilated. All voxels outside the mask with a nonzero probability of being a facial feature are set to 0. The algorithm was applied to 342 datasets that included two different T1-weighted pulse sequences and four different diagnoses (depressed, Alzheimer's, and elderly and young control groups). Visual inspection showed none had brain tissue removed. In a detailed analysis of the impact of defacing on skull-stripping, 16 datasets were bias corrected with N3 (Sled et al. : IEEE Trans Med Imaging 17:87-97), defaced, and then skull-stripped using either a hybrid watershed algorithm (Ségonne et al. : Neuroimage 22:1060-1075, in FreeSurfer) or Brain Surface Extractor (Sandor and Leahy : IEEE Trans Med Imaging 16:41-54; Shattuck et al. : Neuroimage 13:856-876); defacing did not appreciably influence the outcome of skull-stripping. Results suggested that the automatic defacing algorithm is robust, efficiently removes nonbrain tissue, and does not unduly influence the outcome of the processing methods utilized; in some cases, skull-stripping was improved. Analyses support this algorithm as a viable method to allow data sharing with minimal data alteration within large-scale multisite projects.
Human Brain Mapping 10/2007; 28(9):892-903. · 6.88 Impact Factor
[show abstract][hide abstract] ABSTRACT: Performance of automated methods to isolate brain from nonbrain tissues in magnetic resonance (MR) structural images may be influenced by MR signal inhomogeneities, type of MR image set, regional anatomy, and age and diagnosis of subjects studied. The present study compared the performance of four methods: Brain Extraction Tool (BET; Smith : Hum Brain Mapp 17:143-155); 3dIntracranial (Ward  Milwaukee: Biophysics Research Institute, Medical College of Wisconsin; in AFNI); a Hybrid Watershed algorithm (HWA, Segonne et al.  Neuroimage 22:1060-1075; in FreeSurfer); and Brain Surface Extractor (BSE, Sandor and Leahy  IEEE Trans Med Imag 16:41-54; Shattuck et al.  Neuroimage 13:856-876) to manually stripped images. The methods were applied to uncorrected and bias-corrected datasets; Legacy and Contemporary T1-weighted image sets; and four diagnostic groups (depressed, Alzheimer's, young and elderly control). To provide a criterion for outcome assessment, two experts manually stripped six sagittal sections for each dataset in locations where brain and nonbrain tissue are difficult to distinguish. Methods were compared on Jaccard similarity coefficients, Hausdorff distances, and an Expectation-Maximization algorithm. Methods tended to perform better on contemporary datasets; bias correction did not significantly improve method performance. Mesial sections were most difficult for all methods. Although AD image sets were most difficult to strip, HWA and BSE were more robust across diagnostic groups compared with 3dIntracranial and BET. With respect to specificity, BSE tended to perform best across all groups, whereas HWA was more sensitive than other methods. The results of this study may direct users towards a method appropriate to their T1-weighted datasets and improve the efficiency of processing for large, multisite neuroimaging studies.
Human Brain Mapping 03/2006; 27(2):99-113. · 6.88 Impact Factor