Provenance in neuroimaging

Laboratory of Neuro Imaging (LONI), Department of Neurology, University of California Los Angeles School of Medicine, 635 Charles E. Young Drive South, Suite 225, Los Angeles, CA 90095-7334, USA.
NeuroImage (Impact Factor: 6.36). 08/2008; 42(1):178-95. DOI: 10.1016/j.neuroimage.2008.04.186
Source: PubMed


Provenance, the description of the history of a set of data, has grown more important with the proliferation of research consortia-related efforts in neuroimaging. Knowledge about the origin and history of an image is crucial for establishing data and results quality; detailed information about how it was processed, including the specific software routines and operating systems that were used, is necessary for proper interpretation, high fidelity replication and re-use. We have drafted a mechanism for describing provenance in a simple and easy to use environment, alleviating the burden of documentation from the user while still providing a rich description of an image's provenance. This combination of ease of use and highly descriptive metadata should greatly facilitate the collection of provenance and subsequent sharing of data.

7 Reads
  • Source
    • "The final stage of a pipeline life cycle is provenance tracking, which represents the comprehensive recording of the processing steps applied to the datasets. This can also be extended to the archiving of the computing environment used for production (e.g., the version of the software that was used for processing), and the origin of the datasets that were used as inputs (MacKenzie-Graham et al., 2008). Provenance is a critical step to achieve reproducible research, which is itself considered as a cornerstone of the scientific method (Mesirov, 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.
    Frontiers in Neuroinformatics 04/2012; 6:7. DOI:10.3389/fninf.2012.00007 · 3.26 Impact Factor
  • Source
    • "These multivariate neuroimaging data can be processed and analyzed in a huge variety of ways using algorithms that are in a continual state of evolution and improvement. As a result, understanding complete data and processing provenance [10] across these diverse data sets remains a significant neuroinformatics challenge for the imaging community. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT: Neuroimaging researchers have developed rigorous community data and metadata standards that encourage meta-analysis as a method for establishing robust and meaningful convergence of knowledge of human brain structure and function. Capitalizing on these standards, the BrainMap project offers databases, software applications, and other associated tools for supporting and promoting quantitative coordinate-based meta-analysis of the structural and functional neuroimaging literature. In this report, we describe recent technical updates to the project and provide an educational description for performing meta-analyses in the BrainMap environment. The BrainMap project will continue to evolve in response to the meta-analytic needs of biomedical researchers in the structural and functional neuroimaging communities. Future work on the BrainMap project regarding software and hardware advances are also discussed.
    BMC Research Notes 09/2011; 4(349):349. DOI:10.1186/1756-0500-4-349
  • Source
    • "In neuroimaging studies, data provenance, or the history of how the data were acquired and subsequently processed, is often discussed but seldom implemented [24]. Recently, several groups have proposed provenance challenges in order to evaluate the status of various provenance models [25]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at
    PLoS ONE 09/2010; 5(9). DOI:10.1371/journal.pone.0013070 · 3.23 Impact Factor
Show more


7 Reads
Available from