Article

Applications of the pipeline environment for visual informatics and genomics computations.

Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, Los Angeles, CA 90095, USA.
BMC Bioinformatics (Impact Factor: 2.67). 07/2011; 12:304. DOI: 10.1186/1471-2105-12-304
Source: PubMed

ABSTRACT Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.
This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.
The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators--experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.

1 Bookmark
 · 
195 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data.
    Brain Imaging and Behavior 08/2013; · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Regional cortical thickness alterations have been reported in many chronic inflammatory and painful conditions, including inflammatory bowel diseases (IBD) and irritable bowel syndrome (IBS), even though the mechanisms underlying such neuroplastic changes remain poorly understood. In order to better understand the mechanisms contributing to grey matter changes, the current study sought to identify the differences in regional alterations in cortical thickness between healthy controls and two chronic visceral pain syndromes, with and without chronic gut inflammation. 41 healthy controls, 11 IBS subjects with diarrhea, and 16 subjects with ulcerative colitis (UC) underwent high-resolution T1-weighted magnetization-prepared rapid acquisition gradient echo scans. Structural image preprocessing and cortical thickness analysis within the region of interests were performed by using the Laboratory of Neuroimaging Pipeline. Group differences were determined using the general linear model and linear contrast analysis. The two disease groups differed significantly in several cortical regions. UC subjects showed greater cortical thickness in anterior cingulate cortical subregions, and in primary somatosensory cortex compared with both IBS and healthy subjects. Compared with healthy subjects, UC subjects showed lower cortical thickness in orbitofrontal cortex and in mid and posterior insula, while IBS subjects showed lower cortical thickness in the anterior insula. Large effects of correlations between symptom duration and thickness in the orbitofrontal cortex and postcentral gyrus were only observed in UC subjects. The findings suggest that the mechanisms underlying the observed gray matter changes in UC subjects represent a consequence of peripheral inflammation, while in IBS subjects central mechanisms may play a primary role.
    PLoS ONE 01/2014; 9(1):e84564. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.
    Frontiers in Genetics 01/2014; 5:16.

Full-text (3 Sources)

Download
50 Downloads
Available from
May 30, 2014