[show abstract][hide abstract] ABSTRACT: Proteomics approaches enable interrogation of large numbers of proteins to provide a more comprehensive understanding of biological systems. High throughput proteomics typically utilizes liquid chromatography – mass spectrometry technology for data acquisition. Bioinformatic analysis tools are essential to manage and mine resulting high volume proteomics data sets. Data analysis is a current bottleneck for many proteomics researchers because complete and freely accessible already-developed systems are not available. In addition, most analysis systems require experienced bioinformatician input immediately upon data acquisition. For proteomics to achieve greatest impact in biology, data analysis must be more efficient and effective. We present the Proteome Discovery Pipeline (PDP), a web-based analysis platform that provides proteomics data analysis without requirement for specialized hardware or input from bioinformatics specialists for initial data analyses. Function-alities of the PDP include spectrum visualization, deconvolution, alignment, normalization, statistical significance tests, and pattern recognition. The PDP provides proteomic researchers with a user-friendly web-based data analysis package that can handle multiple file formats and facilitates data analysis from multiple proteomics technology platforms. The sys-tem is flexible and extensible to enable further development. In this paper the PDP development is described and the sys-tem capabilities are illustrated through a case study of human plasma proteomics data analysis.
[show abstract][hide abstract] ABSTRACT: The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.