Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments.
ABSTRACT Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.
- SourceAvailable from: qa.iis.sinica.edu.tw[show abstract] [hide abstract]
ABSTRACT: In this study, we present a fully automated tool, called IDEAL-Q, for label-free quantitation analysis. It accepts raw data in the standard mzXML format as well as search results from major search engines, including Mascot, SEQUEST, and X!Tandem, as input data. To quantify as many identified peptides as possible, IDEAL-Q uses an efficient algorithm to predict the elution time of a peptide unidentified in a specific LC-MS/MS run but identified in other runs. Then, the predicted elution time is used to detect peak clusters of the assigned peptide. Detected peptide peaks are processed by statistical and computational methods and further validated by signal-to-noise ratio, charge state, and isotopic distribution criteria (SCI validation) to filter out noisy data. The performance of IDEAL-Q has been evaluated by several experiments. First, a serially diluted protein mixed with Escherichia coli lysate showed a high correlation with expected ratios and demonstrated good linearity (R(2) = 0.996). Second, in a biological replicate experiment on the THP-1 cell lysate, IDEAL-Q quantified 87% (1,672 peptides) of all identified peptides, surpassing the 45.7% (909 peptides) achieved by the conventional identity-based approach, which only quantifies peptides identified in all LC-MS/MS runs. Manual validation on all 11,940 peptide ions in six replicate LC-MS/MS runs revealed that 97.8% of the peptide ions were correctly aligned, and 93.3% were correctly validated by SCI. Thus, the mean of the protein ratio, 1.00 +/- 0.05, demonstrates the high accuracy of IDEAL-Q without human intervention. Finally, IDEAL-Q was applied again to the biological replicate experiment but with an additional SDS-PAGE step to show its compatibility for label-free experiments with fractionation. For flexible workflow design, IDEAL-Q supports different fractionation strategies and various normalization schemes, including multiple spiked internal standards. User-friendly interfaces are provided to facilitate convenient inspection, validation, and modification of quantitation results. In summary, IDEAL-Q is an efficient, user-friendly, and robust quantitation tool. It is available for download.Molecular & Cellular Proteomics 09/2009; 9(1):131-44. · 7.25 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: In the last ten years, the field of proteomics has expanded at a rapid rate. A range of exciting new technology has been developed and enthusiastically applied to an enormous variety of biological questions. However, the degree of stringency required in proteomic data generation and analysis appears to have been underestimated. As a result, there are likely to be numerous published findings that are of questionable quality, requiring further confirmation and/or validation. This manuscript outlines a number of key issues in proteomic research, including those associated with experimental design, differential display and biomarker discovery, protein identification and analytical incompleteness. In an effort to set a standard that reflects current thinking on the necessary and desirable characteristics of publishable manuscripts in the field, a minimal set of guidelines for proteomics research is then described. These guidelines will serve as a set of criteria which editors of PROTEOMICS will use for assessment of future submissions to the Journal.PROTEOMICS 02/2006; 6(1):4-8. · 4.13 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Assembling protein complexes and protein interaction networks from affinity purification-based proteomics data sets remains a challenge. When little a priori knowledge of the complexes exists, it is difficult to place proteins in the proper locations and evaluate the results of clustering approaches. Here we have systematically compared multiple hierarchical and partitioning clustering approaches using a well-characterized but highly complex human protein interaction network data set centered around the conserved AAA+ ATPases Tip49a and Tip49b. This network provides a challenge to clustering algorithms because Tip49a and Tip49b are present in four distinct complexes, the network contains modules, and the network has multiple attachments. We compared the use of binary data, quantitative proteomics data in the form of normalized spectral abundance factors, and the Z-score normalization. In our analysis, a partitioning approach indicated the major modules in a network. Next, while Euclidian distance was sensitive to scaling, with data transformation, all the attachments in a data set were recovered in one branch of a dendrogram. Finally, when Pearson correlation and hierarchical clustering were used, complexes were well separated and their attachments were placed in the proper locations. Each of these three approaches provided distinct information useful for assembly of a network of multiple protein complexes.Journal of Proteome Research 04/2009; 8(6):2944-52. · 5.06 Impact Factor