[Show abstract][Hide abstract] ABSTRACT: Crystallization is the most serious bottleneck in high-throughput protein-structure determination by diffraction methods. We have used data mining of the large-scale experimental results of the Northeast Structural Genomics Consortium and experimental folding studies to characterize the biophysical properties that control protein crystallization. This analysis leads to the conclusion that crystallization propensity depends primarily on the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. We identify specific sequence features that correlate with crystallization propensity and that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the amino acid-sequence properties of human versus eubacterial proteins, which likely reflect differences in biophysical properties, including crystallization propensity. Our thermodynamic measurements do not generally support previous claims regarding correlations between sequence properties and protein stability.
[Show abstract][Hide abstract] ABSTRACT: An abundance of protein structures emerging from structural genomics and the Protein Structure Initiative (PSI) are not amenable to ready functional assignment because of a lack of sequence and structural homology to proteins of known function. We describe a high-throughput NMR methodology (FAST-NMR) to annotate the biological function of novel proteins through the structural and sequence analysis of protein-ligand interactions. This is based on basic tenets of biochemistry where proteins with similar functions will have similar active sites and exhibit similar ligand binding interactions, despite global differences in sequence and structure. Protein-ligand interactions are determined through a tiered NMR screen using a library composed of compounds with known biological activity. A rapid co-structure is determined by combining the experimental identification of the ligand binding site from NMR chemical shift perturbations with the protein-ligand docking program AutoDock. Our CPASS (Comparison of Protein Active Site Structures) software and database are then used to compare this active site with proteins of known function. The methodology is demonstrated using unannotated protein SAV1430 from Staphylococcus aureus.
Journal of the American Chemical Society 12/2006; 128(47):15292-9. DOI:10.1021/ja0651759 · 11.44 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Recent technological advances and experimental techniques have contributed to an increasing number and size of NMR datasets. In order to scale up productivity, laboratory information management systems for handling these extensive data need to be designed and implemented. The SPINS (Standardized ProteIn Nmr Storage) Laboratory Information Management System (LIMS) addresses these needs by providing an interface for archival of complete protein NMR structure determinations, together with functionality for depositing these data to the public BioMagResBank (BMRB). The software tracks intermediate files during each step of an NMR structure-determination process, including: data collection, data processing, resonance assignments, resonance assignment validation, structure calculation, and structure validation. The underlying SPINS data dictionary allows for the integration of various third party NMR data processing and analysis software, enabling users to launch programs they are accustomed to using for each step of the structure determination process directly out of the SPINS user interface.
Proteins Structure Function and Bioinformatics 03/2006; 62(4):843-51. DOI:10.1002/prot.20840 · 2.92 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Recent developments provide automated analysis of NMR assignments and three-dimensional (3D) structures of proteins. These approaches are generally applicable to proteins ranging from about 50 to 150 amino acids. In this chapter, we summarize progress by the Northeast Structural Genomics Consortium in standardizing the NMR data collection process for protein structure determination and in building an integrated platform for automated protein NMR structure analysis. Our integrated platform includes the following principal steps: (1) standardized NMR data collection, (2) standardized data processing (including spectral referencing and Fourier transformation), (3) automated peak picking and peak list editing, (4) automated analysis of resonance assignments, (5) automated analysis of NOESY data together with 3D structure determination, and (6) methods for protein structure validation. In particular, the software AutoStructure for automated NOESY data analysis is described in this chapter, together with a discussion of practical considerations for its use in high-throughput structure production efforts. The critical area of data quality assessment has evolved significantly over the past few years and involves evaluation of both intermediate and final peak lists, resonance assignments, and structural information derived from the NMR data. Methods for quality control of each of the major automated analysis steps in our platform are also discussed. Despite significant remaining challenges, when good quality data are available, automated analysis of protein NMR assignments and structures with this platform is both fast and reliable.
Methods in Enzymology 02/2005; 394:111-41. DOI:10.1016/S0076-6879(05)94005-6 · 2.19 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this chapter we describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples using Escherichia coli host vectors. The platform is centered on 6X-His affinity-tagged protein constructs, allowing for a similar purification procedure for most targets, and the implementation of high-throughput parallel methods. In most cases, these affinity-purified proteins are sufficiently homogeneous that a single subsequent gel filtration chromatography step is adequate to produce protein preparations that are greater than 98% pure. Using this platform, over 1000 different proteins have been cloned, expressed, and purified in tens of milligram quantities over the last 36-month period (see Summary Statistics for All Targets, ). Our experience using a hierarchical multiplex expression and purification strategy, also described in this chapter, has allowed us to achieve success in producing not only protein samples but also many three-dimensional structures. As of December 2004, the NESG Consortium has deposited over 145 new protein structures to the Protein Data Bank (PDB); about two-thirds of these protein samples were produced by the NESG Protein Production Facility described here. The methods described here have proven effective in producing quality samples of both eukaryotic and prokaryotic proteins. These improved robotic and?or parallel cloning, expression, protein production, and biophysical screening technologies will be of broad value to the structural biology, functional proteomics, and structural genomics communities.
Methods in Enzymology 02/2005; 394:210-43. DOI:10.1016/S0076-6879(05)94008-1 · 2.19 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Modern protein NMR spectroscopy laboratories have a rapidly growing need for an easily queried local archival system of raw experimental NMR datasets. SPINS (Standardized ProteIn Nmr Storage) is an object-oriented relational database that provides facilities for high-volume NMR data archival, organization of analyses, and dissemination of results to the public domain by automatic preparation of the header files required for submission of data to the BioMagResBank (BMRB). The current version of SPINS coordinates the process from data collection to BMRB deposition of raw NMR data by standardizing and integrating the storage and retrieval of these data in a local laboratory file system. Additional facilities include a data mining query tool, graphical database administration tools, and a NMRStar v2. 1.1 file generator. SPINS also includes a user-friendly internet-based graphical user interface, which is optionally integrated with Varian VNMR NMR data collection software. This paper provides an overview of the data model underlying the SPINS database system, a description of its implementation in Oracle, and an outline of future plans for the SPINS project.