[show abstract][hide abstract] ABSTRACT: Fibrinogen is a large heterogeneous aggregation/degradation-prone protein playing a central role in blood coagulation and associated pathologies, whose structure is not completely resolved. When a high-molecular-weight fraction was analyzed by size-exclusion high-performance liquid chromatography/small-angle X-ray scattering (HPLC-SAXS), several composite peaks were apparent and because of the stickiness of fibrinogen the analysis was complicated by severe capillary fouling. Novel SAS analysis tools developed as a part of the UltraScan Solution Modeler (US-SOMO; http://somo.uthscsa.edu/), an open-source suite of utilities with advanced graphical user interfaces whose initial goal was the hydrodynamic modeling of biomacromolecules, were implemented and applied to this problem. They include the correction of baseline drift due to the accumulation of material on the SAXS capillary walls, and the Gaussian decomposition of non-baseline-resolved HPLC-SAXS elution peaks. It was thus possible to resolve at least two species co-eluting under the fibrinogen main monomer peak, probably resulting from in-column degradation, and two others under an oligomers peak. The overall and cross-sectional radii of gyration, molecular mass and mass/length ratio of all species were determined using the manual or semi-automated procedures available within the US-SOMO SAS module. Differences between monomeric species and linear and sideways oligomers were thus identified and rationalized. This new US-SOMO version additionally contains several computational and graphical tools, implementing functionalities such as the mapping of residues contributing to particular regions of P(r), and an advanced module for the comparison of primary I(q) versus q data with model curves computed from atomic level structures or bead models. It should be of great help in multi-resolution studies involving hydrodynamics, solution scattering and crystallographic/NMR data.
[show abstract][hide abstract] ABSTRACT: UltraScan Solution Modeler (US-SOMO) computes hydrodynamic parameters and small-angle scattering data from biological macromolecular structural representations and compares them with experimental data for structural determination and validation. At XSEDE 12, a GUI integrated gateway was introduced to offload large computations to various HPC resources. The gateway was directly integrated into the Qt/GUI based software to allow the users a seamless experience. The software is available as source code or precompiled for Apple Mac OSX, MS-Windows and Linux. Current cluster resources include TACC Lonestar and Stampede, SDSC Trestles and a 256 CPU cluster local to the University of Texas Health Science Center at San Antonio. The simplicity of design allowed the implementation of a new method of modeling small angle scattering data that provided new scientific insights and was presented at the 2012 international small-angle scattering conference. Since introduction, multiple workshops have been taught and users are beginning to utilize the gateway in their biological research.
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, San Diego, California; 01/2013
[show abstract][hide abstract] ABSTRACT: The Ultrascan gateway provides a user friendly web interface for evaluation of experimental analytical ultracentrifuge data using the UltraScan modeling software. The analysis tasks are executed on the TeraGrid and campus computational resources. The gateway is highly successful in providing the service to end users and consistently listed among the top five gateway community account usage. This continued growth and challenges of sustainability needed additional support to revisit the job management architecture. In this paper we describe the enhancements to the Ultrascan gateway middleware infrastructure provided through the TeraGrid Advanced User Support program. The advanced support efforts primarily focused on a) expanding the TeraGrid resources incorporate new machines; b) upgrading UltraScan's job management interfaces to use GRAM5 in place of the deprecated WS-GRAM; c) providing realistic usage scenarios to the GRAM5 and INCA resource testing and monitoring teams; d) creating general-purpose, resource-specific, and UltraScan-specific error handling and fault tolerance strategies; and e) providing forward and backward compatibility for the job management system between UltraScan's version 2 (currently in production) and version 3 (expected to be released mid-2011).
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery; 01/2011
[show abstract][hide abstract] ABSTRACT: We compare here the utility of sedimentation velocity (SV) to sedimentation equilibrium (SE) analysis for the characterization of reversible systems. Genetic algorithm optimization in UltraScan is used to optimize the model and to obtain solution properties of all components present in the system. We apply our method to synthetic and experimental data, and suggest limits for the accessible kinetic range. We conclude that equilibrium constants obtained from SV and SE analysis are equivalent, but that SV experiments provide better confidence for the K(d), can better account for the presence of contaminants and provide additional information including rate constants and shape parameters.
[show abstract][hide abstract] ABSTRACT: The US-SOMO suite provides a flexible interface for accurately computing solution parameters from 3D structures of biomacromolecules through bead-modeling approaches. We present an extended analysis of the influence of accessible surface area screening, overlap reduction routines, and approximations for non-coded residues and missing atoms on the computed parameters for models built by the residue-to-bead direct correspondence and the cubic grid methods. Importantly, by taking the theoretical hydration into account at the atomic level, the performance of the grid-type models becomes comparable or exceeds that of the corresponding hydrated residue-to-bead models.
[show abstract][hide abstract] ABSTRACT: Solving large non-negatively constrained least squares systems is frequently used in the physical sciences to estimate model parameters which best fit experimental data. Analytical Ultracentrifugation (AUC) is an important hydrodynamic experimental technique used in biophysics to characterize macromolecules and to determine parameters such as molecular weight and shape. We previously developed a parallel divide and conquer method to facilitate solving the large systems obtained from AUC experiments. New AUC instruments equipped with multi-wavelength (MWL) detectors have recently increased the data sizes by three orders of magnitude. Analyzing the MWL data requires significant compute resources. To better utilize these resources, we introduce a procedure allowing the researcher to optimize the divide and conquer scheme along a continuum from minimum wall time to minimum compute service units. We achieve our results by implementing a preprocessing stage performed on a local workstation before job submission.
[show abstract][hide abstract] ABSTRACT: Progress in analytical ultracentrifugation (AUC) has been hindered by obstructions to hardware innovation and by software incompatibility. In this paper, we announce and outline the Open AUC Project. The goals of the Open AUC Project are to stimulate AUC innovation by improving instrumentation, detectors, acquisition and analysis software, and collaborative tools. These improvements are needed for the next generation of AUC-based research. The Open AUC Project combines on-going work from several different groups. A new base instrument is described, one that is designed from the ground up to be an analytical ultracentrifuge. This machine offers an open architecture, hardware standards, and application programming interfaces for detector developers. All software will use the GNU Public License to assure that intellectual property is available in open source format. The Open AUC strategy facilitates collaborations, encourages sharing, and eliminates the chronic impediments that have plagued AUC innovation for the last 20 years. This ultracentrifuge will be equipped with multiple and interchangeable optical tracks so that state-of-the-art electronics and improved detectors will be available for a variety of optical systems. The instrument will be complemented by a new rotor, enhanced data acquisition and analysis software, as well as collaboration software. Described here are the instrument, the modular software components, and a standardized database that will encourage and ease integration of data analysis and interpretation software.
Biophysics of Structure and Mechanism 04/2009; 39(3):347-59. · 2.44 Impact Factor
[show abstract][hide abstract] ABSTRACT: The interpretation of solution hydrodynamic data in terms of macromolecular structural parameters is not a straightforward task. Over the years, several approaches have been developed to cope with this problem, the most widely used being bead modeling in various flavors. We report here the implementation of the SOMO (SOlution MOdeller; Rai et al. in Structure 13:723-734, 2005) bead modeling suite within one of the most widely used analytical ultracentrifugation data analysis software packages, UltraScan (Demeler in Modern analytical ultracentrifugation: techniques and methods, Royal Society of Chemistry, UK, 2005). The US-SOMO version is now under complete graphical interface control, and has been freed from several constraints present in the original implementation. In the direct beads-per-atoms method, virtually any kind of residue as defined in the Protein Data Bank (e.g., proteins, nucleic acids, carbohydrates, prosthetic groups, detergents, etc.) can be now represented with beads whose number, size and position are all defined in user-editable tables. For large structures, a cubic grid method based on the original AtoB program (Byron in Biophys J 72:408-415, 1997) can be applied either directly on the atomic structure, or on a previously generated bead model. The hydrodynamic parameters are then computed in the rigid-body approximation. An extensive set of tests was conducted to further validate the method, and the results are presented here. Owing to its accuracy, speed, and versatility, US-SOMO should allow to fully take advantage of the potential of solution hydrodynamics as a complement to higher resolution techniques in biomacromolecular modeling.
Biophysics of Structure and Mechanism 03/2009; 39(3):423-35. · 2.44 Impact Factor
[show abstract][hide abstract] ABSTRACT: We report a model-independent analysis approach for fitting sedimentation velocity data which permits simultaneous determination of shape and molecular weight distributions for mono- and polydisperse solutions of macromolecules. Our approach allows for heterogeneity in the frictional domain, providing a more faithful description of the experimental data for cases where frictional ratios are not identical for all components. Because of increased accuracy in the frictional properties of each component, our method also provides more reliable molecular weight distributions in the general case. The method is based on a fine grained two-dimensional grid search over s and f/f (0), where the grid is a linear combination of whole boundary models represented by finite element solutions of the Lamm equation with sedimentation and diffusion parameters corresponding to the grid points. A Monte Carlo approach is used to characterize confidence limits for the determined solutes. Computational algorithms addressing the very large memory needs for a fine grained search are discussed. The method is suitable for globally fitting multi-speed experiments, and constraints based on prior knowledge about the experimental system can be imposed. Time- and radially invariant noise can be eliminated. Serial and parallel implementations of the method are presented. We demonstrate with simulated and experimental data of known composition that our method provides superior accuracy and lower variance fits to experimental data compared to other methods in use today, and show that it can be used to identify modes of aggregation and slow polymerization.
Biophysics of Structure and Mechanism 03/2009; 39(3):405-14. · 2.44 Impact Factor
[show abstract][hide abstract] ABSTRACT: A computational approach for fitting sedimentation velocity experiments from an analytical ultracentrifuge in a model-independent fashion is presented. This chapter offers a recipe for obtaining high-resolution information for both the shape and the molecular weight distributions of complex mixtures that are heterogeneous in shape and molecular weight and provides suggestions for experimental design to optimize information content. A combination of three methods is used to find the solution most parsimonious in parameters and to verify the statistical confidence intervals of the determined parameters. A supercomputer implementation with a MySQL database back end is integrated into the UltraScan analysis software. The UltraScan LIMS Web portal is used to perform the calculations through a Web interface. The performance and limitations of the method when employed for the analysis of complex mixtures are demonstrated using both simulated data and experimental data characterizing amyloid aggregation.
Methods in enzymology 02/2009; 454:87-113. · 1.90 Impact Factor
[show abstract][hide abstract] ABSTRACT: Frequently in the physical sciences experimental data are analyzed to determine model parameters using techniques known as parameter estimation. Eliminating the eects of noise from experimental data often involves Tikhonov or Maximum-Entropy regularization. These methods in- troduce a bias which smoothes the solution. In the prob- lems considered here, the exact answer is sharp, containing a sparse set of parameters. Therefore, it is desirable to nd the simplest set of model parameters for the data with an equivalent goodness-of-t. This paper explains how to bias the solution towards a parsimonious model with a careful application of Genetic Algorithms. A method of representa- tion, initialization and mutation is introduced to ecien tly nd this model. The results are compared with results from two other methods on simulated data with known content. Our method is shown to be the only one to achieve the desired results. Analysis of Analytical Ultracentrifugation sedimentation velocity experimental data is the primary ex- ample application.
Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, July 7-11, 2007; 01/2007
[show abstract][hide abstract] ABSTRACT: Sedimentation experiments can provide alarge amount of information about the composition
of asample, and the properties of each component contained in the sample. To extract the details
of the composition and the component properties, experimental data can be described by amathematical
model, which can then be fitted to the data. If the model is nonlinear in the parameters, the parameter
adjustments are typically performed by anonlinear least squares optimization algorithm. For
models with many parameters, the error surface of this optimization often becomes very complex, the
parameter solution tends to become trapped in alocal minimum and the method may fail to converge.
We introduce here astochastic optimization approach for sedimentation velocity experiments utilizing
genetic algorithms which is immune to such convergence traps and allows high-resolution fitting of
nonlinear multi-component sedimentation models to yield distributions for sedimentation and diffusion
coefficients, molecular weights, and partial concentrations.
[show abstract][hide abstract] ABSTRACT: We present a novel divide and conquer method for parallelizing a large scale multivariate linear optimization problem, which is commonly solved using a sequential algorithm with the entire parameter space as the input. The optimization solves a large parameter estimation problem where the result is sparse in the parameters. By partitioning the parameters and the associated computations, our technique overcomes memory constraints when used in the context of a single workstation and achieves high processor utilization when large workstation, clusters are used. We implemented this technique in a widely used software package for the analysis of a biophysics problem, which is representative for a large class of problems in the physical sciences. We evaluate the performance of the proposed method on a 512-processor cluster and offer an analytical model for predicting the performance of the algorithm.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA; 01/2006
[show abstract][hide abstract] ABSTRACT: We present a novel divide and conquer method for parallelizing a large scale multivariate linear optimization problem, which is commonly solved using a sequential algorithm with the entire parameter space as the input. The optimization solves a large parameter estimation problem where the result is sparse in the parameters. By partitioning the parameters and the associated computations, our technique overcomes memory constraints when used in the context of a single workstation and achieves high processor utilization when large workstation clusters are used. We implemented this technique in a widely used software package for the analysis of a biophysics problem, which is representative for a large class of problems in the physical sciences. We evaluate the performance of the proposed method on a 512-processor cluster and offer an analytical model for predicting the performance of the algorithm
SC 2006 Conference, Proceedings of the ACM/IEEE; 01/2006
[show abstract][hide abstract] ABSTRACT: High resolution analysis approaches for sedimentation experiments have recently been developed that promise to provide a detailed
description of heterogeneous samples by identifying both shape and molecular weight distributions. In this study, we describe
the effect experimental noise has on the accuracy and precision of such determinations and offer a stochastic Monte Carlo
approach, which reliably quantifies the effect of noise by determining the confidence intervals for the parameters that describe
each solute. As a result, we can now predict reliable confidence intervals for determined parameters. We also explore the
effect of various experimental parameters on the confidence intervals and provide suggestions for improving the statistics
by applying a few practical rules for the design of sedimentation experiments.
Colloid and Polymer Science 286(2):129-137. · 2.16 Impact Factor
[show abstract][hide abstract] ABSTRACT: UltraScan Solution Modeler (US-SOMO) processes atomic and lower-resolution bead model representations of biological and other macromolecules to compute various hydrodynamic parameters, such as the sedimentation and diffusion coefficients, relaxation times and intrinsic viscosity, and small angle scattering curves, that contribute to our understanding of molecular structure in solution. Knowledge of biological macromolecules' structure aids researchers in understanding their function as a path to disease prevention and therapeutics for conditions such as cancer, thrombosis, Alzheimer's disease and others. US-SOMO provides a convergence of experimental, computational, and modeling techniques, in which detailed molecular structure and properties are determined from data obtained in a range of experimental techniques that, by themselves, give incomplete information. Our goal in this work is to develop the infrastructure and user interfaces that will enable a wide range of scientists to carry out complicated experimental data analysis techniques on XSEDE. Our user community predominantly consists of biophysics and structural biology researchers. A recent search on PubMed reports 9,205 papers in the decade referencing the techniques we support. We believe our software will provide these researchers a convenient and unique framework to refine structures, thus advancing their research. The computed hydrodynamic parameters and scattering curves are screened against experimental data, effectively pruning potential structures into equivalence classes. Experimental methods may include analytical ultracentrifugation, dynamic light scattering, small angle X-ray and neutron scattering, NMR, fluorescence spectroscopy, and others. One source of macromolecular models is X-ray crystallography. However, the conformation in solution may not match that observed in the crystal form. Using computational techniques, an initial fixed model can be expanded into a search space utilizing high temperature molecular dynamic approaches or stochastic methods such as Brownian dynamics. The number of structures produced can vary greatly, ranging from hundreds to tens of thousands or more. This introduces a number of cyberinfrastructure challenges. Computing hydrodynamic parameters and small angle scattering curves can be computationally intensive for each structure, and therefore cluster compute resources are essential for timely results. Input and output data sizes can vary greatly from less than 1 MB to 2 GB or more. Although the parallelization is trivial, along with data size variability there is a large range of compute sizes, ranging from one to potentially thousands of cores with compute time of minutes to hours. In addition to the distributed computing infrastructure challenges, an important concern was how to allow a user to conveniently submit, monitor and retrieve results from within the C++/Qt GUI application while maintaining a method for authentication, approval and registered publication usage throttling. Middleware supporting these design goals has been integrated into the application with assistance from the Open Gateway Computing Environments (OGCE) collaboration team. The approach was tested on various XSEDE clusters and local compute resources. This paper reviews current US-SOMO functionality and implementation with a focus on the newly deployed cluster integration.
[show abstract][hide abstract] ABSTRACT: The advent of parallel computing technology and low-cost computing hardware has facilitated the adoption of high-performance
computing tools for the analysis of sedimentation data. Over the past 15years, we have developed the UltraScan software (Demeler
et al., http://ultrascan.uthscsa.edu
) to support sedimentation analysis, experimental design, and data management. We describe here recent extensions and advances
in methodology that have been adapted in UltraScan. High-performance computing methods implemented on parallel supercomputers
utilizing grid computing technology are used to analyze sedimentation experiments at much higher resolution than was previously
possible. We discuss the implementation of parallel computing in three novel algorithms used in UltraScan for modeling of
sedimentation velocity experiments and provide guidelines for effective data analysis.
Colloid and Polymer Science 286(2):139-148. · 2.16 Impact Factor
[show abstract][hide abstract] ABSTRACT: Sedimentation velocity experiments reveal information about molecular weight and shape of sedimenting macromolecules. The observables in such experiments are the sedimentation and diffusion coefficients and the concentration of individual solutes. We have developed parallel optimization algorithms that allow us to extract molecular parameters from mixtures of macro- molecules using a nearly model-independent approach. Using a combination of deterministic and stochastic optimization, we are able to fit complex analy tical ultracentrifugation experi- ments globally with excellent convergence properties. Our software uses the TIGRE grid mid- dleware to distribute the computing effort to Teragrid and other computing resources, and offers a public web portal for the hydrodynamic analysis of AUC experiments1 . Our solutions pro- vide unparalleled resolution, and allow us to characterize polymerization events, aggregation and provide high resolution information in structure and function studies in the solution state. in the mixture. The sedimentation (sk) and diffusion coefficients ( Dk) are parameters of the Lamm equation, and define uniquely the molecular weight a nd shape of each solute k in the mixture, while the amplitude of each term determines the partial concentration (ck). In an AUC experiment the goal is to correctly determine s, D, c as well as n, the number of solutes present in the mixture. The inverse problem of fitting experimental data to simulations of Lamm equation systems represents a difficult optimization problem which is nonlinear with respect to the fitting parameters. We present here a method for evaluating experimental data by applying multiple optimization algorithms in series for obtaining the most likely parsimonious parameter distribution that satisfies Occam's razor. Our approach is implemented on a parallel computing platform utilizing the globus-based TIGRE grid middleware5 which can be conveniently accessed through a web portal. Results can be further analyzed with the UltraScan software6, 7. Our approach includes algorithms for initialization, systematic noise deconvolution, parameter search and parsimonious regularization.