[Show abstract][Hide abstract] ABSTRACT: This technical note addresses the stability analysis of nonlinear dynamic systems. Three main contributions are made. First, we show that the standard assumption of a continuous Lyapunov function can be (and in some cases must be) relaxed. We introduce the concept of the ‘weak’ Lyapunov function, which requires that an annulus condition be satisfied. We believe that this annulus condition is a more natural construct, because it is precisely what is needed to make the forward Lyapunov theorem true. Second, we provide an example of a nonlinear system with stable equilibrium point that cannot be shown to be stable with a continuous Lyapunov function. Finally, we demonstrate a simpler and less restrictive proof of the converse Lyapunov theorem.
Full-text · Article · Sep 2014 · IEEE Transactions on Automatic Control
[Show abstract][Hide abstract] ABSTRACT: We found in previous work that the error surfaces of recurrent networks have spurious valleys that can cause significant difficulties in training these networks. Our earlier work focused on single-layer networks. In this paper, we extend the previous results to general layered digital dynamic networks. We describe two types of spurious valleys that appear in the error surfaces of these networks. These valleys are not affected by the desired network output (or by the problem that the network is trying to solve). They depend only on the input sequence and the architecture of the network. The insights gained from this analysis suggest procedures for improving the training of recurrent neural networks.
No preview · Article · Nov 2013 · IEEE transactions on neural networks and learning systems
[Show abstract][Hide abstract] ABSTRACT: In this paper, we introduce a new procedure for efficient training of recurrent neural networks. The new procedure uses a batch training method based on a modified version of the Levenberg-Marquardt algorithm. The information of gradients of individual sequences is used to mitigate the effect of spurious valleys in the error surface of recurrent networks. The method is tested on the modeling and control of several physical systems.
[Show abstract][Hide abstract] ABSTRACT: The computation time for Monte Carlo (MC) simulation of a nanostructure growth process was shown to be reduced by an order of magnitude compared to conventional atomistic and meso-scale models through the prediction of the structure evolution ahead of every growth step. This approach used to grow of one of the longest (∼194 nm) reported carbon nanotubes (CNTs) from atomistic simulations. The key to the approach is the finding from simulation experiments that the CNT synthesis process exhibits nonlinear and recurring near-stationary dynamics.
No preview · Article · Mar 2012 · Chemical Physics Letters
[Show abstract][Hide abstract] ABSTRACT: This paper describes a modification to the method of Reduction Of Dissipativity Domain with Linear Boundaries (RODD-LBI) which was introduced by Barabanov and Prokharov 171. The ROOD method is a computational tech nique for the global stability analysis of nonlinear dynamic sys tems. In this paper we introduce an extension to the original ROOD method that is designed to speed up convergence. The ef ficiency of the extended algorithm is demonstrated through nu merical examples.
[Show abstract][Hide abstract] ABSTRACT: This paper describes a practical framework for using multilayer feedforward neural networks to simultaneously fit both a function and its first derivatives. This framework involves two steps. The first step is to train the network to optimize a performance index, which includes both the error in fitting the function and the error in fitting the derivatives. The second step is to prune the network by removing neurons that cause overfitting and then to retrain it. This paper describes two novel types of overfitting that are only observed when simultaneously fitting both a function and its first derivatives. A new pruning algorithm is proposed to eliminate these types of overfitting. Experimental results show that the pruning algorithm successfully eliminates the overfitting and produces the smoothest responses and the best generalization among all the training algorithms that we have tested.
No preview · Article · Jun 2011 · IEEE Transactions on Neural Networks
[Show abstract][Hide abstract] ABSTRACT: The variation in the fitting accuracy of neural networks (NNs) when used to fit databases comprising potential energies obtained from ab initio electronic structure calculations is investigated as a function of the number and nature of the elements employed in the input vector to the NN. Ab initio databases for H(2)O(2), HONO, Si(5), and H(2)C[Double Bond]CHBr were employed in the investigations. These systems were chosen so as to include four-, five-, and six-body systems containing first, second, third, and fourth row elements with a wide variety of chemical bonding and whose conformations cover a wide range of structures that occur under high-energy machining conditions and in chemical reactions involving cis-trans isomerizations, six different types of two-center bond ruptures, and two different three-center dissociation reactions. The ab initio databases for these systems were obtained using density functional theory/B3LYP, MP2, and MP4 methods with extended basis sets. A total of 31 input vectors were investigated. In each case, the elements of the input vector were chosen from interatomic distances, inverse powers of the interatomic distance, three-body angles, and dihedral angles. Both redundant and nonredundant input vectors were investigated. The results show that among all the input vectors investigated, the set employed in the Z-matrix specification of the molecular configurations in the electronic structure calculations gave the lowest NN fitting accuracy for both Si(5) and vinyl bromide. The underlying reason for this result appears to be the discontinuity present in the dihedral angle for planar geometries. The use of trigometric functions of the angles as input elements produced significantly improved fitting accuracy as this choice eliminates the discontinuity. The most accurate fitting was obtained when the elements of the input vector were taken to have the form R(ij) (-n), where the R(ij) are the interatomic distances. When the Levenberg-Marquardt procedure was modified to permit error minimization with respect to n as well as the weights and biases of the NN, the optimum powers were all found to lie in the range of 1.625-2.38 for the four systems studied. No statistically significant increase in fitting accuracy was achieved for vinyl bromide when a different value of n was employed and optimized for each bond type. The rate of change in the fitting error with n is found to be very small when n is near its optimum value. Consequently, good fitting accuracy can be achieved by employing a value of n in the middle of the above range. The use of interparticle distances as elements of the input vector rather than the Z-matrix variables employed in the electronic structure calculations is found to reduce the rms fitting errors by factors of 8.86 and 1.67 for Si(5) and vinyl bromide, respectively. If the interparticle distances are replaced with input elements of the form R(ij) (-n) with n optimized, further reductions in the rms error by a factor of 1.31 to 2.83 for the four systems investigated are obtained. A major advantage of using this procedure to increase NN fitting accuracy rather than increasing the number of neurons or the size of the database is that the required increase in computational effort is very small.
No preview · Article · May 2010 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: A novel method is presented that significantly reduces the computational bottleneck of executing high-level, electronic structure calculations of the energies and their gradients for a large database that adequately samples the configuration space of importance for systems containing more than four atoms that are undergoing multiple, simultaneous reactions in several energetically open channels. The basis of the method is the high-degree of correlation that generally exists between the Hartree-Fock (HF) and higher-level electronic structure energies. It is shown that if the input vector to a neural network (NN) includes both the configuration coordinates and the HF energies of a small subset of the database, MP4(SDQ) energies with the same basis set can be predicted for the entire database using only the HF and MP4(SDQ) energies for the small subset and the HF energies for the remainder of the database. The predictive error is shown to be less than or equal to the NN fitting error if a NN is fitted to the entire database of higher-level electronic structure energies. The general method is applied to the computation of MP4(SDQ) energies of 68,308 configurations that comprise the database for the simultaneous, unimolecular decomposition of vinyl bromide into six different reaction channels. The predictive accuracy of the method is investigated by employing successively smaller subsets of the database to train the NN to predict the MP4(SDQ) energies of the remaining configurations of the database. The results indicate that for this system, the subset can be as small as 8% of the total number of configurations in the database without loss of accuracy beyond that expected if a NN is employed to fit the higher-level energies for the entire database. The utilization of this procedure is shown to save about 78% of the total computational time required for the execution of the MP4(SDQ) calculations. The sampling error involved with selection of the subset is shown to be about 10% of the predictive error for the higher-level energies. A practical procedure for utilization of the method is outlined. It is suggested that the method will be equally applicable to the prediction of electronic structure energies computed using even higher-level methods than MP4(SDQ).
No preview · Article · Sep 2009 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: This paper describes newly discovered types of overfitting that occur when simultaneously fitting a function and its first derivatives with multilayer feedforward neural networks. We analyze the overfitting and demonstrate how it develops. These types of overfitting occur over very narrow regions in the input space, thus a validation set is not helpful in detecting them. A new pruning algorithm is proposed to eliminate these types of overfitting. Simulation results show that the pruning algorithm successfully eliminates the overfitting, produces smooth responses and provides excellent generalization capabilities. The proposed pruning algorithm can be used with any single-output, two-layer network, which uses a hyperbolic tangent transfer function in the hidden layer.
[Show abstract][Hide abstract] ABSTRACT: In recent years, there has been significant interest in implementing neural networks on FPGAs. This paper describes a simple technique for implementing multi-layer neural networks, with arbitrary numbers of neurons and layers, on FPGAs, using minimal resources. The network architecture can be modified simply by loading memory with the architecture parameters and the network weights and biases. The paper also presents an application of the technology, in which a smart position sensor system is implemented with a neural network on a Xilinx Spartan 3E FPGA development system.
[Show abstract][Hide abstract] ABSTRACT: A general method for the development of potential-energy hypersurfaces is presented. The method combines a many-body expansion to represent the potential-energy surface with two-layer neural networks (NN) for each M-body term in the summations. The total number of NNs required is significantly reduced by employing a moiety energy approximation. An algorithm is presented that efficiently adjusts all the coupled NN parameters to the database for the surface. Application of the method to four different systems of increasing complexity shows that the fitting accuracy of the method is good to excellent. For some cases, it exceeds that available by other methods currently in literature. The method is illustrated by fitting large databases of ab initio energies for Si(n) (n=3,4,...,7) clusters obtained from density functional theory calculations and for vinyl bromide (C(2)H(3)Br) and all products for dissociation into six open reaction channels (12 if the reverse reactions are counted as separate open channels) that include C-H and C-Br bond scissions, three-center HBr dissociation, and three-center H(2) dissociation. The vinyl bromide database comprises the ab initio energies of 71 969 configurations computed at MP4(SDQ) level with a 6-31G(d,p) basis set for the carbon and hydrogen atoms and Huzinaga's (4333/433/4) basis set augmented with split outer s and p orbitals (43321/4321/4) and a polarization f orbital with an exponent of 0.5 for the bromine atom. It is found that an expansion truncated after the three-body terms is sufficient to fit the Si(5) system with a mean absolute testing set error of 5.693x10(-4) eV. Expansions truncated after the four-body terms for Si(n) (n=3,4,5) and Si(n) (n=3,4,...,7) provide fits whose mean absolute testing set errors are 0.0056 and 0.0212 eV, respectively. For vinyl bromide, a many-body expansion truncated after the four-body terms provides fitting accuracy with mean absolute testing set errors that range between 0.0782 and 0.0808 eV. These errors correspond to mean percent errors that fall in the range 0.98%-1.01%. Our best result using the present method truncated after the four-body summation with 16 NNs yields a testing set error that is 20.3% higher than that obtained using a 15-dimensional (15-140-1) NN to fit the vinyl bromide database. This appears to be the price of the added simplicity of the many-body expansion procedure.
No preview · Article · Jun 2009 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: This paper gives a detailed analysis of the error surfaces of certain recurrent networks and explains some difficulties encountered in training recurrent networks. We show that these error surfaces contain many spurious valleys, and we analyze the mechanisms that cause the valleys to appear. We demonstrate that the principle mechanism can be understood through the analysis of the roots of random polynomials. This paper also provides suggestions for improvements in batch training procedures that can help avoid the difficulties caused by spurious valleys, thereby improving training speed and reliability.
Preview · Article · May 2009 · IEEE Transactions on Neural Networks
[Show abstract][Hide abstract] ABSTRACT: An improved neural network (NN) approach is presented for the simultaneous development of accurate potential-energy hypersurfaces and corresponding force fields that can be utilized to conduct ab initio molecular dynamics and Monte Carlo studies on gas-phase chemical reactions. The method is termed as combined function derivative approximation (CFDA). The novelty of the CFDA method lies in the fact that although the NN has only a single output neuron that represents potential energy, the network is trained in such a way that the derivatives of the NN output match the gradient of the potential-energy hypersurface. Accurate force fields can therefore be computed simply by differentiating the network. Both the computed energies and the gradients are then accurately interpolated using the NN. This approach is superior to having the gradients appear in the output layer of the NN because it greatly simplifies the required architecture of the network. The CFDA permits weighting of function fitting relative to gradient fitting. In every test that we have run on six different systems, CFDA training (without a validation set) has produced smaller out-of-sample testing error than early stopping (with a validation set) or Bayesian regularization (without a validation set). This indicates that CFDA training does a better job of preventing overfitting than the standard methods currently in use. The training data can be obtained using an empirical potential surface or any ab initio method. The accuracy and interpolation power of the method have been tested for the reaction dynamics of H+HBr using an analytical potential. The results show that the present NN training technique produces more accurate fits to both the potential-energy surface as well as the corresponding force fields than the previous methods. The fitting and interpolation accuracy is so high (rms error=1.2 cm(-1)) that trajectories computed on the NN potential exhibit point-by-point agreement with corresponding trajectories on the analytic surface.
Preview · Article · May 2009 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: Previous methods proposed for obtaining analytic potential-energy surfaces (PES) from ab initio electronic structure calculations are not self-starting. They generally require that the sampling of configuration space important in the reaction dynamics of the process being investigated be initiated by using chemical intuition or a previously developed semiempirical potential-energy surface. When the system under investigation contains four or more atoms undergoing three- and four-center reactions in addition to bond scission processes, obtaining a sufficiently converged initial sampling can be very difficult due to the extremely large volume of configuration space that is important in the reaction dynamics. It is shown that by combining direct dynamics (DD) with previously reported molecular dynamics (MD), novelty sampling (NS), and neural network (NN) methods, an analytical surface suitable for MD computations for large systems may be obtained. Application of the method to the investigation of N-O bond scission and cis-trans isomerization reactions of HONO followed by comparison of the resulting neural network potential-energy surface to one obtained by using a semiempirical potential to initiate the sampling shows that the two potential surfaces are the same within the fitting accuracy of the surfaces. It is concluded that the combination of direct dynamics, molecular dynamics, novelty sampling, and neural network fitting provides a self-starting, robust, and accurate DD/MD/NS/NN method for the execution of first-principles, ab initio, molecular dynamics studies in systems containing four or more atoms which are undergoing simultaneous two-, three-, and four-center reactions.
No preview · Article · Feb 2009 · The Journal of Physical Chemistry A
[Show abstract][Hide abstract] ABSTRACT: A generalized method that permits the parameters of an arbitrary empirical potential to be efficiently and accurately fitted to a database is presented. The method permits the values of a subset of the potential parameters to be considered as general functions of the internal coordinates that define the instantaneous configuration of the system. The parameters in this subset are computed by a generalized neural network (NN) with one or more hidden layers and an input vector with at least 3n-6 elements, where n is the number of atoms in the system. The Levenberg-Marquardt algorithm is employed to efficiently affect the optimization of the weights and biases of the NN as well as all other potential parameters being treated as constants rather than as functions of the input coordinates. In order to effect this minimization, the usual Jacobian employed in NN operations is modified to include the Jacobian of the computed errors with respect to the parameters of the potential function. The total Jacobian employed in each epoch of minimization is the concatenation of two Jacobians, one containing derivatives of the errors with respect to the weights and biases of the network, and the other with respect to the constant parameters of the potential function. The method provides three principal advantages. First, it obviates the problem of selecting the form of the functional dependence of the parameters upon the system's coordinates by employing a NN. If this network contains a sufficient number of neurons, it will automatically find something close to the best functional form. This is the case since Hornik et al., [Neural Networks 2, 359 (1989)] have shown that two-layer NNs with sigmoid transfer functions in the first hidden layer and linear functions in the output layer are universal approximators for analytic functions. Second, the entire fitting procedure is automated so that excellent fits are obtained rapidly with little human effort. Third, the method provides a procedure to avoid local minima in the multidimensional parameter hyperspace. As an illustrative example, the general method has been applied to the specific case of fitting the ab initio energies of Si(5) clusters that are observed in a molecular dynamics (MD) simulation of the machining of a silicon workpiece. The energies of the Si(5) configurations obtained in the MD calculations are computed using the B3LYP procedure with a 6-31G(**) basis set. The final ab initio database, which comprises the density functional theory energies of 10 202 Si(5) clusters, is fitted to an empirical Tersoff potential containing nine adjustable parameters, two of which are allowed to be the functions of the Si(5) configuration. The fitting error averaged over all 10 202 points is 0.0148 eV (1.43 kJ mol(-1)). This result is comparable to the accuracy achieved by more general fitting methods that do not rely on an assumed functional form for the potential surface.
No preview · Article · Aug 2008 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: A previously reported method for conducting molecular dynamics simulations of gas-phase chemical dynamics on ab initio potential-energy surfaces using modified novelty sampling and feedforward neural networks is applied to the investigation of the unimolecular dissociation of vinyl bromide. The neural network is fitted to a database comprising the MP4(SDQ) energies computed for 71 969 nuclear configurations using an extended basis set. Dissociation rate coefficients and branching ratios at an internal excitation energy of 6.44 eV for all six open reaction channels are reported. The distribution of vibrational energy in HBr formed in three-center dissociation is computed and found to be in excellent accord with experimental measurements. Computational requirements for the electronic structure calculations, neural network training, and trajectory calculations are given. The weight and bias matrices required for implementation of the neural network potential are made available through the Supplementary Material.
No preview · Article · Nov 2007 · The Journal of Chemical Physics
[Show abstract][Hide abstract] ABSTRACT: Training a neural network is a difficult optimization problem because of numerous local minima. Many global search algorithms
have been used to train neural networks. However, local search algorithms are more efficient with computational resources,
and therefore numerous random restarts with a local algorithm may be more effective than a global algorithm. This study uses
Monte-Carlo simulations to determine the efficiency of a local search algorithm relative to nine stochastic global algorithms
when using a neural network on function approximation problems. The computational requirements of the global algorithms are
several times higher than the local algorithm and there is little gain in using the global algorithms to train neural networks.
Since the global algorithms only marginally outperform the local algorithm in obtaining a lower local minimum and they require
more computational resources, the results in this study indicate that with respect to the specific algorithms and function
approximation problems studied, there is little evidence to show that a global algorithm should be used over a more traditional
local optimization routine for training neural networks. Further, neural networks should not be estimated from a single set
of starting values whether a global or local optimization method is used.
No preview · Article · Oct 2007 · Neural Processing Letters
[Show abstract][Hide abstract] ABSTRACT: This paper introduces a general framework for describing dynamic neural networks--the layered digital dynamic network (LDDN). This framework allows the development of two general algorithms for computing the gradients and Jacobians for these dynamic networks: backpropagation-through-time (BPTT) and real-time recurrent learning (RTRL). The structure of the LDDN framework enables an efficient implementation of both algorithms for arbitrary dynamic networks. This paper demonstrates that the BPTT algorithm is more efficient for gradient calculations, but the RTRL algorithm is more efficient for Jacobian calculations.
Preview · Article · Feb 2007 · IEEE Transactions on Neural Networks