## No full-text available

To read the full-text of this research,

you can request a copy directly from the author.

Randomised algorithms offer fast solutions for problem solving with statistical characterization. In machine learning and computational intelligence, research on randomized algorithms for training neural networks can be tracked back to 80s. A common nature of these pioneering works is that all of them are non-iterative solutions for training SLFNs. By doing this, one expected to overcome some shortcomings from gradient-based training algorithm through random assignment on the input weights and biases and determination of the output weights by least squares methods.
This special issue aims to promote randomized approaches for neural networks, providing both historical developments with milestone results and new trends in this research topic. We truly respect advancements of knowledge on randomized learner models and randomised learning algorithms, theory and applications.

To read the full-text of this research,

you can request a copy directly from the author.

... Among them, we can cite the use of sparse formulations [9], semi-supervised settings [10], and many more. An insightful editorial on randomized methods for training neural networks can be found in [11]. For a more complete exposition on random assignment of weights in neural networks, see also the recent survey in [12]. ...

... Note that both terms here have a very intuitive explanation. Due to the symmetry of the Gaussian distribution, the most probable prediction in (13) is given by considering the most probable set of parameters according to the posterior mean (11). The variance of the estimate in (14) is characterized by two terms, where the first term represents the noise level, and the second term is given by the uncertainty in the data itself. ...

... First, we initialize σ 2 and γ to some default values. Then, at every iteration, we compute the current mean and covariance according to (11) and (12). Second, we update the current estimate of σ 2 and γ according to the following formulas: ...

Random vector functional-link (RVFL) networks are randomized multilayer perceptrons with a single hidden layer and a linear output layer, which can be trained by solving a linear modeling problem. In particular, they are generally trained using a closed-form solution of the (regularized) least-squares approach. This paper introduces several alternative strategies for performing full Bayesian inference (BI) of RVFL networks. Distinct from standard or classical approaches, our proposed Bayesian training algorithms allow to derive an entire probability distribution over the optimal output weights of the network, instead of a single pointwise estimate according to some given criterion (e.g., least-squares). This provides several known advantages, including the possibility of introducing additional prior knowledge in the training process, the availability of an uncertainty measure during the test phase, and the capability of automatically inferring hyper-parameters from given data. In this paper, two BI algorithms for regression are first proposed that, under some practical assumptions, can be implemented by a simple iterative process with closed-form computations. Simulation results show that one of the proposed algorithms, Bayesian RVFL, is able to outperform standard training algorithms for RVFL networks with a proper regularization factor selected carefully via a line search procedure. A general strategy based on variational inference is also presented, with an application to data modeling problems with noisy outputs or outliers. As we discuss in this paper, using recent advances in automatic differentiation this strategy can be applied to a wide range of additional situations in an immediate fashion.

... Falls are a major health hazard for older adults and result in high morality and injury rates [55]. A large percentage their asymptotically faster runtimes and computationally efficient models [48,56]. The main idea behind utilising randomised learning for neural networks is to assign random weights and biases to neural network inputs and compute output parameters by solving a linear system [57]. ...

... Randomised algorithms [32] have received a significant focus in recent years for large-scale computing applications, due to their asymptotically faster runtimes and efficient numerical implementations. Neural networks and machine learning models have also exploited randomised algorithms for faster training [48,56]. To the best of our knowledge, this is the first instance of randomised weights-based RVFL neural networks for fall detection. ...

Falls are a major health concern and result in high morbidity and mortality rates in older adults with high costs to health services. Automatic fall classification and detection systems can provide early detection of falls and timely medical aid. This paper proposes a novel Random Vector Functional Link (RVFL) stacking ensemble classifier with fractal features for classification of falls. The fractal Hurst exponent is used as a representative of fractal dimensionality for capturing irregularity of accelerometer signals for falls and other activities of daily life. The generalised Hurst exponents along with wavelet transform coefficients are leveraged as input feature space for a novel stacking ensemble of RVFLs composed with an RVFL neural network meta-learner. Novel fast selection criteria are presented for base classifiers founded on the proposed diversity indicator, obtained from the overall performance values during the training phase. The proposed features and the stacking ensemble provide the highest classification accuracy of 95.71% compared with other machine learning techniques, such as Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine. The proposed ensemble classifier is 2.3× faster than a single Decision Tree and achieves the highest speedup in training time of 317.7× and 198.56× compared with a highly optimised ANN and RF ensemble, respectively. The significant improvements in training times of the order of 100× and high accuracy demonstrate that the proposed RVFL ensemble is a prime candidate for real-time, embedded wearable device–based fall detection systems.

... Differently from previous reviews [17,18,19], in this chapter we focus on recent attempts at extending these ideas to the deep case, where a (possibly very large) number of hidden layers is stacked to obtain multiple intermediate representations. ...

... where N (v) is the neighborhood of v and, as before, x l (v) is the driving input information for vertex v at layer l. The two deep reservoir models expressed by (18) and (19) are based on randomization as conventional RC approaches, and are formalized respectively in [106] and [107]. Experimental assessment in these papers indicate the great potentiality of the randomization approach also in dealing with complex data structures, often establishing new state-of-the-art accuracy results on problems in the areas of document processing, cheminformatics and social network analysis. ...

Randomized Neural Networks explore the behavior of neural systems where the majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical examples of such systems consist of multi-layered neural network architectures where the connections to the hidden layer(s) are left untrained after initialization. Limiting the training algorithms to operate on a reduced set of weights inherently characterizes the class of Randomized Neural Networks with a number of intriguing features. Among them, the extreme efficiency of the resulting learning processes is undoubtedly a striking advantage with respect to fully trained architectures. Besides, despite the involved simplifications, randomized neural systems possess remarkable properties both in practice, achieving state-of-the-art results in multiple domains, and theoretically, allowing to analyze intrinsic properties of neural architectures (e.g. before training of the hidden layers' connections). In recent years, the study of Randomized Neural Networks has been extended towards deep architectures, opening new research directions to the design of effective yet extremely efficient deep learning models in vectorial as well as in more complex data domains. This chapter surveys all the major aspects regarding the design and analysis of Randomized Neural Networks, and some of the key results with respect to their approximation capabilities. In particular, we first introduce the fundamentals of randomized neural models in the context of feed-forward networks (i.e., Random Vector Functional Link and equivalent models) and convolutional filters, before moving to the case of recurrent systems (i.e., Reservoir Computing networks). For both, we focus specifically on recent results in the domain of deep randomized systems, and (for recurrent models) their application to structured domains.

... In general, randomized learning methods have more advantages in training efficiency [20]. This type of method usually adopts a two-step training paradigm, which randomly assigns the input weights and biases of hidden nodes, and solves the output weights by least squares method. ...

... , (20) where N denotes the number of samples in training, or validation, or test set. The error results are averaged over all of the runs. ...

This paper develops an incremental randomized learning method for an extended Echo State Network (φ-ESN), which has a reservoir with random static projection, to better cope with non-linear time series data modelling problems. Although the typical ESN can effectively improve the prediction performance of the network by extending a random static nonlinear hidden layer, since the input weights and biases of the hidden neurons in the extended static layer are randomly assigned, some neurons have little effect on reducing the model error, resulting in high model complexity, poor generalization and large performance fluctuation. A constructive incremental randomized learning method termed OLS-φ-ESN is proposed for generating the nodes of the extended static nonlinear hidden layer. Two-step training paradigm is adopted, namely, randomly assigning the input weights and biases of the hidden neurons in the extended static layer according to a supervisory mechanism and solving output weights by least squares algorithm. Based on Orthogonal Least Squares (OLS) search algorithm, the proposed supervisory mechanism is designed where an adaptive threshold is also set to better control the compactness of the generated learner model. Simulation results concerning both nonlinear time series prediction and system identification tasks indicate some advantages of our proposed OLS-φ-ESN in terms of more compact model and sound generalization.

... In particular we can distinguish among -Two-or three-phase methods -Iterative two blocks minimization methods and on some variants of these two main classes. A two-phase method The simplest learning schemes derived by the two-block decomposition of the variables is a simple two-phase method in which an initial value ω 0 1 is chosen and a single optimization of (11) is performed to get ω * 0 2 , so that only a full iteration is performed with k = 0. Randomization algorithms for FNN [121,142] fit in this class of two-phase methods. Indeed a randomized algorithm consists in generating the initial value ω 0 1 randomly, according to any continuous probability distribution, and in performing a single global (exact or approximate) minimization of problem (11) to get ω * 0 2 . ...

... This may be due to the effect of randomization of the weights. See [121] and the special issue [142] for overviews of the different ways in which randomization can be applied to the design of shallow FNN and the statistical learning properties of the corresponding randomized learning scheme. ...

The paper presents an overview of global issues in optimization methods for training feedforward neural networks (FNN) in a regression setting. We first recall the learning optimization paradigm for FNN and we briefly discuss global scheme for the joint choice of the network topologies and of the network parameters. The main part of the paper focuses on the core subproblem which is the continuous unconstrained (regularized) weights optimization problem with the aim of reviewing global methods specifically arising both in multi layer perceptron/deep networks and in radial basis networks. We review some recent results on the existence of non-global stationary points of the unconstrained nonlinear problem and the role of determining a global solution in a supervised learning paradigm. Local algorithms that are widespread used to solve the continuous unconstrained problems are addressed with focus on possible improvements to exploit the global properties. Hybrid global methods specifically devised for FNN training optimization problems which embed local algorithms are discussed too.

... Randomized approaches for large-scale computing are highly desirable due to their effectiveness and efficiency [20]. In machine learning for data modelling, randomized algorithms have demonstrated great potential in developing fast learner models and learning algorithms with much less computational cost [19,25,28]. Readers are strongly recommended to refer to our survey paper [25] for more details. ...

... Thus, for many real world applications, deterministic methods used in building constructive neural networks seem have no or less applicability. As one of pathways to generate neural networks with the universal approximation property (corresponding to the perfect learning power), randomized approaches have great potential to access a faster and feasible solution [19,25,28]. ...

This paper contributes to a development of randomized methods for neural networks. The proposed learner model is generated incrementally by stochastic configuration (SC) algorithms, termed as Stochastic Configuration Networks (SCNs). In contrast to the existing randomised learning algorithms for single layer feed-forward neural networks (SLFNs), we randomly assign the input weights and biases of the hidden nodes in the light of a supervisory mechanism, and the output weights are analytically evaluated in either constructive or selective manner. As fundamentals of SCN-based data modelling techniques, we establish some theoretical results on the universal approximation property. Three versions of SC algorithms are presented for regression problems (applicable for classification problems as well) in this work. Simulation results concerning both function approximation and real world data regression indicate some remarkable merits of our proposed SCNs in terms of less human intervention on the network size setting, the scope adaptation of random parameters, fast learning and sound generalization.

... Although some convergence properties of these methods can be theoretically established, there exist some limitations in practice due to the extensive search for the hidden parameters. Randomized learning techniques for neural networks become popular in recent years because of their good potential in dealing with large-scale data analysis, fast dynamic modelling, and real-time data processing [4,14,19,20,23] . To the best of our knowledge, researches on randomized algorithms for training neural networks can be tracked back to the 1980s [2] . ...

... To simplify the specification of the random parameters of in RVFL networks, Husmeier suggested to use symmetric and adjustable intervals for approximating a class of nonlinear maps [9] . For more details about the history and recent developments of randomized method for training neural networks, readers may refer to an informative editorial [23] . ...

... It is constantly subjected to expensive computational cost due to the repetitive training process. Thus, the alternative optimization methods, such as randomized algorithms [3][4][5][6], became highly favored. Some representative researches include random vector functional link neural networks (RVFL) [7][8][9], pseudoinverse networks [10], and stochastic configuration networks (SCNs) [11,12]. ...

As a compact and effective learning model, the random vector functional link neural network (RVFL) has been confirmed with universal approximation capabilities. It has gained considerable attention in various fields. However, the randomly generated parameters in RVFL often lead to the loss of valid information and data redundancy, which severely degrades the model performance in practice. This paper first proposes an efficient network parameters learning approach for the original RVFL with pseudoinverse learner (RVFL-PL). Instead of taking the random feature mapping directly, RVFL-PL adopts a non-iterative manner to obtain influential enhancement nodes implanted with valuable information from input data, which realizes to improve the quality of the enhancement nodes and ease the problem caused by the randomly assigned parameters in the standard RVFL. Since the network parameters are optimized analytically, this improved variant can maintain the efficiency of the standard RVFL. Further, the RVFL-PL is extended to a multilayered structure (mRVFL-PL) to obtain high-level representations from the input data. The results of comprehensive experiments on some benchmarks indicate the performance improvement of the proposed method compared to other corresponding methods.

... There is usually a complex nonlinear relationship between the fully connected feature vector and the output category. Compared with RVFL, SCNs [26][27][28][29][30][31] share merit to ensure the universal approximation ability of the built randomized learner. The innovative contribution of SCNs is the way of assigning the random parameters with an inequality constraint and adaptively selecting the scope of the random parameters. ...

Industrial data contain a lot of noisy information, which cannot be well suppressed in deep learning models. The current industrial data classification models are problematic in terms of feature incompleteness and inadequate self-adaptability, insufficient capacity for approximation of classifier and weak robustness. To this end, this paper proposes an intelligent classification method based on self-attention learning features and stochastic configuration networks (SCNs). This method imitates human cognitive mode to regulate feedback so as to achieve ensemble learning. In particular, firstly, at the feature extraction stage, a fused deep neural network model based on self-attention is constructed. It adopts a self-attention long short-term memory (LSTM) network and self-attention residual network with adaptive hierarchies and extracts the fault global temporal features and local spatial features of the industrial time-series dataset after noise suppression, respectively. Secondly, at the classifier design stage, the fused complete feature vectors are sent to SCNs with universal approximation capability to establish general classification criteria. Then, based on generalized error and entropy theory, the performance indexes for real-time evaluation of credibility of uncertainty classified results are established, and the adaptive adjustment mechanism of self-attention fusion networks for the network hierarchy is built to realize the self-optimization of multi-hierarchy complete features and their classification criteria. Finally, fuzzy integral is used to integrate the classified results of self-attention fusion network models with different hierarchies to improve the robustness of the classification model. Compared with other classification models, the proposed model performs better using rolling bearing fault dataset.

... It is widely used in deep and transfer learning [34][35][36][37]. Also, it has good potential for handling large-scale data, fast dynamic modeling, and real-time data processing [38][39][40]. In recent years, RVFL network models have been used in classification and regression, but there are few studies involving the number of hidden nodes and scale factor values. ...

The random vector functional link (RVFL) network is suitable for solving nonlinear problems from transformer fault symptoms and different fault types due to its simple structure and strong generalization ability. However, the RVFL network has a disadvantage in that the network structure, and parameters are basically determined by experiences. In this paper, we proposed a method to improve the RVFL neural network algorithm by introducing the concept of hidden node sensitivity, classify each hidden layer node, and remove nodes with low sensitivity. The simplified network structure could avoid interfering nodes and improve the global search capability. The five characteristic gases produced by transformer faults are divided into two groups. A fault diagnosis model of three layers with four classifiers was built. We also investigated the effects of the number of hidden nodes and scale factors on RVFL network learning ability. Simulation results show that the number of implicit layer nodes has a large impact on the network model when the number of input dimensions is small. The network requires a higher number of implicit layer neurons and a smaller threshold range. The size of the scale factor has significant influence on the network model with larger input dimension. This paper describes the theoretical basis for parameter selection in RVFL neural networks. The theoretical basis for the selection of the number of hidden nodes, and the scale factor is derived. The importance of parameter selection for the improvement of diagnostic accuracy is verified through simulation experiments in transformer fault diagnosis.

... This indicates that output weights of the neural network can be sparse [12]. In addition, the pseudoinverse is usually implemented to train the randomized neural networks [13]. When dealing with high-dimensional data, the ill-posed problem may occur, and then, the output weight with large amplitude will be obtained, which leads to an insufficient generalization capacity of the randomized neural networks. ...

To address the architecture complexity and ill-posed problems of neural networks when dealing with high-dimensional data, this article presents a Bayesian-learning-based sparse stochastic configuration network (SCN) (BSSCN). The BSSCN inherits the basic idea of training an SCN in the Bayesian framework but replaces the common Gaussian distribution with a Laplace one as the prior distribution of the output weights of SCN. Meanwhile, a lower bound of the Laplace sparse prior distribution using a two-level hierarchical prior is adopted based on which an approximate Gaussian posterior with sparse property is obtained. It leads to the facilitation of training the BSSCN, and the analytical solution for output weights of BSSCN can be obtained. Furthermore, the hyperparameter estimation process is derived by maximizing the corresponding lower bound of the marginal likelihood function based on the expectation-maximization algorithm. In addition, considering the uncertainties caused by both noises in the real-world data and model mismatch, a bootstrap ensemble strategy using BSSCN is designed to construct the prediction intervals (PIs) of the target variables. The experimental results on three benchmark data sets and two real-world high-dimensional data sets demonstrate the effectiveness of the proposed method in terms of both prediction accuracy and quality of the constructed PIs.

... In order to mitigate the problems caused by the traditional back propagation (BP) algorithm and its variants, lots of research have been conducted. Among them, randomized algorithm is one of the alternative method [4]. The weight of the hidden layer is usually selected by randomization, while the weight of the output layer is obtained by direct algebraic (pseudo-inverse) operation. ...

... However, RBFN obtains unsatisfactory solution for some cases and Dibyasundar Das, Deepak Ranjan Nayak, Ratnaka Dash and Banshidhar Majhi is with the Department of Computer Science and Engineering, National Institute of Technology Rourkela, Odisha, India, 769008 e-mail: (dibyasundar@ieee.org). results in poor generalization [6]. In contrary, ELM provides effective solution for SLFNs with good generalization and extreme fast leaning, thereby, has been widely applied in various applications like regression [7], data classification [1], [7], image segmentation [8], dimension reduction [9], medical image classification [10], [11], [12], face classification [13], etc. ...

Extreme learning machine (ELM), a randomized learning paradigm for a single hidden layer feed-forward network, has gained significant attention for solving problems in diverse domains due to its faster learning ability. The output weights in ELM are determined by an analytic procedure, while the input weights and biases are randomly generated and fixed during the training phase. The learning performance of ELM is highly sensitive to many factors such as the number of nodes in the hidden layer, the initialization of input weight and the type of activation functions in the hidden layer. Moreover, the performance of ELM is affected due to the presence of random input weight and the model suffers from ill posed problem. Hence, here we propose a backward-forward algorithm for a single feed-forward neural network that improves the generalization capability of the network with fewer hidden nodes. Here, both input and output weights are determined mathematically which gives the network its performance advantages. The proposed model provides an improvement over extreme learning machine with respect to the number of nodes used for generalization.

... With the development of neural networks, randomized algorithms have become hot research topics due to their fast learning ability and much less computational cost [17]- [19]. These randomized algorithms have two common characteristics: 1) Randomly assigning the input weights and biases of neural network and 2) using the least squares method to solve the output weight. ...

Accurate and fast recognition of fiber intrusion signals has always been a fundamental task in the Optical Fiber Pre-warning System (OFPS). However, currently existing recognition models tend to focus on one aspect and lack a comprehensive approach. In this paper, a dropout-based Stochastic Configuration Network (SCN) optical fiber intrusion signal recognition model is first proposed, which is named DropoutSCN. By combining dropout with randomized algorithm models, it not only enhances the fast learning ability, but also improves the generalization performance of the recognition model. In the experiment, compared with traditional Artificial Neural Network (ANN), Random Vector Functional Link (RVFL) and original SCN models, the DropoutSCN model proposed in this paper has the lowest root mean square error (RMSE). In terms of time efficiency, it reduces the time delay by about 2.5 times compared with the traditional ANN. In addition, this paper applies dropout to SCN, which provides a feasible thinking and reference for the study of randomized algorithm models.

... In the original LUBE method, all parameters of NNs, including the input weights, biases and output weights, need to be turned, which leads to a slow training process. To address this issue, randomized methods for training networks have developed [2,27,31,32,34,36] . In this paper, the DSCN proposed in [33] is employed to implement the LUBE method. ...

... These algorithms differ in feature mapping phase such as ELM uses random feature mapping (weights from input to hidden layer generated randomly), and RBFN uses distance-based random feature mapping (centers of RBFs are generated randomly). However, RBFN obtains an unsatisfactory solution for some cases and results in poor generalization [47]. In contrary, ELM provides effective solution for SLFNs with good generalization and extreme fast leaning, thereby, has been widely applied in various applications like regression [21], data classification [18,21], image segmentation [38], dimension reduction [22], medical image classification [37,52,55], face classification [33], etc. ...

Extreme learning machine (ELM), a randomized learning paradigm for single hidden layer feed-forward network, has gained significant attention for solving problems in diverse domains due to its faster learning ability. The output weights in ELM are determined by an analytic procedure, while the input weights and biases are randomly generated and fixed during the training phase. The learning performance of ELM is highly sensitive to many factors such as the number of nodes in the hidden layer, the initialization of input weight and the type of activation functions in the hidden layer. Although various works on ELM have been proposed in the last decade, the effect of the all these influencing factors on classification performance has not been fully investigated yet. In this paper, we test the performance of ELM with different configurations through an empirical evaluation on three standard handwritten character datasets, namely, MNIST, ISI-Kolkata Bangla numeral, ISI-Kolkata Odia numeral and a newly developed NIT-RKL Bangla numeral dataset. Finally, we derive some best ELM figurations which can serve as general guidelines to design ELM based classifiers.

... This is a part of measure concentration phenomena [18,19,34] which form the background of classical statistical physics (Gibbs theorem about equivalence of microcanonic and canonic ensembles [13]) and asymptotic theorems in probability [32]. In machine learning, these phenomena are in the background of the random projection methods [28], learning large Gaussian mixtures [1], and various randomized approaches to learning [50], [42] and bring light in the theory of approximation by random bases [16]. It is highly probable that the recently described manifold disentanglement effects [5] are universal in essentially high dimensional data analysis and relate to essential multidimensionality rather than to specific deep learning algorithms. ...

... For values of W 1 , it is a simple way to choose random values [7,[10][11][12][13][14]. In practice, it is found that if no constraints are applied to random generated value, the solution will become unstable. ...

In this work, we give an overview of pseudoinverse learning (PIL) algorithm as well as applications. PIL algorithm is a non-gradient descent algorithm for multi-layer perception. The weight matrix of network can be exactly computed by PIL algorithm. So PIL algorithm can effectively avoid the problem of low convergence and local minima. Moreover, PIL does not require user-selected parameters, such as step size and learning rate. This algorithm has achieved good application in the fields of software reliability engineering, astronomical data analysis and so on.

... Significant growth of the problem space has led to a scalability issue for conventional machine learning approaches, which require iterating entire batches of data over multiple epochs. This phenomenon results in a strong demand for a simple, fast machine learning algorithm to be well-suited for deployment in numerous data-rich applications [7]. This provides a strong case for research in the area of randomized neural networks (RNNs) [10], which was very popular in late 80's and early 90's. ...

The theory of random vector functional link network (RVFLN) has provided a breakthrough in the design of neural networks (NNs) since it conveys solid theoretical justification of randomized learning. Existing works in RVFLN are hardly scalable for data stream analytics because they are inherent to the issue of complexity as a result of the absence of structural learning scenarios. A novel class of RVLFN, namely parsimonious random vector functional link network (pRVFLN), is proposed in this paper. pRVFLN features an open structure paradigm where its network structure can be built from scratch and can be automatically generated in accordance with degree of nonlinearity and time-varying property of system being modelled. pRVFLN is equipped with complexity reduction scenarios where inconsequential hidden nodes can be pruned and input features can be dynamically selected. pRVFLN puts into perspective an online active learning mechanism which expedites the training process and relieves operator labelling efforts. In addition, pRVFLN introduces a non-parametric type of hidden node, developed using an interval-valued data cloud. The hidden node completely reflects the real data distribution and is not constrained by a specific shape of the cluster. All learning procedures of pRVFLN follow a strictly single-pass learning mode, which is applicable for an online real-time deployment. The efficacy of pRVFLN was rigorously validated through numerous simulations and comparisons with state-of-the art algorithms where it produced the most encouraging numerical results. Furthermore, the robustness of pRVFLN was investigated and a new conclusion is made to the scope of random parameters where it plays vital role to the success of randomized learning.

... Their simplicity is also at the heart of several recent improvements on their basic formulation, such as ensemble strategies (Alhamdoosh and Wang, 2014), the inclusion of unlabeled data in the training process (Scardapane et al., 2016), the use of fuzzy neurons (He et al., 2016), and many more. We refer to the editorial in Wang (2016) and the overview in Scardapane and Wang (2017) for two introductory papers on the topic, and to Li and Wang (2017) for a complete discussion on the problem of selecting a proper range for the randomly generated weights. ...

A random vector functional-link (RVFL) network is a neural network composed of a randomised hidden layer and an adaptable output layer. Training such a network is reduced to a linear least-squares problem, which can be solved efficiently. Still, selecting a proper number of nodes in the hidden layer is a critical issue, since an improper choice can lead to either overfitting or underfitting for the problem at hand. Additionally, small sized RVFL networks are favoured in situations where computational considerations are important. In the case of RVFL networks with a single output, unnecessary neurons can be removed adaptively with the use of sparse training algorithms such as Lasso, which are suboptimal for the case of multiple outputs. In this paper, we extend some prior ideas in order to devise a group sparse training algorithm which avoids the shortcomings of previous approaches. We validate our proposal on a large set of experimental benchmarks, and we analyse several state-of-the-art optimisation techniques in order to solve the overall training problem. We show that the proposed approach can obtain an accuracy comparable to standard algorithms, while at the same time resulting in extremely sparse hidden layers.

... Further investigations were presented in [21] , where some theoretical results were established. Readers may refer to a recently published editorial for a more in-depth discussion of the randomized learner models and their relevant learning algorithms [48] . However, these centralized learning algorithms depend critically on the assumption that the entire training data set is available to a single processor, which is equipped with significantly more computation and computational resources in a distributed environment. ...

This paper focuses on developing new algorithms for distributed cooperative learning based on zero-gradient-sum (ZGS) optimization in a network setting. Specifically, the feedforward neural network with random weights (FNNRW) is introduced to train on data distributed across multiple learning agents, and each agent runs the program on a subset of the entire data. In this scheme, there is no requirement for a fusion center, due to, e.g., practical limitations, security, or privacy reasons. The centralized FNNRW problem is reformulated into an equivalent separable form with consensus constraints among nodes and is solved by the ZGS-based distributed optimization strategy, which theoretically guarantees convergence to the optimal solution. The proposed method is more effective than the existing methods using the decentralized average consensus (DAC) and alternating direction method of multipliers (ADMM) strategies. It is simple and requires less computational and communication resources, which is well suited for potential applications, such as wireless sensor networks, artificial intelligence, and computational biology, involving datasets that are often extremely large, high-dimensional and located on distributed data sources. We show simulation results on both synthetic and real-world datasets.

... Although feed-forward neural networks with error back-propagation learning algorithms have been widely used in resolving nonlinear data regression problems, it suffers from local minima, over-fitting, sensitivity of learning parameter option and very slow learning rate. Randomized approaches for training neural networks take some advantages on learning efficiency and ease in model building and selection [3,27] . Neural networks with random weights (NNRWs) were proposed in [23] , where the input weights and biases at hidden nodes are randomly assigned in [ −1 , 1] and the output weights are given analytically by the well-known least squares method. ...

A learner model with fast learning and compact architecture is expected for industrial data modeling. To achieve these goals during stochastic configuration networks (SCNs) construction, we propose an improved version of SCNs in this paper. Unlike the original SCNs, the improved one employs a new inequality constraint in the construction process. In addition, to speed up the construction efficiency of SCNs, a node selection method is proposed to adaptively select nodes from a candidate pool. Moreover, to reduce the redundant nodes of the built SCNs model, we further compress the model based on the singular value decomposition algorithm. The improved SCNs are compared with other methods over four datasets and then applied to the ammonia-nitrogen concentration prediction task in the wastewater treatment process. Experimental results indicate that the proposed method has good potential for industrial data analytics.

Stochastic configuration networks (SCNs) are a class of randomized learner models that ensure the universal approximation property, whereby random weights and biases are drawn from the uniform distribution and selected by a supervisory mechanism. This paper looks into the impact of the distribution of random weights on the performance of SCNs. In the light of a fundamental principle in machine learning, that is, a model with smaller parameters holds improved generalization, we recommend using symmetric zero-centered distributions in constructing SCNs to improve the generalization performance. Further, we introduce a scalar in the distributions to make the SCN model adaptively feasible to different datasets. Simulation results are reported for both regression and classification tasks over twenty-one benchmark datasets using SCN. Results are also presented on ten regression datasets using a deep implementation of SCN, known as deep stochastic configuration networks (DeepSCN).

The operating state of insulators is directly related to the stability of power transmission line. The existing methods for insulator state recognition cannot achieve satisfactory performance. In this paper, the self-blast state recognition of glass insulators is investigated by using an adaptive learning representation. To increase the adaptability of the network to different scales, we propose a solution based on multi-scale information throughout the entire process, beginning from a low-scale to high-scale subnetworks. The multi-scale information is aggregated in parallel way to take advantage of rich information representation. Then, an imitation of the human thinking pattern is employed. Utilizing entropy-based cost function, we update the parameters of the learner model in real-time. Based on the constraint of the evaluation index, adaptive depth representation for training glass insulators that are unsatisfied with the reliability evaluation is constructed to realize the self-optimizing regulation of feature space. Correspondingly, a stochastic configuration networks (SCNs) classifier is re-constructed to fit for the update multi-hierarchies knowledge space to carry out the re-recognition process. Finally, fuzzy integration is employed to ensemble multi-hierarchies network to improve the model’s generalization. The recognition results on aerial dataset of insulators images demonstrate the effectiveness of our proposed approach.

A significant research area in medical imaging analysis is digital mammography breast cancer detection in the early stage. For breast mass classification into the benign or malignant category, an enhanced automated computer-aided diagnosis (CAD) model is suggested in this work, enabling radiologists to identify breast diseases correctly in less time. First, a fast discrete curvelet transform with wrapping (FDCT-WRP) is deployed to extract the curve-like features and create a feature set. Then, a combined feature reduction strategy called principal component analysis (PCA) and linear discriminant analysis (LDA) is used to produce more relevant and reduced feature sets. Finally, a new enhanced learning algorithm called MODPSO-ELM incorporates modified particle swarm optimization (MODPSO) and an extreme learning machine (ELM) proposed for the classification task. In the MODPSO-ELM algorithm, MODPSO is utilized to optimize the hidden node parameters (input weights and hidden biases) of single-hidden-layer feedforward neural networks (SLFN) and analytically determined the output weight. The proposed CAD model has been evaluated on three standard datasets with a 10 × k-fold stratified cross-validation (SCV) test. It is found from the experiment that the suggested CAD model yields the best outcome for the MIAS dataset and obtains an accuracy of 98.94% and 98.76% for DDSM and INbreast datasets, respectively. The experimental results indicate that the proposed model is superior to other state-of-the-art models with a substantially reduced number of features with better classification accuracy.

In this study, a novel approach that is based on reservoir computing, which is a successful method in modeling sequential datasets, and extreme learning machines, which has a high generalization capacity, was proposed to model a non-sequential dataset or system. The proposed approach does not require any optimization stage; each weight (except weights in the output layer), biases, the number of neurons in the reservoir, activation functions and the parameters of activation functions were determined arbitrarily and the weights in the output layer were calculated based on these arbitrarily assigned parameters. The proposed approach was evaluated and validated with 60 different benchmark datasets. Obtained results were compared with literature findings and results obtained by each of the extreme learning machine (ELM), randomized artificial neural network, random vector functional link, stochastic ELM, and pruned stochastic ELM methods. Achieved results are successful enough to be employed in classification and regression.

Randomized Neural Networks explore the behavior of neural systems where the majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical examples of such systems consist of multi-layered neural network architectures where the connections to the hidden layer(s) are left untrained after initialization. Limiting the training algorithms to operate on a reduced set of weights inherently characterizes the class of Randomized Neural Networks with a number of intriguing features. Among them, the extreme efficiency of the resulting learning processes is undoubtedly a striking advantage with respect to fully trained architectures. Besides, despite the involved simplifications, randomized neural systems possess remarkable properties both in practice, achieving state-of-the-art results in multiple domains, and theoretically, allowing to analyze intrinsic properties of neural architectures (e.g. before training of the hidden layers’ connections). In recent years, the study of Randomized Neural Networks has been extended towards deep architectures, opening new research directions to the design of effective yet extremely efficient deep learning models in vectorial as well as in more complex data domains. This chapter surveys all the major aspects regarding the design and analysis of Randomized Neural Networks, and some of the key results with respect to their approximation capabilities. In particular, we first introduce the fundamentals of randomized neural models in the context of feed-forward networks (i.e., Random Vector Functional Link and equivalent models) and convolutional filters, before moving to the case of recurrent systems (i.e., Reservoir Computing networks). For both, we focus specifically on recent results in the domain of deep randomized systems, and (for recurrent models) their application to structured domains.

In the environmental security monitoring application, an optical fiber prewarning system (OFPS) functions not only to locate the intrusion events but also recognize them. As a nonlinear network for recognition, the stochastic configuration network (SCN) is considered a promising method because it does not require setting the network scale beforehand. However, in the specific requirements of the application of OFPS, due to the small feature distance of different intrusion signals to be classified, it is necessary to set a smaller value of error tolerance. However, the side-effect is that meeting the constraint condition faces a challenge. To overcome this, we improve the configuration method of the hidden layer nodes in the SCN network. In the proceeding of the network process, the increment of the hidden layer nodes in each loop is gradually increased, and the space of the corresponding random parameters generated is enlarged. The SCN with variable increments of hidden nodes can adjust the number of hidden nodes added in each loop for continuous construction and obtaining higher classification accuracy. This study has a great significance for the application of SCN in the classification of intrusion signals in OFPS. © 2019 Society of Photo-Optical Instrumentation Engineers (SPIE).

Although high accuracies were achieved by artificial neural network (ANN), determining the optimal number of neurons in the hidden layer and the activation function is still an open issue. In this paper, the applicability of assigning the number of neurons in the hidden layer and the activation function randomly was investigated. Based on the findings, two novel versions of randomized ANNs, which are stochastic, and pruned stochastic, were proposed to achieve a higher accuracy without any time-consuming optimization stage. The proposed approaches were evaluated and validated by the basic versions of the popular randomized ANNs [1] are the random weight neural network [2], the random vector functional links [3] and the extreme learning machine [4] methods. In the stochastic version of randomized ANNs, not only the weights and biases of the neurons in the hidden layer but also the number of neurons in the hidden layer and each activation function were assigned randomly. In pruned stochastic version of these methods, the winner networks were pruned according to a novel strategy in order to produce a faster response. Proposed approaches were validated via 60 datasets (30 classification and 30 regression datasets). Obtained accuracies and time usages showed that both versions of randomized ANNs can be employed for classification and regression.

Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neu-ral networks involves the optimization of non-convex objective functions, and usually the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counter-intuitive alternative is to randomly assign a subset of the networks' weights, so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favourable benefits, including (i) simplicity of implementation , (ii) faster learning with less intervention from human beings, and (iii) possibility of leveraging over all linear regression and classification algorithms (e.g., 1 norm minimization for obtaining sparse formulations). All these points make them attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims at providing a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems and, most importantly, foster the exchanges of well-known results throughout different communities.

The leave-one-out error is an important statistical estimator of the perfor-mance of a learning algorithm. Unlike the empirical error, it is almost unbiased and is frequently used for model selection. We review attempts aiming at justifying the use of leave-one-out error in Machine Learning. We especially focus on the concept of stability of a learning algorithm and show how this can be used to formally link leave-one-out error to the generalization error. Stability has also motivated recent work on averaging techniques similar to bagging which we briefly summarize in the paper. The ideas we develop are illustrated in some details in the context of kernel-based learning algorithms.

We discuss the role of random basis function approximators in modeling and control. We analyze the published work on random basis function approximators and demonstrate that their favorable error rate of convergence O(1/n) is guaranteed only with very substantial computational resources. We also discuss implications of our analysis for applications of neural networks in modeling and control.

We extend existing theory on stability, namely how much changes in the training data influence the estimated models, and generalization performance of deterministic learning algorithms to the case of randomized algorithms. We give formal definitions of stability for randomized algorithms and prove non-asymptotic bounds on the difference between the empirical and expected error as well as the leave-one-out and expected error of such algorithms that depend on their random stability. The setup we develop for this purpose can be also used for generally studying randomized learning algorithms. We then use these general results to study the effects of bagging on the stability of a learning method and to prove non-asymptotic bounds on the predictive performance of bagging which have not been possible to prove with the existing theory of stability for deterministic learning algorithms.

In the field of neural network research a number of experiments
described seem to be in contradiction with the classical pattern
recognition or statistical estimation theory. The authors attempt to
give some experimental understanding why this could be possible by
showing that a large fraction of the parameters (the weights of neural
networks) are of less importance and do not need to be measured with
high accuracy. The remaining part is capable to implement the desired
classifier and because this is only a small fraction of the total number
of weights, the reported experiments seem to be more realistic from a
classical point of view

A system architecture and a network computational approach compatible with the goal of devising a general-purpose artificial neural network computer are described. The functionalities of supervised learning and optimization are illustrated, and cluster analysis and associative recall are briefly mentioned.< >

An algorithm is called stable at a training set S if any change of a single point in S yields only a small change in the output. Stability of the learning algorithm is necessary for learnability in the supervised classification and regression setting. In this paper, we give formal definitions of strong and weak stability for randomized algorithms and prove non-asymptotic bounds on the difference between the empirical and expected error.

Abstract: The relationship between'learning'in adaptive layered networks and the fitting of
data with high dimensional surfaces is discussed. This leads naturally to a picture
of'generalization in terms of interpolation between known data points and suggests a
rational approach to the theory of such networks. A class of adaptive networks is identified
which makes the interpolation scheme explicit. This class has the property that learning is
equivalent to the solution of a set of linear equations. These networks thus represent ...

A randomized algorithm is one that makes random choices during its execution. The behavior of such an algorithm may thus be random even on a fixed input. The design and analysis of a randomized algorithm focus on establishing that it is likely to behave well on every input; the likelihood in such a statement depends only on the probabilistic choices made by the algorithm during execution and not on any assumptions about the input. It is especially important to distinguish a randomized algorithm from the average-case analysis of algorithms, where one analyzes an algorithm assuming that its input is drawn from a fixed probability distribution. With a randomized algorithm, in contrast, no assumption is made about the input.

In this paper we explore and discuss the learning and generalization characteristics of the random vector version of the Functional-link net and compare these with those attainable with the GDR algorithm. This is done for a well-behaved deterministic function and for real-world data. It seems that ‘overtraining’ occurs for stochastic mappings. Otherwise there is saturation of training.

To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift- invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning al- gorithms applied to these features outperform state-of-the-art large-scale kernel machines.

Randomized neural networks are immortalized in this well-known AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?" asked Minsky. "I am training a randomly wired neural net to play tic-tac-toe," Sussman replied. "Why is the net wired ran- domly?" asked Minsky. Sussman replied, "I do not want it to have any precon- ceptions of how to play." Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty," replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlin- earities. We identify conditions under which these networks exhibit good classi- fication performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities.

From an analytical approach of the multilayer network architecture, we deduce a polynomial-time algorithm for learning from examples. We call it JNN, for “Jacobian Neural Network”. Although this learning algorithm is a randomized algorithm, it gives a correct network with probability 1. The JNN learning algorithm is defined for a wide variety of multilayer networks, computing real output vectors, from real input vectors, through one or several hidden layers, with low assumptions on the activation functions of the hidden units. Starting from an exact learning algorithm, for a given database, we propose a regularization technique which improves the performance on applications, as can be verified on several benchmark problems. Moreover, the JNN algorithm does not require a priori statements on the network architecture, since the number of hidden units, for a one-hidden-layer network, is computed by learning. Finally, we show that a modular approach allows to learn with a reduced number of weights.

We examined closely the cerebellar circuit model that we have proposed previously. The model granular layer generates a finite but very long sequence of active neuron populations without recurrence, which is able to represent the passage of time. For ...

Randomized algorithms for very large matrix problems have received a great
deal of attention in recent years. Much of this work was motivated by problems
in large-scale data analysis, and this work was performed by individuals from
many different research communities. This monograph will provide a detailed
overview of recent work on the theory of randomized matrix algorithms as well
as the application of those ideas to the solution of practical problems in
large-scale data analysis. An emphasis will be placed on a few simple core
ideas that underlie not only recent theoretical advances but also the
usefulness of these tools in large-scale data applications. Crucial in this
context is the connection with the concept of statistical leverage. This
concept has long been used in statistical regression diagnostics to identify
outliers; and it has recently proved crucial in the development of improved
worst-case matrix algorithms that are also amenable to high-quality numerical
implementation and that are useful to domain scientists. Randomized methods
solve problems such as the linear least-squares problem and the low-rank matrix
approximation problem by constructing and operating on a randomized sketch of
the input matrix. Depending on the specifics of the situation, when compared
with the best previously-existing deterministic algorithms, the resulting
randomized algorithms have worst-case running time that is asymptotically
faster; their numerical implementations are faster in terms of clock-time; or
they can be implemented in parallel computing environments where existing
numerical algorithms fail to run at all. Numerous examples illustrating these
observations will be described in detail.

Stability is a general notion that quantifies the sensitivity of a learning
algorithm's output to small change in the training dataset (e.g. deletion or
replacement of a single training sample). Such conditions have recently been
shown to be more powerful to characterize learnability in the general learning
setting under i.i.d. samples where uniform convergence is not necessary for
learnability, but where stability is both sufficient and necessary for
learnability. We here show that similar stability conditions are also
sufficient for online learnability, i.e. whether there exists a learning
algorithm such that under any sequence of examples (potentially chosen
adversarially) produces a sequence of hypotheses that has no regret in the
limit with respect to the best hypothesis in hindsight. We introduce online
stability, a stability condition related to uniform-leave-one-out stability in
the batch setting, that is sufficient for online learnability. In particular we
show that popular classes of online learners, namely algorithms that fall in
the category of Follow-the-(Regularized)-Leader, Mirror Descent, gradient-based
methods and randomized algorithms like Weighted Majority and Hedge, are
guaranteed to have no regret if they have such online stability property. We
provide examples that suggest the existence of an algorithm with such stability
condition might in fact be necessary for online learnability. For the more
restricted binary classification setting, we establish that such stability
condition is in fact both sufficient and necessary. We also show that for a
large class of online learnable problems in the general learning setting,
namely those with a notion of sub-exponential covering, no-regret online
algorithms that have such stability condition exists.

A key challenge for neural modeling is to explain how a continuous stream of multimodal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real time. We propose a new computational model for real-time computing on time-varying input that provides an alternative to paradigms based on Turing machines or attractor neural networks. It does not require a task-dependent construction of neural circuits. Instead, it is based on principles of high-dimensional dynamical systems in combination with statistical learning theory and can be implemented on generic evolved or found recurrent circuitry. It is shown that the inherent transient dynamics of the high-dimensional dynamical system formed by a sufficiently large and heterogeneous neural circuit may serve as universal analog fading memory. Readout neurons can learn to extract in real time from the current state of such recurrent neural circuit information about current and past inputs that may be needed for diverse tasks. Stable internal states are not required for giving a stable output, since transient internal states can be transformed by readout neurons into stable target outputs due to the high dimensionality of the dynamical system. Our approach is based on a rigorous computational model, the liquid state machine, that, unlike Turing machines, does not require sequential transitions between well-defined discrete internal states. It is supported, as the Turing machine is, by rigorous mathematical results that predict universal computational power under idealized conditions, but for the biologically more realistic scenario of real-time processing of time-varying inputs. Our approach provides new perspectives for the interpretation of neural coding, the design of experiments and data analysis in neurophysiology, and the solution of problems in robotics and neurotechnology.

A theoretical justification for the random vector version of the
functional-link (RVFL) net is presented in this paper, based on a
general approach to adaptive function approximation. The approach
consists of formulating a limit-integral representation of the function
to be approximated and subsequently evaluating that integral with the
Monte-Carlo method. Two main results are: (1) the RVFL is a universal
approximator for continuous functions on bounded finite dimensional
sets, and (2) the RVFL is an efficient universal approximator with the
rate of approximation error convergence to zero of order O(C/√n),
where n is number of basis functions and with C independent of n.
Similar results are also obtained for neural nets with hidden nodes
implemented as products of univariate functions or radial basis
functions. Some possible ways of enhancing the accuracy of multivariate
function approximations are discussed

Functional-link net computing

- Y.-H Pao
- Y Takefji

Y.-H. Pao, Y. Takefji, Functional-link net computing, IEEE Comput. J. 25 (5) (1992) 76-79.

Stability Conditions for Online Learnability, The Computing Research Repository (CoRR)

- S Ross

S. Ross, J.A. Bagnell, Stability Conditions for Online Learnability, The Computing Research Repository (CoRR), August 2011.