Conference Paper

Mixture of Neural Networks: Some Experiments with the Multilayer Feedforward Architecture

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A Modular Multi-Net System consist on some networks which solve partially a problem. The original problem has been decomposed into subproblems and each network focuses on solving a subproblem. The Mixture of Neural Networks consist on some expert networks which solve the subproblems and a gating network which weights the outputs of the expert networks. The expert networks and the gating network are trained all together in order to reduce the correlation among the networks and minimize the error of the system. In this paper we present the Mixture of Multilayer Feedforward (MixMF) a method based on MixNN which uses Multilayer Feedfoward networks for the expert level. Finally, we have performed a comparison among Simple Ensemble, MixNN and MixMF and the results show that MixMF is the best performing method.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...
Conference Paper
As shown in the bibliography, training an ensemble of networks is an interesting way to improve the performance with respect to a single network. However there are several methods to construct the ensemble and there are no complete results showing which one could be the most appropriate. In this paper we present a comparison of eleven different methods. We have trained ensembles of 3, 9, 20 and 40 networks to show results in a wide spectrum of values. The results show that the improvement in performance above 9 networks in the ensemble depends on the method but it is usually marginal. Also, the best method is called “Decorrelated” and uses a penalty term in the usual Backpropagation function to decorrelate the network outputs in the ensemble.
Conference Paper
Demand for solving complex problems has directed the research trend in intelligent systems toward design of cooperative multi-experts. One way of achieving effective cooperation is through sharing resources such as information and components. In this paper, we study classifier combination techniques from cooperation perspective. The degree and method by which multiple classifier systems share training resources can be a measure of cooperation. Even though data modification techniques, such as bagging and k-fold crossvalidation, have been extensively used, there is no guidance whether sharing or not sharing training patterns results in higher accuracy and under what conditions. We carried out a set of experiments to examine the effect of sharing training patterns on several architectures by varying the size of overlap between 0-100% of the size of training subsets. The overall conclusion is that sharing training patterns among classifiers is beneficial.
Conference Paper
As shown in the bibliography, training an ensemble of networks is an interesting way to improve the performance with respect to a single network. However there are several methods to construct the ensemble. In this paper we present some new results in a comparison of twenty different methods. We have trained ensembles of 3, 9, 20 and 40 networks to show results in a wide spectrum of values. The results show that the improvement in performance above 9 networks in the ensemble depends on the method but it is usually low. Also, the best method for an ensemble of 3 networks is called "decorrelated" and uses a penalty term in the usual backpropagation function to decorrelate the network outputs in the ensemble. For the case of 9 and 20 networks the best method is conservative boosting. And finally for 40 networks the best method is Cels.
Article
This book, which is wholly devoted to the subject of model combination, is divided into ten chapters. In addition to the first two introductory chapters, the book covers some of the following topics: multiple classifier systems; combination methods when the base classifier outputs are 0/1; methods when the outputs are continuous, e.g., posterior probabilities; methods for classifier selection; bagging and boosting; the theory of fixed combination rules; and the concept of diversity. Overall, it is a very well-written monograph. It explains and analyzes different approaches comparatively so that the reader can see how they are similar and how they differ. The literature survey is extensive. The MATLAB code for many methods is given in chapter appendices allowing readers to play with the explained methods or apply them quickly to their own data. The book is a must-read for researchers and practitioners alike.
Article
Bootstrap samples with noise are shown to be an effective smoothness and capacity control technique for training feed-forward networks and for other statistical methods such as generalized additive models. It is shown that noisy bootstrap performs best in conjunction with weight decay regularization and ensemble averaging. The two-spiral problem, a highly non-linear noise-free data, is used to demonstrate these findings. The combination of noisy bootstrap and ensemble averaging is also shown useful for generalized additive modeling, and is also demonstrated on the well known Cleveland Heart Data [7].