Chapter

Meta-learning and Neurocomputing – A New Perspective for Computational Intelligence

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In this chapter an analysis of computational mechanisms of induction is brought forward, in order to assess the potentiality of meta-learning methods versus the common base-learning practices. To this aim, firstly a formal investigation of inductive mechanisms is accomplished, sketching a distinction between fixed and dynamical bias learning. Then a survey is presented with suggestions and examples which have been proposed in literature to increase the efficiency of common learning algorithms. The peculiar laboratory for this kind of investigation is represented by the field of connectionist learning. To explore the meta-learning possibilities of neural network systems, knowledge-based neurocomputing techniques are considered. Among them, some kind of hybridisation strategies are particularly analysed and addressed as peculiar illustrations of a new perspective of Computational Intelligence.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Chapter
The common practices of machine learning appear to be frustrated by a number of theoretical results denying the possibility of any meaningful implementation of a “superior” learning algorithm. However, there exist some general assumptions that, even when overlooked, preside the activity of researchers and practitioners. A thorough reflection over such essential premises brings forward the meta-learning approach as the most suitable for escaping the long-dated riddle of induction claiming also an epistemologic soundness. Several examples of meta-learning models can be found in literature, yet the combination of computational intelligence techniques with meta-learning models still remains scarcely explored. Our contribution to this particular research line consists in the realisation of Mindful, a meta-learning system based on the neuro-fuzzy hybridisation. We present the Mindful system firstly situating it inside the general context of the meta-learning frameworks proposed in literature. Finally, a complete session of experiments is illustrated, comprising both base-level and meta-level learning activity. The appreciable experimental results underline the suitability of the Mindful system for managing past accumulated learning experience while facing novel tasks.
Chapter
Full-text available
Scopo di questo capitolo è di introdurre il lettore ai principali argomenti che, nel corso della sua breve storia, l’Intelligenza Artificiale (IA) ha affrontato sia nella variante applicativa o ingegneristica sia in quella teorica o cognitiva. Alla fine di questo capitolo il lettore dovrebbe – avere presente l’evoluzione di alcune delle tendenze di ricerca più influenti in IA; – essere al corrente delle più recenti posizioni nel dibattito attuale all’interno dell’ IA; – essere brevemente introdotto ad alcuni classici problemi filosofici ed epistemologici affrontati dall’IA.
Conference Paper
Full-text available
This paper reviews the supervised learning versions of the no-free-lunch theorems in a simplified form. It also discusses the significance of those theorems, and their relation to other aspects of supervised learning.
Article
Full-text available
The Support Vector Machine algorithm is sensitive to the choice of parameter settings. If these are not set correctly, the algorithm may have a substandard performance. Suggesting a good setting is thus an important problem. We propose a meta-learning methodology for this purpose and exploit information about the past performance of different settings. The methodology is applied to set the width of the Gaussian kernel. We carry out an extensive empirical evaluation, including comparisons with other methods (fixed default ranking; selection based on cross-validation and a heuristic method commonly used to set the width of the SVM kernel). We show that our methodology can select settings with low error while providing significant savings in time. Further work should be carried out to see how the methodology could be adapted to different parameter setting tasks.
Article
Full-text available
This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.
Conference Paper
Full-text available
Meta-learning, as applied to model selection, consists of inducing mappings from tasks to learners. Traditionally, tasks are characterised by the values of pre-computed meta-attributes, such as statistical and information-theoretic measures, induced decision trees' characteris- tics and/or landmarkers' performances. In this position paper, we propose to (meta-)learn directly from induced decision trees, rather than rely on a hand-crafted set of pre-computed characteristics. Such metalearning is possible within the framework of the typed higher-order inductive learning framework we have developed.
Conference Paper
Full-text available
Landmarking is a novel approach to describing tasks in meta-learning. Previous approaches to meta-learning mostly considered only statistics-inspired measures of the data as a source for the denition of metaattributes. Contrary to such approaches, landmarking tries to determine the location of a specic learning problem in the space of all learning problems by directly measuring the performance of some simple and ecient learning algorithms themselves. In the experiments reported we show how such a use of landmark values can help to distinguish between areas of the learning space favouring dierent learners. Experiments, both with articial and real-world databases, show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms. 1.
Conference Paper
Full-text available
Common inductive learning strategies offer the tools for knowledge acquisition, but possess some inherent limitations due to the use of fixed bias during the learning process. To overcome limitations of such base-learning approaches, a novel research trend is oriented to explore the potentialities of meta-learning, which is oriented to the development of mechanisms based on a dynamical search of bias. This could lead to an improvement of the base-learner performance on specific learning tasks, by profiting of the accumulated past experience. As a significant set of I/O data is needed for efficient base-learning, appropriate meta-data characterization is of crucial importance for useful meta-learning. In order to characterize meta-data, firstly a collection of meta-features discriminating among different base-level tasks should be identified. This paper focuses on the characterization of meta-data, through an analysis of meta-features that can capture the properties of specific tasks to be solved at base level. This kind of approach represents a first step toward the development of a meta-learning system, capable of suggesting the proper bias for base-learning different specific task domains.
Conference Paper
Full-text available
Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present a combination of techniques to address this problem. The first one, zooming, analyzes a given dataset and selects relevant (similar) datasets that were processed by the candidate algoritms in the past. This process is based on the concept of "distance", calculated on the basis of several dataset characteristics. The information about the performance of the candidate algorithms on the selected datasets is then processed by a second technique, a ranking method. Such a method uses performance information to generate advice in the form of a ranking, indicating which algorithms should be applied in which order. Here we propose the adjusted ratio of ratios ranking method. This method takes into account not only accuracy but also the time performance of the candidate algorithms. The generalization power of this ranking method is analyzed. For this purpose, an appropriate methodology is defined. The experimental results indicate that on average better results are obtained with zooming than without it.
Article
Full-text available
We present here an original work that applies meta-learning approaches to select models for time-series forecasting. In our work, we investigated two meta-learning approaches, each one used in a di2erent case study. Initially, we used a single machine learning algorithm to select among two models to forecast stationary time series (case study I). Following, we used the NOEMON approach, a more recent work in the meta-learning area, to rank three models used to forecast time series of the M3-Competition (case study II). The experiments performed in both case studies revealed encouraging results. c
Article
Full-text available
Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.
Article
Full-text available
We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.
Article
We prove a lower bound of Ω((1/ɛ)ln(1/δ)+VCdim(C)/ɛ) on the number of random examples required for distribution-free learning of a concept class C, where VCdim(C) is the Vapnik-Chervonenkis dimension and ɛ and δ are the accuracy and confidence parameters. This improves the previous best lower bound of Ω((1/ɛ)ln(1/δ)+VCdim(C)) and comes close to the known general upper bound of O((1/ɛ)ln(1/δ)+(VCdim(C)/ɛ)ln(1/ɛ)) for consistent algorithms. We show that for many interesting concept classes, including kCNF and kDNF, our bound is actually tight to within a constant factor.
Book
The volume is the first one in the world literature which is a comprehensive, up to date account on computing with words, a new direction in broadly perceived intelligent systems, proposed and advocated by Professor Zadeh, the founder of fuzzy sets theory and fuzzy logic. Historically, computing was focused on manipulation of numbers. However, in recent years it became more and more obvious that computing had been encompassing not only this, but also manipulation on signal, sounds, images and text; the latter aspects of computing are becoming more and more relevant. However, the traditional manipulation of text (e.g., machine translation, spell checking, etc.) does not cover the mainstream of computing with words, mainly the representation and manipulation of propositions from natural language describing operations and relations. Such problems cannot be solved by conventional methods of logic, probability theory and numerical analysis. Fuzzy logic is shown to be an effective tool to handle such problems. Computing with words may form a basis of a computational theory of perceptions inspired by a remarkable human ability to perform a wide variety of tasks just on the basis of vague and imprecise information expressed in natural language. In Part 1 foundations of computing with words related to linguistic aspects, fuzzy logic and approximate reasoning, granularity, calculations on verbal quantities, and general architectures for the implementation of computing with words are presented.
Book
The volume is the first one in the world literature which is a comprehensive, up to date account on computing with words, a new direction in broadly perceived intelligent systems, proposed and advocated by Professor Zadeh, the founder of fuzzy sets theory and fuzzy logic. Historically, computing was focused on manipulation of numbers. However, in recent years it became more and more obvious that computing had been encompassing not only this, but also manipulation on signal, sounds, images and text; the latter aspects of computing are becoming more and more relevant. However, the traditional manipulation of text (e.g., machine translation, spell checking, etc.) does not cover the mainstream of computing with words, mainly the representation and manipulation of propositions from natural language describing operations and relations. Such problems cannot be solved by conventional methods of logic, probability theory and numerical analysis. Fuzzy logic is shown to be an effective tool to handle such problems. Computing with words may form a basis of a computational theory of perceptions inspired by a remarkable human ability to perform a wide variety of tasks just on the basis of vague and imprecise information expressed in natural language. In Part 2 applications in a wide array of fields are presented which use the paradigm of computing with words, exemplified by reasoning, data anaylsis, data mining, machine learning, risk anaylses, reliability and quality control, decision making, optimization and control, databases, medical diagnosis, business analyses, traffic management, power system planning, military applications, etc.
Chapter
This paper explores the proposition that inductive learning from examples is fundamentally limited to learning only a small fraction of the total space of possible hypotheses. We begin by defining the notion of an algorithm reliably learning a good approximation to a concept C. An empirical study of three algorithms (the classical algorithm for maximally specific conjunctive generalizations, ID3, and back-propagation for feed-forward networks of logistic units) demonstrates that each of these algorithms performs very poorly for the task of learning concepts defined over the space of Boolean feature vectors containing 3 variables. Simple counting arguments allow us to prove an upper bound on the maximum number of concepts reliably learnable from m training examples.
Conference Paper
Machine learning has not yet succeeded in the design of robust learning algorithms that generalize well from very small datasets. In contrast, humans often generalize correctly from only a single training example, even if the number of potentially relevant features is large. To do so, they successfully exploit knowledge acquired in previous learning tasks, to bias subsequent learning. This paper investigates learning in a lifelong context. In contrast to most machine learning approaches, which aim at learning a single function in isolation, lifelong learning addresses situations where a learner faces a stream of learning tasks. Such scenarios provide the opportunity for synergetic effects that arise if knowledge is transferred across multiple learning tasks. To study the utility of transfer, several approaches to lifelong learning are proposed and evaluated in an object recognition domain. It is shown that all these algorithms generalize consistently more accurately from scarce training data than comparable “single-task” approaches.
Article
There is no free lunch, no single learning algorithm that will outperform other algorithms on all data. In practice differ- ent approaches are tried and the best algorithm selected. An alternative solution is to build new algorithms on demand by creating a framework that accommodates many algo- rithms. The best combination of parameters and procedures is searched here in the space of all possible models belong- ing to the framework of Similarity-Based Methods (SBMs). Such meta-learning approach gives a chance to find the best method in all cases. Issues related to the meta-learning and first tests of this approach are presented.
Article
The volume is the first one in the world literature which is a comprehensive, up to date account on computing with words, a new direction in broadly perceived intelligent systems, proposed and advocated by Professor Zadeh, the founder of fuzzy sets theory and fuzzy logic.Computing with words may form a basis of a computational theory of perception inspired by a remarkable human ability to perform a wide variety of tasks on the basis of vague and imprecise information expressed in natural language. In Part 2, applications in a wide array of fields are presented which use the paradigm of computing with words, exemplified by reasoning, data analysis, data mining, machine learning, risk analyses, reliability and quality control, decision making, optimization and control, databases, medical diagnosis, business analyses, traffic management, power system planning, military applications, etc.
Article
Learning is regarded as the phenomenon of knowledge acquisition in the absence of explicit programming. A precise methodology is given for studying this phenomenon rom a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learned using it in a reasonable (polynomial) number of steps. Although inherent algorithmic complexity appears to set serious limits to the range of concepts that can be learned, it is shown that there are some important nontrivial classes of propositional concepts that can be learned in a realistic sense.
Article
This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as ???????????? ??. ??. and ?????????????????????? ??. ??. ?? ?????????????????????? ???????????????????? ???????????? ?????????????????? ?????????????? ?? ???? ????????????????????????. ???????????? ???????????????????????? ?? ???? ???????????????????? 16(2), 264???279 (1971).
Article
In this paper, wedescribe a general approach to scaling data mining applications thatwe have come to call meta-learning. Meta-Learningrefers to a general strategy that seeks to learn how to combine anumber of separate learning processes in an intelligent fashion. Wedesire a meta-learning architecture that exhibits two key behaviors.First, the meta-learning strategy must produce an accurate final classification system. This means that a meta-learning architecturemust produce a final outcome that is at least as accurate as aconventional learning algorithm applied to all available data.Second, it must be fast, relative to an individual sequential learningalgorithm when applied to massive databases of examples, and operatein a reasonable amount of time. This paper focussed primarily onissues related to the accuracy and efficacy of meta-learning as ageneral strategy. A number of empirical results are presenteddemonstrating that meta-learning is technically feasible in wide-area,network computing environments.
Article
Common inductive learning strategies offer tools for knowledge acquisition, but possess some inherent limitations due to the use of fixed bias during the learning process. To overcome the limitations of such base-learning approaches, a research trend explores the potentialities of meta-learning, which is oriented to the development of mechanisms based on a dynamical search of bias. This may lead to an improvement of the base-learner performance on specific learning tasks, by profiting of the accumulated past experience. In this paper, we present a meta-learning framework called Mindful (Meta INDuctive neuro-FUzzy Learning) which is founded on the integration of connectionist paradigms and fuzzy knowledge management. Due to its peculiar organisation, Mindful can be exploited on different levels of application, being able to accumulate learning experience in cross-task contexts. This specific knowledge is gathered during the meta-learning activity and it is exploited to suggest parametrisation for future base-learning tasks. The evaluation of the Mindful system is detailed through an ensemble of experimental sessions involving both synthetic domains and real-world data.
Article
We present here an original work that applies meta-learning approaches to select models for time-series forecasting. In our work, we investigated two meta-learning approaches, each one used in a different case study. Initially, we used a single machine learning algorithm to select among two models to forecast stationary time series (case study I). Following, we used the NOEMON approach, a more recent work in the meta-learning area, to rank three models used to forecast time series of the M3-Competition (case study II). The experiments performed in both case studies revealed encouraging results.
Article
In this paper, we present meta-learning evolutionary artificial neural network (MLEANN), an automatic computational framework for the adaptive optimization of artificial neural networks (ANNs) wherein the neural network architecture, activation function, connection weights; learning algorithm and its parameters are adapted according to the problem. We explored the performance of MLEANN and conventionally designed ANNs for function approximation problems. To evaluate the comparative performance, we used three different well-known chaotic time series. We also present the state-of-the-art popular neural network learning algorithms and some experimentation results related to convergence speed and generalization performance. We explored the performance of backpropagation algorithm; conjugate gradient algorithm, quasi-Newton algorithm and Levenberg–Marquardt algorithm for the three chaotic time series. Performances of the different learning algorithms were evaluated when the activation functions and architecture were changed. We further present the theoretical background, algorithm, design strategy and further demonstrate how effective and inevitable is the proposed MLEANN framework to design a neural network, which is smaller, faster and with a better generalization performance.
Article
A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Article
We prove a lower bound of Ω((1/ɛ)ln(1/δ)+VCdim(C)/ɛ) on the number of random examples required for distribution-free learning of a concept class C, where VCdim(C) is the Vapnik-Chervonenkis dimension and ɛ and δ are the accuracy and confidence parameters. This improves the previous best lower bound of Ω((1/ɛ)ln(1/δ)+VCdim(C)) and comes close to the known general upper bound of O((1/ɛ)ln(1/δ)+(VCdim(C)/ɛ)ln(1/ɛ)) for consistent algorithms. We show that for many interesting concept classes, including kCNF and kDNF, our bound is actually tight to within a constant factor.
Article
Designing artificial neural networks (ANNs) for different applications has been a key issue in the ANN field. At present, ANN design still relies heavily on human experts who have sufficient knowledge about ANNs and the problem to be solved. As ANN complexity increases, designing ANNs manually becomes more difficult and unmanageable. Simulated evolution offers a promising approach to tackle this problem. This paper describes an evolutionary approach to design ANNs. The ANNs designed by the evolutionary process are referred to as evolutionary ANNs (EANNs). They represent a special class of ANNs in which evolution is another fundamental form of adaptation in addition to learning (also known as weight training). This paper describes an evolutionary programming (EP) based system to evolve both architectures and connection weights (including biases) of ANNs. Five mutation operators have been proposed in our evolutionary algorithm. In order to improve the generalisation ability of evolved ANNs, these five operators are applied sequentially and selectively. Validation sets have also been used in the evolutionary process in order to improve generalisation further. The evolutionary algorithm allows ANNs to grow as well as shrink during the evolutionary process. It incorporates the weight learning process as part of its mutation process. The whole EANN system can be regarded as a hybrid evolution and learning system. Extensive experimental studies have been carried out to test this EANN system. This paper gives some of the experimental results which show the effectiveness of the system.
Conference Paper
) Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, OR 97331 tgd@cs.orst.edu ABSTRACT This paper explores the proposition that inductive learning from examples is fundamentally limited to learning only a small fraction of the total space of possible hypotheses. We begin by defining the notion of an algorithm reliably learning a good approximation to a concept C. An empirical study of three algorithms (the classical algorithm for maximally specific conjunctive generalizations, ID3, and back-propagation for feed-forward networks of logistic units) demonstrates that each of these algorithms performs very poorly for the task of learning concepts defined over the space of Boolean feature vectors containing 3 variables. Simple counting arguments allow us to prove an upper bound on the maximum number of concepts reliably learnable from m training examples. INTRODUCTION How good are current inductive learning algorithms? How well can any inductive ...
Conference Paper
Many aspects of concept learning research can be understood more clearly in light of a basic mathematical result stating, essentially, that positive performance in some learning situations must be offset by an equal degree of negative performance in others. We present a proof of this result and comment on some of its theoretical and practical ramifications.
Conference Paper
Boosting is a general method for improving the accuracy of any given learning algorithm. This short paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting. Some examples of recent applications of boost­ ing are also described.
Book
The book grew out of the European StatLog project, in which a team of 6 University and 6 Industrial research groups performed a controlled evaluation of a score of procedures on as many example datasets "to determine to what extent the various techniques met the needs of industry" (p. 4). By highlighting strengths and weakness of popular approaches on realistic problems, the project also hoped to contribute to their improvement. Procedures were drawn from the three fields of the book title and their models trained on most of each data set, then tested on the remaining, unseen data. (All fine-tuning of the procedures was to have relied solely on the training sets, with the test data secreted in a separate site, though one lapse of these controls was noted.) The evaluation results were summarized from several angles and then, interestingly, were themselves made a dataset for the purpose of what could be called meta-modeling. That is, a rule-based model (an Application Assistant) was trained to forecast, from summary features of a dataset, the most promising method(s) for that problem. That program and a utility facilitating comparison studies (an Evaluation Assistant), were deposited in the public domain. All of the datasets, and almost all of the classification algorithms are similarly available, with pointers provided in the Appendices. Though the book has over a dozen contributors (with 60 mentioned as part of the StatLog project), the editors apparently required several iterations and revisions to avoid duplication and make it more unified. This effort was largely successful (with exceptions noted below) and greatly enhances its readability. The book should prove to be of strong interest to researchers -- for whom it will no doubt spark a number of studies extending its results -- as well as to practitioners, who will welcome its practical tone, and find much in it of immediate utility.
Article
Recent years have seen a resurgence of interest in the use of metacognition in intelligent systems. This article is part of a small section meant to give interested researchers an overview and sampling of the kinds of work currently being pursued in this broad area. The current article offers a review of recent research in two main topic areas: the monitoring and control of reasoning (metareasoning) and the monitoring and control of learning (metalearning). Copyright © 2007, Association for the Advancement of Artificial Intelligence. All rights reserved.
Article
this article is to present the major results in each of these four directions. We begin with a discussion of the philosophical foundations, since these will provide a framework for the remainder of the article. This is followed by sections that describe (a) theoretical results, (b)
Article
Interest in the study of neural networks has grown remarkably in the last several years. This effort has been characterized in a variety of ways: as the study of brain-style computation, connectionist architectures, parallel distributed-processing systems, neuromorphic computation, artificial neural systems. The common theme to these efforts has been an interest in looking at the brain as a model of a parallel computational device very different from that of a traditional serial computer.