Article

An Inductive Inference Bibliography

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
Publications that have influenced the growth of artificial intelligence are often difficult to obtain. We first collected titles of several thousand publications from many well-known sources and then selected about 2,000 titles considered to be especially influential. We have identified, and in a few cases created, online versions of about half of these "classics in AI." Searchable text of the documents enables additional analysis of trends and influences. Integration into the rest of the AITopics information portal contextualizes the classic publications. Copyright © 2013, Association for the Advancement of Artificial Intelligence. Copyright © 2013, Association for the Advancement of Artificial Intelligence.
Article
Recent results in the theory of inductive inference are summarized. They concern deciphering of automata, language identification, prediction of functions, inference with additional information, strategies, functionals, index sets, characterization of identification types, uniform inference, and inference of nonrandom sequences. For proofs and further results in the field of inductive inference due to mathematicians of the German Democratic Republic a detailed bibliography is included.
Conference Paper
An algorithm is presented that is capable of performing inductive inference in finite algebraic structures. Given a set of functions defined by a partial list of their values, the task of the algorithm is to hypothesize a “reasonable model” and to infer from it the “missing” function values. The model is constructed stepwise in evolutionary fashion. A new method is developed in which the consistency of the evolving model is guaranteed at all times through the use of a special “normal representation” for the model. The algorithm also takes into account the notion of evidence by introducing a special “evidence measure”. Although consistent with the original function values, a model with insufficient “evidence measure” is rejected. An implementation of the algorithm for the special case of a single binary func tion was constructed.
Article
We survey methods for learning context-free languages (CFL's) in the theoretical computer science literature. We first present some important negative results. Then, we consider five types of methods: those that take text as input, those that take structural information as input, those that rely on CFL formalisms that are not based on context-free grammars, those which learn subclasses of CFL's, and stochastic methods. A description of the subclasses of CFL's considered is provided, as is an extensive bibliography. 1 Introduction One may arrive at a grammar by intuition, guess-work, all sorts of partial methodological hints, reliance on past experience, etc. It is no doubt possible to give an organized account of many useful procedures of analysis, but it is questionable whether these can be formulated rigorously, exhaustively and simply enough to qualify as a practical and mechanical discovery algorithm [for grammars]. [Cho57] The problem of grammatical inference is, i...
Article
Full-text available
Published as: K. M. Podnieks. Probabilistic program synthesis. In: Theory of Algorithms and Programs, Vol. 3, Latvia State University, 1977, pp. 57–88 (in Russian). [[[[[]]]]] The following model of inductive inference is considered. Arbitrary numbering tau = {tau-0, tau-1, tau-2, ...} of total functions N->N is fixed. A "black box" outputs the values f(0), f(1), ..., f(m), ... of some function f from the numbering tau. Processing these values by some algorithm (a strategy) F we try to identify a tau-index of f (i.e. a number n such that f = tau-n). Strategy F outputs an infinite sequence of hypotheses h-0, h-1, ..., h-m, .... If lim h-m = n and tau-n = f, we say that F identifies in the limit tau-index of f. The complexity of identification is measured by the number of mind changes, i.e. by F-tau(f) = card{m | h-m <> h-m+1 }. One can verify easily that for any numbering tau there exists a deterministic strategy F such that F-tau(tau-n) <= n for all n. This estimate is exact [Ba 74]. In the current paper the corresponding exact estimate ln n + o(log n) is proved for probabilistic strategies. [[[[[]]]]] English translation: K. Podnieks. INDUCTIVE INFERENCE OF FUNCTIONS BY PROBABILISTIC STRATEGIES, 1992.
Article
Full-text available
Original title: К.М.Подниекс. Сравнение различных типов предельного синтеза и прогнозирования функций (статья вторая). Ученые записки Латвийского государственного университета, 1974, том 233, стр. 35-44. Prediction: f(m+1) is guessed from given f(0), ..., f(m). Program synthesis: a program computing f is guessed from given f(0), ..., f(m). The hypotheses are required to be correct for all sufficiently large m, or with some positive frequency. These approaches yield a hierarchy of function prediction and program synthesis concepts. The comparison problem of the concepts is solved.
Article
Full-text available
Original title: К.М.Подниекс. Сравнение различных типов предельного синтеза и прогнозирования функций. Ученые записки Латвийского государственного университета, 1974, том 210, стр. 68-81. Prediction: f(m+1) is guessed from given f(0), ..., f(m). Program synthesis: a program computing f is guessed from given f(0), ..., f(m). The hypotheses are required to be correct for all sufficiently large m, or with some positive frequency. These approaches yield a hierarchy of function prediction and program synthesis concepts. The comparison problem of the concepts is solved.
Chapter
Full-text available
Recently there have been several studies of special classes of identifiable recursive function sets. The complexity of such classes is generally characterized in the literature in the theory of complexity classes of recursive functions. With respect to this question the present paper uses two additional view-points: (A) The sorting of index sets of identifiable recursive function sets in the arithmetical hierarchy, and (B) The sorting of the required functionals, to identify recursive function sets of a special type, in the arithmetical hierarchy of function sets. In this way the paper contributes to the subject of the limiting decision procedures (cf. Gold [2], Barzdin' [1]).
Article
Full-text available
Published as: K. M. Podnieks. Probabilistic synthesis of enumerated classes of functions. Dokl. Akad. Nauk SSSR, 1975, Vol. 223, N5, pp. 1071–1074 (in Russian), English translation in: Soviet Math. Dokl., 1975, Vol.16, N4, pp. 1042–1045. Proofs were published as: K. M. Podnieks. Probabilistic synthesis of programs. In: Theory of Algorithms and Programs, Vol. 3, Latvia State University, 1977, pp. 57–88 (in Russian). [[[[[]]]]] The following model of inductive inference is considered. Arbitrary numbering tau = {tau-0, tau-1, tau-2, ...} of total functions N->N is fixed. A "black box" outputs the values f(0), f(1), ..., f(m), ... of some function f from the numbering tau. Processing these values by some algorithm (a strategy) F we try to identify a tau-index of f (i.e. a number n such that f = tau-n). Strategy F outputs an infinite sequence of hypotheses h-0, h-1, ..., h-m, .... If lim h-m = n and tau-n = f, we say that F identifies in the limit tau-index of f. The complexity of identification is measured by the number of mind changes, i.e. by F-tau(f) = card{m | h-m <> h-m+1 }. One can verify easily that for any numbering tau there exists a deterministic strategy F such that F-tau(tau-n) <= n for all n. This estimate is exact [Ba 74]. In the current paper the corresponding exact estimate ln n + o(log n) is obtained for probabilistic strategies. [[[[[]]]]] English translation with proofs: K. Podnieks. INDUCTIVE INFERENCE OF FUNCTIONS BY PROBABILISTIC STRATEGIES, 1992.
Article
Full-text available
Paper published as: K. M. Podnieks. Computational complexity of prediction strategies. In: Theory of Algorithms and Programs, Vol. 3, Latvia State University, 1977, pp. 89–102 (in Russian). The value f(m+1) is predicted from given f(1), ..., f(m). For every enumeration T(n, x) there is a strategy that predicts the n-th function of T making no more than log2(n) errors (Barzdins-Freivalds). It is proved in the paper that such "optimal" strategies require 2^2^cm time to compute the m-th prediction (^ stands for exponentiation).
Article
Full-text available
The general problem of finding minimal programs realizing given “program descriptions” is considered, where program descriptions may be of finite or infinite length and may specify arbitrary program properties. The problem of finding minimal programs consistent with finite or infinite input-output lists is a special case (for infinite input-output lists, this is a variant of E. M. Gold's function identification problem). Although most program minimization problems are not recursively solvable, they are found to be no more difficult than the problem of deciding whether any given program realizes any given description, or the problem of enumerating programs in order of nondecreasing length (whichever is harder). This result is formulated in terms of k-limiting recursive predicates and functionals, defined by repeated application of Gold's limit operator. A simple consequence is that the program minimization problem is limiting recursively solvable for finite input-output lists and 2-limiting recursively solvable for infinite input-output lists, with weak assumptions about the measure of program size. Gold regarded limiting function identification (more generally, “black box” identification) as a model of inductive thought. Intuitively, iterated limiting identification might be regarded as higher-order inductive inference performed collectively by an ever-growing community of lower order inductive inference machines.
Thesis
This thesis is concerned with algorithms for generating generalisations-from experience. These algorithms are viewed as examples of the general concept of a hypothesis discovery system which, in its turn, is placed in a framework in which it is seen as one component in a multi-stage process which includes stages of hypothesis criticism or justification, data gathering and analysis and prediction. Formal and informal criteria, which should be satisfied by the discovered hypotheses are given. In particular, they should explain experience and be simple. The formal work uses the first-order predicate calculus. These criteria are applied to the case of hypotheses which are generalisations from experience. A formal definition of generalisation from experience, relative to a body of knowledge is developed and several syntactical simplicity measures are defined. This work uses many concepts taken from resolution theory (Robinson, 1965). We develop a set of formal criteria that must be satisfied by any hypothesis generated by an algorithm for producing generalisation from experience. The mathematics of generalisation is developed. In particular, in the case when there is no body of knowledge, it is shown that there is always a least general generalisation of any two clauses, in the generalisation ordering. (In resolution theory, a clause is an abbreviation for a disjunction of literals.) This least general generalisation is effectively obtainable. Some lattices induced by the generalisation ordering, in the case where there is no body of knowledge, are investigated. The formal set of criteria is investigated. It is shown that for a certain simplicity measure, and under the assumption that there is no body of knowledge, there always exist hypotheses which satisfy them. Generally, however, there is no algorithm which, given the sentences describing experience, will produce as output a hypothesis satisfying the formal criteria. These results persist for a wide range of other simplicity measures. However several useful cases for which algorithms are available are described, as are some general properties of the set of hypotheses which satisfy the criteria. Some connections with philosophy are discussed. It is shown that, with sufficiently large experience, in some cases, any hypothesis which satisfies the formal criteria is acceptable in the sense of Hintikka and Hilpinen (1966). The role of simplicity is further discussed. Some practical difficulties which arise because of Goodman's (1965) "grue" paradox of confirmation theory are presented. A variant of the formal criteria suggested by the work of Meltzer (1970) is discussed. This allows an effective method to be developed when this was not possible before. However, the possibility is countenanced that inconsistent hypotheses might be proposed by the discovery algorithm. The positive results on the existence of hypotheses satisfying the formal criteria are extended to include some simple types of knowledge. It is shown that they cannot be extended much further without changing the underlying simplicity ordering. A program which implements one of the decidable cases is described. It is used to find definitions in the game of noughts and crosses and in family relationships. An abstract study is made of the progression of hypothesis discovery methods through time. Some possible and some impossible behaviours of such methods are demonstrated. This work is an extension of that of Gold (1967) and Feldman (1970). The results are applied to the case of machines that discover generalisations. They are found to be markedly sensitive to the underlying simplicity ordering employed.
Article
Nature is to us like an infinite ballot box, the contents of which are being continually drawn, ball after ball, and exhibited to us. Science is but the careful observation of the succession in which balls of various character present themselves ([12], p. 150). The project of formulating an account of scientific inference in terms of concepts drawn from probability theory, and based on the programmatic belief that inductive logic (that is, the theory of the principles of inductive reasoning) is the same as probability logic, is one usually associated in our own day with the names of such twentieth-century philosophers of science as John Maynard Keynes, Hans Reichenbach and Rudolf Carnap ([13], [27], [3]).
Conference Paper
Recent results in induction theory are reviewed that demonstrate the general adequacy of the induction system of Solomoncff and Willis. Several problems in pattern recognition and A.I. are investigated through these methods. The theory is used to obtain the a priori probabilities that are necessary in the application cf stochastic languages to pattern recognition. A simple, quantitative solution is presented for part of Winston's problem of learning structural descriptions from exandples. In contrast to work in non-probabilistic prediction, the present methods give probability values that can be used with decision. theory to make critical decisions.
Article
Any computable function \gf may be viewed as a “generalization” of a finite function. Specifically, there is a “sample” (finite subset) of \gf such that every minimal program for the sample is a program for \gf. Like the representation of a function by a program, its representation by a sample is machine dependent. However, relative to any finite number of machines on which \gf is programmable, there is a sample of \gf which represents \gf for each of the machines. On the other hand, given a representative sample of \gf, the values of \gf for arguments in its domain can only be found in the limit in general. If it is known that some program length b suffices for a function \gf, then an upper bound can be effectively inferred from b on the cardinality of any representative sample of \gf which does not contain redundant elements. Conversely, a bound on representative sample “size” (interpreted not as cardinality, but as a finite-one function of finite functions) effectively supplies a bound on requisite program length. Apart from these general considerations, certain detailed relationships between representative samples and minimal programs are also developed. For example, it is shown that any decision function with a representative sample of l elements can be programmed with no more than (4 + [log2m])(l − 1) + c bits, where the largest argument appearing in the representative sample is m bits long. Such bounds can be interpreted as bounds on the information-theoretic complexity of the representative samples (finite functions) concerned.
Article
Machines would be more useful if they could learn to perform tasks for which they were not given precise methods. Difficulties that attend giving a machine this ability are discussed. It is proposed that the program of a stored-program computer be gradually improved by a learning procedure which tries many programs and chooses, from the instructions that may occupy a given location, the one most often associated with a successful result. An experimental test of this principle is described in detail. Preliminary results, which show limited success, are reported and interpreted. Further results and conclusions will appear in the second part of the paper.
Article
This chapter discusses the applications of variable-valued logic to pattern recognition and machine learning. Most of the work done on multiple-valued logic in relation to computer science is oriented toward providing a theoretical and practical basis for constructing non-binary computer and switching systems. Few attempts have been made to investigate other applications of multiple-valued logic. Among these few, there were attempts to apply multiple-valued logic to programming languages or neural modelling. The chapter presents some results on the application of an extended form of multi-valued logic called variable-valued logic to pattern recognition and artificial intelligence. The concept of a variable-valued logic system (VLS) extends known multiple-valued logic systems (MLS) in two directions. Firstly, it permits the propositions and variables in the propositions to take values from different domains, which can vary in the kind and number of elements and in the structure relating the elements. Secondly, it generalizes some of the traditionally used operators and adds new operators that are “most orthogonal” to the previous ones.
Article
In Part I, four ostensibly different theoretical models of induction are presented, in which the problem dealt with is the extrapolation of a very long sequence of symbols—presumably containing all of the information to be used in the induction. Almost all, if not all problems in induction can be put in this form. Some strong heuristic arguments have been obtained for the equivalence of the last three models. One of these models is equivalent to a Bayes formulation, in which a priori probabilities are assigned to sequences of symbols on the basis of the lengths of inputs to a universal Turing machine that are required to produce the sequence of interest as output. Though it seems likely, it is not certain whether the first of the four models is equivalent to the other three. Few rigorous results are presented. Informal investigations are made of the properties of these models. There are discussions of their consistency and meaningfulness, of their degree of independence of the exact nature of the Turing machine used, and of the accuracy of their predictions in comparison to those of other induction methods. In Part II these models are applied to the solution of three problems—prediction of the Bernoulli sequence, extrapolation of a certain kind of Markov chain, and the use of phrase structure grammars for induction. Though some approximations are used, the first of these problems is treated most rigorously. The result is Laplace's rule of succession. The solution to the second problem uses less certain approximations, but the properties of the solution that are discussed, are fairly independent of these approximations. The third application, using phrase structure grammars, is least exact of the three. First a formal solution is presented. Though it appears to have certain deficiencies, it is hoped that presentation of this admittedly inadequate model will suggest acceptable improvements in it. This formal solution is then applied in an approximate way to the determination of the “optimum” phrase structure grammar for a given set of strings. The results that are obtained are plausible, but subject to the uncertainties of the approximation used.
Article
Both in designing a new programming language and in extending an existing language, the designer is faced with the problem of deriving a “natural” grammar for the language. We are proposing an interactive approach to the grammar design problem wherein the designer presents a sample of sentences and structures as input to a grammatical inference algorithm. The algorithm then constructs a grammar which is a reasonable generalization of the examples submitted by the designer. The implemention is presently restricted to a subclass of operator precedence grammars, but a second algorithm is outlined which applies to a larger class of context-free grammars.
Article
A new method of estimating the entropy and redundancy of a language is described. This method exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known. Results of experiments in prediction are given, and some properties of an ideal predictor are developed.
Chapter
This chapter presents a survey of results in grammatical inference. The grammatical inference problem can be described as follows: a finite set of symbol strings from some language L and possibly a finite set of strings from the complement of L are known, and a grammar for the language is to be discovered. Any attempt to formalize the grammatical inference problem must include precise formulations of several concepts. The four central notions are: (1) the hypothesis space, (2) the measure of adequacy, (3) the rules by which the samples are drawn, and (4) the criterion for success in the limit of the inference process. The chapter discusses the intermediate behavior of inference algorithms and criteria for choosing a grammar on the basis of a finite amount of information along with a number of methods which have been developed for inferring grammars and evaluate some of their properties. In the typical inference situation, the grammars that are of interest will tend to be more complex if they are nearer the right end of the spectrum and so the decision as to which grammar to choose rests on the questions of how tight a fit is required and how much complexity can be tolerated.
Chapter
Without Abstract
Article
The Nerode realization technique for synthesizing finite-state machines from their associated right-invariant equivalence relations is modified to give a method for synthesizing machines from finite subsets of their input¿output behavior. The synthesis procedure includes a parameter that one may adjust to obtain machines that represent the desired behavior with varying degrees of accuracy and that consequently have varying complexities. We discuss some of the uses of the method, including an application to a sequential learning problem.
Article
The determination of pattern recognition rules is viewed as a problem of inductive inference, guided by generalization rules, which control the generalization process, and problem knowledge rules, which represent the underlying semantics relevant to the recognition problem under consideration. The paper formulates the theoretical framework and a method for inferring general and optimal (according to certain criteria) descriptions of object classes from examples of classification or partial descriptions. The language for expressing the class descriptions and the guidance rules is an extension of the first-order predicate calculus, called variable-valued logic calculus VL21. VL21 involves typed variables and contains several new operators especially suited for conducting inductive inference, such as selector, internal disjunction, internal conjunction, exception, and generalization. Important aspects of the theory include: 1) a formulation of several kinds of generalization rules; 2) an ability to uniformly and adequately handle descriptors (i.e., variables, functions, and predicates) of different type (nominal, linear, and structured) and of different arity (i.e., different number of arguments); 3) an ability to generate new descriptors, which are derived from the initial descriptors through a rule-based system (i.e., an ability to conduct the so called constructive induction); 4) an ability to use the semantics underlying the problem under consideration. An experimental computer implementation of the method is briefly described and illustrated by an example.