ArticlePDF Available

Learning the structure of a mathematical group

Authors:

Abstract and Figures

A mathematical group is a set of operations that satisfies the properties of identity, inverse, associativity, and closure. Understanding of group structure can be assessed by changing elements and operations over different versions of the same underlying group. Participants learned this structure more quickly over four different, successive versions of a subset of the Klein 4-group, suggesting some understanding of the group structure (Halford, Bain, Mayberry, & Andrews, 1998). Because an artificial neural network learning the task failed to improve, it was argued that such models are incapable of learning abstract group structures (Phillips & Halford, 1997). Here we show that an improved neural model that adheres more closely to the task used with humans does speed its learning over changing versions of the task, showing that neural networks are capable of learning and generalizing abstract structure.
Content may be subject to copyright.
A preview of the PDF is not available
... Some authors have argued that ordinary artificial neural networks are inherently unable to learn systematically structured representations such as mathematical groups (Phillips, & Halford, 1997; Marcus, 1998). However, a first step has recently been made to invalidate this claim, because it has been demonstrated that neural networks can learn particular mathematical groups, like the Klein 4-group, in a fashion that simulates learning by humans (Jamrozik & Shultz, 2007). We now extend the task to learning the abstract structure of small finite groups with three and four elements and we compare knowledge-based learning with knowledge-free learning. ...
... We make no claim that these neural networks understand groups in the same explicit fashion as mathematical specialists do. But these algorithms could well serve to model how ordinary people acquire complex, abstract, highly-structured knowledge (Jamrozik & Shultz, 2007; Egri & Shultz, 2006). Modeling expert explicit mathematical knowledge would require in addition that this implicit learned knowledge be converted into an axiomatic characterization that expresses the three group properties (i)–(iii) listed in our earlier section on The Abstract Group Structure, to allow for a concise representation of both finite and infinite groups (Schlimm, 2008). ...
Article
Full-text available
It has recently been shown that neural networks can learn particular mathematical groups, for example, the Klein 4-group (Jamrozik & Shultz, 2007). However, there are groups with any number of elements, all of which are said to instantiate the abstract group structure. Learning to differentiate groups from other structures that are not groups is a very difficult task. Contrary to some views, we show that neural networks can learn to recognize finite groups consisting of up to 4 elements. We present this problem as a case study that exhibits the advantages of knowledge-based learning over knowledge-free learning. In addition, we also show the surprising result that the way in which the KBCC algorithm recruits previous knowledge reflects some deep structural properties of the patterns that are learned, namely, the structure of the subgroups of a given group.
Article
Cascade-correlation learning is used to model pronoun acquisition in children. The cascade-correlation algorithm is a feed-forward neural network that builds its own topology from input and output units. Personal pronoun acquisition is an interesting non-linear problem in psychology. A mother will refer to her son as you and herself as me, but the son must infer for himself that when he speaks to his mother, she becomes you and he becomes me. Learning the shifting reference of these pronouns is a difficult task that most children master. We show that learning of two different noun-and-pronoun addressee patterns is consistent with naturalistic studies. We observe a surprising factor in pronoun reversal: increasing the amount of exposure to noun patterns can decrease or eliminate reversal errors in children.
Article
Full-text available
A set of hypotheses is formulated for a connectionist approach to cognitive modeling. These hypotheses are shown to be incompatible with the hypotheses underlying traditional cognitive models. The connectionist models considered are massively parallel numerical computational systems that are a kind of continuous dynamical system. The numerical variables in the system correspond semantically to fine-grained features below the level of the concepts consciously used to describe the task domain. The level of analysis is intermediate between those of symbolic cognitive models and neural models. The explanations of behavior provided are like those traditional in the physical sciences, unlike the explanations provided by symbolic models. Higher-level analyses of these connectionist models reveal subtle relations to symbolic models. Parallel connectionist memory and linguistic processes are hypothesized to give rise to processes that are describable at a higher level as sequential rule application. At the lower level, computation has the character of massively parallel satisfaction of soft numerical constraints; at the higher level, this can lead to competence characterizable by hard rules. Performance will typically deviate from this competence since behavior is achieved not by interpreting hard rules but by satisfying soft constraints. The result is a picture in which traditional and connectionist theoretical constructs collaborate intimately to provide an understanding of cognition.
Article
Full-text available
At root, the systematicity debate over classical versus connectionist explanations for cognitive architecture turns on quantifying the degree to which human cognition is systematic. We introduce into the debate recent psychological data that provides strong support for the purely structure-based generalizations claimed by Fodor and Pylyshyn (1988). We then show, via simulation, that two widely used connectionist models (feedforward and simple recurrent networks) do not capture the same degree of generalization as human subjects. However, we show that this limitation is overcome by tensor networks that support relational processing.
Article
Full-text available
Humans routinely generalize universal relationships to unfamiliar instances. If we are told "if glork then frum," and "glork," we can infer "frum"; any name that serves as the subject of a sentence can appear as the object of a sentence. These universals are pervasive in language and reasoning. One account of how they are generalized holds that humans possess mechanisms that manipulate symbols and variables; an alternative account holds that symbol-manipulation can be eliminated from scientific theories in favor of descriptions couched in terms of networks of interconnected nodes. Can these "eliminative" connectionist models offer a genuine alternative? This article shows that eliminative connectionist models cannot account for how we extend universals to arbitrary items. The argument runs as follows. First, if these models, as currently conceived, were to extend universals to arbitrary instances, they would have to generalize outside the space of training examples. Next, it is shown that the class of eliminative connectionist models that is currently popular cannot learn to extend universals outside the training space. This limitation might be avoided through the use of an architecture that implements symbol manipulation.
Article
Full-text available
The non-linear complexities of neural networks make network solutions difficult to understand. Sanger's contribution analysis is here extended to the analysis of networks automatically generated by the cascadecorrelation learning algorithm. Because such networks have cross connections that supersede hidden layers, standard analyses of hidden unit activation patterns are insufficient. A contribution is defined as the product of an output weight and the associated activation on the sending unit, whether that sending unit is an input or a hidden unit, multiplied by the sign of the output target for the current input pattern. Intercorrelations among contributions, as gleaned from the matrix of contributions x input patterns, can be subjected to principal components analysis (PCA) to extract the main features of variation in the contributions. Such an analysis is applied to three problems, continuous XOR, arithmetic comparison, and distinguishing between two interlocking spirals. In all thr...
Article
The Cascade-Correlation learning algorithm constructs a multi-layer artificial neural network as it learns to perform a given task. The resulting network's size and topology are chosen specifically for this task. In the resulting "cascade" networks, each new hidden unit receives incoming connections from all input and pre-existing hidden units. In effect, each new unit adds a new layer to the network. This allows Cascade- Correlation to create complex feature detectors, but it typically results in a network that is deeper, in terms of the longest path from input to output, than is necessary to solve the problem efficiently. In this paper we investigate a simple variation of Cascade-Correlation that will build deep nets if necessary, but that is biased toward minimizing network depth. We demonstrate empirically, across a range of problems, that this simple technique can reduce network depth, often dramatically. However, we show that this tech- nique does not, in general, reduce the total number of weights or improve the generalization ability of the resulting networks.
Article
Five experiments were performed to test whether participants induced a coherent representation of the structure of a task, called a relational schema, from specific instances. Properties of a relational schema include: An explicit symbol for a relation, a binding that preserves the truth of a relation, potential for higher-order relations, omnidirectional access, potential for transfer between isomorphs, and ability to predict unseen items in isomorphic problems. However relational schemas are not necessarily coded in abstract form. Predictions from relational schema theory were contrasted with predictions from configural learning and other nonstructural theories in five experiments in which participants were taught a structure comprised of a set of initial-state,operator-->end-state instances. The initial-state,operator pairs were presented and participants had to predict the correct end-state. Induction of a relational schema was achieved efficiently by adult participants as indicated by ability to predict items of a new isomorphic problem. The relational schemas induced showed the omnidirectional access property, there was efficient transfer to isomorphs, and structural coherence had a powerful effect on learning. The "learning to learn" effect traditionally associated with the learning set literature was observed, and the long-standing enigma of learning set acquisition is explained by a model composed of relational schema induction and structure mapping. Performance was better after reversal of operators than after shift to an alternate structure, even though the latter entailed more overlap with previously learned tasks in terms of the number of configural associations that were preserved. An explanation for the reversal shift phenomenon in terms of induction and mapping of a relational schema is proposed. The five experiments provided evidence supporting predictions from relational schema theory, and no evidence was found for configural or nonstructural learning theories.
Article
To give an adequate explanation of cognition and perform certain practical tasks, connectionist systems must be able to extrapolate. This work explores the relationship between input representation and extrapolation, using simulations of multilayer perceptrons trained to model the identity function. It has been discovered that representation has a marked effect on extrapolation.
Article
A fundamental issue in cognitive science is whether human cognitive processing is better explained by symbolic rules or by subsymbolic neural networks. A recent study of infant familiarization to sentences in an artificial language seems to have produced data that can only be explained by symbolic rule learning and not by unstructured neural networks (Marcus, Vijayan, Bandi Rao, & Vishton, 1999). Here we present successful unstructured neural network simulations of the infant data, showing that these data do not uniquely support a rule-based account. In contrast to other simulations of these data, these simulations cover more aspects of the data with fewer assumptions about prior knowledge and training, using a more realistic coding scheme based on sonority of phonemes. The networks show exponential decreases in attention to a repeated sentence pattern, more recovery to novel sentences inconsistent with the familiar pattern than to novel sentences consistent with the familiar pattern, occasional familiarity preferences, more recovery to consistent novel sentences than to familiarized sentences, and extrapolative generalization outside the range of the training patterns. A variety of predictions suggest the utility of the model in guiding future psychological work. The evidence, from these and other simulations, supports the view that unstructured neural networks can account for the existing infant data.