PosterPDF Available

Adaptive Resonance Theory as a computational model of learning inflection classes

Authors:

Abstract

In this study, we investigate how humans use generalisation to learn verb forms for different grammatical persons and verbs and which role inflection classes play in this process. Inspired by recent advances in computational modelling of morphological processing (Elsner et al., 2019), we propose the task of unsupervised inflection class clustering: a computational model has to cluster verb forms of different paradigm cells from different lemmas together into inflection classes. As a model, we use Adaptive Resonance Theory (ART) (Carpenter & Grossberg, 1987), a cognitively inspired neural network architecture in which generalisation plays a central role, represented by the vigilance parameter. The network consists of an input layer, where new stimuli come in and a recognition layer, which represents learned categories. We performed experiments on a dataset of phonetic Latin verb forms (Beniamine et al., 2020), which suggest that our ART model is able to learn a system of inflection classes and it is possible to analyse the ‘critical feature patterns’ of the ART model, which explain which features in the input data a learned category attends to (Grossberg, 2020).
0.0 0.2 0.4 0.6 0.8 1.0
vigilance
0.0
0.2
0.4
0.6
0.8
Adjust ed Rand Index
2-g ram -set
K-m eans Baseline
3-g ram -set
2-g ram -con cat
3-g ram -con cat
Adaptive Resonance Theory as a computational model of learning inection classes
Peter Dekker, Heikki Rasilo & Bart de Boer AI Lab, Vrije Universiteit Brussel
peter.dekker@ai.vub.ac.be
How do humans use generalisation in production of verb
morphology?
Which role do inection classes play in this process?
Recent computer models of morphological processing
Mostly generation of inected forms (Elsner et al., 2019; Kodner et
al., 2022)
Some work on clustering inection classes: supervised (Guzman
Naranjo 2019, 2020) and unsupervised approaches (Beniamine et al.,
2018; Lefevre et al., 2021)
This study: Can Adaptive Resonance Theory learn a system
of inection classes?
Which features does the model attend to?
Task: Unsupervised inection class clustering
Cluster verb paradigms (1 datapoint = all forms for one verb) into
inection classes
Adaptive Resonance Theory
(Carpenter & Grossberg 1987)
Cognitively inspired neural network of category learning
Input layer (new stimuli) and perception layer (learned
categories)
Vigilance parameter: more or less generalisation
Explainability via critical feature patterns (Grossberg, 2020)
This study: ART1 clustering model
Carpenter & Grossberg (2002)
Data
Romance Verbal Inection Dataset (Beniamine et al.,
2020)
Phonetic word forms with inection classes for evaluation.
Our study: Latin present tense
Representation of n-grams (n=2/3): form concat and
paradigm set
References
Future work
Experiment with ordering of data
Study language change using agent-based model equipped
with ART network (cf. Round et al. 2022; Parker et al., 2018;
Hare & Elman, 1995; Cotterell et al., 2018).
staːre
domaːre
teneːre
caleːre
amaːre timeːre
sapere
trahere
skriːbere
dormiːre
niːre sentiːre
esse
iːre
posse
III III IV special
1SG
2SG
3SG
1PL
2PL
3PL
stoː
staːs
stat
staːmus
staːtis
staːs
staːre
teneoː
teneːs
tenet
teneːmus
teneːtis
tenent
teneːre
sapioː
sapis
sapit
sapimus
sapitis
sapiunt
sapere
dormioː
dormiːs
dormit
dormiːmus
dormiːtis
dormiunt
dormiːre
sum
es
est
sumus
estis
sunt
esse
Paradigm set: presence of n-
gram in all forms together
Form concat: n-grams represented separately for
each form and concatenated
do ... oː
1 0 0 0 1
su um es ... st mu us ... ti is un nt
1 1 1 0 1 1 1 0 1 1 1 1
do ... ːs
1 0 0 0 1
do ... it
1 0 0 0 1
do ... us
1 0 0 0 1
do ... is
1 0 0 0 1
do ... nt
1 0 0 0 1
Analysis of clusters (model: 3-gram and paradigm set representation)
Bar: Cluster
Colour: real inection class of assigned datapoints per cluster
Text in bar: learned n-gram features (distinctive features in bold)
Clustering similarity to real inection classes (Adjusted Rand Index).
Different representations (2/3-gram, set and concat) for different vigilance values
Conclusion
ART learns system of inection classes and learned n-grams
can be interpreted using critical feature patterns
Trigrams and set representation for moderate vigilance give
best results
Results
Beniamine, S., Bonami, O., & Sagot, B. (2018). Inferring Inection Classes with Description Length. Journal of Language Modelling, 5(3), 465. https://doi.org/10.15398/jlm.v5i3.184
Beniamine, S., Maiden, M., & Round, E. (2020). Opening the Romance Verbal Inection Dataset 2.0: A CLDF lexicon. Proceedings of the 12th Language Resources and Evaluation Conference, 3027–3035.
Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37(1), 54–115. https://doi.org/10.1016/S0734-189X(87)80014-2
Carpenter, G. A., & Grossberg, S. (2002). ADAPTIVE RESONANCE THEORY. 12.
Cotterell, R., Kirov, C., Hulden, M., & Eisner, J. (2018). On the Diachronic Stability of Irregularity in Inectional Morphology. ArXiv:1804.08262 [Cs]. http://arxiv.org/abs/1804.08262
Elsner, M., Sims, A. D., Erdmann, A., ... Stevens-Guille, S. (2019). Modeling morphological learning, typology, and change: What can the neural sequence-to-sequence framework contribute? Journal of Language Modelling, 7(1), 53. https://doi.org/10.15398/jlm.v7i1.244
Grossberg, S. (2020). A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action. Frontiers in Neurorobotics, 14. https://www.frontiersin.org/article/10.3389/fnbot.2020.00036
Guzmán Naranjo, M. (2019). Analogical classication in formal grammar. Zenodo. https://doi.org/10.5281/ZENODO.3191825
GuzmánNaranjo, M. (2020). Analogy, complexity and predictability in the Russian nominal inection system. Morphology, 30(3), 219–262. https://doi.org/10.1007/s11525-020-09367-1
Hare, M., & Elman, J. L. (1995). Learning and morphological change. Cognition, 56(1), 61–98. https://doi.org/10.1016/0010-0277(94)00655-5
Kodner, J., Khalifa, S., Batsuren, K., … Vylomova, E. (2022). SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inection. Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 176–203.
LeFevre, G., Elsner, M., & Sims, A. D. (2021). Formalizing Inectional Paradigm Shape with Information Theory. https://doi.org/10.7275/JZ7Z-J842
Parker, J., Reynolds, R., & Sims, A. D. (2018). A Bayesian Investigation of Factors Shaping the Network Structure of Inection Class Systems. Proceedings of the Society for Computation in Linguistics, 3.
Round, E., Mann, S., Beniamine, S., Lindsay-Smith, E., Esher, L., & Spike, M. (2022). COGNITION AND THE STABILITY OF EVOLVING COMPLEX MORPHOLOGY: AN AGENT-BASED MODEL. Joint Conference on Language Evolution.
Digital version of poster:
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Paradigm shape, our term for the morphological structure formed by implicative relations between inflected forms, has not been formally quantified in a gradient manner. We develop a method to formalize paradigm shape by modeling the joint effect of stem alternations and affixes. Applied to Spanish verbs, our model successfully captures aspects of both allomorphic and distributional classes. These results are replicable and extendable to other languages.
Article
Full-text available
The Paradigm Cell Filling Problem (pcfp): “What licenses reliable inferences about the inflected (and derived) surface forms of a lexical item?”Ackerman et al. (2009, p. 54) has received considerable attention during the last decade. The two main approaches that have been explored are the Information Theoretic approach which aims to measure the information contained in the implicative relations between cells of a paradigm; and the neural network approach, which takes an amorphous view of morphology and tries learn paradigms form surface forms. In this paper I present a third alternative based on analogical classification which tries to integrate elements from both approaches. I will present a case study on the Russian nominal inflection system, and will argue that implicative relations between markers, noun semantics and stem phonology all play a role in helping speakers solve the pcfp.
Article
Full-text available
Biological neural network models whereby brains make minds help to understand autonomous adaptive intelligence. This article summarizes why the dynamics and emergent properties of such models for perception, cognition, emotion, and action are explainable, and thus amenable to being confidently implemented in large-scale applications. Key to their explainability is how these models combine fast activations, or short-term memory (STM) traces, and learned weights, or long-term memory (LTM) traces. Visual and auditory perceptual models have explainable conscious STM representations of visual surfaces and auditory streams in surface-shroud resonances and stream-shroud resonances, respectively. Deep Learning is often used to classify data. However, Deep Learning can experience catastrophic forgetting: At any stage of learning, an unpredictable part of its memory can collapse. Even if it makes some accurate classifications, they are not explainable and thus cannot be used with confidence. Deep Learning shares these problems with the back propagation algorithm, whose computational problems due to non-local weight transport during mismatch learning were described in the 1980s. Deep Learning became popular after very fast computers and huge online databases became available that enabled new applications despite these problems. Adaptive Resonance Theory, or ART, algorithms overcome the computational problems of back propagation and Deep Learning. ART is a self-organizing production system that incrementally learns, using arbitrary combinations of unsupervised and supervised learning and only locally computable quantities, to rapidly classify large non-stationary databases without experiencing catastrophic forgetting. ART classifications and predictions are explainable using the attended critical feature patterns in STM on which they build. The LTM adaptive weights of the fuzzy ARTMAP algorithm induce fuzzy IF-THEN rules that explain what feature combinations predict successful outcomes. ART has been successfully used in multiple large-scale real world applications, including remote sensing, medical database prediction, and social media data clustering. Also explainable are the MOTIVATOR model of reinforcement learning and cognitive-emotional interactions, and the VITE, DIRECT, DIVA, and SOVEREIGN models for reaching, speech production, spatial navigation, and autonomous adaptive intelligence. These biological models exemplify complementary computing, and use local laws for match learning and mismatch learning that avoid the problems of Deep Learning.
Article
Full-text available
We survey research using neural sequence-to-sequence models as compu- tational models of morphological learning and learnability. We discuss their use in determining the predictability of inflectional exponents, in making predictions about language acquisition and in modeling language change. Finally, we make some proposals for future work in these areas.
Article
Full-text available
We discuss the notion of an inflection class system, a traditional ingredient of the description of inflection systems of nontrivial complexity. We distinguish systems of microclasses, which partition a set of lexemes in classes with identical behavior, and systems of macroclasses, which group lexemes that are similar enough in a few larger classes. On the basis of the intuition that macroclasses should contribute to a concise description of the system, we propose one algorithmic method for inferring macroclasses from raw inflectional paradigms, based on minimisation of the description length of the system under a given strategy for identifying morphological alternations in paradigms. We then exhibit classifications produced by our implementation on French and European Portuguese conjugation data, and argue that they constitute an appropriate systematisation of traditional classifications. To arrive at such a concincing systematisation, it is crucial though that we use a local approach to class similarity (based on pairwise comparisons of paradigm cells) rather than a global approach (based on simultaneous comparison of all cells). We conclude that it is indeed possible to infer inflectional macroclasses objectively.
Article
Full-text available
Substantial progress has been made in several research area. For example, a new class of neural networks has been developed which are defined by high-dimensional nonlinear dynamics systems that operate at multiple time scales. They are designed to carry out fast, stable autonomous learning of recognition codes and multidimensional maps in response to arbitrary sequences of input patterns. The new neural networks architecture, called ARTMAP, autonomously learns to classify many, arbitrarily ordered vectors into recognition categories based on predictive success. In other research, these investigators developed a new model of temporal prediction that is based upon analysis of how animals and humans learn to time their actions to achieve desired goals. Research was also conducted on the neural dynamics of speech filtering and segmentations, measurement theory, and temporal predictions reinforcement learning, and autonomous credit assignment.
Article
Many languages' inflectional morphological systems are replete with irregulars, i.e., words that do not seem to follow standard inflectional rules. In this work, we quantitatively investigate the conditions under which irregulars can survive in a language over the course of time. Using recurrent neural networks to simulate language learners, we test the diachronic relation between frequency of words and their irregularity.
Article
An account is offered to change over time in English verb morphology, based on a connectionist approach to how morphological knowledge is acquired and used. A technique is first described that was developed for modeling historical change in connectionist networks, and that technique is applied to model English verb inflection as it developed from the highly complex past tense system of Old English towards that of the modern language, with one predominant "regular" inflection and a small number of irregular forms. The model relies on the fact that certain input-output mappings are easier than others to learn in a connectionist network. Highly frequent patterns, or those that share phonological regularities with a number of others, are learned more quickly and with lower error than low-frequency, highly irregular patterns. A network is taught a data set representative of the verb classes of Old English, but learning is stopped before reaching asymptote, and the output of this network is used as the teacher of a new net. As a result, the errors in the first network were passed on to become part of the data set of the second. Those patterns that are hardest to learn led to the most errors, and over time are "regularized" to fit a more dominant pattern. The results of the networks simulations were highly consistent with the major historical developments. These results are predicted from well-understood aspects of network dynamics, which therefore provide a rationale for the shape of the attested changes.