ThesisPDF Available

MSc thesis: Reconstructing language ancestry by performing word prediction with neural networks

Authors:

Abstract

In recent years, computational methods have led to new discoveries in the field of historical linguistics. In my thesis, I applied the machine learning paradigm, succesful in many computing tasks, to historical linguistics. I proposed the task of word prediction: by training a machine learning model on pairs of words in two languages, it learns the sound correspondences between the two languages and should be able to predict unseen words. I used two neural network models, a recurrent neural network (RNN) encoder-decoder and a structured perceptron, to perform this task. I have shown that, by performing the task of word prediction, results for multiple tasks in historical linguistics can be obtained, such as phylogenetic tree reconstruction, identification of sound correspondences and cognate detection. On top of this, I showed that the task of word prediction can be extended to phylogenetic word prediction, in which information is shared between language pairs, based on the assumed structure of the ancestry tree. This task could be used for protoform reconstruction and could in the future lead to the direct reconstruction of the optimal tree at prediction time.
  
   
 
Reconstructing language ancestry by performing
word prediction with neural networks

 

    
 
    
Supervisors:
  
   
Assessor:
  
         
      
Contents
1 Introduction 5
 
                  
 
                       
                             
                                 
                                
 
  
 
 
 
2 Method 13
 
 
 
 
 
 
                              
                            
 
 
3 Results 25
  
                                   
                                
 
 
4 Context vector analysis 33
                           
 
 
 
CONTENTS
 
 
5 Phylogenetic word prediction 39
                                   
 
 
                            
 
 
 
 
 
 
6 Conclusion and discussion 45
7 Appendix 47
Chapter 1
Introduction
                 
       historical linguistics    
             
  machine learning          
               
            
              
           historical linguistics 
          natural language processing   
 word prediction         
1.1 Historical linguistics
1.1.1 Historical linguistics: the comparative method and beyond
             
              
           
                 
         
             
       
               
            
              
            
   cognates       
             
    
          protolanguage  protosounds
    protoforms    
CHAPTER 1. INTRODUCTION
            
            B   
 A  B     A
            

                 
                
              
              
              mass lexical comparison
             
               
            
         
                
        
1.1.2 Sound changes
             
      regular               
       Neogrammarian hypothesis of the regularity of sound change 
              
               
             
           phonemic changesloss of
segments  insertion/movement of segments          
                 
                
              
Phonemic changes
Phonemic changes            
                  
          mergers  splits  merger 
               
   split            
                
         assimilation  vowel changes
Assimilation (regular)            
        nokte    noe/k/   /t/ 
   /t/
            Umlaut   
                    
        gast   gestiz/a/   /e/  
1.1. HISTORICAL LINGUISTICS
   /i/   /e/  /i/           
  GastGäste
Lenition (regular)            
    voicing voiceless  voiced  degemination geminate    
  simplex    nasalization non-nasal  nasal
         strata    strada   /t/ 
   /d/           gua  
 gota
               
  regāle   real
Vowel changes (regular)            
         lenition    
                
  loweringfrontingrounding       dut     
dʏt   /u/    /ʏ/Coalescence       
   Compensatory lengthening            
     bɛst    bɛ:t
Loss of segments
                 
     
Loss (regular)        Aphearesis       
     k  knee          apocope     
      syncope    chocolate
Haplology (sporadic)         haplology   
         sagar   ardo  
sagardo    ar 
Insertion/movement of segments
                
Insertion (regular)              
      prothesis   scala    escala      
   epenthesis       poclum   poculum 
          excrescence    amonges   amongst
Metathesis (sporadic) Metathesis             
   wæps   wasp   parabola    palabra
CHAPTER 1. INTRODUCTION
1.1.3 Computational methods in historical linguistics
             
              
            
               
             
              
             
      
            
         genotypic   
   phenotypic  Genotypic       
 regular sound correspondences        Phenotypic 
                 
              
  
               
                 
Cognate detection
 cognate detection            
           n  
             
                 
               
             
              
              
              
             
                   
                 
              
             
Sound correspondence detection
Sound correspondence detection         
                
             
               
                
                
               
               
    
1.1. HISTORICAL LINGUISTICS
Protoform reconstruction
 protoform reconstruction           
           
           
                
          
          
 functional load hypothesis            
      
Phylogenetic tree reconstruction
              
                
              
                 
  maximum parsimony  likelihood-based   
Distance-based methods           
              
               
molecular clock            
           Q     
                  
               
              
              
               
           
             
                
          greedy minimum evolution  
   
Character-based methods           
              
 maximum parsimony methods  likelihood-based methods     
                 
           long branch aention  
              
      likelihood          
             
           
                 
           
             
 CHAPTER 1. INTRODUCTION
1.2 Developments in natural language processing
              
             natural language processing 
                
         
1.2.1 Natural language processing
   natural language processing         
          
             
               
                
             
            
             
             
            
1.2.2 Machine learning and language
Machine learning             
                
              
 training examples (x, y)          
  test examples x    y          
y                  

              
               
                
                  
     
             
              
                 
             
              
             
            
1.2.3 Deep neural networks
Neural networks              
              
                
            
                 
1.3. WORD PREDICTION 
               deep
learning               
    representation learning       feature
engineering               
                  
                
               
                 
               
             
               
               
             
               
             
      encoder-decoder     
              
          
1.3 Word prediction
1.3.1 Word prediction
                 
               
               
word prediction           
                   
                  
            
                 
           pairwise word prediction  
                
            
    phylogenetic word prediction        

               
           reconstruction of phylogenetic
trees              
     sound correspondences      
              
    cognate detection         
                
               
              
              
                
  
 CHAPTER 1. INTRODUCTION
             
               
                 
              
              
               
1.3.2 Model desiderata
                 
                  wd,B  
B   wd,A   A          wd,B 
wd,B              
  
             
               
                
              
                
       
           semantic shi   
 c  A B          
             c   
 d      d          c
                   
 cross-concept cognate pairs             

1.4 Summary
             
               
   natural language processing         
             
               
    pairwise word prediction          
          
             pairwise word prediction 
             phylogenetic
word prediction              
      
Chapter 2
Method
                    
               
       pairwise word prediction      
         Phylogenetic word prediction
             
2.1 Pairwise word prediction
2.1.1 Task
               
           (wc,A, wc,B )   
 c   A B         
         d    wd,B     wd,A 
       A B      C
    
               
              
               
               
                  
  
2.1.2 Models
              
    RNN encoder-decoder    structured perceptron
RNN encoder-decoder
                
               
                  
                
    

 CHAPTER 2. METHOD
               
              
                
  encoder                
                  
  decoder             
           
               
               
             
                
          Xavier initialization  
               
                
                 
N(0,1
nincoming   nincoming             
                
        
            
                   somax
            
            categorical cross-entropy 
         somax     L2 regularization term 
              
     Adagrad          
                 
                 
                 
 Lasagne       
                
                
           
                 
 
Cognacy prior             
                  
cognacy prior           
               
                  
                 
           
      Lnew      LC E    
 CP
Lnew =LCE (t, p)·C P (t, p)
CP (t, p) = 1
1 + eLCE (t,p)θ
θ=LCE history +vσ
2.1. PAIRWISE WORD PREDICTION 

Lnew         
LCE (t, p)       t  p
CP (t, p)           t p 
θ         
LCE history       
v           
     LCE (t, p)           
               t
p                
                
                  
               
            
Structured perceptron
               structured percep-
tron           perceptron   
     
Algorithm        I     N
           xn   ˆyn  
     w
ˆyn=argmaxyYwTϕ(xn, yn)
wTϕ(xn, yn)             ϕ
  w   argmax            yn  
         ˆyn  argmax   
               ˆyn
    ˆyn      yn     
              
   
ww+ϕ(xn, yn)ϕ(xn,ˆyn)
 I   w         averaged structured
perceptron                
      
Application              
             
                 
         
      seqlearn       
         

 CHAPTER 2. METHOD
ht0ht1ht2ht3



  
ht0ht1ht2ht3
   
Tt0Tt1Tt2Tt3
 

       
p(cog)
E(t, p)
                
      θ           
               θ
     averaged structured perceptron algorithm     

2.1. PAIRWISE WORD PREDICTION 
2.1.3 Data
Data set
               
              
             
              
                 
                
             
    
               
             
               
                
              
            
             ASJPcode    
              
              
                  
                 
                  
                 
               
             
             
Input encoding
               
          one-hotphonetic  embedding 
  embedding             
   
One-hot  one-hot encoding          ncharacters  
                 
               

Phonetic  phonetic              
            
              
              
                 
                 
   
 CHAPTER 2. METHOD




 

               
                 


    · · ·
· · ·
· · ·
 · · ·
· · ·
· · ·
· · ·
              
 
Embedding               
 embedding  Word embeddings           
               
                  
                
            interchangeability   
                
              
           phonotactics    
              
                
                   
           phonetic   
         
           
              
                
                
                
                  
                
   
                 
           
             
2.1. PAIRWISE WORD PREDICTION 


START iLEFT SLEFT pRIGHT · · ·
    · · ·
    · · ·
    · · ·
    · · ·
    · · ·
              
              
           
              
            
                
               
             
                  
nlr N           S s
               
                
                 
             
Target encoding
                
               
            
           
               
       
Input normalization
                  
                
                 
             
     
                
           
2.1.4 Experiments
                 
        
 CHAPTER 2. METHOD
  
  
              
     
2.1. PAIRWISE WORD PREDICTION 
 
         
        
       
       
      
      
      
     
     
     
                
                
                  
  
Training
                 
                 
                
    
Evaluation
                 
               
                  
            
            
   
               
                 
                
   n             
                
  n     maximal cliques         
                   
  maximal cliques          n
               
          Levenshtein distance 
                 
               
                
                 
             
                
 CHAPTER 2. METHOD
              
         source prediction    
                
            
             
                
          
2.2 Applications
              
phylogenetic tree reconstructionsound correspondence identication  cognate detection  
          
2.2.1 Phylogenetic tree reconstruction
               
            
               
               UPGMA
     neighbor joining        
    LingPy     
              
            
               
         QDist     
2.2.2 Sound correspondence identication
                
              
                 
       internal     
     output          
             
            
          
2.2.3 Cognate detection
Cognate detection             
               
                
               
               
              
               
                
                 
2.3. SUMMARY 
    per word           
    at UPGMA     link clustering   
  MCL       LingPy   
     θ= 0.7         θ= 0.8 
   
             
                 
    
             
               bcubed

2.3 Summary
           pairwise word prediction  
                
            
     cognacy prior loss         
    
            
     
           
           
      
           

 CHAPTER 2. METHOD
Chapter 3
Results
3.1 Word prediction
               
           
                
                
            
          
     
  
     v= 1.0v= 2.0
         
               
              
                
                   
  
             
            
               
               
                
             
              
                 
             
3.2 Phylogenetic tree reconstruction
              
             

 CHAPTER 3. RESULTS
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
             
           Prediction      
   Input              target
               
3.3. IDENTIFICATION OF SOUND CORRESPONDENCES 
  
      
    
    
    
  v= 1.0 
  v= 1.0 
  v= 1.0 
  v= 2.0 
  v= 2.0 
  v= 2.0 
    0.4374
    
    
    
  0.3249 
              
                 
         
               
      
              
              
               
               
              
                
         
              
               
                
             
                  
              
                 
        
3.3 Identication of sound correspondences
            
           
                
              
 CHAPTER 3. RESULTS
bul
slv
hrv
rus
bel
ukr
pol
ces
slk
    
 
  
bel
ukr
rus
slv
hrv
bul
pol
ces
slk
    
 
  
bul
ces
slk
slv
hrv
pol
rus
bel
ukr
    
 
  
bel
rus
ukr
hrv
slv
bul
ces
slk
pol
 
             
              
        
3.4. COGNATE DETECTION 
 
       
   0.047619 0.047619
   0.047619 
   0.047619 0.047619
  v= 1.0 
  v= 1.00.047619 0.047619
  v= 1.0 
  v= 2.0 
  v= 2.0 0.047619
  v= 2.0 
   0.047619 0.047619
   0.047619 0.047619
   0.047619 0.047619
    0.047619
  0.047619 0.047619
              
                
                  
          
             
              

3.4 Cognate detection
               
           
               
              
           MCL θ= 0.7 Link
clustering θ= 0.7  Flat UPGMA θ= 0.8
              
              
           
              
             
                   

 CHAPTER 3. RESULTS
 





   
   
  
   
  
  
  
  
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
          
            
                
        
3.5. SUMMARY 
 


 
  



 
     
     
     
     0.932077
     
     
     
     
    0.929840 
              
            θ=
0.7   θ= 0.7   θ= 0.8     
     
3.5 Summary
                
                 
            
    
                 
              
phylogenetic word prediction        
 CHAPTER 3. RESULTS
Chapter 4
Context vector analysis
               
           context vectors    
               
           
               

                 
            
                
        
4.1 Extraction of context vectors and input/target words
               
                  
                
 nhidden            
             
                
 len(word)×nf eatures           
      
              
               
             
4.2 PCA visualization
             
   principal components analysis          
                
               
                  
    

 CHAPTER 4. CONTEXT VECTOR ANALYSIS
    
 
 
 
             
              

           
            
          
              
                
                
              
4.3 Cluster analysis
               
                
               
             
     
4.3.1 Distance matrices
               
        cosine distance     
              
              
              
               scikit-learn

              
  
4.3.2 Clustering
               
               
               
 Flat UPGMAMCLLink clustering  Anity propagation     
   LingPy              
                
               
             
4.3. CLUSTER ANALYSIS 
  
    
    
                
               
 CHAPTER 4. CONTEXT VECTOR ANALYSIS
   
 θ        
           
          
          
         
          
           
          
        
          
        
           
          
          
               
           
              
                 
            
    
    
    
    
             
θ= 0.2              
 
                
           
         θ= 0.2   
          
               
                
               
                  
                 
                

                  
                 
          
4.4. SUMMARY 
4.4 Summary
                  
                
               
                
             
          
 CHAPTER 4. CONTEXT VECTOR ANALYSIS
Chapter 5
Phylogenetic word prediction
5.1 Beyond pairwise word prediction
                 
                
             
             
             
 
  phylogenetic word prediction          
             
               
        protoforms       
  
               
               
                
          
5.2 Method
5.2.1 Network architecture
               
                 
                  
               
             
    feed-forward            
                
               
                 
              
            

 CHAPTER 5. PHYLOGENETIC WORD PREDICTION
   
 
              
                   
                 
         
               
              
             
      
    Recursive neural networks       
             
               
                  
               
             
                
                 
                  
            
             
5.2.2 Weight sharing and protoform inference
               
                
                  
              
                
               
 
                
               
     from         to
               
  
                
5.3. RESULTS 
             
              
              
             
              
             
               
          
          protoforms   
                 
            
  
5.2.3 Implementation details
                 
               
   f(x) = max(0, x)          
              
                
 
5.2.4 Training and prediction
              
              
               
          
             
               
 
5.2.5 Experiments
                
             
               
       ((nld, deu), eng)       
            ((nld, eng), deu)
((deu, eng), nld)            

5.3 Results
            
         ((nld, deu), eng)   
  ((nld, eng), deu) ((deu, eng ), nld)         
        v= 1.0        
 CHAPTER 5. PHYLOGENETIC WORD PREDICTION
  
     
  

    0.5528 
  

 0.5900 0.6807 0.5647  
  

0.6889     0.5806
 

     
             
          
              
      
  
  

0.8945
  


  


  r         
          
            
                
 
                 
r         
              
              
               
              
              
                
             
    
           
           
             
               
            i33n   inslap3  
          
5.4. DISCUSSION 
  
 
 
 
 
 
 
 
 
 
 
   
    
  
 
 
 
 
 
 
 
 
 
 
    
     
            
            
5.4 Discussion
              
              
                 
              
               
              
           
            
                  
                   
       
           
               
              
                
               
                
                
    
5.5 Summary
      phylogenetic word prediction task    
              
               
              
             
              
 CHAPTER 5. PHYLOGENETIC WORD PREDICTION
              
 
Chapter 6
Conclusion and discussion
               
      word prediction         
              
                
              
                
          
               phylogenetic word prediction
              
     
             
                deep
neural network as a model of sound correspondences in historical linguistics   
     cognacy prior loss          
                
                   
             
        embedding encoding     
             
             
   visualize learned patterns by a neural network by comparing clusterings  
              new method to
infer cognate judgments        phylogenetic word pre-
diction            protoform
reconstruction from a neural network
                
                    
                 
             
                
              
                
              
           

 CHAPTER 6. CONCLUSION AND DISCUSSION
              
             
          
             
            
               
              toy data 
              
           
           
                
  
                
              
               
             
               
               
           
Acknowledgments
                
                
               
               
              

Chapter 7
Appendix

 CHAPTER 7. APPENDIX
  
         
       
          
              
   
       
             
      
      
 
  
  
      
 
  
  
  
  
      
    
  
  
   
   
      
 
  
  
      
 
  
  
             
  
      
   
     
  
 
          
       
           



   


      

       


 




 








 











 




 
 






 
                      
                
 CHAPTER 7. APPENDIX
   
 
 
  
 
  
 
  
 
 
 
 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
  
  
 
   
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
  
 
  
  
 
   
 
 
  
 
 
 
 
 
 
 
 
  
 
  
  
 
 
 
 
  
  
 
  
  
 
 
 
 
 
  
  
 
 
  
  
  
  
  
 
  
 
 
 
 
 
 
 
  
 
 
  
  
  
             
Bibliography
               
 Nature 
               
     Information retrieval 
            
            Medical image analysis

            
  Proceedings of the 17th international conference on Computational linguistics-Volume 1 
    
         e Journal of the Acoustical Society of
America 
               
 e annals of mathematical statistics 
    Comparative Indo-European linguistics: an introduction   
            
  IJCNLP  
    Paern recognition and machine learning 
              
       Proceedings of the National Academy of Sciences

                 
               
  Science 
               
          STUF-Language Typology
and Universals Sprachtypologie und Universalienforschung 
                
   Symposium on Discrete Algorithms: Proceedings of the eleventh annual ACM-SIAM
symposium on Discrete algorithms    

 BIBLIOGRAPHY
       AMSTERDAM STUDIES IN THE THEORY AND
HISTORY OF LINGUISTIC SCIENCE SERIES 4  
   Historical linguistics: an introduction     
            
     Language 
               
          
arXiv preprint arXiv:1406.1078
         Procedia Computer Science 

   Indo-European linguistics: an introduction   
            
    Proceedings of the ACL-02 conference on Empirical methods in natural
language processing-Volume 10      
              
arXiv preprint arXiv:1406.1231
       Practical structured learning techniques for natural language process-
ing    
         
                
                 
                  
 
            Joint IAPR International
Workshops on Statistical Techniques in Paern Recognition (SPR) and Structural and Syntactic Paern
Recognition (SSPR)   
               
  Journal of Machine Learning Research 
        URL: hp://ielex. mpi. nl
     e Routledge Handbook of Historical Linguistics. Routledge
 
      e comparative method reviewed: regularity and irregularity in language
change   
          Studies in linguistic analysis
              Science 

BIBLIOGRAPHY 
              
  Proceedings of the irteenth International Conference on Articial Intelligence and Statistics
 
              Proceedings of the
Fourteenth International Conference on Articial Intelligence and Statistics  
           
    Neural Networks, 1996., IEEE International Conference on   
 
        Deep learning  
               
  Nature 
      
                 
    Proceedings of the National Academy of Sciences 
             
    arXiv preprint arXiv:1611.04798
                 
          Nature 
              
     
         Neural computation 
        Language History, Language Change, Language Relationship: An
Introduction to Historical and Interpretative Linguistics  
                 
           Current Biology

               
    Proceedings of the International Conference Recent Advances in Natural Language
Processing  
           Proceedings
of the National Academy of Sciences 
              
          Mayan 
             
   Speech & language processing   
          
 Proceedings of the 19th international conference on Computational linguistics-Volume 1  
   
 BIBLIOGRAPHY
              
    
               
  Computer speech & language 
             
  EMNLP  
              Soviet
physics doklady    
           EACL 2012 

            
              Bioinfor-
matics 
               
       Advances in neural information processing systems
 
             Pro-
ceedings of the 45th Annual Meeting of the ACL: Student Research Workshop   
  
                 
       Journal of Molecular Biology 
               e Compar-
ative Method Reviewed: Regularity and Irregularity in Language Change    

      Morphologische Untersuchungen auf dem Gebiete der indogerman-
ischen Sprachen    
             
 Philosophical Transactions of the Royal Society of London. Series A, containing papers of a
mathematical or physical character 
                 e London,
Edinburgh, and Dublin Philosophical Magazine and Journal of Science 
             
 Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP)
 
              PLoS One 
            
arXiv preprint arXiv:1605.05172
        
BIBLIOGRAPHY 
            
   Psychological review 
               
  Nature 
              
  Molecular Biology and Evolution 
              
    Learning 
             
     Proceedings of the IEEE Conference on Computer Vision and
Paern Recognition  
               
       Proceedings of the NIPS-2010 Deep Learning and
Unsupervised Feature Learning Workshop  
              
University of Kansas Scientic Bulletin 
                
Advances in Neural Information Processing Systems  
   Historical linguistics 
         
             
 IEEE Transactions on Information eory 

Supplementary resource (1)

... We were particularly interested in seeing what encoder-decoder models, standard in machine translation, could produce on the task. However, at the beginning of this work, in 2019, unpublished preliminary experiments as well as other works (Dekker 2018) seemed to conclude that using complex neural networks was likely not a good direction for cognate prediction, as they seemed outperformed by statistical models in our case or simplistic neural networks (perceptrons) in other works, though at the same time, these models of interest were also being used successfully for proto-form reconstruction (Meloni et al. 2021) (on arxiv in 2019). As we believed in the potential of those methods, we decided to investigate the situation more in depth, first by understanding if using complex neural networks for cognate prediction was possible or not, then using it on historical data, before trying to understand the latent information plausibly learnt. ...
... Of these, historical word prediction, or studying 'word' change through time, is the first axis of interest of this thesis. As seen above, many automatic methods can be applied to this task, but we are specifically interested in neural networks; this had barely been done before the beginning of this thesis (Beinborn et al. 2013;Dekker 2018), with at the time extremely mitigated results. We aimed to challenge this, and exhaustively study the applicability and usefulness of neural networks for this task. ...
... Padgett (1943) Character-level low-resource machine translation and cognate prediction present a number of similarities: both model sequence-to-sequence relations, looking at structured data, and machine translation has been used with for cognate prediction or proto-form reconstruction in recent years. Preliminary experiments (Fourrier 2020) and previous work (Dekker 2018) have shown that sequence-to-sequence models do not perform as well as simpler neural or nonneural methods on etymological cognates. This raises the question which started this PhD: ...
Thesis
In historical linguistics, cognates are words that descend in direct line from a common ancestor, called their proto-form, andtherefore are representative of their respective languages evolutions through time, as well as of the relations between theselanguages synchronically. As they reflect the phonetic history of the languages they belong to, they allow linguists to betterdetermine all manners of synchronic and diachronic linguistic relations (etymology, phylogeny, sound correspondences).Cognates of related languages tend to be linked through systematic phonetic correspondence patterns, which neuralnetworks could well learn to model, being especially good at learning latent patterns. In this dissertation, we seek tomethodically study the applicability of machine translation inspired neural networks to historical word prediction, relyingon the surface similarity of both tasks. We first create an artificial dataset inspired by the phonetic and phonotactic rules ofRomance languages, which allow us to vary task complexity and data size in a controlled environment, therefore identifyingif and under which conditions neural networks were applicable. We then extend our work to real datasets (after havingupdated an etymological database to gather a correct amount of data), study the transferability of our conclusions toreal data, then the applicability of a number of data augmentation techniques to the task, to try to mitigate low-resourcesituations. We finally investigat in more detail our best models, multilingual neural networks. We first confirm that, onthe surface, they seem to capture language relatedness information and phonetic similarity, confirming prior work. Wethen discover, by probing them, that the information they store is actually more complex: our multilingual models actuallyencode a phonetic language model, and learn enough latent historical information to allow decoders to reconstruct the(unseen) proto-form of the studied languages as well or better than bilingual models trained specifically on the task. Thislatent information is likely the explanation for the success of multilingual methods in the previous works
... While many works proposed computational approaches to perceive historical sound changes (Mielke, 2008;Dekker, 2018;Boldsen and Paggio, 2022) and others suggested using the listenerdriven model to perceive ongoing sound changes (Janson, 1983;Sanker, 2018a;Quam and Creel, 2021), few works explored the intersection of both computational and human perception. The benefits of doing so can be substantial. ...
... Each transcription represents a sequence of phonemes. Translating a sequence of phonemes from one language to another can be framed as a machine translation task, as both execute a cross-lingual sequence-to-sequence task (Dekker, 2018;Fourrier and Sagot, 2020a). ...
... This paper is based on the first author's unpublished MSc thesis(Dekker 2018). ...
Article
Full-text available
In this paper, we investigate how the prediction paradigm from machine learning and Natural Language Processing (NLP) can be put to use in computational historical linguistics. We propose word prediction as an intermediate task, where the forms of unseen words in some target language are predicted from the forms of the corresponding words in a source language. Word prediction allows us to develop algorithms for phylogenetic tree reconstruction, sound correspondence identification and cognate detection, in ways close to attested methods for linguistic reconstruction. We will discuss different factors, such as data representation and the choice of machine learning model, that have to be taken into account when applying prediction methods in historical linguistics. We present our own implementations and evaluate them on different tasks in historical linguistics.
Article
Full-text available
Significance Do different aspects of language evolve in different ways? Here, we infer the rates of change in lexical and grammatical data from 81 languages of the Pacific. We show that, in general, grammatical features tend to change faster and have higher amounts of conflicting signal than basic vocabulary. We suggest that subsystems of language show differing patterns of dynamics and propose that modeling this rate variation may allow us to extract more signal, and thus trace language history deeper than has been previously possible.
Article
Full-text available
In this paper, we present our first attempts in building a multilingual Neural Machine Translation framework under a unified approach. We are then able to employ attention-based NMT for many-to-many multilingual translation tasks. Our approach does not require any special treatment on the network architecture and it allows us to learn minimal number of free parameters in a standard way of training. Our approach has shown its effectiveness in an under-resourced translation scenario with considerable improvements up to 2.6 BLEU points. In addition, the approach has achieved interesting and promising results when applied in the translation task that there is no direct parallel corpus between source and target languages.
Article
Full-text available
We propose a sequence labeling approach to cognate production based on the orthography of the words. Our approach leverages the idea that orthographic changes represent sound correspondences to a fairly large extent. Given an input word in language L1, we seek to determine its cognate pair in language L2. To this end, we employ a sequential model which captures the intuition that orthographic changes are highly dependent on the context in which they occur. We apply our method on two pairs of languages. Finally, we investigate how second language learners perceive the orthographic changes from their mother tongue to the language they learn.
Conference Paper
Full-text available
Cognates are words in different languages that are associated with each other by language learners. Thus, cognates are important indicators for the prediction of the perceived difficulty of a text. We introduce a method for automatic cognate production using character-based machine translation. We show that our approach is able to learn production patterns from noisy training data and that it works for a wide range of language pairs. It even works across different alphabets, e.g. we obtain good results on the tested language pairs English-Russian, English-Greek, and English-Farsi. Our method performs significantly better than similarity measures used in previous work on cognates. abstract
Article
Full-text available
In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification. We compare our architecture with binary classifiers based on string similarity measures on different language families. Our experiments show that convolutional networks achieve competitive results across concepts and across language families at the task of cognate identification.
Article
Full-text available
We propose the first implementation of an infinite-order generative dependency model. The model is based on a new recursive neural network architecture, the Inside-Outside Recursive Neural Network. This architecture allows information to flow not only bottom-up, as in traditional recursive neural networks, but also topdown. This is achieved by computing content as well as context representations for any constituent, and letting these representations interact. Experimental results on the English section of the Universal Dependency Treebank show that the infinite-order model achieves a perplexity seven times lower than the traditional third-order model using counting, and tends to choose more accurate parses in k-best lists. In addition, reranking with this model achieves state-of-the-art unlabelled attachment scores and unlabelled exact match scores.
Article
Non linguists are not always able to distinguish easily between genetic groupings established by the comparative method and those proposed on other grounds. A much-publicized recent example is Cavalli-Sforza et al. (1988), where the very deep macro groupings the authors assume for languages of the Pacific and New World are treated as the same kind of grouping as Indo-European or Uralic. Another example is Renfrew 1991, where, following IllicSvityc, Nostratic is described as based on the same comparative method as lndo-European (1991: 6). More tangential to linguistic questions, Stoneking and Cann (1989), analyzing mitochondrial DNA divergences in human populations, estimate the rate of evolution by assuming that Papua New Guinea, Australia, and North America each constitutes a single population of a single age. This amounts to an implicit assumption that each of these areas was colonized only once, an assumption compatible with the linguistic macro groupings of Indo-Pacific, Australian, and Amerind. In these and other works, non linguists working on human prehistory assume or implicitly assume that macro groupings are the same kind of entity as families like ludo-European, that they are established by the same comparative method as lndo-European was, and that the received view in linguistics takes them as established or plausible. Works like these, or the macro groupings they assume, are cited with approval in popular works such as Turner 1988 and Gould 1991.
Conference Paper
In this paper, we present phoneme level Siamese convolutional networks for the task of pair-wise cognate identification. We represent a word as a two-dimensional matrix and employ a siamese convolutional network for learning deep representations. We present siamese architectures that jointly learn phoneme level feature representations and language relatedness from raw words for cognate identification. Compared to previous works, we train and test on larger and realistic datasets; and, show that siamese architectures consistently perform better than traditional linear classifier approach.
Book
This accessible, hands-on introduction to historical linguistics - the study of language change - does not just talk about topics. With abundant examples and exercises, it helps students learn for themselves how to do historical linguistics. Distinctive to the book is its integration of the standard traditional topics with others now considered vital to historical linguistics: explanation of 'why' languages change; sociolinguistic aspects of linguistic change; syntactic change and grammaticalization; distant genetic relationships (how to show that languages are related); areal linguistics; and linguistic prehistory. Examples come from a wide range of languages. Those from the history of more familiar languages such as English, French, German and Spanish make the concepts they illustrate more accessible, while others from numerous non-Indo-European languages help to demonstrate the depth and richness of the concepts and methods they illustrate. With its lucid and engaging style, expert guidance and comprehensive coverage, this book is not only an invaluable textbook for students coming to the subject for the first time, but also an entertaining and engaging read for specialists in the field. Key Features. * Practical hands-on approach including numerous student exercises * Wide range of languages and examples * Accessible writing style aimed at students * Comprehensive and insightful coverage of essential topics.
Article
Segmentation of the left ventricle (LV) from cardiac magnetic resonance imaging (MRI) datasets is an essential step for calculation of clinical indices such as ventricular volume and ejection fraction. In this work, we employ deep learning algorithms combined with deformable models to develop and evaluate a fully automatic segmentation tool for the LV from short-axis cardiac MRI datasets. The method employs deep learning algorithms to learn the segmentation task from the ground true data. Convolutional networks are employed to automatically detect the LV chamber in MRI dataset. Stacked autoencoders are utilized to infer the shape of the LV. The inferred shape is incorporated into deformable models to improve the accuracy and robustness of the segmentation. We validated our method using 45 cardiac MR datasets taken from the MICCAI 2009 LV segmentation challenge and showed that it outperforms the state-of-the art methods. Excellent agreement with the ground truth was achieved. Validation metrics, percentage of good contours, Dice metric, average perpendicular distance and conformity, were computed as 96.69%, 0.94, 1.81mm and 0.86, versus those of 79.2%-95.62%, 0.87-0.9, 1.76-2.97mm and 0.67-0.78, obtained by other methods, respectively.