Content uploaded by Peter Dekker
Author content
All content in this area was uploaded by Peter Dekker on Feb 02, 2018
Content may be subject to copyright.
Reconstructing language ancestry by performing
word prediction with neural networks
Supervisors:
Assessor:
Contents
1 Introduction 5
2 Method 13
3 Results 25
4 Context vector analysis 33
CONTENTS
5 Phylogenetic word prediction 39
6 Conclusion and discussion 45
7 Appendix 47
Chapter 1
Introduction
historical linguistics
machine learning
historical linguistics
natural language processing
word prediction
1.1 Historical linguistics
1.1.1 Historical linguistics: the comparative method and beyond
cognates
protolanguage protosounds
protoforms
CHAPTER 1. INTRODUCTION
B
A B A
mass lexical comparison
1.1.2 Sound changes
regular
Neogrammarian hypothesis of the regularity of sound change
phonemic changesloss of
segments insertion/movement of segments
Phonemic changes
Phonemic changes
mergers splits merger
split
assimilation vowel changes
Assimilation (regular)
nokte noe/k/ /t/
/t/
Umlaut
gast gestiz/a/ /e/
1.1. HISTORICAL LINGUISTICS
/i/ /e/ /i/
GastGäste
Lenition (regular)
voicing voiceless voiced degemination geminate
simplex nasalization non-nasal nasal
strata strada /t/
/d/ gua
gota
regāle real
Vowel changes (regular)
lenition
loweringfrontingrounding dut
dʏt /u/ /ʏ/Coalescence
Compensatory lengthening
bɛst bɛ:t
Loss of segments
Loss (regular) Aphearesis
k knee apocope
syncope chocolate
Haplology (sporadic) haplology
sagar ardo
sagardo ar
Insertion/movement of segments
Insertion (regular)
prothesis scala escala
epenthesis poclum poculum
excrescence amonges amongst
Metathesis (sporadic) Metathesis
wæps wasp parabola palabra
CHAPTER 1. INTRODUCTION
1.1.3 Computational methods in historical linguistics
genotypic
phenotypic Genotypic
regular sound correspondences Phenotypic
Cognate detection
cognate detection
n
Sound correspondence detection
Sound correspondence detection
1.1. HISTORICAL LINGUISTICS
Protoform reconstruction
protoform reconstruction
functional load hypothesis
Phylogenetic tree reconstruction
maximum parsimony likelihood-based
Distance-based methods
molecular clock
Q
greedy minimum evolution
Character-based methods
maximum parsimony methods likelihood-based methods
long branch aention
likelihood
CHAPTER 1. INTRODUCTION
1.2 Developments in natural language processing
natural language processing
1.2.1 Natural language processing
natural language processing
1.2.2 Machine learning and language
Machine learning
training examples (x, y)
test examples x y
y∗
1.2.3 Deep neural networks
Neural networks
1.3. WORD PREDICTION
deep
learning
representation learning feature
engineering
encoder-decoder
1.3 Word prediction
1.3.1 Word prediction
word prediction
pairwise word prediction
phylogenetic word prediction
reconstruction of phylogenetic
trees
sound correspondences
cognate detection
CHAPTER 1. INTRODUCTION
1.3.2 Model desiderata
wd,B
B wd,A A wd,B
wd,B
semantic shi
c A B
c
d d c
cross-concept cognate pairs
1.4 Summary
natural language processing
pairwise word prediction
pairwise word prediction
phylogenetic
word prediction
Chapter 2
Method
pairwise word prediction
Phylogenetic word prediction
2.1 Pairwise word prediction
2.1.1 Task
(wc,A, wc,B )
c A B
d wd,B wd,A
A B C
2.1.2 Models
RNN encoder-decoder structured perceptron
RNN encoder-decoder
CHAPTER 2. METHOD
encoder
decoder
Xavier initialization
N(0,1
nincoming nincoming
somax
categorical cross-entropy
somax L2 regularization term
Adagrad
Lasagne
Cognacy prior
cognacy prior
Lnew LC E
CP
Lnew =LCE (t, p)·C P (t, p)
CP (t, p) = 1
1 + eLCE (t,p)−θ
θ=LCE history +vσ
2.1. PAIRWISE WORD PREDICTION
Lnew
LCE (t, p) t p
CP (t, p) t p
θ
LCE history
v
LCE (t, p)
t
p
Structured perceptron
structured percep-
tron perceptron
Algorithm I N
xn ˆyn
w
ˆyn=argmaxy∈YwTϕ(xn, yn)
wTϕ(xn, yn) ϕ
w argmax yn
ˆyn argmax
ˆyn
ˆyn yn
w←w+ϕ(xn, yn)−ϕ(xn,ˆyn)
I w averaged structured
perceptron
Application
seqlearn
CHAPTER 2. METHOD
ht0ht1ht2ht3
ht0ht1ht2ht3
Tt0Tt1Tt2Tt3
p(cog)
E(t, p)
θ
θ
averaged structured perceptron algorithm
2.1. PAIRWISE WORD PREDICTION
2.1.3 Data
Data set
ASJPcode
Input encoding
one-hotphonetic embedding
embedding
One-hot one-hot encoding ncharacters
Phonetic phonetic
CHAPTER 2. METHOD
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
Embedding
embedding Word embeddings
interchangeability
phonotactics
phonetic
2.1. PAIRWISE WORD PREDICTION
START iLEFT SLEFT pRIGHT · · ·
· · ·
· · ·
· · ·
· · ·
· · ·
nlr N S s
Target encoding
Input normalization
2.1.4 Experiments
CHAPTER 2. METHOD
2.1. PAIRWISE WORD PREDICTION
Training
Evaluation
n
n maximal cliques
maximal cliques n
Levenshtein distance
CHAPTER 2. METHOD
source prediction
2.2 Applications
phylogenetic tree reconstructionsound correspondence identication cognate detection
2.2.1 Phylogenetic tree reconstruction
UPGMA
neighbor joining
LingPy
QDist
2.2.2 Sound correspondence identication
internal
output
2.2.3 Cognate detection
Cognate detection
2.3. SUMMARY
per word
at UPGMA link clustering
MCL LingPy
θ= 0.7 θ= 0.8
bcubed
2.3 Summary
pairwise word prediction
cognacy prior loss
CHAPTER 2. METHOD
Chapter 3
Results
3.1 Word prediction
v= 1.0v= 2.0
3.2 Phylogenetic tree reconstruction
CHAPTER 3. RESULTS
Prediction
Input target
3.3. IDENTIFICATION OF SOUND CORRESPONDENCES
v= 1.0
v= 1.0
v= 1.0
v= 2.0
v= 2.0
v= 2.0
0.4374
0.3249
3.3 Identication of sound correspondences
CHAPTER 3. RESULTS
bul
slv
hrv
rus
bel
ukr
pol
ces
slk
bel
ukr
rus
slv
hrv
bul
pol
ces
slk
bul
ces
slk
slv
hrv
pol
rus
bel
ukr
bel
rus
ukr
hrv
slv
bul
ces
slk
pol
3.4. COGNATE DETECTION
0.047619 0.047619
0.047619
0.047619 0.047619
v= 1.0
v= 1.00.047619 0.047619
v= 1.0
v= 2.0
v= 2.0 0.047619
v= 2.0
0.047619 0.047619
0.047619 0.047619
0.047619 0.047619
0.047619
0.047619 0.047619
3.4 Cognate detection
MCL θ= 0.7 Link
clustering θ= 0.7 Flat UPGMA θ= 0.8
CHAPTER 3. RESULTS
3.5. SUMMARY
0.932077
0.929840
θ=
0.7 θ= 0.7 θ= 0.8
3.5 Summary
phylogenetic word prediction
CHAPTER 3. RESULTS
Chapter 4
Context vector analysis
context vectors
4.1 Extraction of context vectors and input/target words
nhidden
len(word)×nf eatures
4.2 PCA visualization
principal components analysis
CHAPTER 4. CONTEXT VECTOR ANALYSIS
4.3 Cluster analysis
4.3.1 Distance matrices
cosine distance
scikit-learn
4.3.2 Clustering
Flat UPGMAMCLLink clustering Anity propagation
LingPy
4.3. CLUSTER ANALYSIS
CHAPTER 4. CONTEXT VECTOR ANALYSIS
θ
θ= 0.2
θ= 0.2
4.4. SUMMARY
4.4 Summary
CHAPTER 4. CONTEXT VECTOR ANALYSIS
Chapter 5
Phylogenetic word prediction
5.1 Beyond pairwise word prediction
phylogenetic word prediction
protoforms
5.2 Method
5.2.1 Network architecture
feed-forward
CHAPTER 5. PHYLOGENETIC WORD PREDICTION
Recursive neural networks
5.2.2 Weight sharing and protoform inference
from to
5.3. RESULTS
protoforms
5.2.3 Implementation details
f(x) = max(0, x)
5.2.4 Training and prediction
5.2.5 Experiments
((nld, deu), eng)
((nld, eng), deu)
((deu, eng), nld)
5.3 Results
((nld, deu), eng)
((nld, eng), deu) ((deu, eng ), nld)
v= 1.0
CHAPTER 5. PHYLOGENETIC WORD PREDICTION
0.5528
0.5900 0.6807 0.5647
0.6889 0.5806
0.8945
r
r
i33n inslap3
5.4. DISCUSSION
5.4 Discussion
5.5 Summary
phylogenetic word prediction task
CHAPTER 5. PHYLOGENETIC WORD PREDICTION
Chapter 6
Conclusion and discussion
word prediction
phylogenetic word prediction
deep
neural network as a model of sound correspondences in historical linguistics
cognacy prior loss
embedding encoding
visualize learned patterns by a neural network by comparing clusterings
new method to
infer cognate judgments phylogenetic word pre-
diction protoform
reconstruction from a neural network
CHAPTER 6. CONCLUSION AND DISCUSSION
toy data
Acknowledgments
Chapter 7
Appendix
CHAPTER 7. APPENDIX
CHAPTER 7. APPENDIX
Bibliography
Nature
Information retrieval
Medical image analysis
Proceedings of the 17th international conference on Computational linguistics-Volume 1
e Journal of the Acoustical Society of
America
e annals of mathematical statistics
Comparative Indo-European linguistics: an introduction
IJCNLP
Paern recognition and machine learning
Proceedings of the National Academy of Sciences
Science
STUF-Language Typology
and Universals Sprachtypologie und Universalienforschung
Symposium on Discrete Algorithms: Proceedings of the eleventh annual ACM-SIAM
symposium on Discrete algorithms
BIBLIOGRAPHY
AMSTERDAM STUDIES IN THE THEORY AND
HISTORY OF LINGUISTIC SCIENCE SERIES 4
Historical linguistics: an introduction
Language
arXiv preprint arXiv:1406.1078
Procedia Computer Science
Indo-European linguistics: an introduction
Proceedings of the ACL-02 conference on Empirical methods in natural
language processing-Volume 10
arXiv preprint arXiv:1406.1231
Practical structured learning techniques for natural language process-
ing
Joint IAPR International
Workshops on Statistical Techniques in Paern Recognition (SPR) and Structural and Syntactic Paern
Recognition (SSPR)
Journal of Machine Learning Research
URL: hp://ielex. mpi. nl
e Routledge Handbook of Historical Linguistics. Routledge
e comparative method reviewed: regularity and irregularity in language
change
Studies in linguistic analysis
Science
BIBLIOGRAPHY
Proceedings of the irteenth International Conference on Articial Intelligence and Statistics
Proceedings of the
Fourteenth International Conference on Articial Intelligence and Statistics
Neural Networks, 1996., IEEE International Conference on
Deep learning
Nature
Proceedings of the National Academy of Sciences
arXiv preprint arXiv:1611.04798
Nature
Neural computation
Language History, Language Change, Language Relationship: An
Introduction to Historical and Interpretative Linguistics
Current Biology
Proceedings of the International Conference Recent Advances in Natural Language
Processing
Proceedings
of the National Academy of Sciences
Mayan
Speech & language processing
Proceedings of the 19th international conference on Computational linguistics-Volume 1
BIBLIOGRAPHY
Computer speech & language
EMNLP
Soviet
physics doklady
EACL 2012
Bioinfor-
matics
Advances in neural information processing systems
Pro-
ceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Journal of Molecular Biology
e Compar-
ative Method Reviewed: Regularity and Irregularity in Language Change
Morphologische Untersuchungen auf dem Gebiete der indogerman-
ischen Sprachen
Philosophical Transactions of the Royal Society of London. Series A, containing papers of a
mathematical or physical character
e London,
Edinburgh, and Dublin Philosophical Magazine and Journal of Science
Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP)
PLoS One
arXiv preprint arXiv:1605.05172
BIBLIOGRAPHY
Psychological review
Nature
Molecular Biology and Evolution
Learning
Proceedings of the IEEE Conference on Computer Vision and
Paern Recognition
Proceedings of the NIPS-2010 Deep Learning and
Unsupervised Feature Learning Workshop
University of Kansas Scientic Bulletin
Advances in Neural Information Processing Systems
Historical linguistics
IEEE Transactions on Information eory