Archived project

MSc thesis: Reconstructing language ancestry by performing word prediction with neural networks

Goal: In this thesis, I apply machine learning methods to historical linguistics. By predicting words between languages, several tasks in historical linguistics can be performed, such as phylogenetic tree reconstruction, sound correspondence identification and cognate detection.

Methods: Machine Learning, Phylogenetic Tree Construction, cognate detection, deep learning, recurrent neural networks

Updates
0 new
1
Recommendations
0 new
0
Followers
0 new
2
Reads
0 new
50

Project log

Peter Dekker
added an update
The source code of my MSc thesis "Reconstructing language ancestry by performing word prediction with neural networks", including my implementation of an encoder-decoder RNN for historical linguistics, can now be found on Bitbucket: https://bitbucket.org/pdekker/wordprediction/
 
Peter Dekker
added a research item
In this presentation, I show how machine learning can be applied to historical linguistics. This presentation describes the progress of my thesis until march 2017.
Peter Dekker
added 2 research items
In recent years, computational methods have led to new discoveries in the field of historical linguistics. In my thesis, I applied the machine learning paradigm, succesful in many computing tasks, to historical linguistics. I proposed the task of word prediction: by training a machine learning model on pairs of words in two languages, it learns the sound correspondences between the two languages and should be able to predict unseen words. I used two neural network models, a recurrent neural network (RNN) encoder-decoder and a structured perceptron, to perform this task. I have shown that, by performing the task of word prediction, results for multiple tasks in historical linguistics can be obtained, such as phylogenetic tree reconstruction, identification of sound correspondences and cognate detection. On top of this, I showed that the task of word prediction can be extended to phylogenetic word prediction, in which information is shared between language pairs, based on the assumed structure of the ancestry tree. This task could be used for protoform reconstruction and could in the future lead to the direct reconstruction of the optimal tree at prediction time.
Peter Dekker
added a project goal
In this thesis, I apply machine learning methods to historical linguistics. By predicting words between languages, several tasks in historical linguistics can be performed, such as phylogenetic tree reconstruction, sound correspondence identification and cognate detection.