Ali Basirat

Ali Basirat
Linköping University | LiU · Department of Computer and Information Science (IDA)

Doctor of Philosophy
Looking for a new position

About

42
Publications
7,555
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
102
Citations
Introduction
Ali Basirat is a machine learning researcher with a deep interest in analysing textual data. Ali does research in Algorithms, Artificial Intelligence and Computational Linguistics.
Additional affiliations
January 2019 - present
Uppsala University
Position
  • PostDoc Position
March 2014 - September 2018
Uppsala University
Position
  • PhD Student
January 2010 - January 2014
University of Tehran
Position
  • Researcher

Publications

Publications (42)
Article
Full-text available
The two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order...
Article
Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tes...
Preprint
Standard models for syntactic dependency parsing take words to be the elementary units that enter into dependency relations. In this paper, we investigate whether there are any benefits from enriching these models with the more abstract notion of nucleus proposed by Tesni\`{e}re. We do this by showing how the concept of nucleus can be defined in th...
Article
This study conducts an experimental evaluation of two hypotheses about the contributions of formal and semantic features to the grammatical gender assignment of nouns. One of the hypotheses (Corbett and Fraser 2000) claims that semantic features dominate formal ones. The other hypothesis, formulated within the optimal gender assignment theory (Rice...
Preprint
We explore the transferability of a multilingual neural machine translation model to unseen languages when the transfer is grounded solely on the cross-lingual word embeddings. Our experimental results show that the translation knowledge can transfer weakly to other languages and that the degree of transferability depends on the languages' relatedn...
Preprint
Full-text available
The vector representation of words, known as word embeddings, has opened a new research approach in the study of languages. These representations can capture different types of information about words. The grammatical gender of nouns is a typical classification of nouns based on their formal and semantic properties. The study of grammatical gender...
Preprint
Full-text available
We analyze the information provided by the word embeddings about the grammatical gender in Swedish. We wish that this paper may serve as one of the bridges to connect the methods of computational linguistics and general linguistics. Taking nominal classification in Swedish as a case study, we first show how the information about grammatical gender...
Preprint
We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word em...
Preprint
We study the effect of rich supertag features in greedy transition-based dependency parsing. While previous studies have shown that sparse boolean features representing the 1-best supertag of a word can improve parsing accuracy, we show that we can get further improvements by adding a continuous vector representation of the entire supertag distribu...
Preprint
Full-text available
We extend the randomized singular value decomposition (SVD) algorithm \citep{Halko2011finding} to estimate the SVD of a shifted data matrix without explicitly constructing the matrix in the memory. With no loss in the accuracy of the original algorithm, the extended algorithm provides for a more efficient way of matrix factorization. The algorithm...
Article
Full-text available
We introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most p...
Chapter
We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical...
Conference Paper
Full-text available
Word embeddings are fundamental objects in neural natural language processing approaches. Despite the fact that word embedding methods follow the same principles, we see in practice that most of the methods that use PCA are not as successful as the methods that are developed in the area of language modelling and make use of neural networks to train...
Conference Paper
Full-text available
We study the presence of information provided by word embeddings from real-valued syntactic word vectors for determining the grammatical gender of nouns in Swedish. Our investigation reveals that regardless of being a frequently used word or not, real-valued syntactic word vectors are highly informative for identifying the grammatical gender of nou...
Poster
Full-text available
We study the presence of information provided by word embeddings from real-valued syntactic word vectors for determining the grammatical gender of nouns in Swedish. Our investigation reveals that regardless of being a frequently used word or not, real-valued syntactic word vectors are highly informative for identifying the grammatical gender of nou...
Poster
Full-text available
Word embeddings are fundamental objects in neural natural language processing approaches. Even though word embedding methods follow the same principles, we see in practice that most of the methods that use PCA are not as successful as the methods that are developed in the area of language modelling and make use of neural networks to train word embe...
Conference Paper
Full-text available
Word embeddings are fundamental objects in neural natural language processing approaches. Even though word embedding methods follow the same principles, we see in practice that most of the methods that use PCA are not as successful as the methods that are developed in the area of language modelling and make use of neural networks to train word embe...
Thesis
Full-text available
Word embedding is a technique for associating the words of a language with real-valued vectors, enabling us to use algebraic methods to reason about their semantic and grammatical properties. This thesis introduces a word embedding method called principal word embedding, which makes use of principal component analysis (PCA) to train a set of word e...
Conference Paper
Full-text available
We apply real-valued word vectors combined with two different types of classifiers (linear discriminant analysis and feed-forward neural network) to scrutinize whether basic nominal categories can be captured by simple word embedding models. We also provide a linguistic analysis of the errors generated by the classifiers. The targeted language is S...
Presentation
Full-text available
The slides presented in the 10th International Conference on Agents and Artificial Intelligence
Conference Paper
Full-text available
We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech ta...
Conference Paper
Full-text available
We show that a set of real-valued word vectors formed by right singular vectors of a transformed co-occurrence matrix are meaningful for determining different types of dependency relations between words. Our experimental results on the task of dependency parsing confirm the superiority of the word vectors to the other sets of word vectors generated...
Conference Paper
Full-text available
A set of continuous feature vectors formed by right singular vectors of a transformed co-occurrence matrix are used with the Stanford neural dependency parser to train parsing models for a limited number of languages in the corpus of universal dependencies. We show that the feature vector can help the parser to remain greedy and be as accurate as (...
Article
Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are...
Article
Full-text available
A hybrid algorithm specifically designed to work with optimised support vector machine with genetic algorithm (GA-SVM) was developed for determining the relationships between soil properties and plant distribution and vegetation cover densities in a protected area (Ghomeshlu, central Iran). The bulk density, porosity, silt, total nitrogen and chlor...
Conference Paper
The Treebanks as the sets of syntactically annotated sentences, are the most widely used language resource in the application of Natural Language Processing. The occurrence of errors in the automatically created Treebanks is one of the main obstacles limiting the using of these resources in the real world applications. This paper aims to introduce...
Article
LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information...
Conference Paper
Though the lack of semantic representation of automatically extracted LTAGs is an obstacle in using these formalism, due to the advent of some powerful statistical parsers that were trained on them, these grammars have been taken into consideration more than before. Against of this grammatical class, there are some widely usage manually crafted LTA...
Thesis
Full-text available
گرامرها ابزارهای ریاضی هستند که برای مدل کردن زبانها بکار می‌روند. به فرايند كاوش در سازه‌هاي پايه گرامر جهت يافتن توصیف‌‌هاي نحوي مناسب براي جمله ورودي تجزيه گفته مي‌شود. بسته به پوشش گرامرها بر روي زبان‌ها، عملیات تجزیه را می‌توان بصورت کلی یا جزئی انجام داد. در روشهای قدیمی با این فرض که گرامر توانایی پوشش همه پدیده‌های زبانی را دارد، تجزیه کنند...
Conference Paper
This paper discusses two Hidden Markov Models (HMM) for linking linguistically motivated XTAG grammar and the automatically extracted LTAG used by MICA parser. The former grammar is a detailed LTAG enriched with feature structures. And the latter one is a huge size LTAG that due to its statistical nature is well suited to be used in statistical app...
Conference Paper
Full-text available
MICA is a fast and accurate dependency parser for English that uses an automatically LTAG derived from Penn Treebank (PTB) using the Chen's approach. However, there is no semantic representation related to its grammar. On the other hand, XTAG grammar is a hand crafted LTAG that its elementary trees were enriched with the semantic representation by...

Network

Cited By

Projects

Projects (5)
Project
This projects aim at the linguistic study of word embeddings
Project
To extend existing dependency-based parsing models to better cope typological diversity and adapt them to the representations of Universal Dependencies.
Project
We approach the linguistic diversity of languages through an analysis of nominal classification systems. Our current focus involves topics such as grammatical gender and numeral classifiers.