About
71
Publications
21,342
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
355
Citations
Publications
Publications (71)
Recent advancements in speech recognition technology have significantly enhanced accessibility and functionality across various sectors. Nonetheless, the task of recognizing kid’s speech presents considerable challenges. Children from different age groups exhibit distinct speech characteristics, including variations in intonation, articulation, and...
Due to the rapid growth of information on the Internet and social networks, research in the field of computational linguistics has become very relevant. The volume of information that people and machines create in natural language needs to be processed, analyzed and verified. Information retrieval systems, dialog systems, and machine translation to...
One of the applications of NLP is Question and Answering (QA) system. In this field there are many researches and applications. Some competition is organized to increase the performance of the systems. Turkish is a language designed as if by a scientific committee. In Turkish language, rules are clear and transparent. In order to prove these featur...
The number of possible word forms in agglutinative languages is theoretically unlimited. This, in turn, creates the problem of POS tagging (part-of-speech) of out-of-vocabulary (OOV) words in agglutinative languages. In agglutinative languages, words are formed by adding suffixes to the stem. Due to the occurrence of phonetic harmony and disharmony...
Markov models are one of the most widely used machine learning methods for natural language processing. Markov chain and hidden Markov model is a stochastic (random) method used to model dynamic systems, and the current state of the system is predicted based on previous states. The Markov chain, which correctly generates a sequence of words in the...
The educational corpus is a language corpus based on school textbooks and dictionaries and is a structural type of the Uzbek National Corpus. According to the "Concept of the National Corpus of the Uzbek language", the educational corpus of the Uzbek language was created in the framework of the practical project AM-FZ-201908172 "Creation of the edu...
In this work, we report large-scale semantic role annotation of arguments in the Turkish dependency treebank, and present the first comprehensive Turkish semantic role labeling (SRL) resource: Turkish Proposition Bank (PropBank). We present our annotation workflow that harnesses crowd intelligence, and discuss the procedures for ensuring annotation...
Turkish belongs to the Turkic family of languages and these languages exhibit tremendous similarity when it comes to morphological and grammatical structure but have somewhat different lexicons owing to various historical, geographical, and cultural interactions with neighboring languages. In this chapter we briefly cover the similarities and diffe...
Formal Grammar which is introduced by Chomsky is one of the most important development in Natural Language Processing, a branch of Artificial Intelligence. The mathematical reresentation of languages can be possible using Formal Grammars. Almost all natural languages have word classes such as noun, adjective, verb. In addition to this one sentence...
This paper describes the implementation of a rule-based tagging for Uyghur (spoken in Sin Kiang, China) Verbs. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has product...
An intrusion detection system (IDS) monitors the network traffic looking for suspicious or malicious activities or policy violations, which could represent an attack or unauthorized access. Traditional systems were designed to detect known attacks but cannot identify unknown threats. They most commonly detect known threats based on predefined rules...
Üniversitelerin otomasyon sistemleri dağıtık bilgi sistemleridir. Bu tarz bir sistemin yönetilmesi de dağıtık olması
sebebiyle oldukça güçtür Bugün Türkiye’de bilgi güvenliğinin önemi ve güvenlik saldırılarının ehemmiyeti artmış
olmasına rağmen, üniversiteler için belirlenmiş zorunlu bir güvenlik politikası bulunmamaktadır. Ancak
önümüzdeki dönemle...
In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grai...
This paper presents an exploration and evaluation of a diverse set of features that influence word-sense disambiguation (WSD) performance. WSD has the potential to improve many natural language processing (NLP) tasks as being one of the most crucial steps in the area. It is known that exploiting effective features and removing redundant ones help i...
Geliş Tarihi: ; Kabul Tarihi: Anahtar kelimeler Dilbilgisi çözümlemesi, anlamsal çözümleme, öbek-kavramın yüklem uyumluluğu, cümlenin vektör temsili. Özet Tümcenin anlamsal ve dilbilgisi açısından çözümlenmesi Doğal Dil İşleme (DDİ)'nin ana konulardan biridir. Çalışmamızda, tümcedeki temel dilbilgisi ve anlamsal yanlışları saptamak için yüklemi tem...
In this paper we present a framework for extraction of Turkish phrases and their concepts. The objective of the study is meeting the requirement of sources for Turkish Semantic Extractions and represent a Turkish sentence at phrase-concept level. The semantic and grammatical analysis of a sentence is a basic content of Natural Language Processing (...
There is an emerging interest in developing bio-functionalisation routes serving as platforms for assembling diverse enzymes onto material surfaces. Specifically, the fabrication of next-generation, laboratory-on-a-chip-based sensing and energy-harvesting systems requires controlled orientation and organisation of the proteins at the inorganic inte...
In this study, comments about technology
brands are collected from a popular Turkish website,
eks¸is¨ozl ¨uk, and classified as positive or negative. Turkish
text is preprocessed with different kinds of filters and then
modeled with 1-gram, 2-grams and 3-grams language models.
Naive Bayes (NB), Support Vector Machines (SVM) and K
nearest neighbor (...
In this paper we present a framework for extraction of inference rules from Turkish documents, such as "A solved B ~ A found a solution to B". Many natural language processing tasks, such as question answering, information retrieval and machine translation can benefit tremendously from inference rules. Our framework consists of three layers: constr...
Drastically document increase in Web requires semantic web applications in order to lead the Web to its full potential. Extracting important phrases in a document facilitates finding expected information. In this paper, a new approach that is labeling the main subject, main predicate, main location and main date of an electronic document is introdu...
In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window...
Word Sense Disambiguation (WSD) has become even more important research area in recent years with the widespread usage of Natural Language Processing (NLP) applications. WSD task has two variants: “Lexical Sample” and “All Words” approaches. Lexical Sample approach disambiguates the occurrences of a small sample of target words that were previously...
Word Sense Disambiguation (WSD) is the task of choosing the most appropriate sense of a word having multiple senses in a given context. Collocational features acquired from the words in neighborship with the ambiguous word are one of the important knowledge sources in this area. This paper explores the effective sets of collocational features in Tu...
This paper presents the results of main part-of-speech tagging of Turkish sentences using Conditional Random Fields (CRFs). Although CRFs are applied to many different languages for part-of-speech (POS) tagging, Turkish poses interesting challenges to be modeled with them. The challenges include issues related to the statistical model of the proble...
The K-means algorithm is quite sensitive to the cluster centers selected initially and can perform different clusterings depending on these initialization conditions. Within the scope of this study, a new method based on the Fuzzy ART algorithm which is called Improved Fuzzy ART (IFART) is used in the determination of initial cluster centers. By us...
The K-means algorithm is quite sensitive to the cluster centers selected initially and can perform different clusterings depending on these initialization conditions. Within the scope of this study, a new method based on the Fuzzy ART algorithm which is called Improved Fuzzy ART (IFART) is used in the determination of initial cluster centers. By us...
In this paper, we present a rule based model for morphological disambiguation of Uyghur language. Morphological ambiguity is a challenging problem for agglutinative languages. Because there is a possibility a word takes unlimited number of suffixes. If that language has more suffixes, then the ambiguity problem gets more complex. Uyghur language is...
This paper describes the differences between Uyghur (spoken in Sin Kiang, China) and Turkish Grammar on the sentence level. There are not many researches about natural language processing on Turkic languages except than Turkish. Uyghur language is one of the old and rich language in the Turkic language family. Even though both of these languages be...
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-...
Uygur dili yaygın olarak, Doğu Türkistan'da kullanılmaktadır. Bunlar Hariç, Orta Asya, Afganistan, Türkiye gibi Ülkelere bile birçok kişiler tarafından kullanılmaktadır. Uygurlar günümüze kadar birçok alfabe kullanmıştır. Günümüzde Uygurlar yaşadığı bölgelere göre farklı alfabeler kullanmaktadır. Orta Asya da yaşayan Uygurlar Kırıl alfabesini kulla...
We present an approach to MT between Tur- kic languages and present results from an implementation of a MT system from Turk- men to Turkish. Our approach relies on am- biguous lexical and morphological transfer augmented with target side rule-based re- pairs and rescoring with statistical language models.
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Language. Like all Turkic languages,
the Turkmen Language is an agglutinative language that has productive inflectional and derivational suffixes. In this work,
we implemented a finite-state two-level morphological analyzer for Turkmen Language by using Xe...
This paper presents a statistical lexical ambiguity resolution method in direct transfer machine translation models in which
the target language is Turkish. Since direct transfer MT models do not have full syntactic information, most of the lexical
ambiguity resolution methods are not very helpful. Our disambiguation model is based on statistical l...
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Language. Like all Turkic languages, the Turkmen Language is an agglutinative language that has productive inflectional and derivational suffixes. In this work, we implemented a finite-state two-level morphological analyzer for Turkmen Language by using Xe...
In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-defined, but an important problem is defining similarity on the basis of data. In this paper we introduce the problem of finding the pair-wise similarities of quantitative valued sequences where each sequence is a l...
Abstract This paper presents the design and the implementation,of amorphological,analyzer for ,Turkish. A new methodology,is proposed ,for doing ,the analysis of Turkish words ,with an affix ,stripping approach ,and without,using ,any ,lexicon. The ,rule-based ,and agglutinative structure of the language allows Turkish to be modeled with finite sta...
TEZDIL is a programming language which has been designed for use with CNC tools. In this paper, TEZDIL is described as it is implemented as the software part of a project which was done for an automobile factory (TOFAS) in Turkey.
TEZDIL is a programming language which has been designed for use with CNC tools. In this paper, TEZDIL is described as it is implemented as the software part of a project which was done for an automobile factory (TOFAS) in Turkey.
The combination of a personal computer and a printer provides remarkable opportunities for text editing and storing. The main idea of this project was to construct a pantograph with the similar features of a text editing computer. In order to realise a pantograph with microcomputer an X-Y table and an Amstrad CPC 6128 computer have been used. All n...
There are three generations of railboad control: 1) early electromechanical control; 2) central computer control; and 3) distributed computer control. In the second-generation, tracks were divided into numbers of track sections and applications had been aimed toward detecting individual track section as to whether or not there is a train on the tra...
ZET Tükçe, bütün dünya dilleri arasında kurallı olması ve düzenli yapısıyla farklı bir konumdadır. Bu çalışma kapsamında Türkçe'nin dilbilgisi kuralları kullanılarak geliştirilen bir bilgisayar yazılımı ile olası yeni sözcük kökleri üretilmiştir. Gerçeklenen çalışmada PC-Kimmo yazılımı için geliştirilmiş olan Türkçe sözlükler kullanılarak yaklaşık...
We evaluate the music composer classification using an approximation of the Kolmogorov distance be-tween different music pieces. The distance approximation has recently been suggested by Vitanyi and his col-leagues. They use a clustering method to evalute the distance metric. However the clustering is too slow for large (>60) data sets. We suggest...