Hossein HassaniUniversity of Kurdistan Hewlêr (UKH) · Department of Computer Science and Engineering
Hossein Hassani
PhD (Computer Science)
Natural Language Processing and Computational Linguistics (focusing on Kurdish)
About
55
Publications
40,274
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
203
Citations
Introduction
Additional affiliations
March 2019 - present
January 2007 - March 2019
Publications
Publications (55)
This research suggests a method for machine translation among two Kurdish dialects. We chose the two widely spoken dialects, Kurmanji and Sorani, which are considered to be mutually unintelligible.
Also, despite being spoken by about 30 million people in different countries, Kurdish is among less-resourced languages. The research used bi-dialectal...
Automatic dialect identification is a necessary Language Technology for processing multidialect
languages in which the dialects are linguistically far from each other. Particularly, this
becomes crucial where the dialects are mutually unintelligible. Therefore, to perform
computational activities on these languages, the system needs to identify the...
This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application...
Kurdish language is spoken by almost 30 million people. Alongside with
other problems, illiteracy and different types of disability are hindering
information accessibility in the Kurdish community. This paper introduces a
Kurdish Text-to-Speech software, which has been developed as the first
attempt of using assistive technology in the Iraqi Kurdis...
Research methods are essential parts in conducting any research project. Although they have been theorized and summarized based on best practices, every field of science requires an adaptation of the overall approaches to perform research activities. In addition, any specific research needs a particular adjustment to the generalized approach and sp...
Hawrami, a dialect of Kurdish, is classified as an endangered language as it suffers from the scarcity of data and the gradual loss of its speakers. Natural Language Processing projects can be used to partially compensate for data availability for endangered languages/dialects through a variety of approaches, such as machine translation, language m...
Hawrami, a dialect of Kurdish, is classified as an endangered language as it suffers from the scarcity of data and the gradual loss of its speakers. Natural Language Processing projects can be used to partially compensate for data availability for endangered languages/dialects through a variety of approaches, such as machine translation, language m...
Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that...
Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that...
Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and contribute to the Kurdish languages resources which is crucial as Kurdish is considered a low-resource language. Curren...
Classifying Sorani Kurdish subdialects poses a challenge due to the need for publicly available datasets or reliable resources like social media or websites for data collection. We conducted field visits to various cities and villages to address this issue, connecting with native speakers from different age groups, genders, academic backgrounds, an...
Text-to-speech (TTS) synthesis is the technique of generating synthetic speech from input text. Developing a TTS system for Sorani (Central) Kurdish is a challenge due to the lack of resources for the language. In this research, we assess the development of a storytelling TTS system in Sorani Kurdish for children aged five to ten by comparing two d...
Text-to-speech (TTS) synthesis is the technique of generating synthetic speech from input text. Developing a TTS system for Sorani (Central) Kurdish is a challenge due to the lack of resources for the language. In this research, we assess the development of a storytelling TTS system in Sorani Kurdish for children aged five to ten by comparing two d...
Sentiment analysis is widely used in various areas and has versatile applications. For example, it is used in market research, customer retention strategies, and product analysis, to name a few. Although a few works on the topic exist for the Kurdish language, similar to other fields in Kurdish processing, it is not well-studied, and particularly i...
Sentiment analysis is widely used in various areas and has versatile applications. For example, it is used in market research, customer retention strategies, and product analysis, to name a few. Although some previous work on the topic exists for the Kurdish language, similar to other fields in Kurdish processing, it is not well-studied, and partic...
Accurate demand forecasting is a key-factor for business success, particularly in complex markets like the pharmaceutical sector. This study explores the application of two machine learning techniques that are Random Forest (RF) and Support Vector Machine (SVM) models, in enhancing demand forecasting within a local pharmaceutical company based in t...
Textual data continues to multiply with time, Alongside the exponential growth of textual information, an increase in anonymous material has also been seen. Authorship detection has significant potential for usage in numerous applications of authorship analysis, such as history and literary science, Forensic examination, or Plagiarism detection. We...
Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people. We work on automatic translation between spoken Kurdish and KuSL. Sign languages evolve rapidly and follow grammatical rules that differ from spoken languages. Consequently,those differences should be considered during any translation. We proposed an avatar-based autom...
Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people. We work on automatic translation between spoken Kurdish and KuSL. Sign languages evolve rapidly and follow grammatical rules that differ from spoken languages. Consequently,those differences should be considered during any translation. We proposed an avatar-based autom...
Named Entity Recognition (NER) is one of the essential applications of Natural Language Processing (NLP). It is also an instrument that plays a significant role in many other NLP applications, such as Machine Translation (MT), Information Retrieval (IR), and Part of Speech Tagging (POST). Kurdish is an under-resourced language from the NLP perspect...
Named Entity Recognition (NER) is one of the essential applications of Natural Language Processing (NLP). It is also an instrument that plays a significant role in many other NLP applications, such as Machine Translation (MT), Information Retrieval (IR), and Part of Speech Tagging (POST). Kurdish is an under-resourced language from the NLP perspect...
Music has different styles, and they are categorized into genres by musicologists. Nonetheless, non-musicologists categorize music differently, for example, by finding similarities and patterns in instruments, harmony, and style of the music. For instance, in addition to popular music genre categorization, such as classic, pop, and modern folkloric...
Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems, thanks to deep learning methods, parallel corpora have remained indispensable for progress in the field. In an attempt to create parallel corpora for the Kurdish l...
Spell checkers have become regular features of most word processing applications. They assist us in writing more correctly in various digital environments. However, this assistance does not exist for all languages equally. The Kurdish language, which still is considered a less-resourced language, currently, lacks well-known and well-tested spell ch...
Tagged corpora play a crucial role in a wide range of Natural Language Processing.
The Part of Speech Tagging (POST) is essential in developing tagged corpora. It is
time-and-effort-consuming and costly, and therefore, it could be more affordable if it
is automated. The Kurdish language currently lacks publicly available tagged corpora
of proper si...
Tagged corpora play a crucial role in a wide range of Natural Language Processing. The Part of Speech Tagging (POST) is essential in developing tagged corpora. It is time-and-effort-consuming and costly, and therefore, it could be more affordable if it is automated. The Kurdish language currently lacks publicly available tagged corpora of proper si...
Musicologists use various labels to classify similar music styles under a shared title. But, non-specialists may categorize music differently. That could be through finding patterns in harmony, instruments, and form of the music. People usually identify a music genre solely by listening, but now computers and Artificial Intelligence (AI) can automa...
Musicologists use various labels to classify similar music styles under a shared title. But, non-specialists may categorize music differently. That could be through finding patterns in harmony, instruments, and form of the music. People usually identify a music genre solely by listening, but now computers and Artificial Intelligence (AI) can automa...
To consider Hawrami and Zaza (Zazaki) standalone languages or dialects of a language have been discussed and debated for a while among linguists active in studying Iranian
languages. The question of whether those languages/dialects belong to the Kurdish
language or if they are independent descendants of Iranian languages was answered by
MacKenzie (...
To consider Hawrami and Zaza (Zazaki) standalone languages or dialects of a language have been discussed and debated for a while among linguists active in studying Iranian languages. The question of whether those languages/dialects belong to the Kurdish language or if they are independent descendants of Iranian languages was answered by MacKenzie (...
Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the developed tools mostly transliterate Persian-Arabic scripts into Latin. We present a transliterator to translit...
Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the developed tools mostly transliterate Persian-Arabic scripts into Latin. We present a transliterator to translit...
Applications based on Long-Short-Term Memory (LSTM) require large amounts of data for their training. Tesseract LSTM is a popular Optical Character Recognition (OCR) engine that has been trained and used in various languages. However, its training becomes obstructed when the target language is not resourceful. This research suggests a remedy for th...
Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have remained indispensable for progress in the field. In an attempt to create parallel corpora for the Kurdish la...
Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology...
The resources and technologies for sign language processing of resourceful languages are emerging, while the low-resource languages are falling behind. Kurdish is a multi-dialect language, and it is considered a low-resource language. It is spoken by approximately 30 million people in several countries, which denotes that it has a large community w...
Kurdish poetry and prose narratives were historically transmitted orally and less in a written form. Being an essential medium of oral narration and literature, Kurdish lyrics have had a unique attribute in becoming a vital resource for different types of studies, including Digital Humanities, Computational Folkloristics and Computational Linguisti...
Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the major bottlenecks in Kurdish language processing. We used Punkt, an unsupervised machine learning method, to segme...
Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the major bottlenecks in Kurdish language processing. We used Punkt, an unsupervised machine learning method, to segme...
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is...
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is...
This paper describes the development of lexicographic resources for Kurdish and provides a lexical model for this language. Kurdish is considered a less-resourced language, and currently, lacks machine-readable lexical resources. The unique potential which Linked Data and the Semantic Web offer to e-lexicography enables interoperability across lexi...
Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish language processing. In this paper, we present KTC-the Kurdish Textbooks Corpus, which is composed of 31 K-12 textboo...
Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish language processing. In this paper, we present KTC-the Kurdish Textbooks Corpus, which is composed of 31 K-12 textboo...
This research suggests a framework, Digital Humanities Readiness Assessment Framework (DHuRAF), to assess the maturity level of the required infrastructure for Digital Humanities studies (DH) in different communities. We use a similar approach to the Basic Language Resource Kit (BLARK) in developing the suggested framework. DH as a fairly new field...
This research suggests a framework, Digital Humanities Readiness Assessment Framework (DHuRAF), to assess the maturity level of the required infrastructure for Digital Humanities studies (DH) in different communities. We use a similar approach to the Basic Language Resource Kit (BLARK) in developing the suggested framework. DH as a fairly new field...
Final Year Projects (FYPs) play a significant role in undergraduate education in the computing field of study, and most of the related university departments and schools consider them an essential contribution to this study. However, issues such as whether to assign the projects individually or to a group of students, the procedures followed in the...
Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These fe...
Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These fe...
In this paper we introduce the Kurdish BLARK (Basic Language Resource Kit). The original BLARK has not considered multi-dialect characteristics and generally has targeted reasonably well-resourced languages. To consider these two features, we extended BLARK and applied the proposed extension to Kurdish. Kurdish language not only faces a paucity in...
Dialect identification/classification is an important step in many language processing activities particularly with regard to multi-dialect languages. Kurdish is a multi-dialect language which is spoken by a large population in different countries. Some of the Kurdish dialects, for example, Kurmanji and Sorani, have significant grammatical differen...
XML has become an important aspect of computing. It plays a crucial role in data interchange and manipulation. Several technologies have been developed around XML in order to make it a powerful tool for interchanging data, manipulating semistructured data, and information retrieval. Having knowledge about these broad range of technologies enables s...
Every year, many Computer Science and IT students need to prepare themselves for their final year projects. This final project plays a great role in showing the efficiency of learning outcomes of modules that the students have taken during their studies. Once the time comes, a thousand questions arise: What kind of project should I do? What steps s...