Hossein Hassani

Hossein Hassani
University of Kurdistan Hewlêr (UKH) · Department of Computer Science and Engineering

PhD (Computer Science)
Natural Language Processing and Computational Linguistics (focusing on Kurdish)

About

55
Publications
40,274
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
203
Citations
Additional affiliations
March 2019 - present
University of Kurdistan Hewlêr (UKH)
Position
  • Professor (Assistant)
January 2007 - March 2019
University of Kurdistan Hewlêr (UKH)
Position
  • Lecturer

Publications

Publications (55)
Conference Paper
This research suggests a method for machine translation among two Kurdish dialects. We chose the two widely spoken dialects, Kurmanji and Sorani, which are considered to be mutually unintelligible. Also, despite being spoken by about 30 million people in different countries, Kurdish is among less-resourced languages. The research used bi-dialectal...
Conference Paper
Full-text available
Automatic dialect identification is a necessary Language Technology for processing multidialect languages in which the dialects are linguistically far from each other. Particularly, this becomes crucial where the dialects are mutually unintelligible. Therefore, to perform computational activities on these languages, the system needs to identify the...
Article
This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application...
Conference Paper
Full-text available
Kurdish language is spoken by almost 30 million people. Alongside with other problems, illiteracy and different types of disability are hindering information accessibility in the Kurdish community. This paper introduces a Kurdish Text-to-Speech software, which has been developed as the first attempt of using assistive technology in the Iraqi Kurdis...
Article
Full-text available
Research methods are essential parts in conducting any research project. Although they have been theorized and summarized based on best practices, every field of science requires an adaptation of the overall approaches to perform research activities. In addition, any specific research needs a particular adjustment to the generalized approach and sp...
Article
Hawrami, a dialect of Kurdish, is classified as an endangered language as it suffers from the scarcity of data and the gradual loss of its speakers. Natural Language Processing projects can be used to partially compensate for data availability for endangered languages/dialects through a variety of approaches, such as machine translation, language m...
Preprint
Full-text available
Hawrami, a dialect of Kurdish, is classified as an endangered language as it suffers from the scarcity of data and the gradual loss of its speakers. Natural Language Processing projects can be used to partially compensate for data availability for endangered languages/dialects through a variety of approaches, such as machine translation, language m...
Preprint
Full-text available
Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that...
Article
Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that...
Article
Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and contribute to the Kurdish languages resources which is crucial as Kurdish is considered a low-resource language. Curren...
Article
Full-text available
Classifying Sorani Kurdish subdialects poses a challenge due to the need for publicly available datasets or reliable resources like social media or websites for data collection. We conducted field visits to various cities and villages to address this issue, connecting with native speakers from different age groups, genders, academic backgrounds, an...
Preprint
Full-text available
Text-to-speech (TTS) synthesis is the technique of generating synthetic speech from input text. Developing a TTS system for Sorani (Central) Kurdish is a challenge due to the lack of resources for the language. In this research, we assess the development of a storytelling TTS system in Sorani Kurdish for children aged five to ten by comparing two d...
Article
Text-to-speech (TTS) synthesis is the technique of generating synthetic speech from input text. Developing a TTS system for Sorani (Central) Kurdish is a challenge due to the lack of resources for the language. In this research, we assess the development of a storytelling TTS system in Sorani Kurdish for children aged five to ten by comparing two d...
Article
Full-text available
Sentiment analysis is widely used in various areas and has versatile applications. For example, it is used in market research, customer retention strategies, and product analysis, to name a few. Although a few works on the topic exist for the Kurdish language, similar to other fields in Kurdish processing, it is not well-studied, and particularly i...
Preprint
Full-text available
Sentiment analysis is widely used in various areas and has versatile applications. For example, it is used in market research, customer retention strategies, and product analysis, to name a few. Although some previous work on the topic exists for the Kurdish language, similar to other fields in Kurdish processing, it is not well-studied, and partic...
Thesis
Accurate demand forecasting is a key-factor for business success, particularly in complex markets like the pharmaceutical sector. This study explores the application of two machine learning techniques that are Random Forest (RF) and Support Vector Machine (SVM) models, in enhancing demand forecasting within a local pharmaceutical company based in t...
Article
Full-text available
Textual data continues to multiply with time, Alongside the exponential growth of textual information, an increase in anonymous material has also been seen. Authorship detection has significant potential for usage in numerous applications of authorship analysis, such as history and literary science, Forensic examination, or Plagiarism detection. We...
Preprint
Full-text available
Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people. We work on automatic translation between spoken Kurdish and KuSL. Sign languages evolve rapidly and follow grammatical rules that differ from spoken languages. Consequently,those differences should be considered during any translation. We proposed an avatar-based autom...
Article
Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people. We work on automatic translation between spoken Kurdish and KuSL. Sign languages evolve rapidly and follow grammatical rules that differ from spoken languages. Consequently,those differences should be considered during any translation. We proposed an avatar-based autom...
Preprint
Full-text available
Named Entity Recognition (NER) is one of the essential applications of Natural Language Processing (NLP). It is also an instrument that plays a significant role in many other NLP applications, such as Machine Translation (MT), Information Retrieval (IR), and Part of Speech Tagging (POST). Kurdish is an under-resourced language from the NLP perspect...
Article
Named Entity Recognition (NER) is one of the essential applications of Natural Language Processing (NLP). It is also an instrument that plays a significant role in many other NLP applications, such as Machine Translation (MT), Information Retrieval (IR), and Part of Speech Tagging (POST). Kurdish is an under-resourced language from the NLP perspect...
Conference Paper
Full-text available
Music has different styles, and they are categorized into genres by musicologists. Nonetheless, non-musicologists categorize music differently, for example, by finding similarities and patterns in instruments, harmony, and style of the music. For instance, in addition to popular music genre categorization, such as classic, pop, and modern folkloric...
Article
Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems, thanks to deep learning methods, parallel corpora have remained indispensable for progress in the field. In an attempt to create parallel corpora for the Kurdish l...
Conference Paper
Full-text available
Spell checkers have become regular features of most word processing applications. They assist us in writing more correctly in various digital environments. However, this assistance does not exist for all languages equally. The Kurdish language, which still is considered a less-resourced language, currently, lacks well-known and well-tested spell ch...
Article
Tagged corpora play a crucial role in a wide range of Natural Language Processing. The Part of Speech Tagging (POST) is essential in developing tagged corpora. It is time-and-effort-consuming and costly, and therefore, it could be more affordable if it is automated. The Kurdish language currently lacks publicly available tagged corpora of proper si...
Preprint
Full-text available
Tagged corpora play a crucial role in a wide range of Natural Language Processing. The Part of Speech Tagging (POST) is essential in developing tagged corpora. It is time-and-effort-consuming and costly, and therefore, it could be more affordable if it is automated. The Kurdish language currently lacks publicly available tagged corpora of proper si...
Article
Musicologists use various labels to classify similar music styles under a shared title. But, non-specialists may categorize music differently. That could be through finding patterns in harmony, instruments, and form of the music. People usually identify a music genre solely by listening, but now computers and Artificial Intelligence (AI) can automa...
Preprint
Full-text available
Musicologists use various labels to classify similar music styles under a shared title. But, non-specialists may categorize music differently. That could be through finding patterns in harmony, instruments, and form of the music. People usually identify a music genre solely by listening, but now computers and Artificial Intelligence (AI) can automa...
Article
To consider Hawrami and Zaza (Zazaki) standalone languages or dialects of a language have been discussed and debated for a while among linguists active in studying Iranian languages. The question of whether those languages/dialects belong to the Kurdish language or if they are independent descendants of Iranian languages was answered by MacKenzie (...
Preprint
Full-text available
To consider Hawrami and Zaza (Zazaki) standalone languages or dialects of a language have been discussed and debated for a while among linguists active in studying Iranian languages. The question of whether those languages/dialects belong to the Kurdish language or if they are independent descendants of Iranian languages was answered by MacKenzie (...
Article
Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the developed tools mostly transliterate Persian-Arabic scripts into Latin. We present a transliterator to translit...
Preprint
Full-text available
Kurdish is written in different scripts. The two most popular scripts are Latin and Persian-Arabic. However, not all Kurdish readers are familiar with both mentioned scripts that could be resolved by automatic transliterators. So far, the developed tools mostly transliterate Persian-Arabic scripts into Latin. We present a transliterator to translit...
Article
Full-text available
Applications based on Long-Short-Term Memory (LSTM) require large amounts of data for their training. Tesseract LSTM is a popular Optical Character Recognition (OCR) engine that has been trained and used in various languages. However, its training becomes obstructed when the target language is not resourceful. This research suggests a remedy for th...
Preprint
Full-text available
Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have remained indispensable for progress in the field. In an attempt to create parallel corpora for the Kurdish la...
Preprint
Full-text available
Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology...
Conference Paper
The resources and technologies for sign language processing of resourceful languages are emerging, while the low-resource languages are falling behind. Kurdish is a multi-dialect language, and it is considered a low-resource language. It is spoken by approximately 30 million people in several countries, which denotes that it has a large community w...
Conference Paper
Kurdish poetry and prose narratives were historically transmitted orally and less in a written form. Being an essential medium of oral narration and literature, Kurdish lyrics have had a unique attribute in becoming a vital resource for different types of studies, including Digital Humanities, Computational Folkloristics and Computational Linguisti...
Conference Paper
Full-text available
Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the major bottlenecks in Kurdish language processing. We used Punkt, an unsupervised machine learning method, to segme...
Preprint
Full-text available
Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the major bottlenecks in Kurdish language processing. We used Punkt, an unsupervised machine learning method, to segme...
Article
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is...
Preprint
Full-text available
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is...
Conference Paper
This paper describes the development of lexicographic resources for Kurdish and provides a lexical model for this language. Kurdish is considered a less-resourced language, and currently, lacks machine-readable lexical resources. The unique potential which Linked Data and the Semantic Web offer to e-lexicography enables interoperability across lexi...
Preprint
Full-text available
Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish language processing. In this paper, we present KTC-the Kurdish Textbooks Corpus, which is composed of 31 K-12 textboo...
Conference Paper
Full-text available
Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish language processing. In this paper, we present KTC-the Kurdish Textbooks Corpus, which is composed of 31 K-12 textboo...
Preprint
Full-text available
This research suggests a framework, Digital Humanities Readiness Assessment Framework (DHuRAF), to assess the maturity level of the required infrastructure for Digital Humanities studies (DH) in different communities. We use a similar approach to the Basic Language Resource Kit (BLARK) in developing the suggested framework. DH as a fairly new field...
Article
This research suggests a framework, Digital Humanities Readiness Assessment Framework (DHuRAF), to assess the maturity level of the required infrastructure for Digital Humanities studies (DH) in different communities. We use a similar approach to the Basic Language Resource Kit (BLARK) in developing the suggested framework. DH as a fairly new field...
Article
Full-text available
Final Year Projects (FYPs) play a significant role in undergraduate education in the computing field of study, and most of the related university departments and schools consider them an essential contribution to this study. However, issues such as whether to assign the projects individually or to a group of students, the procedures followed in the...
Article
Full-text available
Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These fe...
Article
Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These fe...
Article
In this paper we introduce the Kurdish BLARK (Basic Language Resource Kit). The original BLARK has not considered multi-dialect characteristics and generally has targeted reasonably well-resourced languages. To consider these two features, we extended BLARK and applied the proposed extension to Kurdish. Kurdish language not only faces a paucity in...
Conference Paper
Dialect identification/classification is an important step in many language processing activities particularly with regard to multi-dialect languages. Kurdish is a multi-dialect language which is spoken by a large population in different countries. Some of the Kurdish dialects, for example, Kurmanji and Sorani, have significant grammatical differen...
Book
Full-text available
XML has become an important aspect of computing. It plays a crucial role in data interchange and manipulation. Several technologies have been developed around XML in order to make it a powerful tool for interchanging data, manipulating semistructured data, and information retrieval. Having knowledge about these broad range of technologies enables s...
Book
Full-text available
Every year, many Computer Science and IT students need to prepare themselves for their final year projects. This final project plays a great role in showing the efficiency of learning outcomes of modules that the students have taken during their studies. Once the time comes, a thousand questions arise: What kind of project should I do? What steps s...

Network

Cited By