Conference Paper

Linguistic intellectual analysis methods for Ukrainian textual content processing

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
ChatGPT, an artificial intelligence model, has garnered significant interest within education. This study examined public sentiment regarding ChatGPT's influence on education by utilizing web mining and natural language processing (NLP) techniques. By adopting an empirical approach and leveraging machine learning models to process 2003 web articles, the study extracts valuable insights. The results indicate that ChatGPT has emerged as a crucial educational tool, offering advantages for both students and educators. Notably, the study emphasized ChatGPT's role in enhancing students' writing abilities and fostering dynamic, interactive learning environments. ChatGPT's capacity to address a broad spectrum of questions demonstrates its versatility and adaptability, contributing to more inclusive and personalized educational experiences. However, the study also uncovered challenges tied to academic integrity, such as plagiarism and cheating, which stem from incorporating AI-driven tools like ChatGPT into education. This raises concerns regarding ethical aspects, including responsible AI usage and data privacy, and highlights the need for institutions to develop guidelines and policies for AI tool implementation in education. This study's findings hold theoretical and practical implications for integrating ChatGPT into educational settings. It is the first to employ web mining and NLP techniques to analyze public opinions on ChatGPT's impact on education comprehensively.
Article
Full-text available
Zipf's Law of Abbreviation - the idea that more frequent symbols in a code are simpler than less frequent ones - has been shown to hold at the level of words in many languages. We tested whether it holds at the level of individual written characters. Character complexity is similar to word length in that it requires more cognitive and motor effort for producing and processing more complex symbols. We built a dataset of character complexity and frequency measures covering 27 different writing systems. According to our data, Zipf's Law of Abbreviation holds for every writing system in our dataset - the more frequent characters have lower degrees of complexity and vice-versa. This result provides further evidence of optimization mechanisms shaping communication systems.
Article
Full-text available
The article develops a technology for finding tweet trends based on clustering, which forms a data stream in the form of short representations of clusters and their popularity for further research of public opinion. The accuracy of their result is affected by the natural language feature of the information flow of tweets. An effective approach to tweet collection, filtering, cleaning and pre-processing based on a comparative analysis of Bag of Words, TF-IDF and BERT algorithms is described. The impact of stemming and lemmatization on the quality of the obtained clusters was determined. Stemming and lemmatization allow for significant reduction of the input vocabulary of Ukrainian words by 40.21% and 32.52% respectively. And optimal combinations of clustering methods (K-Means, Agglomerative Hierarchical Clustering and HDBSCAN) and vectorization of tweets were found based on the analysis of 27 clustering of one data sample. The method of presenting clusters of tweets in a short format is selected. Algorithms using the Intelligent Analysis of Ukrainian-language Tweets for Public Opinion Research based on NLP Methods and Machine Learning Technology Volume 15 (2023), Issue 3 71 Levenstein Distance, i.e. fuzz sort, fuzz set and Levenshtein, showed the best results. These algorithms quickly perform checks, have a greater difference in similarities, so it is possible to more accurately determine the limit of similarity. According to the results of the clustering, the optimal solutions are to use the HDBSCAN clustering algorithm and the BERT vectorization algorithm to achieve the most accurate results, and to use K-Means together with TF-IDF to achieve the best speed with the optimal result. Stemming can be used to reduce execution time. In this study, the optimal options for comparing cluster fingerprints among the following similarity search methods were experimentally found: Fuzz Sort, Fuzz Set, Levenshtein, Jaro Winkler, Jaccard, Sorensen, Cosine, Sift4. In some algorithms, the average fingerprint similarity reaches above 70%. Three effective tools were found to compare their similarity, as they show a sufficient difference between comparisons of similar and different clusters (> 20%). The experimental testing was conducted based on the analysis of 90,000 tweets over 7 days for 5 different weekly topics: President Volodymyr Zelenskyi, Leopard tanks, Boris Johnson, Europe, and the bright memory of the deceased. The research was carried out using a combination of K-Means and TF-IDF methods, Agglomerative Hierarchical Clustering and TF-IDF, HDBSCAN and BERT for clustering and vectorization processes. Additionally, fuzz sort was implemented for comparing cluster fingerprints with a similarity threshold of 55%. For comparing fingerprints, the most optimal methods were fuzz sort, fuzz set, and Levenshtein. In terms of execution speed, the best result was achieved with the Levenshtein method. The other two methods performed three times worse in terms of speed, but they are nearly 13 times faster than Sift4. The fastest method is Jaro Winkler, but it has a 19.51% difference in similarities. The method with the best difference in similarities is fuzz set (60.29%). Fuzz sort (32.28%) and Levenshtein (28.43%) took the second and third place respectively. These methods utilize the Levenshtein distance in their work, indicating that such an approach works well for comparing sets of keywords. Other algorithms fail to show significant differences between different fingerprints, suggesting that they are not adapted to this type of task.
Article
Full-text available
Citation: Lytvyn, V.; Pukach, P.; Vysotska, V.; Vovk, M.; Kholodna, N. Identification and Correction of Grammatical Errors in Ukrainian Texts Based on Machine Learning Technology. Mathematics 2023, 11, 904. Abstract: A machine learning model for correcting errors in Ukrainian texts has been developed. It was established that the neural network has the ability to correct simple sentences written in Ukrainian; however, the development of a full-fledged system requires the use of spell-checking using dictionaries and the checking of rules, both simple and those based on the result of parsing dependencies or other features. In order to save computing resources, a pre-trained BERT (Bidirectional Encoder Representations from Transformer) type neural network was used. Such neural networks have half as many parameters as other pre-trained models and show satisfactory results in correcting grammatical and stylistic errors. Among the ready-made neural network models, the pre-trained neural network model mT5 (a multilingual variant of T5 or Text-to-Text Transfer Transformer) showed the best performance according to the BLEU (bilingual evaluation understudy) and METEOR (metric for evaluation of translation with explicit ordering) metrics.
Article
Full-text available
Due to the exponential growth of Internet users and traffic, information seekers depend highly on search engines to extract relevant information. Due to the accessibility of a large amount of textual, audio, video etc., contents, the responsibility of search engines has increased. The search engine provides relevant information to Internet users concerning their query, based on content, link structure, etc. However, it does not guarantee the correctness of the information. The performance of a search engine is highly dependent upon the ranking module. The performance of the ranking module is dependent upon the link structure of web pages, which analyze through Web structure mining (WSM) and their content, which analyzes through Web content mining (WCM). Web mining plays a vital role in computing the rank of web pages. This article presents web mining types, techniques, tools, algorithms, and their challenges. Further, it provides a critical comprehensive survey for the researchers by presenting different features of web pages, which are essential to check their quality. In this work, authors presented different approaches/techniques, algorithms and evaluation approaches in previous researches and identified some critical issues in page ranking and web mining, which provide future directions for the researchers working in the area.
Chapter
Full-text available
Speech has been the most popular form of human communication. A keyboard or a mouse, on the other hand, is the most common way of entering data into a computer. It would be wonderful if computers could understand and carry out human commands. The method of obtaining the transcription (word sequence) of an utterance from the speech waveform is known as automatic speech recognition (ASR). Over the last few decades, speech technology and systems in human-computer interaction have progressed progressively and significantly. This chapter suggests a comprehensive review of automatic speech recognition systems (ASR) and their most recent developments. This research aims to outline and explain some of the popular approaches in speech recognition systems at various stages and highlight selected systems’ unique and innovative characteristics.
Conference Paper
Full-text available
This paper deals with the different methods, particularly statistical analysis and text mining, which help in stylistic research. The examination of the lexical and semantic features of meiosis and litotes in the novel The Catcher in the Rye by Jerome David Salinger is presented as an example. The examination in question has been carried out with the help of the programming language R. To have a good-quality research, the specific features of litotes and meiosis have been explored thoughtfully. Therefore, the broad range of possible scientific views has been described, and, subsequently, we have made a general assumption of typical linguistic patterns of meiosis and litotes. Using the obtained insights, it is possible to apply different tools of text mining in stylistic research. The present paper outlines in detail the creation of concordances, word frequencies and sentiment analysis. To reach our goal, we have used the programming language R and the R packages which are distributed by members of the community. In the scope of concordances, the concept of Key Word in Context has been discussed as well, and the advantages of using concordances in stylistic research have been introduced. The possible implementation of statistical analysis in the research of litotes has been proposed and discussed. Within the framework of sentiment analysis, we have focused on the negation, and how it affects the opinion orientation. Thus, the present paper also aims to validate the importance of litotes in sentiment analysis, as litotes are directly linked to the effects of negation. The results of each stage of the research have been provided and meticulously discussed.
Conference Paper
Full-text available
The paper describes the first phase of semantic annotation implemented in the General Regionally Annotated Corpus of Ukrainian (GRAC) using the Ukrainian Semantic Lexicon (USL) and the TagText tagger for Ukrainian. Over 1,000 most frequent lemmas were supplied with semantic tags, creating the foundation for the lexicon. In the process of developing the USL, the original semantic tagset underwent changes and was expanded. The revised tagset is presented, and the linguistic aspects of practical semantic annotation are analyzed. The TagText tagger was updated to enable both morphological and semantic annotation of Ukrainian texts. The current versions of the USL and TagText are released and available for download. Text coverage by semantic tags in GRAC is discussed, and examples of semantic and complex searches in the GRAC corpus are provided. Plans for future work on the USL are outlined.
Article
Full-text available
The importance of text mining is increasing in services management as the access to big data is increasing across digital platforms enabling such services. This study adopts a systematic literature review on the application of text mining in services management. First, we analyzed the literature on which has used text mining methods like Sentiment Analysis, Topic Modeling, and Natural language Processing (NLP) in reputed business management journals. Further, we applied visualization tools for text mining and the topic association to understand the dominant themes and relationships. The analysis highlighted that social media analysis, market analysis, competitive intelligence are the most dominant themes while other themes like risk management and fake content detection are also explored. Further, based on the analysis, future research agenda in the field of text mining in services management has been indicated.
Conference Paper
Full-text available
A semantic tagset for the semantic annotation of Ukrainian-language texts is presented, and the use of the taxonomic approach is substantiated. The categorization scheme implemented in the tagset takes into account the cognitive-linguistic perspective on categorization, specifically the basic level of categorization. Semantic tags are to be assigned to lemmas in the existing Large Electronic Dictionary of Ukrainian (VESUM) yielding a semantic lexicon that will be used by the TagText tagger (both tools developed by the r2u team) to add semantic annotation to the GRAC corpus. Used in conjunction with POS tags, semantic tags will serve as a powerful tool for the linguistic exploration of corpus data and for solving NLP tasks involving Ukrainian.
Article
Full-text available
Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. While often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological papers have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Further, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the dataset’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.
Article
Full-text available
Context: Identifying emerging research fronts is critical as it aids policy makers or funding agencies on their decisions in research policies, it is also a useful tool for guiding young researchers’ research direction. Studies have successfully used techniques such as co-citation and co-word analysis of data retrieved from ISI databases to investigate emerging or dying research trends. Aim: With the advent of publicly available preprint databases such as BioRxiv, it becomes necessary to investigate if the state of the art techniques used to identify emerging research areas can be transferred to a dataset extracted from these public archives. Methods and Materials: A cluster analysis of keyword burst and author burst from data extracted from BioRxiv is used to investigate the suitability of BioRxiv dataset for the investigation of emerging or dying research trends. Results: The results showed that although the data retrieved from BioRxiv may not yet be mature enough for reliable analyses, the increased awareness shown in the exponential growth in preprint submissions suggests that this data source will be a valuable resource and the techniques described is this research can be used to discover interesting trends from preprint databases on emerging or dying research fronts.
Article
Full-text available
As the amount of generated information grows, reading and summarizing texts of large collections turns into a challenging task. Many documents do not come with descriptive terms, thus requiring humans to generate keywords on-the-fly. The need to automate this kind of task demands the development of keyword extraction systems with the ability to automatically identify keywords within the text. One approach is to resort to machine-learning algorithms. These, however, depend on large annotated text corpora, which are not always available. An alternative solution is to consider an unsupervised approach. In this article, we describe YAKE!, a light-weight unsupervised automatic keyword extraction method which rests on statistical text features extracted from single documents to select the most relevant keywords of a text. Our system does not need to be trained on a particular set of documents, nor does it depend on dictionaries, external corpora, text size, language, or domain. To demonstrate the merits and significance of YAKE!, we compare it against ten state-of-the-art unsupervised approaches and one supervised method. Experimental results carried out on top of twenty datasets show that YAKE! significantly outperforms other unsupervised methods on texts of different sizes, languages, and domains.
Article
Full-text available
Розробленно лiнгвометричний метод алгоритмiчного забезпечення процесiв кон-тент-монiторiнгу для розв’язання задачi авто-матичного визначення автора україномовного текстового контенту на основi технологiї ста-тистичного аналiзу коефiцiєнтiв мовної рiз-номанiтностi. Проведено декомпозицiю мето-ду визначення автора на основi аналiзу таких коефiцiєнтiв мовлення як лексична рiзноманiт-нiсть, ступiнь (мiра) синтаксичної склад-ностi, зв’язнiсть мовлення, iндекси винятко-востi та концентрацiї тексту. Проаналiзованi також параметри авторського стилю як кiль-кiсть слiв у певному текстi, загальна кiлькiсть слiв цього тексту, кiлькiсть речень, кiлькiсть прийменникiв, кiлькiсть сполучникiв, кiлькiсть слiв iз частотою 1, та кiлькiсть слiв iз часто-тою 10 та бiльше. Особливостями розробленого є адаптацiя морфологiчного та синтаксичного аналiзу лек-сичних одиниць до особливостей конструкцiй україномовних слiв/текстiв. Тобто при аналiзi лiнгвiстичних одиниць типу слiв враховувалась належнiсть до частини мови та вiдмiнювання в межах цiєї частини мови. Для цього прова-дився аналiз флексiй цих слiв для класифiкацiї, видiлення основи для формування вiдповiдних алфавiтно-частотних словникiв. Наповнення цих словникiв в подальшому враховувалися на наступних кроках визначення авторства тек-сту як розрахунок параметрiв та коефiцiєнтiв авторського мовлення. Для iндивiдуального стилю письменника показовими є саме служ-бовi (стоповi або опорнi) слова, оскiльки вони нiяк не пов’язанi з темою i змiстом публiкацiї. Проведено порiвняння результатiв на мно-жинi 200 одноосiбних робiт технiчного спря-мування бiля 100 рiзних авторiв за перiод 2001–2017 рр. для визначення, чи змiнюються i як коефiцiєнти рiзноманiтностi тексту цих авторiв в рiзнi промiжки часу. Виявлено, що для обраної експериментальної бази з понад 200 робiт найкращих результатiв за критерi-єм щiльностi досягає метод аналiзу статтi без початкової обов’язкової iнформацiї як анота-цiї та ключовi слова рiзними мовами, а також списку лiтературиКлючовi слова: NLP, контент-монiторiнг, стоп-слова, контент-аналiз, статистийний лiнгвiстичний аналiз, квантитативна лiнгвi стика Development of the linguometric method for automatic identification of the author of text content based on statistical analysis of language diversity coefficients | Request PDF. Available from: https://www.researchgate.net/publication/328392533_Development_of_the_linguometric_method_for_automatic_identification_of_the_author_of_text_content_based_on_statistical_analysis_of_language_diversity_coefficients [accessed Nov 02 2018].
Article
Full-text available
The study has solved the task of making comparative analysis and choosing an optimal statistical method to determine stable word combinations while identifying keywords to process English-language and Ukrainian-language Web-resources. The effectiveness of the method directly proportionally depends on the quality of linguistic analysis, of Ukrainian and English texts, respectively, based on the technology of Web Mining and NLP. A decomposition of methods of linguistic analysis was performed to determine the impact on the quality of forming stable word combinations as keywords. The features of the method are the adaptation of the morphological and syntactic analyses of lexical units to the peculiarities of Ukrainian-language words/texts. To determine stable word combinations effectively, it is essential to exclude functional words (stops or references), pronouns, numerals and verbs because they are not related to the subject and content of a published work. A set of stable word combinations as keywords is determined by qualitative morphological and syntactic analyses of relevant texts. The set of the identified stable word combinations is used further to compare and determine the degree of the text relevance to a specific topic or user request. The internal “dynamics” of forming a set of stable word combinations as keywords was investigated in the study depending on the statistical method applied to the texts. The obtained results have been verified. The study has produced results of the experimental testing of the proposed content-monitoring method for determining stable word combinations to identify keywords in the processing of English-language and Ukrainian-language web-resources of the technical content based on Web Mining technology. It has been determined that the authors of published works often identify the keywords that are far from being considered. It has also been proven that the quality of the result is influenced by the quality of linguistic analysis of texts and subsequent filtering. Further experimental research requires approbation of the proposed method for determining keywords for other categories of texts – scientific, humanitarian, belletristic, journalistic, etc.
Article
In this paper, an end-to-end multi-task deep neural network was proposed for simultaneous script identification and Keyword Spotting (KWS) in multi-lingual hand-written and printed document images. We introduced a unified approach which addresses both challenges cohesively, by designing a novel CNN-BLSTM architecture. The script identification stage involves local and global features extraction to allow the network to cover more relevant information. Contrarily to the traditional feature fusion approaches which build a linear feature concatenation, we employed a compact bi-linear pooling to capture pairwise correlations between these features. The script identification result is, then, injected in the KWS module to eliminate characters of irrelevant scripts and perform the decoding stage using a single-script mode. All the network parameters were trained in an end-to-end fashion using a multi-task learning that jointly minimizes the NLL loss for the script identification and the CTC loss for the KWS. Our approach was evaluated on a variety of public datasets of different languages and writing types.. Experiments proved the efficacy of our deep multi-task representation learning compared to the state-of-the-art systems for both of keyword spotting and script identification tasks.
Article
Scenario development is an established foresight method. However, scenario processes require much time and the integration of a balanced set of initial information (reports, expert interviews, etc.) remains a challenge. One of the key tasks in scenario development is to capture the topic and identify its key influences. This has potential for improvements. In times of big data, far more options exist for the rapid exploration of a topic than manual literature analysis. Hence, this work examines web and text mining for its usability in data retrieval and aggregation to improve scenario development. In this article, a new scenario process is proposed and described using the topic quantified self as an example. As the results show, web and text mining present a very good starting point for discussing the scenario content. The rapid overview with the visualizations remarkably reduces the reading effort. Still, future projections need to be searched manually, but the results from the automatic analysis comprehensively guide this step.
Article
The number of received citations have been used as an indicator of the impact of academic publications. Developing tools to find papers that have the potential to become highly-cited has recently attracted increasing scientific attention. Topics of concern by scholars may change over time in accordance with research trends, resulting in changes in received citations. Author-defined keywords, title and abstract provide valuable information about a research article. This study performs a latent Dirichlet allocation technique to extract topics and keywords from articles; five keyword popularity (KP) features are defined as indicators of emerging trends of articles. Binary classification models are utilized to predict papers that were highly-cited or less highly-cited by a number of supervised learning techniques. We empirically compare KP features of articles with other commonly used journal-related and author-related features proposed in previous studies. The results show that, with KP features, the prediction models are more effective than those with journal and/or author features, especially in the management information system discipline.
Article
Authors evaluated supervised automatic classification algorithms for determination of health related web-page compliance with individual HONcode criteria of conduct using varying length character n-gram vectors to represent healthcare web page documents. The training/testing collection comprised web page fragments extracted by HONcode experts during the manual certification process. The authors compared automated classification performance of n-gram tokenization to the automated classification performance of document words and Porter-stemmed document words using a Naive Bayes classifier and DF (document frequency) dimensionality reduction metrics. The study attempted to determine whether the automated, language-independent approach might safely replace word-based classification. Using 5-grams as document features, authors also compared the baseline DF reduction function to Chi-square and Z-score dimensionality reductions. Overall study results indicate that n-gram tokenization provided a potentially viable alternative to document word stemming.
Article
This issue's expert guest column is by Eric Allender, who has just taken over the Structural Complexity Column in the Bulletin of the EATCS.Regarding "Journals to Die For" (SIGACT News Complexity Theory Column 16), Joachim von zur Gathen, ...
Article
The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. Possible applications are to the problems of automatic spelling correction and determining the longest subsequence of characters common to two strings.
Article
This report indicates the level of computer development and application in each of the thirty countries of Europe, most of which were recently visited by the author
Article
In this paper we wish to show that the fundamental problem of determining the utility of a communication channel in conveying information can be interpreted as a problem within the framework of multistage decision processes of stochastic type, and as such may be treated by means of the theory of dynamic programming. We shall begin by formulating some aspects of the general problem in terms of multistage decision processes, with brief descriptions of stochastic allocation processes and learning processes. Following this, as a simple example of the applicability of the techniques of dynamic programming, we shall discuss in detail a problem posed recently by Kelly. In this paper, it is shown by Kelly that under certain conditions, the rate of transmission, as defined by Shannon, can be obtained from a certain multistage decision process with an economic criterion. Here we shall complete Kelly's analysis in some essential points, using functional equation techniques, and considerably extend his results.
The Role of Keyword Language in the Database of World Slavic linguistics "iSybislaw
  • A Taran
A. Taran, The Role of Keyword Language in the Database of World Slavic linguistics "iSybislaw", CEUR Workshop Proceedings 3171 (2022) 266-276.
Keyword-based Study of Thematic Vocabulary in British Weather News
  • N Bondarchuk
N. Bondarchuk, et. al., Keyword-based Study of Thematic Vocabulary in British Weather News, CEUR Workshop Proceedings 3171 (2022) 451-460.
The Applicability of Zipf's Law in Report Text
  • Z Yang
  • Z Xiangyi
Z. Yang, Z. Xiangyi, The Applicability of Zipf's Law in Report Text, Lecture Notes on Language and Literature 6(10) (2023) 57-64.
Linguistic analysis method of Ukrainian commercial textual content for data mining
  • O Bisikalo
  • V Vysotska
O. Bisikalo, V. Vysotska, Linguistic analysis method of Ukrainian commercial textual content for data mining, CEUR Workshop Proceedings 2608 (2020). 224-244.
VESUM: A Large Morphological Dictionary of Ukrainian As a Dynamic Tool
  • V Starko
  • A Rysin
V. Starko, A. Rysin, VESUM: A Large Morphological Dictionary of Ukrainian As a Dynamic Tool, CEUR Workshop Proceedings 3171 (2022) 61-70.
Ukrainian Feminine Personal Nouns in Online Dictionaries and Corpora
  • O Synchak
  • V Starko
O. Synchak, V. Starko, Ukrainian Feminine Personal Nouns in Online Dictionaries and Corpora, CEUR Workshop Proceedings 3171 (2022) 775-790.
A comparative analysis for English and Ukrainian texts processing based on semantics and syntax approach
  • V Vysotska
  • S Holoshchuk
  • R Holoshchuk
V. Vysotska, S. Holoshchuk, R. Holoshchuk, A comparative analysis for English and Ukrainian texts processing based on semantics and syntax approach, CEUR Workshop Proceedings 2870 (2021) 311-356.
Correlation Analysis of Text Author Identification Results Based on N-Grams Frequency Distribution in Ukrainian Scientific and Technical Articles
  • V Vysotska
  • O Markiv
  • S Teslia
  • Y Romanova
  • I Pihulechko
V. Vysotska, O. Markiv, S. Teslia, Y. Romanova, I. Pihulechko, Correlation Analysis of Text Author Identification Results Based on N-Grams Frequency Distribution in Ukrainian Scientific and Technical Articles, CEUR Workshop Proceedings 3171 (2022) 277-314.
Machine Learning Model for Paraphrases Detection Based on Text Content Pair Binary Classification
  • N Kholodna
  • V Vysotska
  • O Markiv
  • S Chyrun
N. Kholodna, V. Vysotska, O. Markiv, S. Chyrun, Machine Learning Model for Paraphrases Detection Based on Text Content Pair Binary Classification, CEUR Workshop Proceedings 3312 (2022) 283-306.
Using Topic Modeling for Automation Search to Reviewer
  • Y Hlavcheva
  • O Kanishcheva
  • М Vovk
  • M Glavchev
Y. Hlavcheva, O. Kanishcheva, М. Vovk, M. Glavchev, Using Topic Modeling for Automation Search to Reviewer, CEUR Workshop Proceedings 3171 (2022) 81-90.
Automatic Multilingual Ontology Generation Based on Texts Focused on Criminal Topic
  • N Khairova
  • A Kolesnyk
  • O Mamyrbayev
  • G Ybytayeva
  • Y Lytvynenko
N. Khairova, A. Kolesnyk, O. Mamyrbayev, G. Ybytayeva, Y. Lytvynenko, Automatic Multilingual Ontology Generation Based on Texts Focused on Criminal Topic, CEUR Workshop Proceedings 2870 (2021) 108-117.
Lexical Diversity Parameters Analysis for Author's Styles in Scientific and Technical Publications
  • V Motyka
  • Y Stepaniak
  • M Nasalska
  • V Vysotska
V. Motyka, Y. Stepaniak, M. Nasalska, V. Vysotska, Lexical Diversity Parameters Analysis for Author's Styles in Scientific and Technical Publications, CEUR Workshop Proceedings 3403 (2023) 595-617.
The Game Method for Orthonormal Systems Construction
  • P Kravets
P. Kravets, The Game Method for Orthonormal Systems Construction, in Proceedings of the 9th International Conference -The Experience of Designing and Applications of CAD Systems in Microelectronics, 2007. doi: doi.org/10.1109/cadsm.2007.4297555.