Article

Methodolo- gical Aspects of Semantic Relationship Extraction for Automatic Thesaurus Generation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To determine the amount of information at the level of its semantic content (semantic level), the thesaurus measure (Groot et al., 2016;Lagutina et al., 2016;LeCun et al., 2015;Mai et al., 2017;Mai et al., 2018;Wilson et al., 2019). This characteristic determines semantic properties through the student's ability to accept (perceive and assimilate) the information received (Chernigovskaya et al., 2016;Kiselev, 2018;Popova, 2012). ...
Conference Paper
Full-text available
Одним из направлений инновационного развития педагогики является поиск путей для повышения доступности учебного текста материал в связи со снижением читательской компетентности студентов. Для преодоления фрагментарного восприятия учебных текстов необходимо опираться на результаты современных исследований в различных науках. Предварительный анализ показал, что эффективность восприятия и усвоения информации может быть повышена за счет вовлечения бессознательного компонента психики путем фиксации внимания на наиболее повторяющихся терминах и понятиях. Одним из решений является разработка математической модели и программного продукта для дидактического анализа совместимости образовательных информация путем определения расширенного тезауруса, который по отношению к образовательной информации подразумевает терминологию в качестве основного фиксатора значений образовательной информации. Программное обеспечение было разработано на основе фреймворка Django, математической модели,реализованной в python 3.7 с использованием библиотеки scikit-learn. Разработанный инструмент актуален в контексте непрерывного образования для анализа и обработки учебных текстов без учета их семантической совместимости и закономерностей восприятия учебной информации. То постепенное приращение тезауруса в рамках необходимого уровня восприятия позволит нивелировать негативные тенденции в изменении качества восприятия информации студентами. (2) (PDF) Программное Обеспечение Для Анализа Смысловой Совместимости Учебных Текстов. Available from: https://www.researchgate.net/publication/346343658_Software_For_Sense_Compatibility_Analysis_Of_Educational_Texts [accessed Jan 25 2022].
... In such training, an important aspect is the ability to navigate professionally relevant information, quickly determine its optimal volume, content and key aspects, if necessary, constantly updating the knowledge base, taking into account changes in all normative documents (Popova, 2012;Schenk, 2012;. The modern digital space provides opportunities and potential for optimizing the work with a large amount of information (Andreeva & Ushakov, 2019;Golitsyna et al., 2016;Groot et al., 2016;Kiselev et al., 2018;Kutuzov & Kuzmenko, 2019;Lagutina et al., 2016;LeCun et al., 2015;Mai et al., 2017Mai et al., , 2018Metcalfe, 2017;Pushkareva & Kalitina, 2018;Rybakova et al., 2015). ...
Article
The paper reviews the existing Russian-language thesauri in digital form and methods of their automatic construction and application. The authors analyzed the main characteristics of open access thesauri for scientific research, evaluated trends of their development, and their effectiveness in solving natural language processing tasks. The statistical and linguistic methods of thesaurus construction that allow to automate the development and reduce labor costs of expert linguists were studied. In particular, the authors considered algorithms for extracting keywords and semantic thesaurus relationships of all types, as well as the quality of thesauri generated with the use of these tools. To illustrate features of various methods for constructing thesaurus relationships, the authors developed a combined method that generates a specialized thesaurus fully automatically taking into account a text corpus in a particular domain and several existing linguistic resources. With the proposed method, experiments were conducted with two Russian-language text corpora from two subject areas: articles about migrants and tweets. The resulting thesauri were assessed by using an integrated assessment developed in the previous authors’ study that allows to analyze various aspects of the thesaurus and the quality of the generation methods. The analysis revealed the main advantages and disadvantages of various approaches to the construction of thesauri and the extraction of semantic relationships of different types, as well as made it possible to determine directions for future study.
Conference Paper
Full-text available
One of the major problems of modern Information Retrieval (IR) systems is the vocabulary problem that concerns the discrepancies between terms used for describing documents and the terms used by the searchers to describe their information need. A way of handling the vocabulary problem is by using a thesaurus, which shows (usually semantic) relationships between terms. Three approaches for automatically creating thesauri are presented in this paper; statistical co- occurrence analyses, the concept space approach, and Bayesian networks.
Conference Paper
Full-text available
Existing graph-based ranking methods for keyphrase extraction compute a single impor- tance score for each word via a single ran- dom walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into mul- tiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. After that, given the topic distribution of the document, we fur- ther calculate the ranking scores of words and extract the top ranked ones as keyphrases. Ex- perimental results show that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation met- rics.
Conference Paper
The paper is devoted to solving the task of automatic extraction of keyphrases from a text corpus relating to a specific domain so that the texts linked by common keyphrases would form a well-connected graph. The authors developed a new method that uses a combination of a well-known keyphrase extraction algorithm (e.g., TextRank, Topical PageRank, KEA, Maui) with thesaurus-based procedure that improves the text-via-keyphrase graph connectivity and simultaneously raises the quality of the extracted keyphrases in terms of precision and recall. The effectiveness of the proposed method is demonstrated on the text corpus of the Open Karelia tourist information system.
Conference Paper
While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
Article
The conceptualization of knowledge required for an efficient processing of textual data is usually represented as ontologies. Depending on the knowledge domain and tasks, different types of ontologies are constructed: formal ontologies, which involve axioms and detailed relations between concepts; taxonomies, which are hierarchically organized concepts; and informal ontologies, such as Internet encyclopedias created and maintained by user communities. Manual construction of ontologies is a time-consuming and costly process requiring the participation of experts; therefore, in recent years, there have appeared many systems that automate this process in a greater or lesser degree. This paper provides an overview of methods for automatic construction and enrichment of ontologies, with the focus being placed on informal ontologies.
Проектирование лингвистических онтологий для информационных систем в широких предметных областях
  • Н В Лукашевич
  • Б В Добров
  • N V Loukachevitch
  • B V Dobrov
Лукашевич Н. В., Добров Б. В., "Проектирование лингвистических онтологий для информационных систем в широких предметных областях", Онтология проектирования, 5:1(15) (2015), 47-69; [Loukachevitch N. V., Dobrov B. V., "Developing Linguistic Ontologies in Broad Domains", Ontology of Designing, 5:1(15) (2015), 47-69, (in Russian).]
Latent semantic analysis
  • P Wiemer-Hastings
  • K Wiemer-Hastings
  • A Graesser
Wiemer-Hastings P., Wiemer-Hastings K., Graesser A., "Latent semantic analysis", Proceedings of the 16th international joint conference on Artificial intelligence, 2004, 1-14.
Evaluation of automatic hypernym extraction from technical corpora in English and Dutch
  • E Lefever
  • M Van De Kauter
  • V Hoste
Lefever E., Van de Kauter M., Hoste V., "Evaluation of automatic hypernym extraction from technical corpora in English and Dutch", 9th International Conference on Language Resources and Evaluation (LREC), 2014, 490-497.
Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus
  • M P Oakes
Oakes M. P., "Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus", RANLP Text Mining Workshop, 5 (2005), 63-67.
A Lightweight Program Similarity Detection Model using XML and Levenshtein Distance
  • S Noh
  • S Kim
  • C Jung
Noh S., Kim S., Jung C., "A Lightweight Program Similarity Detection Model using XML and Levenshtein Distance", FECS, 2006, 3-9.
Ehlektronnye biblioteki: Perspektivnye Metody i Tekhnologii, Ehlektronnye kollekcii-RCDL
  • Е С Мозжерина
  • E S Mozzherina
Мозжерина Е. С., "Автоматическое построение онтологии по коллекции текстовых документов", Электронные библиотеки: Перспективные Методы и Технологии, Электронные коллекции-RCDL, 2011, 293-298; [Mozzherina E. S., "Ehlektronnye biblioteki: Perspektivnye Metody i Tekhnologii, Ehlektronnye kollekcii-RCDL", 2011, 293-298, (in Russian).]
Automatic Extraction of Patterns Displaying Hyponym-Hypernym Co-Occurrence from Corpora
  • V B Mittelu
Mittelu V. B., "Automatic Extraction of Patterns Displaying Hyponym-Hypernym Co-Occurrence from Corpora", Proceedings of First Central European Student Conference in Linguistics, 2006, 21, 8 pp.