FIGURE 7 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Feature contribution analysis performed by ELI5 for the classification of different textual genres (SGD model evaluated).
Source publication
The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this ki...
Contexts in source publication
Context 1
... libraries devoted to improving model interpretability, we selected ELI5 12 (Explain like I'm five) a Python tool which explains the weights given to each feature and predictions made by scikit-learn models. ELI5 was used then to evaluate the ML Selected group of features against one of the models in the BMs set, in this case, the SGD model. Fig. 7 displays the results provided by ELI5. Each column corresponds to a genre and the weights therein show how much a feature has contributed to the classification of the genre. Their absolute values indicate the relevance of the feature, either in a positive or a negative way. A thorough study of such information was conducted also ...
Context 2
... relevant features of the 95 selected by the different ML strategies, after sorting them by their importance coefficient, which results from a combination of the votes-i.e. number of methods that selected the feature-, and the averaged importance score also retrieved from such methods. and the named entities mentioned in news. This is reflected in Fig. 7 with the high and positive weight reported for the frequency of proper nouns per sentence in news documents (NNPXsent), as well as with the total number of proper nouns per document (c_NNP). Moreover, this is also confirmed by the high weight assigned to named entities made up of two words (NER_length_2), which include proper nouns, ...
Similar publications
Topic modeling is a probabilistic graphical model for discovering latent topics in text corpora by using multinomial distributions of topics over words. Topic labeling is used to assign meaningful labels for the discovered topics. In this paper, we present a new topic labeling method that uses automatic term recognition to discover and assign relev...
Citations
... On the other hand, textbooks with disorganized or fragmented themes may hinder students' ability to connect concepts and grasp the material effectively. The findings from semantic analysis techniques provide actionable insights for authors and publishers [9]. High semantic similarity and accurate named entity recognition suggest that textbooks are well-aligned with educational objectives and present relevant information effectively. ...
This study explores innovations in semantic analysis of textbooks using Natural Language Processing (NLP) to enhance the evaluation and development of educational materials. Semantic analysis focuses on understanding the meaning and context within texts, providing insights into how effectively textbooks convey concepts and support learning objectives. By employing advanced NLP techniques such as semantic similarity measurement, named entity recognition, and context-aware topic modeling, this research offers a detailed examination of textbook content. Semantic similarity measurement reveals the alignment of textbook content with key educational themes and concepts, while named entity recognition identifies and categorizes important entities such as historical figures, scientific terms, and key concepts. Context-aware topic modeling uncovers the underlying themes and structures of textbooks, providing a comprehensive view of how information is organized and presented. The results demonstrate how these semantic analysis techniques can identify gaps in content coverage, assess the coherence and relevance of instructional material, and improve the alignment of textbooks with curriculum standards. The study also highlights the potential of semantic analysis to enhance the development of more effective educational resources by providing actionable insights for authors and publishers. Ultimately, this research showcases the transformative potential of NLP in advancing the evaluation and creation of textbooks, contributing to improved educational outcomes through more precise and contextually relevant content analysis.
... Textbooks that convey encouragement and positivity are likely to contribute to better student engagement and a more constructive approach to learning. The evaluation of content alignment with educational standards using NLP tools has highlighted the effectiveness of automated content checks in identifying gaps and ensuring that textbooks meet curricular requirements [9]. ...
This study explores the use of Natural Language Processing (NLP) to uncover pedagogical patterns within educational textbooks, aiming to enhance the understanding of instructional content and its effectiveness. By applying advanced NLP techniques, including readability assessment, thematic analysis, sentiment analysis, and content alignment checks, the research provides a detailed examination of textbook content. Readability assessments reveal how well textbooks cater to different student levels by analyzing sentence structure, vocabulary, and overall complexity. Thematic analysis, utilizing topic modeling, uncovers the primary educational themes and patterns within textbooks, assessing their alignment with curricular objectives. Sentiment analysis evaluates the emotional tone of the text, offering insights into how the tone may influence student engagement and motivation. Content alignment checks compare textbook material against educational standards, identifying gaps and ensuring that the content supports key learning outcomes. The study also includes a comparative analysis of textbooks from various publishers and educational levels, highlighting variations in pedagogical approaches and content quality. The results provide valuable feedback for educators, authors, and publishers, aiming to improve the design and effectiveness of educational materials. This research demonstrates the potential of NLP to reveal pedagogical patterns and enhance the development of textbooks that are both engaging and aligned with educational goals, ultimately contributing to better learning experiences for students.
... This preference of reviews may be due to the motivation of the reviewer, who voluntarily writes these comments to share experiences and help others, either by criticizing or praising the product. In any case, the user normally tries to include as much information as possible, both in the evaluation part (e.g., "the laptop is incredibly light"),and in the product description (e.g., "the room was not insulated") [62]. Therefore, users of online reviews will generally make use of this feature, depending on the amount of information they want to include in the narration of their experience and the evaluation of the product. ...
We present a data-driven approach to discover and extract patterns in textual genres with the aim of identifying whether there is an interesting variation of linguistic features among different narrative genres depending on their respective communicative purposes. We want to achieve this goal by performing a multilevel discourse analysis according to (1) the type of feature studied (shallow, syntactic, semantic, and discourse-related); (2) the texts at a document level; and (3) the textual genres of news, reviews, and children’s tales. To accomplish this, several corpora from the three textual genres were gathered from different sources to ensure a heterogeneous representation, paying attention to the presence and frequency of a series of features extracted with computational tools. This deep analysis aims at obtaining more detailed knowledge of the different linguistic phenomena that directly shape each of the genres included in the study, therefore showing the particularities that make them be considered as individual genres but also comprise them inside the narrative typology. The findings suggest that this type of multilevel linguistic analysis could be of great help for areas of research within natural language processing such as computational narratology, as they allow a better understanding of the fundamental features that define each genre and its communicative purpose. Likewise, this approach could also boost the creation of more consistent automatic story generation tools in areas of language generation.
In this paper, we investigate whether there is a standardized writing composition for articles in Wikipedia and, if so, what it entails. By employing a Neural Gas approximation to the topology of our dataset, we generate a graph that represents various prevalent textual compositions adopted by the texts in our dataset. Subsequently, we examine significantly attractive regions within our graph by tracking the evolution of articles over time. Our observations reveal the coexistence of different stable compositions and the emergence and disappearance of certain unstable compositions over time.