December 2024
·
10 Reads
The assessment of text complexity is a significant applied problem with potential applications in drafting legal documents, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, researchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.