Inter-rater agreement of human assessors and technicality-based choice for side-by-side sessions.

Inter-rater agreement of human assessors and technicality-based choice for side-by-side sessions.

Source publication
Preprint
Full-text available
Generative poetry systems require effective tools for data engineering and automatic evaluation, particularly to assess how well a poem adheres to versification rules, such as the correct alternation of stressed and unstressed syllables and the presence of rhymes. In this work, we introduce the Russian Poetry Scansion Tool library designed for stre...

Context in source publication

Context 1
... investigate this, we calculated the inter-rater agreement between the annotators' selections and a hypothetical selection based solely on technicality (i.e., choosing the poem with the higher technicality score in each pair). The results, shown in Table 2, indicate moderate agreement for Session 1, where the range of poem quality was wider, and weak agreement for Session 2, where the compared texts were of similar quality. This suggests that other factors beyond technicality influence the annotators' decisions, especially when evaluating poems of comparable quality. ...