Maria Shvedova's scientific contributions

Citations

... The program addresses multiple issues ranging from low-level noise reduction in processing raw OCR-ed text to a rather fine-grained language-sensitive rules involving excision of Russian text or correcting the use of the letter і. A separate paper by the authors [14] is dedicated to more language-specific tasks, such as resolving the problem of multiple orthographies, which is characteristic of Ukrainian throughout its history. This challenge is also associated with pre-processing Ukrainian texts as their linguistically relevant characteristics need to be kept intact for automatic linguistic analysis. ...
... A significant problem for the construction of a reference corpus of Ukrainian language with a historical part covering the Early Modern literary language is the inaccurate reflection of the linguistic features in the "modernized" editions in the 20th and 21st centuries. Western Ukrainian authors are redacted the most, even though, as can be seen from the examples quoted in Table 1 from [15], texts from other regions are also affected. The western Ukrainian version of the literary language is found in the texts of the 19th and the first half of the 20th century. ...