The paper describes three corpora of different varieties of BS that are currently being developed with the goal of providing data for the analysis of the diatopic and diachronic variation in non-standard Balkan Slavic. The corpora includes spoken materials from Torlak, Macedonian dialects, as well as the manuscripts of pre-standardized Bulgarian. A...


... However, following the recent global trend to represent languages with corpora, several projects have attempted to amend this lack of publicly available resources regarding the written register (Erjavec, 2012;Ljubešić and Klubička, 2014;Miličević and Ljubešić, 2016). Resources for spoken standard Serbian in interaction are still lacking, despite there being some work on Serbian dialectal corpora (Vuković et al., 2019) and repositories for speech recognition (Suzić et al., 2014). Meanwhile, tools and resources of spoken registers that are being created for similar South Slavic languages such as Croatian (Kuvač Kraljević and Hržica, 2016: hrAL) and Slovenian (Verdonik et al., 2013: GOS) are gaining in popularity. ...