ArticlePDF Available

Automatic language profiling of a dialect speaker: the case of the Timok variety spoken in the village of Ber􏲃činovac (Eastern Serbia). In: Acta Linguistica Petropolitana.

Authors:

Abstract and Figures

In a previously published paper [Konior et al. 2019], which thematically led up to the present article, we explored the possibility of developing a quantitative tool for assessing the intrasystemic dialectal coherence and the degree of dialectal authenticity (preservation) for a particular variety of Slavic (and more broadly Balkan) dialectal speech. In order to do so, we analyzed and manually counted all cases of presence or absence of specific phonemes, direct and indirect object reduplication, ways of expressing peripheral cases meaning, presence of a postpositive article, and some other language features. The data used for that purpose was extracted from “Linguistic Atlas of Eastern Serbia and Western Bulgaria” [SAOSWB]; an idiolect of a native speaker of the Timok dialect spoken in the village of Berč􏲃inovac (near the town of Knjaževac in the Zaje􏲃ar district, Eastern Serbia) was chosen for analysis. Subsequently, the following question arose: how can the use of modern technologies for automatic text processing increase the efficiency of dialectologists’ work, and what technical obstacles must be overcome in this regard? In the article, we present a method of (semi-)automatic analysis of phonetic and morphosyntactic features in a dialect text with the use of morphological annotation (the tagger model is based on the ReLDI tagger [Ljubeši􏲄ć et al. 2016] and user Python scripts). An algorithm searching for some important dialect features is described and exemplified. Trying to imitate and automate historical and structural linguistic analysis, we open a discussion about the advantages and disadvantages of computer analysis of dialect data as compared with the manual analysis. In the future, the automatic method is expected to be helpful in managing larger amounts of dialect data.
Content may be subject to copyright.
Acta Linguistica Petropolitana. 2020. Vol. 16.2. P. 160–180
DOI 10.30842/alp2306573716207
 
   
(   
 )*
. . 
  (); anastasia.makarova@uzh.ch
. . 
   , -;
dsuetina@yandex.ru
. 
  (); teodora.vukovic2@uzh.ch
. . 
   , -;
sobolev@sta .uni-marburg.de
. 
  (); olivier-andreas.winistoerfer@uzh.ch
.     ()
     ,
         -
.        -
        
  .      -
,   /   (   ) ,
   ,    -
 ,    . .  
*      :  18-512-
76002 _ «     -
:  », EraNet Rus Plus grant/Swiss National Science Foun-
dation IZRPZ0_177557/1 (TraCeBa project, https://traceba.net/); SNF100015_176378/1
(‘Ill-bred sons’, family and friends: tracing the multiple a liations of Balkan Slavic).
      161
   (  «»)  
    .
 :   ,  ,
 ,  ,  ,  -
,   ,  ,  .
Automatic language pro ling of a dialect speaker:
the case of the Timok variety spoken
in the village of Berinovac (Eastern Serbia)
A. L. Makarova
University of Zurich (Switzerland); anastasia.makarova@uzh.ch
D. V. Konior
Institute for Linguistic Studies, Russian Academy of Sciences, St. Petersburg;
dsuetina@yandex.ru
T. Vukovi
University of Zurich (Switzerland); teodora.vukovic2@uzh.ch
A. N. Sobolev
Institute for Linguistic Studies, Russian Academy of Sciences, St. Petersburg;
sobolev@sta .uni-marburg.de
O. Winistörfer
University of Zurich (Switzerland); olivier-andreas.winistoerfer@uzh.ch
Abstract. In a previously published paper [Konior et al. 2019], which thematically
led up to the present article, we explored the possibility of developing a quantitative tool
for assessing the intrasystemic dialectal coherence and the degree of dialectal authentic-
ity (preservation) for a particular variety of Slavic (and more broadly Balkan) dialectal
speech. In order to do so, we analysed and manually counted all cases of presence or ab-
sence of speci c phonemes, direct and indirect object reduplication, ways of expressing
peripheral cases meaning, presence of a postpositive article, and some other language
features. The data used for that purpose was extracted from “Linguistic Atlas of Eastern
Serbia and Western Bulgaria” [SAOSWB]; an idiolect of a native speaker of the Timok
dialect spoken in the village of Berinovac (near the town of Knjaževac in the Zajear
district, Eastern Serbia) was chosen for analysis. Subsequently, the following question
arose: how can the use of modern technologies for automatic text processing increase
the e ciency of dialectologists’ work, and what technical obstacles must be over-
come in this regard? In the article, we present a method of (semi-)automatic analysis
162 . . , . . , .  . ALP 16.2
of phonetic and morphosyntactic features in a dialect text with the use of morpholog-
ical annotation (the tagger model is based on the ReLDI tagger [Ljubeši et al. 2016]
and user Python scripts). An algorithm searching for some important dialect features is
described and exempli ed. Trying to imitate and automate historical and structural lin-
guistic analysis, we open a discussion about the advantages and disadvantages of com-
puter analysis of dialect data as compared with the manual analysis. In the future, the
automatic method is expected to be helpful in managing larger amounts of dialect data.
Keywords: statistical methods in linguistics, machine text analysis, linguistic pro-
ling, dialect speakers, Balkan Slavic languages, Serbian dialects, Timok dialect, idio-
lect of dialect speaker, village of Berinovac, Eastern Serbia.
1. 
      -
   (
    ,  «-
»    « -
») 1   (  
 ).
   [ . 2019], 
 ,   ,    -
     
 «  » ( «-
»)       ( )
 .   ,    -
    [ . 2019: 30–31],   -
 :     -
      
,      -
     ?   
     .  
    :   -
   ,  
1        
  . [,  2020].
      163
  ,       -
.  2      -
 ,     ;  3
        ,
     -
    .  4, -
   ,  -
     
    . .  -
 ,    .
  «  :  -
 » (18–20  2018 .,   ) 
«     -
 »      
     -
     
.        
 (1906 . .),    2 . 
 .      [-
 . 2019].  . ,  1990- .,  -
   «   -
   » [ 1998] ( SAOSWB
 ).        -
   ,  -
    , , ,  
        -
   (4453 ).    -
      
  ( 20 . )  ,  -
      -
 .     :
2       
- ,      -
.      - -
   ,   -
 (    
    ) 
  .
164 . . , . . , .  . ALP 16.2
1)    ; 2)   -
,  « »; 3)   -
      
     ; 4)  -
  .
 ,       -
 ()     
   3 (   -
   [Vukovi et al. 2019]   ReLDI-
[Ljubeši et al. 2016];      )
 Python-.  SAOSWB  -
   « » 4,  -
   , , , 2015–2017 .
     
 »)   5:
1)  . *tj   : *vtje > vee’,
*svtja > svea’,   e;
2)  . *dj    : *medja >
mea’, *tjudje > uo ’, *vidj- > vi- ‘-’;
3)     
       :
*sn > sn ’, *vš > vška’, *dsky > dska’;
 *-v (takv ()’); *dn > dn’;
4)  :
.   (IO)   (POSS): i na
tuj ovcu ( .. ..) se toj dade prvo i venc
3      -
       
,  .
4  ,      -
       
   : http://balksrv2012.sanu.ac.rs/webdict/timok/index.
5         
    -
  , a     -
   [ . 2019].
      165
    (. «») ’; tg u mo-
je znae i na mojega tatu ( .. ..) odnela
vodenicu,  , []  
 ’;
.      -
: ot sviu ( ..) ostala samo glava -
   ’, orala sam ss pluk ( ..)
 ’, on bil u vojsku ( ..)  
’;
.   ()   ():
toj na nas ( 1.) priala baba jedna  -
   ’;
.      -
: pokraj u ( . 3.)  ’, ss ega
( .3.) ‘ ’, da peemo leb u u ( . 3.) ‘
  ’;
5)     : dojde do
nas voda-ta (-..) do ovdeka    ’;
6)   .  -
  po   :
pomlad,  , ’;
7)  :
.  : tebe te stra (2. 2. ..)
 ’;
.  : tep ti je dobro (2. 2. ..3
) ‘ ’;
8)     
  : sad u # priam ( .1 #
..1) ‘ ’, . sad u da priam (-
 .1  ..1) [ . 2019: 21–22].
       -
 ,    -
  .    ,  -
     
    ,   -
   .
166 . . , . . , .  . ALP 16.2
2.  .  
2.1.   
     
   SAOSWB  
:     
,    -
 ,       -
   .  
  :   
      
   (, dan / dan / dan / dn ’).
       
.txt    OCR Transkribus;   -
 .      -
,   ()    
    (,  -
)  ,     
    (.    1).
2.2. 
     .  -
  ,   
    (
)    ().  -
     . , 
« »     ,
         .
      
  ReLDI,     -
  6.    , 
    MULTEXT-East V5, -
        
6 https://github.com/bravethea/Torlak-ReLDI-Tagger-2019
      167
   : , 
žena,   Ncfsny: Noun, common, femi-
nine, singular, nominative, animate (yes).   
    ,  
   -v, -t  -n (     -
). ,  ženata ‘[]   
Ncfsny-t,    (-t)   -
  -ta.  1    
 .
 1.      
Table 1. Examples of the transcription and additional annotation



  
pa pa Cc pa
dójde dOjde Vma3s doi
na na Sa ma
nás nAs Pp1-pa mi
vodáta vodAta Npmsa-t voda
doovdéka doovdEka Rgp dovde
3.   .
 
       -
,       
  () , -
  .  « »,  
   ,    -
  .   -
     ,   -
  .    
: ,     
  ,     /   -
 [Birkner 2015; Dash 2018].    
     .
168 . . , . . , .  . ALP 16.2
  1 ( . *tj), 2 ( . *dj),
3 (    
     ).   -
  ,     
  .    -
   ,     -
      
.        
 -     
,  ,    , -
    ,    
       
-  . « » -
       
 MULTEXT-east    [Erjavec et al. 2003] 7. -
,        -
  [Dash, Hussain 2013],    
   ,    -
  .
  ,   , -
        ,
  c   , , , , 
      
  ,    
  *tj *dj   ,  
 ,     .   -
    *dj. ,   -
    : ( -
) (  ). ,  2,
   (  -
)  .    -
       -
  .
       -
 ,  /,      -
   .     *tj
  (   )
7 http://nl.ijs.si/ME/V6/msd/html/msd-hbs.html
      169
    .  / (-
  *tj)     , -
,     
    .
 ,        , 
     8:     
 *tj, *dj     -
,        -
    .  SAOSWB
       
(    )   *tti    *tj
 *sn *dn    «  -
» (   *došl
*jedn 9).
 3     
   «  -
»:      «» (*dn,
 dan)     
  «» ( ii)    -l . . . .,
   .
8        (vs -
)    [Vukovi et al. 2020].
9  ,     , 
 , ,       -
  ,      -
    , , ,     -
  .
 2.  *dj:   
Table 2. Re exes of *dj: dialect vs standard realization




*dj > dž *dj >
lédža 
prédžu 
ena 
govée 
80 % 20 %
170 . . , . . , .  . ALP 16.2
 3.  :
  
Table 3. Re exes of the reduced vowels:
dialect vs standard realization
   








('vzdn', 'RGP',
'vzdan')
('dna', 'Ncmsa',
'dan')
('dOšo', 'Vmp-sm',
'doi')
('dOšo', 'Vmp-sm',
'doi')
('dOšl', 'Vmp-sm',
'doi')
('dAn', 'Ncmsn',
'dan')
('Išal', 'Vmp-sm',
'ii')
88 % 12 %
     
 12 %,  — 88 %.     -
      (37 % 63 %
), ,   ,  
      -
 .
    
 (   )  
 .  4   -
   ,   -
   ( ).
  4.   -
  10  .
  4.   -
     . -
   ,    -
  (   ), 
.
10 «  ,   ,  -
     na, ( -
      100 %  -
)» [ . 2019: 22].
      171
 4.  
Table 4. Analytic case-marking
   
4.
tg u mojE znAnje i na mojEga
tAtu ( . .) odnEla
vodenIcu
,  , []
   ’.
  
  
, 
.
100 % 0 %
4.
boluvAla sam nEšto u glAvu
( ..)
 -  ’.
a mI smo si u selO ( ..)
   ’.
ne znAm kOje gOdine (...
 ..) bIlo
 ,    ’.
99 % 1 %
4.
pa dOjde na nAs ( 1.) vodAta
doovdEka
   
tOj na nAs ( 1.) priAla bAba
jednA
   
 ’.
p nIšta, On dadE mEne 11 parU, jA
njEmu (.3.) nEšto
,    ,
 -
80 % 20 %
4.
s njU / ss njU (c . 3.) ‘ 
pOkraj njU ( . 3.) ‘

s njEga / ss njEga (c .3.)

s nAs (c 1.) ‘ 
ss njI (c 3.) ‘ 
  
  ,
 .
100 % 0 %
11  mEne        (-
  *men > mene   
   *men > meni),      
: mEne — Pp1-sa 'ja'.
172 . . , . . , .  . ALP 16.2
  4.   -
  ()   () 12.
  4.   -
    . 
  ,     -
 (   ),  -
.
  5.   
 .  ,   , —
 ,     -
.      
      ,  -
    .   , 
  115    ().   -
      -
», . coreference annotation [Deemter, Kibble 1999]), -
 ,        ,
  .
  6.   ( -
  po   ).
     : 'pOmladu'.
 ()   .
  7.   
 (    ,  -
      , ,
 ).
      -
      -
 .     -
   [Escher 2021] ,   
      -
  . ,    
 « »    / -
 . ,  ,  (  ) -
 .  ,    -
    
12      ()  -
.
      173
   /  ,   
     (). 
        -
      ,  
 .       / 
,     
    ,   
   .   -
   ,   -
  jA ga vIdim tUj mOjega Iu
(1. .. ..1  ... ..)
 (. «»)     
.     ,   -
    ,   .
  7.   -
 .
     .
  8.     -
     ( 5).
 5.  :   
Table 5. Conjuntive particle: dialect vs. standard realization
   
Ekaj sAd u prIam ( .1
..1)
,   ’.
on e da poglEda (.3. .3
 ..3) ‘ 
37 % 63 %
4.   
       -
,     (
    )
 ,     -
, . .    . ,   -
       -
      
174 . . , . . , .  . ALP 16.2
  ,   « -
   »   
     ,  -
    .
 1.   
   
Figure 1. Dialect pro le of the informant
based on the results of the automatic analysis
     
 « ».   
        
   . ,   
 3,    ,  -
     - -
 .      -
    .
      175
 2.       
Figure 2. Dialect pro le of the informant based on the results of the manual analysis
5. 
,      /
       -
,    ,   
.  ,     -
 .      
:     , -
   ( -
 /  ,  -
 ).    :   , 
176 . . , . . , .  . ALP 16.2
     
        -
      . -
,        , -
        .
  ,  ,  ,  -
    . -
      -
 ( )    
 .
 « » [Vukovi et al. 2019],  -
    SAOSWB,   -
    »),
    ,   -
   .  
   ,    -
    (,  
  ,     ).
 -   -
  .  
    ,   
 .    -
        -
  ,    
 13.       -
      
 .      -
    , -
      ,
      -
  .   --
 ()  [Goedertier et al. 2000], -
        ,
      
  .    
      
13     , ,  
https://www.clarin.eu/resource-families/spoken-corpora.
      177
 .   ,   -
   ,     -
  « ».
  
1, 2, 3 —   , , , 
 ,   , ,  ,
 , ,  , 
,  ,  , SAOSWB — -
     , -
 ,    ,  .

 . 2019 — . . , . . , . . . 
     (  -
   ) //   
. . 2019. 58. . 17–33. DOI: 10.17223/19986645/58/2.
,  2020 . , . . .  
    
(       
) //    . .
2020. 66. C. 158–176. DOI: 10.17223/19986645/66/9.
 1998 — . . .    
  // . .  (. .).   
. . 5. .:   , 1998. . 106–167.
Birkner 2015 — V. Birkner. The advantages and disadvantages of employing corpus ev-
idence in sociolinguistic studies // The Teacher Magazine. 2015. Vol. 2. P. 11–17.
Dash 2012 — N. S. Dash. Etymological Annotation: a New Concept of Corpus Anno-
tation // Proceedings of the 34th All India Conference of Linguists (34-AICL). Shil-
long, India, 2012. P. 100–104.
Dash, Arulmozi 2018 N. S. Dash, S. Arulmozi. Limitations of language corpora //
N. Dash, S. Arulmozi. History, features, and typology of language corpora. Singa-
pore: Springer Singapore, 2018. P. 259–272.
Dash, Hussain 2013 — N. S. Dash, M. M. Hussain. Designing a Generic Scheme for Ety-
mological Annotation: a New Type of Language Corpora Annotation // P. Bhattacha-
rayya, K.-S. Choi (eds.). Proceedings of the 11th Workshop on Asian Language Re-
sources. Nagoya: Asian Federation of Natural Language Processing, 2013. P. 64–71.
Deemter, Kibble 1999 K. van Deemter, R. Kibble. What is coreference, and what
should coreference annotation be? // A. Bagga, B. Baldwin, S. Shelton (eds.).
178 . . , . . , .  . ALP 16.2
Proceedings of the Workshop on Coreference and Its Applications. Stroudsburg,
PA: Association for Computational Linguistics, 1999. P. 90–96.
Erjavec et al. 2003 — T. Erjavec, C. Krstev, V. Petkevic, K. Simov, M. Tadic, D. Vitas.
The MULTEXT-east morphosyntactic speci cations for Slavic languages // T. Er-
javec, D. Vitas (eds.). Proceedings of the Workshop on Morphological Processing
of Slavic Languages, EACL 2003. Stroudsburg, PA: Association for Computation-
al Linguistics, 2003. P. 25–32.
Escher 2021 — A. L. Escher. Double argument marking in Timok dialect texts (in Bal-
kan Slavic context). Zeitschrift für Slawistik. Forthcoming.
Goedertier et al. 2000 — W. Goedertier, S. Goddijn, J.-P. Martens. Orthographic tran-
scription of the spoken Dutch corpus // M. Gavrilidou, G. Carayannis, S. Markan-
tonatou, S. Piperidis, G. Stainhouer (eds.). Proceedings of the Second International
Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece.
Athens: National Technical University of Athens Press, 2000. P. 909–914.
Ljubeši et al. 2016 — N. Ljubeši, F. Klubika, Ž. Agi, I.-P. Jazbec. New In ectional
Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croa-
tian and Serbian // N. Calzolari, Kh. Choukri, Th. Declerck, S. Goggi, M. Grobelnik,
B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (eds.). Proceed-
ings of the Tenth International Conference on Language Resources and Evaluation
(LREC 2016). Paris: European Language Resources Association, 2016. P. 4264–4270.
Vukovi et al. 2019 — T. Vukovi, N. Muheim, O. Winistörfer, I. Simko, A. Makarova,
S. Bradjan. Corpora and Processing Tools for Non-Standard Contemporary and Dia-
chronic Balkan Slavic // I. Temnikova, I. Nikolova, N. Konstantinova (eds.). Pro-
ceedings of the Student Research Workshop associated with The 12th International
Conference on Recent Advances in Natural Language Processing (RANLP 2019).
Shoumen: Incoma, 2019. P. 62–68.
Vukovi et al. 2020 T. Vukovi, B. Sonnenhauser, A. Escher. Degrees of non-stan-
dardness. Feature-based analysis of variation in a Torlak dialect corpus. Manuscript.

SAOSWB — A. N. Sobolev. Sprachatlas Ostserbiens und Westbulgariens. Bd. I. Pro-
blemstellung, Materialen und Kommentare, Kartenanalyse. Bd. II. Sprachkarten.
Bd. III. Texte. Marburg; Lahn: Biblion Verlag, 1998.
References
Birkner 2015 — V. Birkner. The advantages and disadvantages of employing corpus
evidence in sociolinguistic studies. The Teacher Magazine. 2015. Vol. 2. P. 11–17.
Dash 2012 — N. S. Dash. Etymological Annotation: a New Concept of Corpus Anno-
tation. Proceedings of the 34th All India Conference of Linguists (34-AICL). Shill-
ong, India, 2012. P. 100–104.
      179
Dash, Arulmozi 2018 N. S. Dash, S. Arulmozi. Limitations of language corpora.
N. Dash, S. Arulmozi. History, features, and typology of language corpora. Singa-
pore: Springer Singapore, 2018. P. 259–272.
Dash, Hussain 2013 — N. S. Dash, M. M. Hussain. Designing a Generic Scheme for Et-
ymological Annotation: a New Type of Language Corpora Annotation. P. Bhattacha-
rayya, K.-S. Choi (eds.). Proceedings of the 11th Workshop on Asian Language Re-
sources. Nagoya: Asian Federation of Natural Language Processing, 2013. P. 64–71.
Deemter, Kibble 1999 K. van Deemter, R. Kibble. What is coreference, and what
should coreference annotation be? A. Bagga, B. Baldwin, S. Shelton (eds.). Pro-
ceedings of the Workshop on Coreference and Its Applications. Stroudsburg, PA:
Association for Computational Linguistics, 1999. P. 90–96.
Erjavec et al. 2003 — T. Erjavec, C. Krstev, V. Petkevic, K. Simov, M. Tadic, D. Vitas.
The MULTEXT-east morphosyntactic speci cations for Slavic languages. T. Erja-
vec, D. Vitas (eds.). Proceedings of the Workshop on Morphological Processing
of Slavic Languages, EACL 2003. Stroudsburg, PA: Association for Computation-
al Linguistics, 2003. P. 25–32.
Escher 2021 — A. L. Escher. Double argument marking in Timok dialect texts (in Bal-
kan Slavic context). Zeitschrift für Slawistik. Forthcoming.
Goedertier et al. 2000 — W. Goedertier, S. Goddijn, J.-P. Martens. Orthographic tran-
scription of the spoken Dutch corpus. M. Gavrilidou, G. Carayannis, S. Markanto-
natou, S. Piperidis, G. Stainhouer (eds.). Proceedings of the Second International
Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece.
Athens: National Technical University of Athens Press, 2000. P. 909–914.
Konior et al. 2019 — D. V. Konior, A. L. Makarova, A. N. Sobolev. Statisticheskiy metod
yazykovogo pro lirovaniya nositelya dialekta (na materiale vostochnoserbskogo
idioma sela Berchinovats) [Quantitative method of language pro ling of a dialect
speaker (based on the material of the East Serbian idiom of the village of Bercino-
vac)]. Tomsk State University Journal of Philology. 2019. No. 58. P. 17–33.
Ljubeši et al. 2016 — N. Ljubeši, F. Klubika, Ž. Agi, I.-P. Jazbec. New In ectional
Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Cro-
atian and Serbian. N. Calzolari, Kh. Choukri, Th. Declerck, S. Goggi, M. Grobel-
nik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (eds.).
Proceedings of the Tenth International Conference on Language Resources and
Evaluation (LREC 2016). Paris: European Language Resources Association, 2016.
P. 4264–4270.
Sikimi, Sobolev 2020 — B. Sikimi, A. N. Sobolev. Processy divergentcii v razdelen-
nom gosudarstvennoy granitcey zapadnoyuzhnoslavyanskom dialekte (na mate-
riale sovremennoy dialektnoy rechi Vostochnoy Serbii i Zapadnoy Bolgarii) [Di-
vergence Processes in the West South Slavic Dialect Divided by the State Border
(Based on the Modern Dialect Speech of Eastern Serbia and Western Bulgaria)].
Tomsk State University Journal of Philology. 2020. No. 66. P. 158–176. DOI:
10.17223/19986645/66/9.
Sobolev 1998 — A. N. Sobolev. O dialektologicheskom atlase Vostochnoy Serbii
i Zapadnoy Bolgarii [On the dialectological atlas of Eastern Serbia and Western
180 . . , . . , .  . ALP 16.2
Bulgaria]. G. P. Klepikova (ed.). Issledovaniya po slavyanskoy dialektologii [Stud-
ies in Slavic Dialectology]. Iss. 5. Moscow: Institute of Slavic Studies RAS, 1998.
P. 106–167.
Vukovi et al. 2019 T. Vukovi, N. Muheim, O. Winistörfer, I. Simko, A. Makaro-
va, S. Bradjan. Corpora and Processing Tools for Non-Standard Contemporary and
Diachronic Balkan Slavic. I. Temnikova, I. Nikolova, N. Konstantinova (eds.). Pro-
ceedings of the Student Research Workshop associated with The 12th International
Conference on Recent Advances in Natural Language Processing (RANLP 2019).
Shoumen: Incoma, 2019. P. 62–68.
Vukovi et al. 2020 T. Vukovi, B. Sonnenhauser, A. Escher. Degrees of non-stan-
dardness. Feature-based analysis of variation in a Torlak dialect corpus. Manuscript.
Sources
SAOSWB — A. N. Sobolev. Sprachatlas Ostserbiens und Westbulgariens. Bd. I. Pro-
blemstellung, Materialen und Kommentare, Kartenanalyse. Bd. II. Sprachkarten.
Bd. III. Texte. Marburg; Lahn: Biblion Verlag, 1998.
... Одна из задач исследования состоит в проведении анализа идиома и культуры торлаков в том виде, в котором они бытуют в настоящий момент 6 , в трансграничной перспективе и с позиций перцептивной диалектологии -исходя из положения о том, что восприятие носителями диалекта собственного языка (и речи) является одним из факторов языковых изменений на данной территории (Preston 1999). Собранный полевой материал -нарративы жителей Восточной Сербии и Западной Болгарии, записанные на видео-и аудионосители, -подлежат расшифровке и служат основой для создания корпуса торлакского диалектного комплекса, снабженного инструментарием автоматической обработки языковых данных (Winistörfer et al. 2020;Mirić et al. 2020). В данном исследовании приводится материал двух подкорпусов -«Белоградчик» 7 и «Трын» 8 , собранный с применением специально разработанного вопросника (см.: Konior, Makarova, Ćirković 2020). ...
Article
Full-text available
This article sheds light on postposed articles and DP structures in Torlak, a non-standardised Balkan Slavic variety. Torlak and specifically Trgoviste-Torlak, unlike Bulgarian and Macedonian, does not exhibit MD. We argue that this scenery is due to a partial grammaticalization of the determiner, which is arguably an inflectional affix and maintains the demonstrative feature. In addition, we verify the nature of the Torlak DP and we make some considerations on the intermediate nature of this element with respect to the grammaticalization path, followed by the other Balkan Slavic varieties.
The MULTEXT-east morphosyntactic specifi cations for Slavic languages
  • Erjavec
Erjavec et al. 2003 -T. Erjavec, C. Krstev, V. Petkevic, K. Simov, M. Tadic, D. Vitas. The MULTEXT-east morphosyntactic specifi cations for Slavic languages // T. Erjavec, D. Vitas (eds.). Proceedings of the Workshop on Morphological Processing of Slavic Languages, EACL 2003. Stroudsburg, PA: Association for Computational Linguistics, 2003. P. 25-32.
Orthographic transcription of the spoken Dutch corpus
  • Goedertier
Goedertier et al. 2000 -W. Goedertier, S. Goddijn, J.-P. Martens. Orthographic transcription of the spoken Dutch corpus // M. Gavrilidou, G. Carayannis, S. Markantonatou, S. Piperidis, G. Stainhouer (eds.). Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece. Athens: National Technical University of Athens Press, 2000. P. 909-914.
New Infl ectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and
  • Ljubeši
Ljubeši et al. 2016 -N. Ljubeši, F. Klubi ka, Ž. Agi, I.-P. Jazbec. New Infl ectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian // N. Calzolari, Kh. Choukri, Th. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (eds.). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association, 2016. P. 4264-4270.
Corpora and Processing Tools for Non-Standard Contemporary and Diachronic
  • Vukovi
Vukovi et al. 2019 -T. Vukovi, N. Muheim, O. Winistörfer, I. Simko, A. Makarova, S. Bradjan. Corpora and Processing Tools for Non-Standard Contemporary and Diachronic Balkan Slavic // I. Temnikova, I. Nikolova, N. Konstantinova (eds.). Proceedings of the Student Research Workshop associated with The 12 th International Conference on Recent Advances in Natural Language Processing (RANLP 2019). Shoumen: Incoma, 2019. P. 62-68.
Degrees of non-standardness. Feature-based analysis of variation in a Torlak dialect corpus
  • Vukovi
Vukovi et al. 2020 -T. Vukovi, B. Sonnenhauser, A. Escher. Degrees of non-standardness. Feature-based analysis of variation in a Torlak dialect corpus. Manuscript.
References Birkner 2015 -V. Birkner. The advantages and disadvantages of employing corpus evidence in sociolinguistic studies. The Teacher Magazine
  • N Saoswb -A
  • Sobolev
SAOSWB -A. N. Sobolev. Sprachatlas Ostserbiens und Westbulgariens. Bd. I. Problemstellung, Materialen und Kommentare, Kartenanalyse. Bd. II. Sprachkarten. Bd. III. Texte. Marburg; Lahn: Biblion Verlag, 1998. References Birkner 2015 -V. Birkner. The advantages and disadvantages of employing corpus evidence in sociolinguistic studies. The Teacher Magazine. 2015. Vol. 2. P. 11-17. Dash 2012 -N. S. Dash. Etymological Annotation: a New Concept of Corpus Annotation. Proceedings of the 34 th All India Conference of Linguists (34-AICL). Shillong, India, 2012. P. 100-104. … 179
Statisticheskiy metod yazykovogo profi lirovaniya nositelya dialekta (na materiale vostochnoserbskogo idioma sela Berchinovats) [Quantitative method of language profi ling of a dialect speaker
  • Konior
Konior et al. 2019 -D. V. Konior, A. L. Makarova, A. N. Sobolev. Statisticheskiy metod yazykovogo profi lirovaniya nositelya dialekta (na materiale vostochnoserbskogo idioma sela Berchinovats) [Quantitative method of language profi ling of a dialect speaker (based on the material of the East Serbian idiom of the village of Bercino-
Sobolev 1998 -A. N. Sobolev. O dialektologicheskom atlase Vostochnoy Serbii i Zapadnoy Bolgarii [On the dialectological atlas of Eastern Serbia and Western
Tomsk State University Journal of Philology. 2020. No. 66. P. 158-176. DOI: 10.17223/19986645/66/9. Sobolev 1998 -A. N. Sobolev. O dialektologicheskom atlase Vostochnoy Serbii i Zapadnoy Bolgarii [On the dialectological atlas of Eastern Serbia and Western .. ,.. ,. .
Moscow: Institute of Slavic Studies RAS
  • Iss
Iss. 5. Moscow: Institute of Slavic Studies RAS, 1998. P. 106-167.
Degrees of non-standardness. Feature-based analysis of variation in a
  • Vukovi
Vukovi et al. 2020 -T. Vukovi, B. Sonnenhauser, A. Escher. Degrees of non-standardness. Feature-based analysis of variation in a Torlak dialect corpus. Manuscript. Sources SAOSWB -A. N. Sobolev. Sprachatlas Ostserbiens und Westbulgariens. Bd. I. Problemstellung, Materialen und Kommentare, Kartenanalyse. Bd. II. Sprachkarten. Bd. III. Texte. Marburg; Lahn: Biblion Verlag, 1998.