Content uploaded by Adrian Jan Zasina
Author content
All content in this area was uploaded by Adrian Jan Zasina on Jul 13, 2020
Content may be subject to copyright.
•Learner corpora have become popular as a source for analysing
L2 learners’ language (Gilquin, Granger, & Paquot, 2007: 322–323)
•Relatively few learner corpora for other languages than English
(https://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpora-around-the-world.html)
•Lack of learner corpus of Polish (Zasina, 2019)
•Need for a modern source of learning materials
•Polish language is taught all over the world –all continents
We would like to show our gratitude to Alexandr Rosen for the extraordinary
technical support and to Urszula Sajkowska from the Linguae Mundi Foundation.
adrian.zasina@ff.cuni.cz
Infrastructure of the Polish Learner Corpus PoLKo
Adrian Jan Zasina & Elżbieta Kaczmarska
Charles University | University of Warsaw
e.h.kaczmarska@uw.edu.pl
Polish language
•About 50 millions of speakers (Achtelik et al. 2018)
•One of the 25 most commonly spoken languages of the world (Council for the Polish Language, 2007)
•The 15th language according to Power Language Index (Chan, 2016)
(Ministerstwo Spraw Zagranicznych, 2014)
•The primary goal of the project is to collect learners’ writings in Polish as a foreign language at
various levels of language proficiency
•The collected material will be a basis for:
•analysing the L2 learners’ language
•identifying the most common language errors
•creating classroom materials
•improving modern teaching methods
Project goal
•Corpus building in the TeiTok environment (Janssen, 2016)
•TeiTok used for text transcription and managing all collected learners’ writings
•TeiTok used as a search engine
•Metadata divided into two groups:
•information on a respondent (e.g. age, sex, L1)
•information on a text (e.g. title, word count, topic)
•Morphosyntactic annotation by MorphoDiTa tagger with the Polish language model
Corpus PoLKo
Motivation
According to the error taxonomy used by State Commission for the Certification of Proficiency in
Polish as a Foreign Language (Markowski 2008), we distinguish several error levels:
•grammatical
•lexical
•stylistic
•spelling
•punctuation
Error taxonomy
Corpus searching
Text view
•http://utkl.ff.cuni.cz/teitok/polko/
•https://www.researchgate.net/project/The-Polish-Learner-Corpus
•http://slawistyka.uw.edu.pl/pl/the-polish-learner-corpus/
Project websites
•Students’texts from the School and Foundation Linguae Mundi
•Private students’ texts
Corpus data collection
•Achtelik et al. (2018). Nauczanie i promocja języka polskiego w świecie. Diagnoza –stan –perspektywy.
Katowice: Wydawnictwo Uniwersytetu Śląskiego.
•Chan, K. L. (2016). Power Language Index. Which are the world’s most influential languages? Retrieved from
http://www.kailchan.ca/wp-content/uploads/2016/12/Kai-Chan_Power-Language-Index-full-
report_2016_v2.pdf
•Council for the Polish Language. (2007). The Polish Language. Retrieved from
http://www.rjp.pan.pl/images/stories/pliki/broszury/jp_angielski.pdf
•Gilquin, G., Granger, S., & Paquot, M. (2007). Learner corpora: The missing link in EAP pedagogy. Journal of
English for Academic Purposes,6(4), 319−335. doi: 10.1016/j.jeap.2007.09.007
•Janssen, M. (2016). TEITOK: Text-Faithful Annotated Corpora. In N. Calzolari et al. (Eds.), Proceedings of the
Tenth International Conference on Language Resources and Evaluation (LREC'16)(pp. 4037–4043). Portorož:
ELRA.
•Markowski, A. (2008). Kultura języka polskiego. Teoria. Zagadnienia leksykalne. Warszawa: Wydawnictwo
Naukowe PWN.
•Ministerstwo Spraw Zagranicznych. (2014). Atlas polskiej obecności za granicą [Atlas of Polish presence
abroad]. Retrieved from https://issuu.com/msz.gov.pl/docs/atlas_polskiej_obecnosci_za_granica
•Zasina, A. J. (2019). Podejście korpusowe w nauczaniu języka polskiego jako obcego na przykładzie
rzeczownikowych alternacji ó:o.In K. Zioło-Pużuk (Ed.), Panorama glottodydaktyki polonistycznej. Wyzwania,
pytania, kierunki (pp. 181–199). Warszawa: Wydawnictwo Naukowe Uniwersytetu Kardynała Stefana
Wyszyńskiego.
References
Acknowledgement