• Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper attempts to answer the question: Which XML standard(s) should be used for multilevel corpus annotation? Various more or less specific standards and best practices are reviewed: TEI P5, XCES, work within ISO TC 37 / SC 4, TIGER-XML and PAULA. The conclusion of the paper is that the approach with the best claim to following text encoding standards consists in 1) using TEI-conformant schemata that are 2) designed in a way compatible with other standards and data models.
    Human Language Technology. Challenges for Computer Science and Linguistics - 4th Language and Technology Conference, LTC 2009, Poznan, Poland, November 6-8, 2009, Revised Selected Papers; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In search of a method of automatic measurement of readability of informational texts Summary Numerous texts that are incomprehensible for some of their intended recipients can be found in the Polish public space. In many cases, the problem is related to the structure of the text, which can be measured objectively. Methods of determining the degree of readability of texts have been developed for many languages. The importance of this problem has also been noticed by certain Polish institutions, publishing houses and offi ces in recent years. In the Polish academia, the problem is being investigated primarily by two centres: Pracownia Prostej Polszczyzny (Plain Polish Laboratory) at the University of Wrocław and the University of Social Sciences and Humanities (SWPS) in Warsaw. This paper presents the state of research on this problem within the project “Mierzenie stopnia zrozumiałości polskich tekstów użytkowych (pozaliterackich)” (“Measuring the degree of readability of nonliterary Polish texts”) carried out at SWPS. The problem and the state of research thereon in Poland and abroad are described in the fi rst place. A tool for measuring readability of Polish texts called Jasnopis, which is currently under construction and which uses both known methods, such as Pisarek index, as well as new ones, such as the automatic Taylor test, is presented afterwards. This paper is fi nished with a presentation of conclusions from the works performed so far and ideas for further research.