Richard Změlík’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Fig. 2: Strings that are ir re le vant to the text ana ly sis are de le ted from the text (see li nes 11-17) in the left pa nel. The re trie val of a spe ci fic text is done via an ID that is iden ti cal to the eBook num ber in the Gu ten berg da ta base (cf. line 10 in Fig. 1 and line 5 he re in the left pa nel)
Fig. 5: Ta ble of titles that match the gi ven con di tion, i.e. we re first pu blished between 1830 and 1880. Be low the ta ble, the to tal num ber of texts re trie ved and the si ze of such a cor pus in to ken counts are gi ven. 4
Fig. 6: Exam ple of sa ving text da ta in to Json for mat.
Fig. 7: Considering the size of some of Karel Hynek Mácha's prose, we calculated the relatedness between the texts with respect to the 100 most frequent words. From each text in the database, the 100 most frequent lemmas were selected and a common set of lemmas was created. For each lemma in this set, the relative frequency that the lemma has in each text was calculated and then a dendrogram was constructed. The distances between texts are expressed in the graph by the length of the y-axis.
Fig 8: PCA analysis of a sub-corpus consisting of 40 selected texts (see Fig. 5).
On the question of the acquisition of literary texts and the method of their processing for the needs of independent literary research
  • Article
  • Full-text available

April 2025

·

1 Read

Bohemistyka

Richard Změlík

The study deals with the issue of acquisition of digital literary data, specifically prose texts of Czech literature, which the data would serve for independent scientific research in the context of digital humanities, or computational literary studies. In the first part, we focus on selected available foreign textual databases, which we characterize with respect to the stated goal, i.e. to the existence of such a digital data collection that would be internally structured and machine-readable. We then focus on the Czech environment, in the context of which we present the emerging database of prosaic texts of Czech literature. We describe its basic structure, the advantage of such structuring, and concrete examples of possible use of the database in statistical analysis of literary texts. We conclude that in the context of the current development of DH we can expect an increasing demand not only for specialized web applications of digital literary corpora, but especially for access to such or similar databases, as these allow for highly variable and individual research.

Download