ArticlePDF Available

Abstract and Figures

The article will focus on H. Beam Piper’s classical story Omnilingual (1957). This Piper-esque writing has entered the records of the science fiction prose for the ‘Martian’ periodic table of elements, being synonymous with a scientific ‘Rosetta-like stone’ in the decipherment area. The work, while having a search potential in text analysis and stylistics, may add in a parallel fashion some lustre to the validity of science as a communicative channel in non-conventional circumstances. In order to capture stylistic features of the novelette, a number of quantitative indicators are drawn in. The study will concentrate on vocabulary-richness indexes (TTR, entropy, RR, RRMc, G, ATL, HL, MATTR, and Lambda), a complex assessment of activity (Busemann’s coefficient, the chi-square testing classification), and a sketch of the Belza chain analysis. The goal of the article is to find distinctive features of the piece in question, and point out ways for further research.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
Journal of Quantitative Linguistics
ISSN: (Print) (Online) Journal homepage:
On Stylometric Features of H. Beam Piper’s
Tomi S. Melka & Michal Místecký
To cite this article: Tomi S. Melka & Michal Místecký (2020) On Stylometric Features
of H. Beam Piper’s Omnilingual , Journal of Quantitative Linguistics, 27:3, 204-243, DOI:
To link to this article:
Published online: 12 Feb 2019.
Submit your article to this journal
Article views: 86
View related articles
View Crossmark data
Citing articles: 2 View citing articles
On Stylometric Features of H. Beam Pipers
Tomi S. Melka
and Michal Místecký
Department of Humanities, Parkland College, Champaign, IL, USA;
Researcher, Las Palmas de G.C., Spain;
Department of Czech Language, Faculty of Arts,
University of Ostrava, Ostrava, Czech Republic
The article will focus on H. Beam Pipers classical story Omnilingual (1957). This Piper-
esque writing has entered the records of the science ction prose for the Martian
periodic table of elements, being synonymous with a scienticRosetta-like stonein
thedeciphermentarea.Thework,whilehaving a search potential in text analysis
and stylistics, may add in a parallel fashion some lustre to the validity of science as a
communicative channel in non-conventional circumstances. In order to capture
stylistic features of the novelette, a number of quantitative indicators are drawn in.
The study will concentrate on vocabulary-richness indexes (TTR, entropy, RR, RR
G, ATL, HL, MATTR, and Lambda), a complex assessment of activity (Busemanns
coecient, the chi-square testing classication), and a sketch of the Belza chain
analysis. The goal of the article is to nd distinctive features of the piece in question,
and point out ways for further research.
1. Introduction
The study of style is probably the most complex problem in text analysis.
There are numerous denitions and accounts in this respect, but hardly any
of them is thoroughly operational (cf. in the chronological order, Swift,
1930 [1907, 1721]; Coleridge, 1914 [1907, 1818]; Wackernagel, 1873/1888;
Cooper, 1907/1930; Sebeok, 1960; Frye, 1963; Baker, 1966; Freeman, 1970;
Chatman, 1971; Sanders, 1977; and other past and recent scholars, too
many to list here).
In point of fact, we have to counter, after all the elementary and
legitimate question what is it(cf. Chatman, 1971; Sanders, 1977). In the
quantitative approach, which is based upon pragmatic principles, one can
consider style to be a latent property of language that cannot be measured
directly(Bell, Berridge, & Rayson, 2009, p. 3), but the discrepancies of
which can be mathematically captured. Basically, researchers in the eld
focus on the dierences in the styles of various pieces of language, and not
CONTACT Tomi S. Melka
This is a revised and expanded version of Melka (2018), published in the journal of Glottometrics,no.43.
2020, VOL. 27, NO. 3, 204243
© 2019 Informa UK Limited, trading as Taylor & Francis Group
on the style as an isolated phenomenon. In the present study, the main
attention will be paid to the spheres of vocabulary richness, computable
cohesion, and activity/descriptiveness of a text, with individual indexes
discerned, evaluated, and criticizedon the bases of their interpretative
potentials (cf. Argamon et al., 2007).
If one develops the problem further, there appear questions that need to
be answered –‘How does it behave?and Why does it behave (in) that
way?The hypotheses set up in this domain will never be ideally answered/
tested because of the polysemy of the notion (cf. Carter & Simpson, 1989;
Eckert & Rickford, 2001; Sebeok, 1960). Nevertheless, one can try to
approximate the theoretical level by setting up simple, quantied indicators
or functions, test them and seek the answer to the why?question by
nding some other properties with which the given one is correlated.
Style just as any other structural phenomenon is result of a self-
regulation, based on . . .its own internal and self-sucient rules(Hawkes,
1977/2004, p. 6). If an observed property Abehaves in a certain manner, it
may be thereby queried: what is the cause of this behaviour? The other
property associated with Amust be quantied too, and their relation must
be expressed in form of a testable hypothesis. Procedures of this kind are
well known from the synergetic linguistics (cf. Köhler, 2005a, p. 766), and
have already applied in text analysis, too.
Methodologically, there must be a possibility of comparing at least two
disparate texts, in order to show that there is a stylistic dierence. Such a
dierence will relate to the personality of the author, thematic choices, or
one of her/his characters if the author is the same. However, although the
dierences may be interpreted by and large intuitively, they must be
supported statistically. The number of statistical methods used for this
purpose is sucient and may be (should be) used for any examination.
The question and the commitment to unlocking the answers of what are
the properties of a text?’–on a microcosmic scale are perhaps no less
similar to those of what are the properties of the world?’–on a macro-
cosmic one .
And yet, one should not be oblivious to the fact that every inspection,
however useful, is only a reection of a restricted view of the text, not the
capturing of truth(cf. Bunge, 1983, p. 269; Popescu, Altmann et al., 2009,
p. 250).
2. H. Beam Piper and his Story Omnilingual
Technically, it would be not only appropriate but also prudent to
enlighten some aspects about the author and Omnilingual (1957). The
writer does his bit to bring a look on the interpretation of unknown data
(deciphering, in this case), using clever and ad hoc mechanisms of
expression as well as a cache of words that excel the daily language
output of many humans. In this sense, useful assumptions can be made
and clues can be possibly found about the author and the deployed
vocabulary richness.
H. Beam Piper (19041964) is a US fantasy and science ction (sf)
writer. The rst name, assumed to be Henry or Horace, is just another
indication about the puzzle surrounding many facets of his life, and the
untimely death by suicide. H. Beam Piper (HBP) did not have much literary
success during lifetime: his rst story Time and Time Again was published
in 1947 when he was 42 years old (Beam Piper, 1947). Having a full-time
position as a night watchman at Pennsylvania Railroad Company, he was
able to support his self-education and passion for writing during a non-
negligible period of time. The mystery novel Murder in the Gunroom
(1953); the sf novels Little Fuzzy (1962), Space Viking (1963), and Lord
Kalvan of Otherwhen (1965) are considered literary highlights of his career.
Biography writers tend to think that a marriage gone awry, an unfriendly
divorce, the death of his literary agent Kenneth White, together with dire
economic woes, caused HBP to self-destruct (Carr, 2008; Hines, 2002). In
latter times, scepticism about the value of his work has been supplanted by
a cult-following movement, by several reprints of the major works, and by a
growing acclaim of his originality and insightfulness.
Omnilingual, published as a novelette in Astounding Science Fiction maga-
zine (February 1957; Figure 1), subsequently known as Analog, was later
collected in Federation (1981), a compilation of short stories by HBP.
Omnilingual deals with a human survey party archaeologists included
looking for clues and/or indigenous relics among the ruins of a very ancient
Martian city. Consider that we are in the realm of ction and such things do
occur accordingly. On the other hand, science fact suggests that the red planets
surface water had likely disappeared before rocks formed about 3.9 billion to
4.6 billion years ago (cf. Carr & Head, 2010;Kramer,2014). A local culture
given to building Martian universities and complex research facilities could
hardly thrive in the aftermath amidst the cold and barren environment and the
absence of a protective atmosphere against UV showering and other radia-
tions. It should also be stated at this point that Pipers is old-fashioned science
ction, with the Mars theme and its inhabitants (nativesand/or Earth immi-
grants) regularly used in several storylines by his predecessors or contempor-
aries. Unusual methods of travelling and mis/adventures, in concert with
philosophical, technological, and socioethical quagmires of the various kinds,
are chronologically shown, e.g. in Greg (1880); Lasswitz (1897/1971); Wells
(1898); Serviss (2006 [1898])Bogdanov(1908/1984); Burroughs (1912/1917);
Tolstoy (1922/1950); or Weinbaum (1934); for more, see Bleiler (1990).
The tradition of the Mars fever(Fergus, 2013) continues later with
Bradbury (1950); Clarke (1951), Brown (1954), Heinlein (1961), Dick
(1964), Disch (1976), Pohl (1976) and Shiner (1984/2010), whose specta-
cular and polyvalent tales are linked one way or another with our
planetary neighbour. Next, the Mars-related literary patterns turn to more
hard science and specic topics. In such works, the human endurance and
resourcefulness are strongly tested in one or more quandaries, with the
nearly far-fetched outcomes upping ones imagination, or with the realiza-
tion of extraordinary feats e.g. Robinson (1992,1993,1996); Bear (1993);
McAuley (1994); Landis (2000); Weir (2011/2014); or Pratchett and Baxter
The human expedition, part of which is the protagonist of Pipersstory,
Martha Dane, uncovers some strange writings while muddling through the
remnants of a large-scale habitat. In an act of scienticdevotion,Martha
makes assumptions, as she confronts the assertive and ego-driven associate
Anthony Lattimer. While the story is set in the 1990s, Omnilingual itself was
written in the 1950s, with Anthony Lattimer, PhD, not taking well to being
Figure 1. Cover of Astounding Science Fiction (February 1957; edited by John W.
Campbell, Jr.) featuring the ctional characters M. Dane, H. Penrose, and S. von
Ohlmhorst. In the far background, a mural found amidst the University ruins shows
aheroic-sized Martianhandling a theodolite-like apparatus. Illustration by Frank Kelly
Freas; reprinted after Wikipedia (2018) public domain image.
upstaged by his young female colleague. With the mysterybeing pursued, the
breakthrough occurs when an ancient Martian University is located. Within its
connes, scores of books are discovered in what appears to be the Mars University
library. At some moment, a diagram of a simple atom and a table of words and
numbers are seen on one of the walls of the Department of Physics/Chemistry.
Given the occurrence of 92 slots, Martha speculated that the structural layout
corresponded to a periodic table. Hence, she builds up the chart in a piecemeal
fashion, computing the best alignment between Earth and Martian tables:
Hydrogen, No. 1: Sarafaldsorn;Helium,No.2:Tirfaldsorn; etc. The deconstruc-
tion of the Martian words in base roots and axes helped her in grasping the
meaning of extra words in a chain-like reaction. The fact is initially suggested in
Section 4, where the archaeologist deduces some similarity between Martian and
the German language, having as the model for generating new words by pasting
existing ones. Wikipedia (2018) keenly dwells on that idea, and suggests by
analogy the string of chemical elements hydrogen: Wassersto; carbon:
Kohlensto;nitrogen:Sticksto,andoxygen:Sauersto, each sharing the root
word sto(stu, matter, substance), and a dierent prex. The ensuing Greek-
based patterns are comparable to some extent to the German-based ones
Nitrogen, Hydrogen, Oxygen, Cyanogen, etc. where the common root is
gen, i.e. generating, producing, issuing, with N: generating nitrous gas, or nitrate
substances;H:generating water;O:generating acid;CN:generating the gas-
eous compound of carbon and nitrogen;etc.
With the evidence getting stronger, the interpretation of a long lost language
began to fall into place. As the known universe is mostly packed with the element
hydrogen, it is in retrospect hydrogen on Earth, Jupiter, on the super-massive
Pistol star (constellation of Sagittarius) or elsewhere (cf. also Beam Piper, 1957,
Section 22), so we tend to think that this kind of Rosetta Stone is, if not fully
warrantable, then conceivable. Without the assistance of an all-languagetext
(the periodic table of chemical elements), it would have taken perhaps more than
a lifetime to crack the Martian writings. Looking at some of the real historical
decipherments, e.g. the names of Egypt-related rulers in the Rosetta Stone (Pope,
1975/1999), or the names of Minoan towns for Linear B (Chadwick, 1958/2000),
it must be concluded that before penning Omnilingual H. Beam Piper was
already familiar with the referents.
2.1. Rosetta Stones as Serendipitous Assistants in Real-Life Settings
It should be mentioned at the outset that the discussions of decipherment
(Subsections 2.1,2.2), rather than being a distraction from understanding
the story and the original idea of HBP, are intended to come to the aid of
the readers. Here, we concentrate on one basic criterion for a feasible
decipherment of an unknown script/language: acquirement of bilingual
inscriptions presumably encoding speech in linearly arranged symbols
(Gelb & Whiting, 1975, pp. 9899).
The annals of decipherment have shown not infrequently records whose
original content is duplicated, paralleled, or slightly paraphrased in other lan-
guages (see Daniels, 1996, Friedrich, 1957/1971,p.153;Gelb&Whiting,1975;
Knight & Sproat, 2009). As language contact situations in pre-modern civiliza-
tions are considered matter-of-fact, the phenomenon of bi- and multilingualism
should not come as surprise. For deadlanguages, however, the primary evidence
about bilingualism is dierent from that on which modern linguists investigating
bilingualism in spoken languages can call (cf. Adams, 2003,p.3;Campanile,
Cardona, & Lazzeroni, 1988). For the decipherer of ancient languages, written
data are the necessary medium. In order to tackle the problem, the precondition
is that scholars should be acquainted sine qua non with one of the languages.
Given the nature of dierent and/or incomplete sound encodings in two or more
scripts on the iconic, morphemic, syllabic, segmental levels consider that a
bilingual text is not a type of external clue that instantly produces a unique
solution to the retrieved portions of ancient writings (cf. Gelb & Whiting, 1975,p.
99). Overall, by virtue of a true (or quasi) bilingual text a duplicate, e.g. well
done!inEnglishvbien hecho!inSpanishor, of a virtual bilingual one place
names, proper names, e.g. a-mi-ni-so,Amnisos;ko-no-so,Knossos;tu-li-so,Tu/
ylissos, after the Linear B decipherment (Pope, 1975/1999, p. 174; Robinson,
2002,p.99)epigraphers/script analysts get a foothold allowing them to advance
in the elucidation or decipherment of the available material.
Several instances corroborate the above: the Palmyrene (a form of Aramaic)
script was cracked in 1754 thanks to a bilingual (Parkinson, 1999,p.16);the
decipherment of Luwian language was ascertained later through the Phoenician-
Luwian bilingual records of Karatepe hill (Gordon, 1968/1987, pp. 100101;
Hawkins & Morpurgo Davies, 1978); the Phoenician version of a Cypriot
bilingual provided the key to the Cypriot syllabary (Friedrich, 1957/1971, pp.
136139, Fig. 60; Steele, 2013, p. 202); the Thugga (modern Dougga) bilingual
text in Tunisia, written in Punic and a variant of the Numidian(the so-called
Lybico-Berber) alphabet (Friedrich, 1957/1971, pp. 118121, Fig. 57; OConnor,
1996, p. 113) also adds up to this list; the so-called Landasalphabet, a fragment
of a copy manuscript written by the Spanish bishop of Yucatán Diego de Landa
about the Maya people and their civilization, assisted Yuri Knorozov in carrying
out the initial substantiation of Maya script on credible phonetic values (Pope,
1975/1999, pp. 199200; Robinson, 2002, pp. 119125).
Still, the renowned multi-text inscription held responsible for starting
the decipherment of a real-world script is the Rosetta Stone as is often
known today . The artefact of Rosetta Stone shows three systems
(Egyptian hieroglyphic, local Egyptian demotic, Greek) coded in two lan-
guages (Egyptian and Greek), displaying a decree in honour of King
Ptolemy V Epiphanes related to year 196 BCE. The literature on this
momentous artefact and its role in the understanding and interpretation of
hieroglyphic writings is large; suce to take note of some references at this
juncture, Friedrich (1957/1971, pp. 1725); Pope (1975/1999, p. 61); Ray
(2007); Robinson (2002, pp. 5660). While the French scholar Jean-
François Champollion is generally credited with using the Rosetta Stone
to decipher the ancient Egyptian script, the intricacies of the decipherment
and the respective contributions are still open to debate (cf. Daniels, 1996,
p. 145; Gordon, 1968/1987; Pope, 1975/1999; Robinson, 2011).
In a narrow sense, the NP-collocation Rosetta Stone has come to indicate
by antonomasia any artefact designed to convey parallel and repeat infor-
mation about unknown entities, events, or cultural phenomena, which
assists in explaining and restoring their original structure and meaning.
2.2. Cosmic Rosetta Stones
So far, unidentied scripts involving human-related settings have been men-
tioned. The cumulative knowledge brings on many challenges of a dierent
nature: unidentied signals, sign sequences, visual-like data streams derive simi-
larly from other-than-human sources, whether Earth-bound, or not. With each
passing year, establishing contact with non-human entities is viewed as more
than plausible, raising scientic and philosophical concerns of the highest order.
The actuality of corresponding with other intelligences or sentient life-
forms, plus its complex ramications, is discussed in dierent sources (cf.
Dick, 1998; Drake & Sobel, 1992; Engdahl, 2001/2006; Golomb, 1961/1968;
Heidmann, 1995; Michaud, 2007; Vakoch, 2014). The coordinated eorts
are mainly based hitherto on the current understanding of human needs, of
the physics of space, and of communication. Therefore, adjudging con-
sciously or unconsciously human characteristics to the contact languageor
channelis regarded as a drawback (cf. Harrison, 2014; Michaud, 2007).
The designed languages and/or devices vary from naturally developed
human languages, to the mathematical ones, radio signals, visual-symbolic
codes, or to the dispatch of robotic space probes carrying messages in
multiple ways. In this context, the most celebrated Rosetta Stone intended
to be intercepted by any scientically educated being in outer space is the
coded message of the Pioneer 10 space probe of 1972 (see Davies, 1995, pp.
5556; Gombrich, 1982, pp. 150151). Despite the outcome of Pioneer 10s
mission (e.g. falling short of achieving its goal for multiple reasons), the
message stands for a deeply symbolic human eort in contacting intelli-
gences beyond Earth. In a similar manner, it paved the way for additional
communication experiments, each a reminder of the humankinds drive to
expand the frontiers of knowledge in the physical and metaphysical sense.
3. Analysis of Text
Omnilingual (1957) is an autonomous dataset of c. 16,730 tokens
(Figure 2)
partitioned in 22 sections, with each of them far from being a suciently large
text. At this juncture, we cannot do better but direct readers for purported text
lengths and reliable results to Eder (2010/2013), Popescu, Altmann et al. (2009,p.
3), and Tuldava (2005, p. 370), whereas for denitions of textin modern
linguistics and quantitative studies, one should refer to Juola (2008, p. 252);
Kubát (2014,p.105);Nekula(2002, p. 489); Popescu, Čech, & Altmann (2011,p.
98); and Yesypenko (2008, p. 18). Similarly, observe that in information science,
one denition of textis formulated as . . .a collection of signs purposefully
structured by a sender with the intention of changing the image-structure of its
recipient(Belkin & Robertson, 1976,p.201).DouglasRaber(2003,p.6)points
outthatifforthemomentwhatitmeanstobeinformedis left aside, acollection
of signs, according to Belkin and Robertson (1976), can appear in a variety of
formats and media, including but not limited to writing. Yet, we shall adhere in
our paper to writingas a standard textual format for the purpose of extracting
data for quantitative analyses.
The simple arithmetic mean would be c. 760 words per corpus section at this
moment, but clearly Figure 2 highlights that the distribution is not uniform in
nature. One realizes that comparing it in favourable terms with other popular
works of H. Beam Piper (1962,1963,1965) adds up to a tenuous practice.
Creative literary samples are (apparently) not written by smart automata with a
quasi-perfect rhythm and disposition; they are not invariant and are character-
ized by an inherent lack of homogeneity (cf. Bell et al., 2009;Strauß,Grzybek,&
Altmann, 2006). This notion nds support elsewhere. As part of the human
thought processes which are hierarchical and discontinuous writing itself is a
reectionof them, contrasting, for example, with spaces distinguished by con-
tinuity and connectivity, such as the physical ones (cf. Khrennikov, 2014). The
point, however, does not suggest that intra-authorial analyses should not be made
one day; it means that given the topic and the lexical specicity of Omnilingual,
observations and results cannot be rid of (some) arbitrariness. A similar argu-
ment can be raised with regard to ction works of other authors. If we hypothe-
tically consider weighing the Mars-themed Omnilingual against another
storybook, e.g. the New Grub Street of George Gissing (2016[2008, 1891]), the
discrepancy is pretty obvious: the relianceonsamplesizemaycausebias-related
conclusions (i.e. the sampling variation problem).
Specically, Gissing crafted a
three-volume novel of c. 220,000 tokens. Another concern is that the plot revolves
around two contrasting characters in the late 19th century of Londonsliterary
world, and their hardships, attitudes and ethical (or not) choices regarding
professional and social life. In this sense, attributions to these dissimilar spheres
of action and location (Earths late Victorian London vs Mars, plus the involved
cognitive challenges) seem to build more corpus-based gaps than bridges. In
sheer size, paralleling or overlaying statistics between H. Beam Pipers and
George Gissings may lead to some implausible claims and encroachment of
textual realities. What is also of note is the remark found in the thesis of Jack W.
Grieve (2005, p. 21) when reviewing measures of vocabulary richness:
’…the vocabulary of a text depends far more on its subject than on its author.
For while every word in a text must be drawn from its authors vocabulary,
dierent subjects will activate dierent sections of an authors vocabulary,
and dierent sections of any authors vocabulary will not all be equally rich.
The raw data of Omnilingual (1957) are retrieved from the public domain
formatted in txt les and its 22 sections separately arranged, the language data
are processed per statistical software packages with know-how in such matters:
1QUITA: Quantitative Index Text Analyzer (Kubát, Matlach, & Čech,
2LancsBox: Lancaster University corpus toolbox (© Vaclav Březina,
2018), for the part addressing the Busemanns coecient.
3.1. Properties of the Vocabulary
Vocabulary richness in an organized text can be measured:
(a) Directly, by naming the individual types not tokens because the
number of tokens is automatically greater in synthetic languages. The
types (i.e. distinct words) can be ordered according to some principle, e.g.
Figure 2. The graph conveys the number of tokens per individual sections in HBPs
Omnilingual (1957). The N-size shows various steep slopes, especially through sections 10
to 20, spiking at # 13.
ranked according to their frequency, their length, etc. (see e.g. Baayen,
2001;Čech, Popescu, & Altmann, 2014;COCA,2018;Herdan,1964;
Johnson, 2008;Köhler&Galle,1993;Kubátetal.,2014; Leech, Rayson,
Strauß et al., 2006).
(b) Indirectly, by performing some classications of the types and set-
ting up new distributions: they can be classied according to the
parts-of-speech (PoS) to which they belong, according to the role
they play in the sentences. On the other hand, some special classes, e.
g. adnominals, verb valencies, etc., should be separated (cf. Fortis &
Fagard, 2010; Helbig & Schenkel, 2011[1991, 1969]; Herbst, Heath,
Roe, & Götz, 2004; Köhler, 2005b; Pan & Liu, 2014).
(c) By means of indicators which can be established either directly,
from the existing approaches (a) and (b).
It is re-emphasized that the aim in this article is to study the vocabulary of
H. Beam Piper (1957), and along this line some of its properties are shown.
As expected, the frequencies of distinct words in individual sections are
counted, evaluated and, in addition, the development of the text is studied.
As stated earlier, H. Beam Pipers(1957) novelette has 22 sections/chapters,
and for the sake of cross-checking and further examination, they are all
listed in Tables 1 and 5.
3.1.1. TTR
As to vocabulary richness, many indicators are available; among these, the
classical type-token ratio (TTR) is widely used, though one needs to bear in
mind that it substantially depends on the text-length. Since, in Omnilingual,
the sections are of a comparable size, it was included in the present
research; the count states:
Table 1. Types and tokens as recorded through each section of Omnilingual.
Section Types Tokens Section Types Tokens
Omnilingual_1 280 520 Omnilingual_12 261 478
Omnilingual_2 398 844 Omnilingual_13 622 1670
Omnilingual_3 274 557 Omnilingual_14 310 602
Omnilingual_4 281 608 Omnilingual_15 297 545
Omnilingual_5 362 761 Omnilingual_16 464 1136
Omnilingual_6 371 884 Omnilingual_17 399 818
Omnilingual_7 409 869 Omnilingual_18 410 867
Omnilingual_8 425 976 Omnilingual_19 340 747
Omnilingual_9 322 615 Omnilingual_20 214 478
Omnilingual_10 266 487 Omnilingual_21 298 712
Omnilingual_11 419 835 Omnilingual_22 309 700
where Vstands for the number of types and Nfor the total of the words in a text.
The resulting value relates to vocabularydiversity:themorediversetheV,the
higher the TTR, see Table 5. The numbers of types and tokens found across the
sections are listed in Table 1.
3.1.2. Entropy
Next, lexical wealth of a text can be evaluated on the basis of entropy.
Derived from the original notion introduced by Shannon (1948), linguistic
entropy measures the degree of vocabulary dispersion in a text; it can also
be interpreted as a measure of its monotony. Its formula is as follows:
pilog pi;(2)
with Kstanding for the inventory size, and pirelative frequency of a given
word. It needs to be pointed out that, as in TTR, a text size has got a
considerable impact on the values; moreover, a linguistic type plays its role,
too, as it has been found that entropy gets higher with the level of analytical
character of language (cf. Strauß, Fan, & Altmann, 2008, p. 96). It makes
sense, since these tongues tend to use more words in general, which
increase the gure of the measurement.
3.1.3. Repeat Rate and RR
If there is a necessity to measure repetitiveness of individual words, one can
make use of the repeat rate (RR). George U. Yules(1944/2014)character-
istic Kindicates through inversion that the richer the text is, the smaller the
repetition of words is. Its basic formula reads:
where rmeans a rank, Vis the number of distinct words (types), and p
are the
squares of individual relative frequencies. The RR formula can be relativized,
transformed in entropy, or chi-squared (cf. Altmann & Köhler, 2015,p.38),etc.
This procedure was normalized by McIntosh (1967; see also Popescu,
Altmann et al., 2009), yielding:
RRmc ¼1ffiffiffiffiffiffi
The point of McIntoshs change to the original formula was to link it to
the size of the text, which is expressed in the number of tokens. Thanks
to this amendment, it is not needed to count the minimal value of RR to
nd a springboard for comparisons; two texts can thus be contrasted
directly on the bases of the RR
counts. In the present analysis, both
indexes have been calculated; the results are listed in Table 2.
The sequence of RR is not monotonic, with the steep jump revea-
lingly shown in Section 14 (Figure 3). Apparently, the boundary con-
ditions lead H. Beam Piper to write the given section in a slightly
dierent manner. The observation strikes well a chord with the
change-pointnotion, as shown in the Subsection 5 Time series ana-
lysisof F. J. Tweedie (2005, pp. 390391).
Figure 3. Repeat-rate development through the Omnilingual (1957) sections. If the
whole text was concentrated in one uniformly replicated word, we would acquire a
theoretical RR
= 1 and that is hardly the case in point.
Table 2. RR and RR
gures for the Omnilingual sections.
Section RR RR
Section RR RR
Omnilingual_1 0.018 0.922 Omnilingual_12 0.013 0.946
Omnilingual_2 0.010 0.946 Omnilingual_13 0.011 0.932
Omnilingual_3 0.011 0.951 Omnilingual_14 0.020 0.910
Omnilingual_4 0.014 0.939 Omnilingual_15 0.012 0.947
Omnilingual_5 0.010 0.949 Omnilingual_16 0.008 0.958
Omnilingual_6 0.011 0.943 Omnilingual_17 0.013 0.934
Omnilingual_7 0.009 0.954 Omnilingual_18 0.013 0.933
Omnilingual_8 0.009 0.949 Omnilingual_19 0.012 0.941
Omnilingual_9 0.014 0.934 Omnilingual_20 0.013 0.949
Omnilingual_10 0.012 0.950 Omnilingual_21 0.011 0.952
Omnilingual_11 0.011 0.941 Omnilingual_22 0.010 0.957
3.1.4. Ginis Coecient
One of the many possibilities to account for the richness of text is Ginis
coecient (Kubát et al., 2014; Popescu, Altmann et al., 2009, pp. 5463).
Similarly, the indicator can be used in other scientic areas that are not
concerned with stylometric experiments (cf. Damgaard & Weiner, 2000,or
Gastwirth, 2017).
Ginis coecient is the space between the Lorenz curve and the straight
line joining <0;1> in the two-dimensional coordinate system (Gini, 1921; cf.
Ceriani & Verme, 2012). The Lorenz curve is the stepwise adding of relative
frequencies beginning from the lowest up to the highest (Popescu, Altmann
et al., 2009, p. 56, Fig. 3.11). Since this constitutes an area, one needs to
gure out all individual areas between the two lines. Regardless of the fact,
there are easily computable approximations at our disposal. One of them is
given as:
and rendered as 1 + 1/V 2* μy/V, where μis the mean of the frequencies.
For comparative purposes, one can use the variance of Gconsistent with:
Var GðÞ¼4σ2
where σ
is the variance of the rank frequencies. The values for individual
sections are listed as fractions in Table 3.
On the whole, Giniscoecient tells us that the smaller its values, the greater
will be the vocabulary richness (e.g. Popescu & Altmann, 2006). The sequence of
Giniscoecients could be captured by a straight line, but Section 13 involves a
climactic value, plus the variation among the other chapters is clearly perceptible.
3.1.5. Hapax Legomena and Average Tokens Length
complete, two more measures are gured out. First, hapax legomena (HL)
count is a simple ratio of all the words that occur only once in a text to the
total of them (Popescu, Mačutek et al. 2009, p. 99). As there is no empirically
Table 3. Results of Ginis coecient on individual sections.
Section Gini Section Gini Section Gini Section Gini
1 0.412 8 0.483 15 0.401 19 0.459
2 0.461 9 0.422 16 0.494 20 0.454
3 0.434 10 0.392 17 0.453 21 0.467
4 0.471 11 0.443 18 0.467 22 0.459
5 0.456 12 0.397
6 0.489 13 0.543
7 0.451 14 0.437
attested number of hapax legomena to be found in a text, the proportion is not
exclusively exploited as an indicator for richness; its importance lies rather in
comparisons. The development of HL ratios across the investigated text is
illustrated in Figure 4.
Second, the average tokens length (ATL) is estimated considering the
mean of a word size in characters. The characters were chosen as they
seem to be the steadiest unit, withthephonemesbeingslightlydierent
in individual varieties of English, and the system of syllabic divisions not
unied. Mathematically, it is expressed as:
ATL ¼1
where N= number of tokens, and p
= individual word size.
The length may be directly linked to complexity or style, as, for
instance, the English vocabulary tends to manifest itself in two or even
three layers (e.g. spell enchantment [n.]; own possess [v.]; edgy
excitable [adj.]; re ame conagration [n.]; ask question inter-
rogate [v.]; clear pellucid transparent [adj.]). These content words,
having originated from dierent sources (Germanic, French, and Latin),
do not only have variegated meanings, but are felt to be situated at various
stylistic levels, the highest ones being reserved for words of the Romance
provenance (cf. Jackson & Amvela, 2000).
Figure 4. HL ratios across the twenty-two sections of Omnilingual.
3.1.6. The Lambda Indicator (Λ)
Every frequency distribution of words in a text, whether ranked or presented as a
spectrum, displays a number of properties which can be measured, compared
and tested. One among the many others developed in the last years is the so-
called lambda indicator,dened on the basis of Euclidean distances between
neighbouring/ranked frequencies (cf. Popescu et al., 2011,p.3).Itmaybe
relativized in such a way that text size does not have an apparent inuence.
Still, under the premises, it can be approximated in a simple form (cf. Popescu &
Altmann, 2015). As the underlying Euclidean distance can be approximated by:
where Vis the vocabulary size (or the highest rank), f
is the frequency of a
unit at rank 1 (the most commonly used word) and his the h-point dened as:
h¼r;if there is an r ¼fðrÞ
rjriþfðiÞfðjÞ;if there is no r ¼fðrÞ
we obtain for (1) an estimated lambda in the form of
Λ¼Llog NðÞ
N¼Vþf1h1ðÞlog NðÞ
In explicit terms, hcan be described as a xed point along the rankfrequency
distribution, where rand f(r)ofaspecic linguistic unit concur during the
counting (cf. Popescu & Altmann, 2006,2007). For cases where r=fand the
point is unattainable, hcan be decided by the point where the product of rank
and frequency reaches its maximum (Kelih, Rovenchak, & Buk, 2014, p. 84). The
h-point has found a use in quantitative text analysis (e.g. in the measurement of
vocabulary richness) and in cross-linguistic comparisons, enfolding synthetic
and analytic languages (e.g. Popescu et al., 2011,pp.1011). This premise,
however, is all too often subject to text size in its applications, where texts of a
similar length, or other indicators that rely on the h-point and are normalized as
tothesizeoftext,arepreferableduringtheanalyses(Kelihetal.,2014, p. 85).
In view of the aforementioned objection by Kubát (2014)concerningthe
dependence of lambda upon text length, Pearson and Kendall correlation coe-
cients (cf. Zaid, 2015) have been counted; their values (0.37 and 0.26) have
shown that there is a feeble inverse dependence, but it does not seem prominent
within the studied context. It may thus be concluded that lambda is a valid
indicator of vocabulary richness for Omnilingual.
Further work can be done if one considers comparing the variance (Var Λ*) of
each section, and performing a normal u-test (Table 4). Nevertheless, given the
projected extension which may include other-than-Omnilingual texts, we would
prefer treating the topic in a study of its own.
The results are listed in Table 5.
3.1.7. Busemanns Coecient
Another property of text which can be measured is activity. The simplest way
of its quantitative evaluation is Busemannscoecient (1925),
which is the
division of the number of verbs to the sum of verbs and adjectives; namely:
Table 4. Values of h-point and lambda according to QUITA text analyser; cf. also Figure 5
and Figure 6.
Text section H-point Lambda Text section
point Lambda
01. Martha Dane paused. . . 7.00 1.675 12. The sixth oor was. . . 7.00 1.602
02. There were ten people. . . 9.83 1.482 13. They made their
way. . .
16.00 1.387
03. Selim von Ohlmhorst. . . 9.00 1.442 14. Lunch at the huts. . . 7.00 1.678
04. Photographs. . . 10.00 1.366 15. They worked up. . . 8.00 1.607
05. Sachiko was speaking. . . 10.00 1.460 16. The next day. . . 13.00 1.302
06. Michael Ventris. . . 11.00 1.383 17. Ivan Fitzgerald. . . 9.00 1.573
07. Three men had come
in. . .
10.50 1.458 18. Martha
remembered. . .
10.00 1.562
08. The library, which was
also. . .
11.00 1.405 19. She was halfway. . . 9.50 1.472
09. They all got out. . . 8.00 1.644 20. Ninety-two! 9.00 1.325
10. The door, one of the
double. . .
7.50 1.593 21. Tranter hesitated. . . 9.50 1.299
11. The hallway, too, was
thick. . .
10.00 1.608 22. Sachiko Koremitsu. . . 10.50 1.315
Table 5. Integrated results of eight stylometric indicators regarding the text of
Text TTR Entropy RR RR
G ATL HL Λ(Lambda)
Omnilingual_1 0.538 7.301 0.018 0.922 0.412 4.562 0.398 1.675
Omnilingual_2 0.472 7.782 0.010 0.946 0.461 4.344 0.333 1.482
Omnilingual_3 0.492 7.403 0.011 0.951 0.434 4.176 0.341 1.442
Omnilingual_4 0.462 7.268 0.014 0.939 0.471 4.610 0.327 1.366
Omnilingual_5 0.476 7.698 0.010 0.949 0.456 4.281 0.336 1.460
Omnilingual_6 0.420 7.666 0.011 0.943 0.489 4.256 0.273 1.383
Omnilingual_7 0.471 7.908 0.009 0.954 0.451 4.377 0.314 1.458
Omnilingual_8 0.435 7.870 0.009 0.949 0.483 4.403 0.294 1.405
Omnilingual_9 0.524 7.532 0.014 0.934 0.422 4.498 0.387 1.644
Omnilingual_10 0.546 7.429 0.012 0.950 0.392 4.314 0.392 1.593
Omnilingual_11 0.502 7.855 0.011 0.941 0.443 4.623 0.372 1.608
Omnilingual_12 0.546 7.367 0.013 0.946 0.397 4.529 0.406 1.602
Omnilingual_13 0.372 8.103 0.011 0.932 0.543 4.480 0.238 1.387
Omnilingual_14 0.515 7.303 0.020 0.910 0.437 4.711 0.380 1.678
Omnilingual_15 0.545 7.513 0.012 0.947 0.401 4.464 0.400 1.607
Omnilingual_16 0.408 8.025 0.008 0.958 0.494 4.402 0.256 1.302
Omnilingual_17 0.488 7.722 0.013 0.934 0.453 4.581 0.346 1.573
Omnilingual_18 0.473 7.735 0.013 0.933 0.467 4.314 0.337 1.562
Omnilingual_19 0.455 7.602 0.012 0.941 0.459 4.278 0.297 1.472
Omnilingual_20 0.448 7.061 0.013 0.949 0.454 4.153 0.293 1.325
Omnilingual_21 0.419 7.499 0.011 0.952 0.467 4.253 0.249 1.299
Omnilingual_22 0.441 7.564 0.010 0.957 0.459 4.186 0.273 1.315
The verbadjective ratio has been studied in several works: e.g. Altmann
(1978,2018a); Altmann and Köhler (2015); Antosch (1969); Bakker (1965);
Boder (1940); Místecký (2018); Těšitelová (1987/1992). One reason to
include activity in the assessment of Omnilingual is that, unlike other
indicators (TTR, RR, or entropy), the impact of the text length on it is of
no consideration (cf. Zörnig et al., 2015). According to the results, texts can
be classied into active (Q> 0.5), neutral (Q= 0.5), and descriptive ones
(Q< 0.5). As such division is quite rough, a chi-square test may be
Figure 5. The graph shows the development of lambda (vocabulary use) across the
sections of Omnilingual. The course of lambda is not even, suggesting changes in the
structure of text. The changes may be related with particular properties of each
section, with pauses, with the thematic yarn, or a posteriori dierent presentations,
once HBP nished writing the preceding section.
Figure 6. Graphic distribution of h-point and lambda across the 22 sections of the text.
Further investigations on H. Beam Pipers work and other inter-authorial comparisons
are needed for a reliable statement on the possible correlation of these two indicators.
introduced, which states whether activity/descriptiveness of a text is statis-
tically signicant. Its formula (cf. Altmann & Köhler, 2015; Zörnig et al.,
2015) is rendered as:
Given the results, the test ranges the texts on the basis of the following
(1) SA signicantly active (Q> 0.55, χ
> 3.84);
(2) AC active (Q> 0.55, χ
< 3.84);
(3) N neutral (0.45 < Q< 0.55);
(4) DE descriptive (Q< 0.45, χ
< 3.84);
(5) SD signicantly descriptive (Q< 0.45, χ
> 3.84).
In the present study, the research in activity has been operated both via
the LancsBox (2018) software and manually, as the programme is incapable
of discerning verbs and predicates. Moreover, the sf novelette contains
adjectives which may not be deep-rooted in standard language. For possible
objections to be prevented, a meticulous attention was paid to dening both
word classes. Verbs like beand haveare not separately counted; never-
theless, in serving as auxiliaries in compound constructions, or embedded
in xed sets and idioms, they are treated as a single unit, e.g. youre not
going to insist on; you want to be a big shot; was ahead of him; we have to
risk, that was right; he had been afraid of; she was trying to think. Modal
verbs are not independently counted rather than estimated as subsidiary
parts, e.g. must have been carried on; I ought to mention; it would mean
something. . . Adjectives are easier to classify; as autonomously standing
units, e.g. small houses; aky stu,unshaded light; or compound forms,
either with a blank (frontal modiers, e.g. the Space Force ocers; a bar and
lunch counter), or hyphenated, e.g. purple-tinged; brush-grown; proton-
and-neutron; high-level; long-chain; cases where they are recognized as
alphanumeric combinations, e.g. carbon-14 dating; as acronyms, e.g. A-
bomb mushrooms; or a mixture of them, e.g. the 4000-f.s. bullet.
Modiers as part of xed collocations are not tagged as adjectives, e.g.
Rosetta Stone;tobea big shot; the old dog; the Dark Agesin Europe;
Stone Age;’‘Syrtis Depression;the Wicked Witchin the Wizard of Oz.
Accordingly, there are exceptional circumstances when H. Beam Piper
places in the text invented Martian words. We considered such fragments
of the Martian lingo to be unknownand disregarded for tagging.
Last but not least, if one pursuits the scaling of verbs in terms of activity
(e.g. go jog run sprint; with the last one bringing about more activity
than the others), the analysis can be more comprehensive. Another line of
research is to subdivide verbs in keeping with the various semantic cate-
gories of Dixon (1991/2005) or Kipper, Korhonen, Ryant, and Palmer
(2008), and study their ranked frequencies (cf. Levickij & Lučak, 2005).
The same can be said in assigning any static,semi-static, and vibrant
adjectival quality along a similarly graduated series (e.g. a dead person a
lethargic person an awkward person adramatic person acholeric
person). Further renements can be carried out on the basis of the semantic
orientation (polarity), or not, e.g. adjectives that involve a desirable state
graceful,precise,exultant vs adjectives that often represent a negative state
broken,furious,tiresome; and those adjectives that have no orientation as
per binary properties, e.g. green,glasslike (cf. Lyons, 1977/1996, pp. 270
291; Wiebe, 2000). In the article, however, for the sake of simplicity and
traditional pragmatism, we comply with the basic division.
The results are summarized in Table 6. It is noteworthy that almost all
sections in Omnilingual are signicantly active, which may have to do with
the general tendency of modern ction to avoid rich adjectival embellish-
ments. Moreover, because of the situational development of the storyline,
with members of the Martian expedition constantly investigating, debating,
deploying and re-deploying, dynamic and contingent motions (plus argu-
ments) are to be expected nding their direct linguistic expression in
verbs. Otherwise, Těšitelová (1987/1992) mentions that, compared to the
number of dierent verbs, non-ctional texts have a higher number of
dierent adjectives. Such a ruling seems to be related to the strongly
Table 6. Calculations concerning activity in the Omnilingual novelette.
Text Verbs Adjectives Activity Chi-square Text type
Omnilingual_1 53 40 0.57 1.81 A
Omnilingual_2 94 62 0.60 6.56 SA
Omnilingual_3 80 18 0.81 39.22 SA
Omnilingual_4 78 40 0.66 12.24 SA
Omnilingual_5 100 49 0.67 17.46 SA
Omnilingual_6 114 49 0.70 25.92 SA
Omnilingual_7 125 50 0.71 32.14 SA
Omnilingual_8 150 55 0.73 44.02 SA
Omnilingual_9 87 50 0.64 9.99 SA
Omnilingual_10 76 18 0.81 35.79 SA
Omnilingual_11 96 58 0.62 9.38 SA
Omnilingual_12 56 30 0.65 7.86 SA
Omnilingual_13 228 99 0.70 50.89 SA
Omnilingual_14 50 46 0.52 0.17 N
Omnilingual_15 84 23 0.79 34.78 SA
Omnilingual_16 190 41 0.82 96.10 SA
Omnilingual_17 98 61 0.61 8.61 SA
Omnilingual_18 114 40 0.74 35.56 SA
Omnilingual_19 100 45 0.69 20.86 SA
Omnilingual_20 60 27 0.69 12.52 SA
Omnilingual_21 74 30 0.71 18.62 SA
Omnilingual_22 102 20 0.83 55.11 SA
nominal (substantive-based) character of these texts, given their descriptive
and informational nature.
The only exception in the overall activity-infused novelette seems to be
Section 14, where the description of a part of the Martian university is
provided; it is thus much more of an academic-like report than a piece of
ction. Furthermore, Section 14 deviates from the expected numbers in
more than one respect, which is going to receive attention in the Discussion
section of this article.
3.1.8. MATTR (Moving-Average Type-Token Ratio)
Given the fact that TTR is dependent on text length, there have been
attempts to create a vocabulary-richness measure that would be freed
from this restriction. This age-long ambition among quantitative linguists
has been stated and documented in numerous publications (see Popescu et
al., 2011, p. 1; Tweedie, 2005, pp. 389390). After various tries, the study by
Covington and McFall (2010) developed a normalized TTR formula called
MATTR (Moving-Average Type-Token Ratio). It is based upon the idea of
a text chunk awindow’–which moves along the text, always by one
token at a time; the overall gure for the text is then the average of these
window TTRs. Mathematically, the aforementioned is conveyed through
the formula:
here, Nstands for the length of the text, Vifor the number of types in one
window, and Lfor the arbitrarily chosen length of the window. In the
present analysis, the standard number of 100 tokens has been opted for.
To date, the method has been used in several studies (cf. Kubát, 2016;
Kubát & Milička, 2013; Savoy, 2017), and is also included in a new book on
statistics in corpus linguistics (Březina, 2018).
For the text at hand, the MATTR results are listed in Table 7.
The subsequent graph shows the MATTR trend throughout the sections
of the story (Figure 7).
Even though the MATTR-based vocabulary richness is a more suitable
tool when it comes to comparative analysis, an interpretation may be
proposed even here. It is symptomatic that the vocabulary richness drops
in Sections 4 and 14, where the text gets very analytical, as it deals, in the
former, with linguistic issues and in the latter, with a dry scientic descrip-
tion. In general, there is no single trend to be determined on the basis of the
numbers. The results of MATTR thus conrmed what has been found out
3.1.9. Belza chain Belza Chains
Belza chains are a numeric attempt at delimiting the degree of cohesion in a
text. A Belza-chain is a later coinage following the original work of the author
(Belza, 1971). Unlike the qualitative approaches, useful as they are (cf.
Dontcheva-Navratilova, Jančaříková, Miššíková, & Povolná, 2017;
Beaugrande & Dressler, 1972/1981; Halliday & Hasan, 1976; Van Dijk,
1977/1992), it tries to gure out its level with a strictly dened exactitude
within which the phenomenon can be encompassed. The study of Belza
chainBelza chains has gained a momentum recently (Altmann, 2018b; Chen
& Altmann, 2015; Roelcke, Popescu, & Altmann, 2017), and various indexes
have been implemented to make the chain analysis practical for stylometric
purposes (cf. Místecký, 2018; Místecký, Yiang, & Altmann, 2018).
As to the notion, a Belza chain is a string of the same idea that stretches over
neighbouring sentences (lines in poetry). For example, the devised mini-text:
Table 7. MATTR results of Omnilingual by H. Beam Piper (1957).
Section MATTR Section MATTR
Omnilingual_1 0.748957 Omnilingual_12 0.742845
Omnilingual_2 0.755573 Omnilingual_13 0.764581
Omnilingual_3 0.752369 Omnilingual_14 0.720137
Omnilingual_4 0.715745 Omnilingual_15 0.773378
Omnilingual_5 0.768555 Omnilingual_16 0.779495
Omnilingual_6 0.745605 Omnilingual_17 0.757484
Omnilingual_7 0.772853 Omnilingual_18 0.740429
Omnilingual_8 0.783753 Omnilingual_19 0.759548
Omnilingual_9 0.75763 Omnilingual_20 0.737869
Omnilingual_10 0.776663 Omnilingual_21 0.744984
Omnilingual_11 0.763979 Omnilingual_22 0.751442
Figure 7. MATTR development in the Omnilingual(1957) sections.
I was dancing with Annie at the ball. She was wearing a stylish blue dress.
Once I touched it, I knew I was in love with her.
contains two Belza chainBelza chains, the rst is represented by the
sequence (Annie; she; her), and the other one by the string (dress; it). In
other words, there are two concepts that are elaborated in the passage
(Annieand dress), with the rst one being more outstretchedthan the
other. If a sentence is unlinked to its neighbours, it is assigned the Belza
chain value of 1.
As to the whole of the text, it may be assessed on the basis of several
indicators. First, there is an average Belza chain length, which is dened as:
where PLBstands for the sum of the Belza chain lengths, and Bfor the
total of them. The result which, in the present case, is 2.5 may be taken
as a measure of text association (A). Another sophisticated way of evaluating
a text on the basis of Belza chain presupposes a weighting of its elements.
Chen and Altmann (2015) proposed a system which is presented in Table 8;
it was developed in order to rank the elements from the nearest to the most
distant to the core notion. The lower the nal value, the more closely linked
the individual elements of a Belza chain are supposed to be. For the sake of
the present analysis, the category of indenite pronouns has been added to
weight-class 5.
In the example, the rst chain contains a girls name and two
pronouns referring to her; they are thus weighted [1; 6; 6], with the
average weight of the chain being 4.33. The second chain comprises
the weights [1; 6], its average thus totals 3.5. As to the entire text, a
formula has been designed to calculate the degree of weight richness;
it reads:
Table 8. Weighting of Belza chain elements according to Chen and Altmann (2015).
Weight Chain Element
1 Main word, head of the chain
2 Synonym, metaphor, variant
3 Hyponym (=specication)
4 Hypernym (=generalization, class)
5 Relative pronoun, relative phrase, rhetoric question, rhetoric answer,
article, interrogative pronoun, demonstrative pronoun, indenite pronoun
6 Personal pronoun
7 Possessive pronoun
8 Grammatical ax or introection referring to the head
9 Derivation or composition containing the head;
conversion of head to other POS
10 Suppletion
where fW
ðÞstands for a frequency of a given weight, and Nfor the
number of sentences (lines) in a text. In the ball story, the count yields:
CL ¼22þ32
It should be noted that the gures of the chain length count are of use
mostly in comparisons.
Considering the shortage of space, the present investigation will focus on
two Omnilingual sections only, with the goal being to present the method,
further to be elaborated in articles to come. The results of the research are
collated in the below Tables 9 and 10.
The measure of associativity indicatesthatmostchainsinboth
sections are two-member, which is a standard situation in many
instances of language that have been analysed so far (cf. Místecký,
2018). Moreover, Section 2 tends to be richer in both the types and
the lengths of the chains explored; this may be attributable to its
discrepant character, as it opens with a record of the people present
on the spot (Belza chain 2), but most of it is covered by the dialogue
between two scientists, which holds together much more thanks to
logical, non-linguistic coherence than because of formal cohesive
devices. All in all, a careful scrutiny of the results will only be possible
after many various texts are processed.
Table 9. The results of the Belza chain analysis of OmnilingualsSection 1.
Number String Length Weights Average chain weight
1 [Martha; she; she] 3 [1; 6; 6] 4.33
2[] 1 [1] 1
3 [streets; streets] 2 [1; 1] 1
4 [she; she; she] 3 [1; 1; 1] 1
5 [machinery; buldozers, shovels, draglines] 2 [1; 3] 2
6 [she; she] 2 [1; 1] 1
7 [pickmen; pickmen] 2 [1; 1] 1
8 [native; native] 2 [1; 1] 1
9 [laborer; labor] 2 [1; 9] 5
10 [something; jack-hammer] 2 [5; 1] 3
11 [she; she] 2 [1; 1] 1
A 2.09
CL 18.39
4. Discussion
4.1. Quantitative Terms
Most assuredly, interpretation of data is as good as the performed statistical
measures, plus the authenticity, size, and characteristics of a sample. Whilst
not oering pat solutions, statistics can be symptomatic of underlying
Table 10. The results of the Belza chain analysis of Omnilingualssection 2.
Number String Length Weights
Average chain
1 [she; she; her] 3 [1; 1; 7] 4.5
2 [people; them; Selim; ocer; Colonel; a couple
of. . .; Sir. . .]
7 [1; 6; 3; 3; 3;
3; 3]
3 [girls; them] 2 [1; 6] 3.5
4 [Sir; he; his] 3 [1; 6; 7] 4.67
5 [she; she] 2 [1; 1] 1
6 [Sachiko; Japanese girl; she; her] 4 [1; 2; 6; 6] 3.75
7[] 1 [1] 1
8[] 1 [1] 1
9[] 1 [1] 1
10 [] 1 [1] 1
11 [I; I] 2 [1; 1] 1
12 [this; it] 2 [1; 6] 3.5
13 [book; it] 2 [1; 6] 3.5
14 [] 1 [1] 1
15 [] 1 [1] 1
16 [] 1 [1] 1
17 [] 1 [1] 1
18 [] 1 [1] 1
19 [] 1 [1] 1
20 [] 1 [1] 1
21 [] 1 [1] 1
22 [] 1 [1] 1
23 [] 1 [1] 1
24 [] 1 [1] 1
25 [] 1 [1] 1
26 [] 1 [1] 1
27 [] 1 [1] 1
28 [] 1 [1] 1
29 [] 1 [1] 1
30 [] 1 [1] 1
31 [Martha; Martha] 2 [1; 1] 1
32 [It; It] 2 [1; 1] 1
33 [] 1 [1] 1
34 [] 1 [1] 1
35 [] 1 [1] 1
36 [] 1 [1] 1
37 [] 1 [1] 1
38 [] 1 [1] 1
39 [] 2 [1] 2
A 1.54
CL 51.33
appears equally supportive of a qualitative approach of the text, i.e. close
reading. As already noted, the smaller Giniscoecient, the greater the
vocabulary richness; this is revealed in the Sections 10 and 12, with
values below the 0.4 threshold. In the lowest supposed value, i.e. G=0.0,
all distinct words in Omnilingual (Beam Piper, 1957)wouldhavebeen
used with the same frequency. The quoted sections show a number of
features bolstering that property: carefully described scenarios strewn
with techno-parlance, quite often falling next to an enumerativeand
informational style; sparingly used dialogues (mostly bearing the mark of
a silent monologue); and brevity in terms of tokens. The substantial use
of technical word-forms in linewiththesizeoftextsectionosets the
lexical dearth. On the other side, Section 13 (the longest in the novel-
ette) shows the highest jump in Giniscoecient: 0.543. The relative
diminishing of richness in vocabulary could hint at more embedded
dialogues, where colloquial/informal domestic-like speech may taintto
a degree the pool of scientic hapax legomena, V(1, N), or dislegomena,
V(2, N). The other sections do not exhibit the change noticed in Section
13: uctuations are strong, between c. 0.4 and 0.49, but not that striking
(see Table 3). The data collectively suggest that Beam Piper (1957)might
have taken a respite
before writing the section in question. The out-
come is a more relaxedand protractedtext, or words to that eect.
Nonetheless, generalizing on the basis of a single written segment should
be cautiously avoided, as it may stand only for a subset or a frame of
writers linguistic skills. Overall, the indicator does not suggest a poor
acquisition or management of English vocabulary. This can mean that
H. Beam Piper was sedulously consuming historical/scienticmaterial
about archaeological decipherment, bio-chemistry, interplanetary
travel and exploration, and gadget engineering. The following
standard and non-standard words burst across the chapters (neologisms,
rare or common portmanteaus), and they come in dierent avours:
<spraygun>, <airdyne pilot>, <airsealing>, <viviparous>, <gamogenetic>,
<Photostat>, <stenophone>, <oxyacetylene torch>, <spectroscope>,
<loess>, <vibratool>, <tarpaulins>, <radiophone>, <jetticopters>,
<nuclear-electric jackhammer>, <transuranics>, <beryllo-silver alloys>,
the diverse intellectual concerns and the inventive strain of the author.
The observation nds justication in J. F. Carr (2008), with Pipersup-
to-date information accomplished by dint of relatable literature or par-
ticipation in sf conventions.
The repeat rate data in Table 2 shows that the ow of narrative in terms
of vocabulary wealth does not manifest dull or relatively dull uniformity.
The fact ts well with the mixed nature of text samples (cf. Bell et al., 2009,
p. 3; Oakes, 2009, p. 1071), and may have to do with the time axis through
which Omnilingual (Beam Piper, 1957) was written. It may be theorized
that uctuating values do not only act in response to the required situa-
tions/subplots along the sections, but also to the prevailing emotional mood
of the author himself. An interesting observation relates to Section 14,
where the repeat rate (RR) value is doubled or nearly doubled in compar-
ison, for example, with Sections 2,5, 8, 21, 22, pointing at lower vocabulary
richness. In opposition, Ginis coecient registers 0.437 for # 14, whereas
the numeric values for # 2, 5, 8, 21, 22 swing within the range 0.4560.483
(#5 #8).
The doubling of repeat rate (# 14) contrasts with Section 13svaluethe
longest in the story and the one with several instances of up-close dialogues.
Section 14, in turn, is a dry and technical report on a sector of the Martian
University and the measures taken by the deployed international team for
camping and its further exploration. The writing at this point, besides being
underprivilegedin number of tokens, lacks dialogues and is loaded with past
tenses and passive constructions. Furthermore, in referencing Baroni and Evert
(2009,p.778)onthepassivization as a cue of formality, the cross-over with the
prior comment on the dryand technicalinformational style along Section 14 is
Whether consulting Giniscoecient for Sections # 12: 0.397, # 13: 0.543, # 14:
0.437, or the repeat rate, # 12: 0.0130, # 13: 0.0112, # 14: 0.0202, the gures reveal
certain conspicuous and anomalousbehaviour nearby Section 13. Although
there is divergence in the way these indicators perform Ginisshowsrelative
lexical richness for # 14, while counter-posed by the repeat rate showing decrease
in richness it may be stated with some condence that Section 13 (or the
circumstances that led to its conception) act/s as a breaking point in the lexical set
up. The discrepancies in the results of the two indexes may be explained by their
dierent focuses whereas RR takes into account the relative frequencies of
words only, Giniscoecient works with the rankfrequency distribution of
them; its elevated value thus indicates a diminished number of especially frequent
words in a text, and a lot of those that occur very rarely. As to Section 13, this is in
line with the nding in the domain of hapax legomena, the proportion of which is
largely present here (almost 41%).
We would tend to reconcile the observations with pauses/breaks that the
author took in the interim. Such pauses might have conditioned a slightly
dierent creative impulse in HBP, or aected him psychologically, with the
result of a discrepant use of vocabulary (see especially Fiebelkorn, Pinsk, &
Kastner (2018) on rhythmic brain cycles and alternating attention-related
Next, a brief comment should also be paid to the gures of ATL.
Although the indicator seems independent of text length, it has been proved
that the two may be interconnected (Zörnig & Místecký, 2018); this is
probably due to the fact that the texts which are lexically richer tend to use
longer words as well. In case of Omnilingual, the highest value of ATL has
been measured in Section 14, which is, in its aforementioned technicality,
prone to the employment of long, scientic expressions. The fact that it also
ranks high in repeat rate may, on the other hand, shake the presupposition
that ATL rises with vocabulary richness, as a genre may play a role in the
matter, too.
As to lambda, the results are not easy to be interpreted; it seems that
there is a passage with high gures (Sections 912), which means that the
distances between neighbouring ranked words are constantly high; this is
broken by Section 13, where the denite article thesubstantially prevails
over all the other words. This part of the novelette focuses on the descrip-
tion of the Martian premises, where objects and events are treated from
dierent viewpoints, and many speculations are made. On the other hand,
the following Section 14 continues in the trend of the part with elevated
lambda gures.
To nish, the text was analysed as to its activity. Here, the core nding is
that most chapters are signicantly active, which may lead to various
interpretations it can be a feature of the Martian subgenres conventions,
of the greater part of the 20th-century science ction, or a matter of Pipers
personal preference. More light will be cast on the issue when the results are
compared to other pieces within the aforementioned domains, or when the
V/A ratio is identied through a dierent method.
4.2. Qualitative Terms
A few observations that may escape the computer-assisted quantitative
analysis follow.
First, if present-day readers have one quibble with Pipers story, that may
regard the words smokingon more than one occasion and having libations
on planet Mars by way of cocktail pitchersand (counterfeit) Martinis.
Admittedly, these catchwordsare not accidental: they served to perk the
atmosphere up and may be adduced to the private baggage of the author
and the time in which he lived. Qualitatively speaking, such lexical choices
function as shibboleths(a peculiarity of speech/writing; cf. Edelstein, 2003,
p. 19; Juola, 2008, pp. 237238), by which inferences on specic stylistic or
broad social habits can be made. Second, in Section 4, while examining the
string of letters of a few words, Martha Dane deduces that when Martians
had needed a new word; they had just pasted a couple of existing words
together.And H. Beam Piper, who is impersonating the main character,
states without delay, It would probably turn out to be a grammatical
horror.Examples from languages the morphology of which is based on
the agglutinating features abound. As highly synthetic languages go,
members of the Finno-Ugric family attach axes to a stem to create many
grammatical forms, e.g. the Hungarian word <legeslegmagasabb>(the very
highest) has the stem magas(tall, high) at its core, the prexlegesleg(in
an exaggerated way), and the suxabb(as a link vowel). Or the German
tongue (of the Indo-European family), e.g. <Bestandsbuchführung> trans-
lated as inventory, with the constituents lining up in that order,
<Bestand>, stock, supplies,<buch>book, and <führung>direction, con-
duct. In this sense, the morphological complexity of Hungarian, and to a
lesser degree of German, might have been somewhat intrusive to Pipers
eyes (a native speaker of English), eventually preconceiving these foreign
Third, in Section 6, through the character of von Ohlmhorst,
Piper voices as a passing detail that Cretan language(i.e. Linear B) was
read until the nding of the Greek-Cretan bilingual in 1953. Considering
the time of writing, and the fact that Piper was not a professional linguist or
an archaeologist, he risked in creating a scientic-like Martian story, and
any risk-taking involves certain mistakes. In May 1953, Carl W. Blegen,
who was excavating at the town of Pylos (Greece), made use of the earlier
suggested sign-list of Michael Ventris to read a freshly found clay tablet,
conrming the correctness of the decipherment (Chadwick, 2000 [1958],
pp. 8184; Robinson, 2002, pp. 14, 100101). Strictly speaking, this parti-
cular Pylos document is not a bilingualtext. It is rather new evidence that
validates independently the hypothesis of M. Ventris regarding an old form
of Greek language underlying the Linear B script. Similarly, in section 16,
the author says through Anthony Lattimer that the decipherment of the
Hittite language(Anatolian hieroglyphs) was done when they found
Hittite-Assyrian bilinguals. This moment rings false as the decipherment
was conrmed through the Hittite[Luwian]-Phoenician bilinguals of
Karatepe hill (in Osmaniye Province, modern Turkey), dated from the
late part of 8th century BCE (see Friedrich, 1957/1971, pp. 98101; Pope,
1975/1999, pp. 141142). It so happens that in the selfsame section the
referenced German epigrapher is under the appellation that distinguished
Hittitologist, Johannes Friedrich.
On the other hand, a plus point observation that commends the
Pennsylvanian author is related to the choice of Martha Dane as the
protagonist of the story a bright, observant, and strong-willed woman.
5. Conclusions
The study attempts to make a corpus-based statistical inspection in order to
extract style-related features of H. Beam PipersOmnilingual. The feature of
relevance of the written text vocabulary richness may shed light on
characteristic traits of Pipers style and/or his socio-psychological back-
ground. It must be borne in mind, however, that style is a complex
phenomenon, and cannot be captured on the basis of a few indicators only
(cf. Tweedie, 2005, p. 390). In addition, given the neurological structure of
the human brain, it is quite improbable that the present research lays bare
the soul(Raleigh, 1897/1904, pp. 126127) of the author. How much of the
writers meta-knowledge i.e. of his psychological and sociological inclina-
tions (cf. Daelemans, 2013)such an investigation reveals thus remains an
open question. Further developments in quantitative methods that accu-
rately correlate with some intuitions, together with cutting-edge break-
throughs in neurosciences and AI, will help in gaining a signicant
advantage on meta-knowledge.
Both statistical indicators, the repeat rate and Giniscoecient, show that
H. Beam Piper (1957) has on the whole an estimable level of vocabulary
richness. The observation suggests that in spite of his wanting academic
training, Piper was an assiduous reader of ction and non-ction literature
(see, for instance, the discrepancy between Sections 13/14 and the rest of the
text). As to MATTR, the piece yields very similar results, as the type-token
ratio seems to be oscillating throughout the text; the only two exceptions are
Sections 4 and 14, the former treating in a repetitive, explanatory manner
the workings of the Martian language, the latter describing the premises of the
local university. What is striking, however, are the results of the activity
analysis, as nearly all sections tend to be verb-infused; the role of the classical,
adjective-based description is thus considerably diminished.
The vocabulary richness in various sections of Omnilingual appears
sensitive to a time axis. The novelette very likely was not written at one
sitting, but over dierent sessions, some more non-linear than the others.
We may largely concede that HBP did not plan in advance the length of the
whole story or that of each chapter (cf. Popescu, Altmann et al., 2009,p.
70). He was an ingenious and spontaneous writer, and was not forced to
create according to xed instructions, or space-constrained norms.
In any event, xing the temporal gap (hours, days, weeks) among ses-
sions, i.e. building a time-structured succession, is far beyond the capability
of the applied quantitative measures. All that said, we only can speculate
about such perceived distances and the reasons behind them.
While the quantitative approach is regarded as a potential discriminant
for meta- knowledge, we are unwilling to dismiss salient qualitative aspects
(cf. also Tuldava, 2005, p. 370). Although the application of quantitative
indicators takes priority in this study, downplaying the importance of
qualitative observations is not advisable for our part. For all practical
purposes, a complementary methodology would have more usefulness;
there are things that quantitative and qualitative approaches can and cannot
do alone (cf. Creswell & Plano Clark, 2007).
Comparative studies involving Omnilingual (Beam Piper, 1957) and other
stories of the author/other authors/dierent languages are scheduled for the
near future. The condition for a comparison can be theoretically satisedif
the range of lexis is near, or comparatively near Pipers, with the inter-
authorial tests made with some vintagesf text. At rst, English as the chosen
language comes in handy, though it would be both interesting and advanta-
geous to explore science ction cross-linguistically, too, e.g. in French, Czech,
German, Italian, Spanish (inectional), or in Finnish, Hungarian (agglutina-
tive), etc. In this vein, the obtained picture may help in checking if the present
results hold true in various languages, or in dening some kind of dierence
as a function of all genre dierences (cf. Popescu et al., 2011,p.49).
From a non-stylometric position, if there is praise for Omnilingual,it
should concern the assumption of science as a universal code of commu-
nication among intelligent cultures. Algis Budrys (1967, p. 168), for exam-
ple, postulates that the translation of an alien dead language by analysing its
periodic table sounds like a perfectly valid proposition, and that archaeol-
ogists (i.e. decipherers) ought to keep this notation on le.
Now, despite the dierent perception, organization, and rationalization of
science by other non-human vectors (cf. e.g. Rescher, 1985), the basic scien-
tic tenet still stands. For instance, the decimal system used to date by many
of the Earths cultures
is due to the fact humans have anatomically 10
ngers. A non-terrestrial society, whose membership evolved under pressure
of natural selectionand happen to have 12 standard ngers, may choose a
duodecimal system of counting, i.e. computing by 12s. At any rate, the
structure or languagespanning these sentient living systems is currently
mathematics, which suggests, in theory, some form of interaction or exchange.
The case in point is as simple and graspable as it can be, though we should be
aware that exploring and conrming these matters are much more complex
(e.g. Ellis, 2004/2005, or Traphagan, 2014, pp. 161162).
1. E.g. B. Gray (1969, p. 7), Few problems in literary scholarship continue to
generate so much endeavor and so much conict as the problem of style.
2. For additional statistical models, reviews, and problems in stylometry,
authorship, and/or forensic linguistics, see Argamon et al. (2007); Bell et al.
(2009); Holmes (1998); Juola (2008); McMenamin (2002); Oakes (2009);
Rudman (1998); Thisted and Efron (1987); Tuldava (2005); Tweedie (2005),
and referenced literature thereof.
3. It needs to be pointed out that the raw count was performed by the QUITA
software (see later in the text), which treats contracted forms as two separate
tokens; for example, hadntis segmented as hadand not. This is why
other token counters may yield slightly dierent results, though the discre-
pancies have not been tested as high (cf. Melka, 2018). It is hard to say that
our way of counting is more correct, or less correct, than those of other
counters; it is, as long as consistently carried out, a possible way in
segmenting and evaluating the English text of Omnilingual (cf. also Popescu
et al., 2011, p. 14).
4. Cf. for instance, Strauß, Grzybek, & Altmann (2006,p. 293) with regard to
undersized samples, short texts have the disadvantage of not allowing a
property to take appropriate shape.
5. Examples include BrunetsW(1978); OrlovsZ(in Orlov & Chitashvili,
1983); SimpsonsDiversity index(1949)/Yulescharacteristic K(1944/2014),
or entropy, as a measure of uniformity (Cover & Thomas, 1991/2006). Yet,
the debate among experts over their real discriminative power is hardly
6. For a summary and literature on the subject, see Esteban and Morales (1995);
cf. also Cover and Thomas (1991/2006); Popescu et al. (2011).
7. M. Kubát (2014,p. 105), however, diers in opinion.
8. On the suggested modication of the coecient, see G. Altmann (1978,
p. 93).
9. The assumed break could have responded to any physical or personal recrea-
tional activity of H. Beam Piper: sleeping, sipping coee/drinking rum, light-
ing up his pipe, hiking for a non-determined period of time, cleaning
rearms, hunting, and so forth (e.g. Anonymous, 1953, p. 7).
10. In The Penssy (Anonymous, 1953,p. 7) is clearly reported that Mr. Beam
Piper used to drink black Jamaica rum at home and light up his pipe with
Serene tobacco, having smoked that brand for the last 30 years.
11. The linguistic bias towards such a complex morphology nds in particular a
humorous expression in Mark Twains(1880) essay The awful German
12. For several dierent systems of counting and historical related trivia, see T.
Dantzig (1930/2005), J. S. Peterssons(1996)Numerical Notation and A.
Robinson (2007).
The online repository sites Project Gutenberg,, and The Library Service
of Parkland College, Champaign, IL. (USA) have been of assistance with several
reference sources.
Disclosure statement
No potential conict of interest was reported by the authors.
Adams, J. N. (2003). Bilingualism and the Latin language. Cambridge, UK:
Cambridge University Press.
Altmann, G. (1978). Zur Anwendung der Quotiente in der Textanalyse [About the
application of the quotient in text analysis]. Glottometrika,1,91106.
Altmann, G. (2018a). Some properties of adjectives in texts. Glottometrics,41,6779.
Altmann, G. (2018b). The nature and hierarchy of Belza chain. Glottometrics,42,7585.
Altmann,G.,&Köhler,R.(2015). Forms and degrees of repetitions in texts: Detection and
Anonymous. (1953, September 7). Typewriter Killer: Altoonas H. Beam Piper.
Watchman Mystery writer, nds job helps plots. The Pennsy,2(9),
Pennsylvania Railroad Company, Philadelphia, PA. Retrieved from http://www.
Antosch, F. (1969). The diagnosis of literary style with the verb-adjective ratio. In L.
Doležel & R. W. Bailey (Eds.), Statistics and style (pp. 5765). New York:
American Elsevier.
Argamon, S., Whitelaw, C., Chase, P., Hota, S. R., Garg, N., & Levitan, S. (2007).
Stylistic text classication using functional lexical features. Journal of the
American Society for Information Science and Technology,58(6), 802822.
Retrieved from
Baayen, R. H. (2001). Word frequency distributions. Text, speech and language
technology, Vol. 18. Ide, N., & Véronis, J. (Series Eds.). Dordrecht, Netherlands:
Kluwer Academic Publishers.
Baker, S. (1966). The complete stylist. New York: Thomas Y. Crowell Company.
Bakker, F. J. (1965). Untersuchungen zur Entwicklung des Aktionsquotienten
[Investigations on the development of actions quotient]. Archiv für die
Gesamte Psychologie,117,78101.
Baroni, M., & Evert, S. (2009). Statistical methods for corpus exploitation. In A.
Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook.
Handbücher zur Sprach- und Kommunikations-wissenschaft/Handbooks of
Linguistics and Communication Science, Band 29/2 (pp. 777802). Berlin:
Mouton de Gruyter.
Beam Piper, H. (1947, April). Time and time again. Astounding Science Fiction,39
(2), Retrieved from
Beam Piper, H. (1953). Murder in the Gunroom. New York: Alfred A. Knopf.
Retrieved from
Beam Piper, H. (1957). Omnilingual. Originally published in Astounding Science
Fiction, 58 (6), February 1957, pp. 846; with cover and interior illustration by
Frank Kelly Freas. Retrieved from
Beam Piper, H. (1962). Little Fuzzy. New York: Avon. Retrieved from http://www.
Beam Piper, H. (1963). Space Viking. New York: Ace Books. Retrieved from https://
Beam Piper, H. (1965). Lord Kalvan of otherwhen. New York: Ace Books. Retrieved
Beam Piper, H. (1981). Federation. Preface by Jerry Pournelle. New York: Ace Books.
Bear, G. (1993). Moving Mars. New York: Tor Books.
Beaugrande, R.-A. de, & Dressler, W. U. (1972/1981). Introduction to text linguis-
tics. R. de Beaugrande, Trans.. London: Longman Group Limited. German
edition © Max Niemeyer Verlag, Tübingen.
Belkin,N.J.,&Robertson,S.E.(1976). Information science and the phenomenon of
information. Journal of the American Society for Information Science,27(4), 197204.
Retrieved from
Bell, E. J. L., Berridge, D., & Rayson, P. (2009). Measuring style with the authorship
ratio: An invariant metric of lexical similarity. Retrieved from http://ucrel.lancs.
Belza, M. I. (1971). K voprosu o nekotorych osobennostjach semanticheskoj struk-
tury svjaznych tekstov [On some features of the semantic structure of coherent
texts]. In Skorokhodko, É. F. (Ed.), Semanticheskie problemy avtomatizacii i
informacionnogo poiska [Semantic problems of automation and information
search] (pp. 5873). Kyiv: Naukova dumka.
Bleiler, E. F. (1990). Science-ction: The early years. Kent, OH: Kent State University
Boder, D. P. (1940). The adjective-verb quotient: A contribution to the psychology
of language. Psychological Record,3, 310343.
Bogdanov, A. (1908/1984). Red Star. Engineer Menni. A Martian stranded on Mars.
Ch. Rougle, Trans. Bloomington and Indianapolis: Indiana University Press.
Retrieved from
Bradbury, R. (1950). The Martian chronicles. New York: Doubleday.
Březina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge, UK:
Cambridge University Press.
Brown, F. (1954). Martians, go home. Illustrations by Freas. In Astounding Science
Fiction (New York), 44(1), 955. Retrieved from
Brunet, E. (1978). Vocabulaire de Jean Giraudoux: Structure et Évolution; Statistique
et Informatique Appliquées à lÉtude des Textes, à partir du Trésor de la Langue
Française. [The vocabulary of Jean Giraudoux: Structure and evolution; statistics
and informatics applied to the study of texts, based on the thesaurus of the
French language]. Paris: Slatkine.
Budrys, A. (1967). Great science ction stories about Mars, T. E. Dikty (Ed.).
Review by Algis Budrys. In F. Pohl (Ed.). Galaxy bookshelf; cover by Douglas
Chaee. Galaxy Magazine, April 1967, 25(4),166169. New York: Galaxy
Publishing Corporation.
Bunge, M. A. (1983). Treatise on basic philosophy. Vol. 6. Epistemology and meth-
odology II: Understanding the world. Dordrecht/Boston/Lancaster: D. Reidel
Publishing Company/Kluwer Academic Publishers Group.
Burroughs, E. R. (1912/1917). A princess of Mars [Original title, Under the moons
of Mars]. Chicago, IL: A. C. McClurg & Co. Retrieved from https://www.guten
Busemann, A. (1925). Die Sprache der Jugend als Ausdruck der
Entwicklungsrhythmik [Youths speech as an imprint of the rhythm of develop-
ment]. Jena: Fischer.
Campanile, E., Cardona, G. R., & Lazzeroni, R., Eds. (1988). Bilinguismo e
biculturalismo nel mondo antico. Atti del Colloquio interdisciplinare tenuto a
Pisa il 28 e 29 settembre 1987. Testi Linguistici 13. Pisa: Giardini Editori e
Carr, J. F. (2008). H. Beam Piper: A biography. Series Editors, Palumbo, D. E. &
Sullivan III, C. W. Critical Explorations in Science Fiction and Fantasy, 8.
Jeerson, NC: McFarland & Company, Inc.
Carr, M. H., & Head, J. W. (2010). Acquisition and history of water on Mars. In N.
A. Cabrol & E. A. Grin (Eds.), Lakes on Mars (pp. 3167). Amsterdam: Elsevier
Science. Retrieved from
Carter, R., & Simpson, P. (Eds.). (1989). Language, discourse and literature: An
introductory reader in discourse stylistics. London: Unwin Hyman, Ltd.
Čech, R., Popescu, -I.-I., & Altmann, G. (2014). Metody kvantitativní analýzy
(nejen) básnických textů[Methods of quantitative analysis of (not only) poetic
texts]. Olomouc: Univerzita Palackého v Olomouci.
Ceriani, L., & Verme, P. (2012). The origins of the Gini index: Extracts from
Variabilità e Mutabilità (1912) by Corrado Gini. The Journal of Economic
Inequality (Springer),10(3), 421443.
Chadwick, J. (1958/2000). The decipherment of linear B. Cambridge: The Press
Syndicate of the Cambridge University.
Chatman, S. B. (Ed.). (1971). Literary style: A symposium. London & New York:
Oxford University Press.
Chen, R., & Altmann, G. (2015). Conceptual inertia in texts. Glottometrics,30,7388.
Clarke, A. C. (1951). The sands of Mars. London: Sidgwick & Johnson.
COCA (2018). Word frequency data Corpus of Contemporary American English
(COCA). Retrieved from
Coleridge, S. (1914 [1907, 1818]). Coleridges essays & lectures on Shakspeare &
some other old poets & dramatists. London: J. M. Dent & Sons/New York: E. P.
Dutton & Co. Retrieved from
Cooper, L. (1907/1930). Theories of style, with especial reference to prose composi-
tion. New York: The Macmillan Company. Retrieved from
Cover, T. M., & Thomas, J. A. (1991/2006). Elements of information theory. New
York: John Wiley & Sons, Inc. Retrieved from
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian Knot: The Moving
Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics,17(2),
94100. Retrieved from
Creswell, J. W., & Plano Clark, V. (2007). Designing and conducting mixed methods
research. Thousand Oaks, CA: Sage Publications, Inc.
Daelemans, W. (2013). Explanation in computational stylometry. In A. Gelbukh (Ed.),
Computational linguistics and intelligent text processing. CICLing 2013. Lecture notes
in computer science, 7817 (pp. 451464). Berlin, Heidelberg: Springer. Retrieved from
Damgaard, C., & Weiner, J. (2000). Describing inequality in plant size or fecundity.
Ecology,81, 11391142. Retrieved from
Daniels, P. T. (1996). Methods of decipherment. In P. T. Daniels & W. Bright
(Eds.), The worlds writing systems (pp. 141159). New York: Oxford University
Daniels, P. T., & Bright, W. (Eds.). (1996). The worlds writing system. Oxford, NY:
Oxford University Press.
Dantzig, T. (1930/2005). Number: The language of science. J. Mazur (Ed.), The
Masterpiece Science Edition. New York: Pi Press/An imprint of Pearson
Education, Inc
Davies, P. (1995). Are we alone?: Philosophical implications of the discovery of
extraterrestrial life. New York: Basic Books/Harper Collins Publishers.
Dick, P. K. (1964). Martian time-slip. New York: Ballantine Books/Random House.
Dick, S. J. (1998). Life on other worlds: The 20th century extraterrestrial debate.
Cambridge, UK: Cambridge University Press.
Dijk, T. A. V. (1977/1992). Text and context. Sixth Impression. London and New
York: Longman Group UK Limited.
Disch, T. M. (1976). Echo round his bones. New York: Berkley Medallion Books/
Penguin Group.
Dixon, R. M. W. (1991/2005). A semantic approach to English grammar. Revised
and enlarged second edition. Oxford Textbooks in Linguistics. Oxford: Oxford
University Press.
Dontcheva-Navratilova, O., Jančaříková, R., Miššíková, G., & Povolná, R. (2017).
Coherence and cohesion in English discourse. Brno: Masaryk University Press.
Drake, F., & Sobel, D. (1992). Is anyone out there? The scientic search for extra-
terrestrial intelligence. New York: Delacorte Press.
Eckert, P., & Rickford, J. R. (Eds.). (2001). Style and sociolinguistic variation.
Cambridge, UK: Cambridge University Press.
Edelstein, S. (2003). Dubious doublets: A delightful compendium of unlikely word
pairs of common origin. Hoboken, NJ: John Wiley & Sons, Inc.
Eder, M. (2010/2013). Does size matter? Authorship attribution, small samples, big
problem. Literary and Linguistic Computing,30(2), 167182. Based on a previous
draft DH 2010: DIGITAL HUMANITIES, Conference Abstracts. Kings College
London, pp. 132135. Retrieved from
Ellis, G. F. R. (2004/2005). True complexity and its associated ontology. In J. D.
Barrow, P. C. W. Davies, & C. L. Harper Jr. (Eds.), Science and ultimate reality:
Quantum theory, cosmology, and complexity (pp. 607636). Cambridge:
Cambridge University Press.
Engdahl, S., Ed. (2001/2006). Extraterrestrial life. Contemporary Issues
Companion. Detroit: Greenhaven Press/An imprint of Thomson Gale.
Esteban, M. D., & Morales, D. (1995). A summary of entropy statistics. Kybernetica,
31(4), 337346.
Fergus, C. (2013, May 1). Beyond Earth: Mars fever. Penn State News. Pennsylvania
State University. Retrieved from
Fiebelkorn, I. C., Pinsk, M. A., & Kastner, S. (2018). A dynamic interplay within the
frontoparietal network underlies rhythmicspatial attention. Neuron,99(4), 842853.
Fortis, J.-M., & Fagard, B. (2010). Space in language. Part IV: Adnominals.
Adnominals: Topological-functional adpositions, spatial phrases and spatial
cases. DGfS-CNRS Summer School on Linguistic Typology. Leipzig, August
15September 3, 2010. Retrieved from
Freeman, D. C. (Ed.). (1970). Linguistics and literary style. New York: Holt,
Rinehart and Winston, Inc.
Friedrich, J. (1957/1971). Extinct languages. F. Gaynor, Trans. Westport, CT:
Greenwood Press, Publishers.
Frye, N. (1963). The well-tempered critic. Bloomington, IN: Indiana University Press.
Gastwirth, J. L. (2017). Is the Gini index of inequality overly sensitive to changes in
the middle of the income distribution? Statistics and Public Policy,4(1), 111.
Gelb, I. J., & Whiting, R. M. (1975). Methods of decipherment. Journal of the Royal
Asiatic Society of Great Britain and Ireland,107,95104.
Gini, C. (1921). Measurement of inequality of incomes. The Economic Journal,31
(121), 124126. Retrieved from
Gissing, G. (2016 [2008, 1891]). New Grub Street (Vol. 3, 2nd ed.). London: Smith, Elder,
Golomb, S. W. (1961/1968). Extraterrestrial linguistics. Word Ways,1(4/5), 202
205. Retrieved from
Gombrich, E. H. (1982). The image and the eye: Further studies in the psychology of
pictorial representation. Ithaca, NY: Cornell University Press/Phaidon Books.
Gordon, C. H. (1968/1987). Forgotten scripts. New York: Dorset Press/Marboro
Books Co.
Gray, B. (1969). Style: The problem and its solution. The Hague: Mouton.
Greg, P. (1880). Across the zodiac: The story of a wrecked record. London: Trübner
& Co. Retrieved from
Grieve, J. W. (2005). Quantitative authorship attribution: A history and an evalua-
tion of techniques (Masters Thesis). Simon Fraser University, Burnaby, BC,
Canada. Retrieved from
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Routledge.
Harrison,A.A.(2014). Speaking for Earth: Projecting cultural values across deep space
and time. In D. A. Vakoch (Ed.), Archaeology, anthropology,andinterstellarcommu-
nication. The NASA History Series (pp. 175191). Washington, DC: Oce of
Communications, National Aeronautics and Space Administration.
Hawkes, T. (1977/2004). Structuralism and semiotics. London: Routledge/Taylor &
Francis Group.
Hawkins, J. D., & Morpurgo Davies, A. (1978). On the problems of Karatepe: The
hieroglyphic text. Anatolian Studies (British Institute at Ankara),28, 103119.
Retrieved from
Heidmann, J. (1995). Extraterrestrial intelligence (2nd ed.). Cambridge: Cambridge
University Press.
Heinlein, R. (1961). Stranger in a strange land. New York: G. P. Putnams Sons.
Helbig, G., & Schenkel, W. (2011 [1991, 1969]). Wörterbuch zur Valenz und
Distribution deutscher Verben [Lexicon of the valency and distribution of
German verbs]. Berlin: de Gruyter.
Herbst, T., Heath, D., Roe, I. F., & Götz, D. (2004). A valency dictionary of English:
A corpus based analysis of the complementation patterns of English verbs, nouns
and adjectives. Berlin: Mouton de Gruyter.
Herdan, G. (1964). Quantitative linguistics. London: Butterworths.
Hines, D. (2002). H. Beam Piper. Retrieved from
Holmes, D. (1998). The evolution of stylometry in humanities scholarship. Literary
and Linguistic Computing,13, 111117.
Jackson, H., & Amvela, E. Z. (2000). Words, meaning and vocabulary: An introduc-
tion to modern English lexicology. London: A & C Black.
Johnson, K. (2008). Quantitative methods in linguistics. Malden, MA: Blackwell.
Juola, P. (2008). Author attribution. Foundations and Trends® in Information
Retrieval,I(3), 233334. Retrieved from
Kelih, E., Rovenchak, A., & Buk, S. (2014). Analyzing h-point in lemmatized and
non-lemmatized texts. In G. Altmann, R. Čech, I. Mačutek, & L. Uhlířová (Eds.),
Empirical approaches to text and language analyses Dedicated to LuděkHřebíček
on the occasion of his 80th birthday (pp. 8194). Lüdenscheid: RAM-Verlag.
Khrennikov, A. Y. (2014). Cognitive processes of the brain: An ultrametric model of
information dynamics in unconsciousness. P-Adic Numbers, Ultrametric
Analysis, and Applications,6(4), 293302.
Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2008). A large-scale classica-
tion of English verbs. Language Resources and Evaluation Journal,42,2140.
Retrieved from
Knight, K., & Sproat, R. (2009). Writing systems, transliteration and decipherment.
Retrieved from
Köhler, R. (2005a). Synergetic linguistics. In R. Köhler,G.Altmann,&R.G.Piotrowski
(Eds.), Quantitative Linguistik/Quantitative linguistics: Ein Internationales Handbuch/
An international handbook. Handbücher zur Sprach- und Kommunikations-wis-
senschaft, Band 27 (pp. 760774). Berlin: Walter de Gruyter.
Köhler,R.(2005b). Quantitative Untersuchungen zur Valenz deutscher Verben
[Quantitative investigations on the valency of German verbs]. Glottometrics,9,1320.
Köhler, R., & Galle, M. (1993). Dynamic aspects of text characteristics. In L.
Hřebíček & G. Altmann (Eds.), Quantitative text analysis. Quantitative
Linguistics, 52 (pp. 4653). Trier: Wissenschaftlicher Verlag Trier.
Kramer,M.(2014, December 16). Curiosity Rover drills into Mars rock, nds
com, Retrieved from
Kubát, M. (2014). Moving window type-token ratio and text length. In G. Altmann,
R. Čech, I. Mačutek, & L. Uhlířová (Eds.), Empirical approaches to text and
language analyses dedicated to LuděkHřebíček on the occasion of his 80th
birthday (pp. 105114). Lüdenscheid: RAM-Verlag.
Kubát, M. (2016). Kvantitativní analýza žánrů[Quantitative analysis of genres].
Ostrava: FF OU.
Kubát, M., Matlach, V., & Čech, R. (2014). QUITA Quantitative index text
analyzer. Lüdenscheid: RAM-Verlag. Retrieved from
Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of
Quantitative Linguistics,20(4), 339349.
LancsBox (2018). Lancaster University corpus toolbox. © Vaclav Březina, Lancaster
University. Retrieved from
Landis, G. A. (2000). Mars crossing. New York: Tor Books.
Lasswitz, K. (1897/1971). Auf Zwei Planeten [Two planets]. Weimar: Emil Felber.
(H. H. Rudnick, Trans.). Carbondale: Southern Illinois University Press.
Retrieved from
Leech, G., Rayson, P., & Wilson, A. (2001). Word frequencies in written and spoken
English: Based on the British National Corpus. London: Longman.
Levickij, V., & Lučak, M. (2005). Category of tense and verb semantics in the
English language. Journal of Quantitative Linguistics,12(23), 212238.
Lyons, J. (1977/1996). Semantics. Vol. 1. Cambridge, UK: Cambridge University Press.
Malvern, D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical diversity and
language development: Quantication and assessment. Basingstoke, UK: Palgrave
McAuley, P. J. (1994). Red dust. New York: Avon/HarperCollins.
McIntosh, R. P. (1967). An index of diversity and the relation of certain concepts to
diversity. Ecology,48(3), 392404. Retrieved from
McMenamin, G. R. (2002). Forensic linguistics: Advances in forensic stylistics. Boca
Ratón, FL: CRC Press LLC.
Melka, T. S. (2018). Stylistic study of Omnilingual by H. Beam Piper. Glottometrics,
Michaud, M. A. G. (2007). Contact with alien civilizations: Our hopes and fears
about encountering extraterrestrials. New York: Copernicus Books/Springer
Science + Business Media LLC.
Místecký, M. (2018). Belza chains in MacharsLetní sonety.Glottometrics,41,4656.
Místecký, M., Yiang, J., & Altmann, G. (2018). Belza chain analysis: Weighting
elements. Glottometrics,43,6876.
Nekula, M. (2002). Text. In P. Karlík, M. Nekula, & J. Pleskalová (Eds.),
Encyklopedický slovník češtiny [New encyclopaedic dictionary of Czech language]
(pp. 489). Praha: Nakladatelství Lidové noviny.
OConnor, M. (1996). The Berber scripts. In P. T. Daniels & W. Bright (Eds.), The
worlds writing systems (pp. 112116). New York: Oxford University Press.
Oakes, M. P. (2009). Corpus linguistics and stylometry. In A. Lüdeling & M. Kytö
(Eds.), Corpus linguistics: An international handbook. Handbücher zur Sprach- und
Kommunikations-wissenschaft/Handbooks of Linguistics and Communication
Science, Band 29/2 (pp. 10701090). Berlin: Mouton de Gruyter.
Orlov,J.K.,&Chitashvili,R.Y.(1983). Generalized z-distribution generating the well-
known rank-distributions. BulletinoftheAcademyofSciences,Georgia,110(2), 269
Pan, X., & Liu, H. (2014). Adnominal constructions in Modern Chinese and their
distribution properties. Glottometrics,29,130.
Parkinson, R. B. (1999). Cracking codes: The Rosetta stone and decipherment.
Berkeley: The University of California Press.
Petersson, J. S. (1996). Numerical notation. In P. T. Daniels & W. Bright (Eds.), The
worlds writing systems (pp. 795806). New York: Oxford University Press.
Pohl, F. (1976). Man plus. New York: Random House.
Pope, M. (1975/1999). The story of decipherment: From Egyptian hieroglyphs to
Maya script. Rev. ed. London: Thames & Hudson.
Popescu, -I.-I., & Altmann, G. (2006). Some aspects of word frequencies. Glottometrics,
Popescu, -I.-I., & Altmann, G. (2007). Writers view of text generation. Glottometrics,
Popescu, I.-I, & Altmann, G. (2015). A simplied lambda indicator in text analysis.
Popescu, -I.-I., Altmann, G., Grzybek., P., Jayaram, B. D., Köhler, R., Krupa, V., . . .
Vidya, M. N. (2009). Word frequency studies. Berlin: Mouton de Gruyter.
Popescu, -I.-I., Čech, R., & Altmann, G. (2011). The lambda-structure of texts.
Studies in Quantitative Linguistics 10. Lüdenscheid: RAM-Verlag.
Popescu, -I.-I., Mačutek, J., & Altmann, G. (2009). Aspects of word frequencies.
Studies in Quantitative Linguistics 3. Lüdenscheid: RAM-Verlag.
Pratchett, T., & Baxter, S. (2014). The long Mars. Series The Long Earth. New York:
Raber, D. (2003). The problem of information. Library and Information Science.
Lanham, MD: Scarecrow Press, Inc.
Raleigh, W. (1897/1904). Style. Fifth Impression. London: Edward Arnold. Retrieved
Ray, J. (2007). The Rosetta stone and the rebirth of ancient Egypt. London: Prole.
Rescher, N. (1985). Extraterrestrial science. In E. Regis Jr. (Ed.), Extraterrestrials: Science
and alien intelligence (pp. 83116). Cambridge: Cambridge University Press.
Robinson, A. (2002). Lost lang