Written Norwegian comes in two standards, Bokmål and Nynorsk, each allowingmuch variation in stem forms and inflections. Actual texts show intricateinterdependencies among the alternative options. This chapter reports on the use of correspondence analysis and implicational analysis applied to a newspaper corpus in order to chart the patterns of variation. The aim is to test the utility of the method itself, and the study therefore concentrates on a small selection of phenomena. The long-term aim of a future extended project is to monitor actual usage in relation to the official standards, thereby identifying possible emerging subvarieties. This will be of value in the maintenance of the official written standards. The results indicate that the approach can yield valuableinsights.
Norm clusters in written Norwegian
Helge Dyvik
University of Bergen
Written Norwegian comes in two standards, Bokmål and Nynorsk, each
allowingmuch variation in stem forms and inections. Actual texts show
intricateinterdependencies among the alternative options. is chapter reports
on the use of correspondence analysis and implicational analysis applied to a
newspaper corpus in order to chart the patterns of variation. e aim is to test
the utility of the method itself, and the study therefore concentrates on a small
selection of phenomena. e long-term aim of a future extended project is to
monitor actual usage in relation to the ocial standards, thereby identifying
possible emerging subvarieties. is will be of value in the maintenance of
the ocial written standards. e results indicate that the approach can yield
1.  Introduction1
e two written standards for Norwegian, Bokmål and Nynorsk, are linguistically
exceptional, not only because there are two of them, but also because each of them
allows an unusual amount of variation among alternative stem forms2 and inec-
tional endings. is is a result of the ocial language policy of the 20th century,
where the original idea was to pave the way for the merger of the two standards by
allowing extensive variation while excluding many traditional forms. is policy
failed and has been abandoned, and much of the variation that was allowed in
the ocial norms, but never taken up in actual usage, has been removed. But
still the policy has le its mark on the language as it is actually written, in the
form of a fair amount of variation. e variation is correlated in rather complex
1.  I am grateful to the Norwegian Language Council and the Faculty of Humanities, Univer-
sity of Bergen, for funding the research reported here. About 90% of the funding came from
the Language Council.
.  e variations in stem forms, such as hånd vs. hand ‘hand’, are frequently referred to as
‘spelling variants’. is is slightly misleading since they are not alternative ways of spelling the
same sequence of phonemes, but rather capture alternative spoken variants.
1 Helge Dyvik
ways with stylistic and sociolinguistic variables, e.g. on a scale from traditional’
or ‘moderate’ to ‘radical’, or on a scale from ‘high style’ to ‘folksy’. e choice
between those two scales for classication may also be politically loaded, since
some would claim that ‘radical’ forms should not be seen as conned to ‘folksy’
style. Also, the stylistic perceptions change over time; thus, to some the stylistic
eect of extensive use of ‘radical’ forms in Bokmål today (e.g. extensive feminine
gender), at least in texts of an abstract, discursive nature, may be to make the
text look quaint and oldfashioned rather than modern, reminiscent of the social-
democratic, reform-optimistic ies.
Present language policy is to base the ocial norms for Bokmål and
Nynorsk, not on a future goal, but on developments in observed usage, which
hence needs to be systematically examined. One important dierence between
the ocial, prescribed norms and the operative norms of actual usage is that
while the former impose few constraints on how alternative forms are combined
in a text, actual usage displays intricate patterns of dependency among the alter-
natives.3 ese dependencies frequently seem to take the form of unilateral or
bilateral implications. For example, one dimension of variation in Bokmål con-
cerns the past and past participle endings in a subclass of weak verbs, which
may be either-et (traditional, moderate) or -a (radical): kastet or kasta ‘threw’,
‘thrown. Similarly, while Bokmål has three grammatical genders, a two-gender
option is possible, since all feminine nouns may optionally (and individually)
be masculine. Apart from agreement, gender is reected in the ending of the
denite singular. us a noun like hytte ‘cabin’ in the denite singular may have
either the form hytten masc. (traditional, moderate) or hytta fem. (radical, or in
the case of this word, neutral).4 A Bokmål dictionary describes these options on
a word-by-word level, thereby imposing no constraints on the combination of
choices across lemmas in a text. However, in actual usage the choice of a-forms
.  In practical teaching of the norm, however, there tends to be a strong emphasis on
conveying information about a number of such dependencies, in order to achieve consistency
within a subnorm.
.  Since gender per definition is a matter of agreement, the use of the ending -a does not
imply feminine gender with absolute necessity. ere is a common variety of Bokmål where-a
occurs on some nouns, while there is no masculine/feminine distinction in the agreeing forms
(e.g. the indefinite article en(m) in en jente ‘a girl’ occurs together with jenta ‘the girl’), and con-
structions that would force such agreement are avoided (e.g. postposed possessive in phrases
like jenta mi(f) ‘my girl’, min(m) jente with preposed possessive being used instead). In this
variety the -en/-a variation is simply a case of allomorphy within common gender. However,
this fact does not reduce the value of the -en/-a variation as an indicator of subnorm. For
simplicity, we will continue to refer to the -a-forms as ‘feminines’ in this chapter.
Norm clusters in written Norwegian 1
of weak verbs is generally perceived as more radical than the choice of feminine
forms of most nouns. A pattern resulting from this is that a text using the form
kasta will very probably use the feminine form hytta as well, while hytta is per-
ceived as stylistically compatible with both kasta and kastet. If this turns out to
be the actual pattern, then there is a unilateral implication between kasta and
hytta: in the domain of Bokmål texts, kasta implies hytta, but not vice versa.
Conversely, the more traditional masculine hytten will then imply the tradi-
tional kastet, but not vice versa. In a pattern like this, then, the implying, mutu-
ally exclusive, forms – kasta and hytten – emerge as more ‘marked’ or ‘special’,
while the implied forms – kastet and hytta – emerge as more ‘unmarked’ or
‘neutral’, typical of more texts.
e research questions addressed in the present chapter concern the extent
to which the form choices in actual texts display implicational patterns of this
kind, and, as a corollary, the extent to which more or less well-dened subvariet-
ies of Bokmål and Nynorsk may be identied.5 Among earlier attempts to specify
the properties of subvarieties of Bokmål is the work of Koenraad De Smedt and
Victoria Rosén (Rosén & De Smedt 2000, De Smedt & Rosén 2000) in connec-
tion with the SCARRIE proofreading project. eir experimental system allowed
users to choose among ve subvarieties, thus constraining the set of proposals
made by the system. While De Smedt and Rosén’s individuation of subvarieties
was informant-based, the aim of the present project is to derive information about
subvarieties from corpus data.
.  A pilot study
A ‘morphosyntactic word’ (MSW) is taken to be a combination of a lemma form
and a set of values of morphological categories which the lemma can realize. us,
hytte +Noun +Def +Sg is the MSW which can be realized as either hytta or
hytten in Bokmål. A MSW with more than one possible realization may be called
a ‘variable MSW’. Each variable MSW may be seen as a ‘dimension’ along which
texts may vary (the ordering of forms within each dimension is not important –
it may be taken to be alphabetical). e ocial norms for Bokmål and Nynorsk
(in conjunction with possible non-ocial forms in actual usage) will then dene
a multidimensional space, and a given text will be located as a point within the
.  e phenomena studied here are confined to cases of variation in inflectional endings,
although the approach is easily extended to form variation in general, such as the variation
between the stem forms hånd and hand h an d’, et c .
1 Helge Dyvik
subspace dened by the variable MSWs occurring within it (or as a set of points if
at least one variable MSW is inconsistently realized).6
Since the ocial norms involve few constraints on the combination of pos-
sible forms within a text, they suggest as a null hypothesis that texts will spread out
approximately evenly throughout the space. However, experience tells us that this
is not the case, as indicated above. To the extent that implicational relations of the
exemplied kind hold, certain combinations of forms will not occur, thus leaving
the space empty in certain regions, e.g. in the region where the forms hytten and
kasta are combined. Consequently our expectation is that the texts will tend to
form more or less clearly dened clusters within the space. I will refer to such
clusters as ‘norm clusters. Relatively clearly dened norm clusters within the total
text universe will then approach the status of emerging subvarieties of Bokmål
and Nynorsk. Hence empirically based charting of norm clusters will provide
important information for further work towards the standardization of written
I have conducted a pilot study based on newspaper texts in order to inves-
tigate the viability of an approach along the lines suggested. In the preliminary
investigation reported here I concentrate on a small set of linguistic phenom-
ena, subjecting the extracted material to two kinds of analysis: correspondence
analysis in order to visualize the clustering of forms and texts, and a more direct
analysis of the implicational relations among the forms, visualized as directed
.  e texts
e pilot study is based on texts extracted from the web editions of four
Bokmål and six Nynorsk newspapers.7 Two newspapers, Bergens Tidende and
.  A multidimensional space may be an unaccustomed concept to some readers. Since three
dimensions are as many as we can easily visualize, we may exemplify with an imagined cube
c whose breadth dimension has the points handa – handen – hånda – hånden, whose depth
dimension has the points framtida – framtiden – fremtida – fremtiden, and whose height
dimension has the points kona – konen. A text with the forms framtida, hånda, kona will then
correspond to a given point within c, a text with fremtiden, hånden, kona to another point, etc.
while we will expect certain other locations in c not to be filled by any texts, e.g. the location
of the combination framtida, handen, konen.
.  I am indebted to Knut Hofland, Uni Digital, for collecting, structuring and morphologi-
cally tagging the corpus, and for organizing it into texts according to the criteria described
be l ow.
Norm clusters in written Norwegian 1
Klassekampen, occur in both lists because they contain articles in both language
varieties.8 e Bokmål newspapers are:
Aenposten, an Oslo-based daily newspaper of nation-wide distribution, and
one of Norway’s two largest newspapers in terms of circulation.
Bergens Tidende, a Bergen-based daily newspaper and the major newspaper in
Western Norway, with about 35% of Aenposten’s circulation.
Dagbladet, an Oslo-based daily newspaper of nation-wide distribution and
about 45% of Aenpostens circulation.
Klassekampen, an Oslo-based daily newspaper of nation-wide distribution
and around 5% of Aenposten’s circulation.
e Nynorsk newspapers are:
Bergens Tidende (see above).
Dag og Tid, an Oslo-based weekly newspaper of nation-wide distribution and
around 3% of Aenpostens circulation.
Hallingdølen, a local newspaper published in Ål in Hallingdal, with about 4%
of Aenposten’s circulation.
Klassekampen (see above).
Nationen, an Oslo-based daily newspaper of nation-wide distribution and
around 6% of Aenpostens circulation.
Sogn Avis, a local newspaper published in Leikanger in Sogn, with about 4%
of Aenposten’s circulation.
e higher number of Nynorsk newspapers was chosen in order to compensate to
some extent for the smaller text volume in the Nynorsk sources. According to the
criteria specied below a selection of 55,000 Bokmål articles and 35,000 Nynorsk
articles was made, yielding a Bokmål corpus of 22 million words and a Nynorsk
corpus of 12 million words. e articles are taken from the period between the
years 2000 and 2009, but the overwhelming majority is from 2008 and 2009.
As the aim of the project is to study the clustering of texts in the space
dened by the variable MSWs, an appropriate ‘text’ concept had to be dened.
e individual newspaper article is too small to yield sucient data for informa-
tive comparison with other texts. erefore, based on the assumption that the
two most important determinants of the form options chosen are the journalist/
author and the newspaper, a ‘text’ is dened for present purposes as the sum of
.  is is also true of Nationen, although only the Nynorsk parts have been sampled in this
1 Helge Dyvik
what a given author has written in a given newspaper. For each newspaper (or
for the Bokmål & Nynorsk parts of the paper, respectively, in the case of newspa-
pers using both Bokmål & Nynorsk) the 20 most productive authors were then
selected, yielding a text inventory of 80 Bokmål and 120 Nynorsk texts. ese
are the texts that constitute the corpus of 22 + 12 = 34 million words mentioned
above. In the graphs each text is coded by an index identifying the author, with
the initials of the newspaper suxed. us, ‘71KK’ is the text consisting of what
author no. 71 has written in the Klassekampen part of the corpus. e texts were
automatically tagged with lemma forms and morphological categories by means
of the Oslo-Bergen Tagger.9
.  Data extraction and processing
e lexical database Norsk Ordbank, in which (ocially and non-ocially) pos-
sible inectional forms are registered, allows the automatic identication of vari-
able MSWs in a text. Based on this information, for each phenomenon studied
all forms of variable MSWs in each text – i.e. all word forms that could have been
dierent according to Norsk Ordbank – were registered in a table with the text ids
along one axis and the forms along the other.10 Distances between texts based on
the dierences between their rows of forms, and distances between forms based
on the dierences between their columns of texts, could now be calculated by
means of correspondence analysis, of which the implementation in the program
R (Baayen 2008: 128 .) was used, or more precisely the method corres.fnc in the
module languageR.11 In correspondence analysis the row and column maps (i.e.
the plottings of texts and word forms, respectively) are superimposed on each
other. e result is represented as a common multidimensional space in which
both texts and word forms are distributed. e necessarily two-dimensional
diagrams below show the projection of this space on its two most informative
dimensions, i.e. the two dimensions responsible for most of the information
about distances within the multidimensional space. e most common forms,
typical of most texts, will tend to occur close to the centre of the diagram, while
more rare forms tend to occur more peripherally, and near opposite edges to
.  See Johannesen et al. this volume.
1.  Second elements of compounds that are listed in the lexical resources are also included
in the data.
11.  I am indebted to Øystein Reigem, Uni Digital, and Christer Johansson, University of
Bergen, for help in carrying out the correspondence analysis.
Norm clusters in written Norwegian 1
the extent that they tend to be mutually exclusive. erefore, as we move from a
peripheral form towards the centre, the sequence of forms which we pass on our
way roughly suggests an implicational relation holding between peripheral, rare
forms and more central and widespread forms, which are ‘implied’ in the sense
that choice of a more peripheral form (usually) implies the choice of a more cen-
tral form along the same line, but not vice versa.
Several caveats are in order here. In the rst place, it is important to bear in
mind that much information about distances may be hidden in non-projected
dimensions, somewhat like the actual distances between the visible stars in a
night sky. In the second place, the dimensions of the space calculated are the
result of the correspondence analysis’ attempt to structure the information
about distances as simply as possible, which means that the dimensions do not
correspond exactly to the imagined dimensions dened by the MSWs, as briey
discussed in 2 above. In the third place, and as a corollary of this, the clustering
of forms and texts will not only be inuenced by the strictly linguistic param-
eters of choice among a set of alternative inectional forms, but also by corre-
sponding vocabulary: texts with a shared topic, and hence a shared vocabulary,
will tend to be closer together than texts with dierent topics, other things being
equal. However, the latter problem is to some extent counteracted by limiting
the analysis to high-frequency items.
Since the positions of the forms in the correspondence analysis diagrams only
give rough hints about implicational relations among the forms, primarily because
dierences in vocabulary across texts also inuence the plots of texts and forms,
some of the implicational relations (IRs) have in addition been subjected to a more
precise analysis whose results are represented as directed graphs.12 e IR-analysis
is based on the following normalizations and assumptions:
1. Quite a few texts are ‘inconsistent’ in the sense that for some MSW, more than
one of its alternative forms occur in the text. is may partly be the result
of tagging errors, partly the result of quotations in the text, and partly real
inconsistencies. In such cases the alternatives tend to have clearly dierent
frequencies of occurrence, and I have therefore made the simplifying move of
disregarding the form with the lowest frequency of occurrence in such cases,
classifying the texts according to the most frequent form expressing a given
MSW. us, an ‘a-text’ will be a text in which a is the most frequent expression
of the relevant MSW.
1.  I am indebted to Øystein Reigem, Uni Digital, for implementing the implication analysis
and its graphics according to gradually developing specifications.
 Helge Dyvik
2. A form a is said to imply a form b if all a-texts which contain some form of
the MSW of which b is a possible expression, are also b-texts. For example,
consider MSW1 hytte +Noun +Def +Sg (‘cabin’) with the possible
forms hytten masc. and hytta fem., and MSW2 gate +Noun +Def +Sg
(‘street’) with the possible forms gaten masc. and gata fem. hytten is said
to imply gaten if and only if all hytten-texts containing MSW2 are also
gaten-texts. is is compatible with the converse not holding, i.e. that some
gaten-texts which also contain MSW1 are hytta-texts. If so, the implication is
unilateral; if not, it is bilateral.
3. I assume that the implicational relation is transitive, i.e. that if a implies b and
b implies c, then a implies c. is extends the relation to form pairs which
never occur together in the same text in the corpus, and it is hence somewhat
risky and probably leads to some spurious properties of the graphs, in par-
ticular given the sparseness of some of the relevant data. Nevertheless I nd
reason to assume that the graphs generally give a valid picture of the situation.
It should be stressed that these normalizations and assumptions apply to the
IR-analysis only, and not to the correspondence analysis.
.  Case I: Feminine nouns in Bokmål
I rst consider the choice of feminine vs. masculine gender13 for the set of Bokmål
nouns allowing this option. is set comprises all nouns that may be feminine in
the language, i.e. feminine gender is optional in Bokmål. From this set I select the
subset whose lemmas have a frequency of 500 or above in the 22 mill word Bokmål
part of the corpus. is yields an inventory of 60 lemmas.
.1  Correspondence analysis
Figure 1, in which the individual items are too small to be readable, is included
in order to show the shape of the ‘cloud’ resulting from a correspondence analysis
of the varying denite singular forms of the 60 nouns and the texts in which they
occur. e two maximally informative dimensions are shown in Figure 1, but
they give only 33.9% of the total information about distances in the multidimen-
sional space (x-axis: 21%, y-axis: 12.9%). Still, some clear patterns emerge. As later
gures zooming in on parts of the plot show, the radical feminine a-forms distrib-
ute towards the le in the plot, while the moderate masculine en-forms distribute
1.  See Footnote4 above.
Norm clusters in written Norwegian 1
towards the right. Furthermore, with some overlaps we nd the texts from the four
newspapers distributed from le to right in the order Klassekampen – Dagbladet –
Aenposten – Bergens Tidende.
Figure 1. e ‘cloud’ displaying the distribution of masc. and fem. noun forms
e pointed shape of the ‘cloud’ towards the right is noteworthy. Dense
clusters occur when relatively many texts share a number of choices; this is the
intended meaning of the term ‘norm cluster’. Choices which less consistently
are correlated with other choices across texts lead to a more diuse distribu-
tion. A claim which is sometimes made is that moderate Bokmål is more of a
real, identiable subvariety of Bokmål than radical Bokmål, which may be more
of an abstraction in the sense that it refers to the sum of possible departures
from moderate Bokmål without itself being a subvariety about which language
users tend to have consistent intuitions. e pointed shape towards the moder-
ate -en end of the plot, and the higher density within it, is compatible with such
a claim, although it must of course be borne in mind that the plot covers only
one phenomenon and considers only a limited number of texts. It should also
be noted that low frequency in itself typically leads to a more sparse distribution
because there will be fewer shared cooccurrence partners across texts among
 Helge Dyvik
low-frequency forms. is is therefore an alternative possible explanation of the
sparseness in the le-hand -a end of the plot.
Figure 2 is a close-up of the right-hand side of the plot, in which we nd texts
form Bergens Tidende (BT), Aenposten (AP) and a few from Dagbladet (DB).
Figure 2. e rightmost part of the plot in Figure 1
At the far right we see the forms jenten ‘the girl’ masc. and konen ‘the wife’/‘the
woman’ masc. ese are special for Bergens Tidende and reect a consistent two-
gender variety of Bokmål (which means that the gender here should properly
be called ‘common’ rather than ‘masculine’). As we move lewards towards the
centre, we nd the feminine alternative jenta among the rst a-forms to show up.
kona and klokka ‘the clock’ also occur to the right of the centre, in the denser part
of the plot (but outside Figure 2; see, however, Figure 4). is suggests that konen
and jenten are marked forms at the top of the implicational scale: a text with
these forms will most probably have all the other relevant nouns in the masculine,
too. Within the denser norm cluster we nd the a-forms jenta and kona occur-
ring together with mostly en-forms of other nouns, such as ulykken ‘the accident’,
etterforskningen ‘the investigation, turneringen ‘the tournament, nanskrisen ‘the
nancial crisis’, døren ‘the door’, avisen ‘the newspaper’, nattenthe night’, luen
‘the air’, kvinnen ‘the woman’ etc. e last noun is worth noticing, since it shows
that biological gender has limited relevance here. e nouns kvinne and kone
have dierent stylistic properties, and the feminine form kvinna is perceived as
Norm clusters in written Norwegian 
markedly more radical than kona. e form kvinna occurs near the middle of the
bottom le quadrant in Figure 1. It should also be noted that the total frequency
of the form kvinna in the corpus is 14, as against 3,049 for kvinnen. e numbers
for the forms of ‘kone’ are 80 for konen and 934 for kona.
Figure 3 is a close-up of a region in the far le of the plot.
venstresida kirka
Figure 3. e lemost part of the plot in Figure 1
In this part of the plot only the newspaper Klassekampen is represented.14 At
the le edge we expect to nd marked forms, typical of relatively few texts, and
probably implying a-forms of other relevant nouns as well. However, the sparse-
ness of this part of the plot indicates that there is less consistency in the choice
of a-forms across the texts departing from the norm in the direction of -a than
there is in the choice of en-forms across the texts near the other edge (Figure 2).
e nouns in Figure 3 are ordningathe arrangement, venstresida ‘the (political)
le’, høyresida ‘the (political) right, kirka ‘the church, utviklinga ‘the development’,
tida ‘the time, avisa ‘the newspaper’, framtida ‘the future, handa ‘the hand, where
the lemost members tend to denote abstract concepts typical of texts of discur-
sive type, which is compatible with the common impression that these are the last
nouns to get the radical a-forms.
1.  is does not preclude that the forms may occur in the other newspapers as well, but
other choices pull those texts further to the right in the plot.
 Helge Dyvik
Figure 4 is a close-up of the central region of the plot in Figure 1.
Figure 4. e central part of the plot in Figure 1
It is noticable that the Klassekampen texts occur exclusively on the le-hand
side of the middle line, although also quite close to it, while the Aenposten and
Bergens Tidende texts occur exclusively on the right-hand side. Only Dagbladet
distributes on both sides, placing the newspaper in the central region where we
expect to nd the forms that are typical of most of the texts. However, the central
part of the plot is not the maximally dense part, indicating that the texts deviating
to the right are more consistent in their choices than the others are, thus forming
a norm cluster.
e nouns with a-forms in the central region are typically concretes or
other words characteristic of everyday language, such as skyldathe blame’, uka
‘the week’, hånda ‘the hand’, gata ‘the street’, natta ‘the night, døra ‘the door’,
klokka ‘the clock’, and konathe wife’. en-forms in Figure 4 comprise abstracts
typical of discursive prose stretching into the le part, such as venstresiden ‘the
(political) le’, utfordringen ‘the challenge, sannheten ‘the truth, utviklingen
‘the development’, makten ‘the power’, løsningen ‘the solution’, etterforskningen
‘the investigation, høyresiden ‘the (political) right’, ordningen ‘the arrangement’
(corroborating the impression that the en-form is most persistent in such words
Norm clusters in written Norwegian 
even in texts where other nouns get -a), and further to the right muligheten
‘the possibility’, pressen ‘the press, behandlingen ‘the treatment’, undersøkelsen
‘the investigation, nyheten ’ the piece of news’, framtidenthe future, kirken ‘the
church, stillingen ‘the position, tiden ‘the time, årsaken ‘the cause, moren ‘the
mother’, and jakten ‘the hunt’.
en-forms even further to the le than the area shown in Figure 4, in the bottom
le quadrant, are historien ‘the history’, ytringsfriheten ‘the freedom of expression,
teksten ‘the text’, virkeligheten ‘the reality’, oentligheten ‘the public sphere, and for-
estillingen ‘the performance’/‘the idea. For ytringsfriheten, teksten and virkeligheten
the corresponding a-forms do not occur at all in the corpus, while the numbers of
occurrences in the other three cases are: forestillingen: 757, forestillinga: 7, historien:
2,540, historia: 16, oentligheten: 828, oentligheta: 2. e reason why the strongly
dominant en-forms in these cases still do not occur more centrally in the diagram is
probably related to vocabulary: these ‘intellectual’ concepts seem to be more typical
of the Klassekampen and Dagbladet journalists in their vicinity than they are of the
writers in Aenposten and Bergens Tidende.
.  Implication analysis
As described in Section 4 above I have performed a more direct analysis of the
implicational relations between the form choices in the corpus as a whole, disre-
garding the distribution across dierent newspapers. In the graphs that follow, the
forms which imply each other mutually according to the criteria in 4 are placed
within the same oval, while unilateral implications are marked with arrows between
ovals. e frequency of the forms is roughly indicated by the thickness of the oval
line, according to a logarithmic scale (log5). When the oval contains more than one
form, the thickness of the line indicates average frequency.
Figure 5 shows the bottom of the implicational graph, directly or indirectly
dominated by all other forms, both a-forms and en-forms. ese forms, then,
emerge as the maximally ‘unmarked’ forms, expected to occur across the text
Figure 5. e bottom of the implicational graph with maximally ‘unmarked’ forms
 Helge Dyvik
e forms in Figure 5 are all en-forms, with the exception of kona. Also, many
of them denote abstracts typical of discursive prose.
As indicated in Section4 the graph has a few spurious properties resulting
from the low frequency of some forms in combination with our assumption of
transitivity of the implicational relation. For example, the rare forms utfordringa
and konen occur high up in the graph (see Figures 7 and 8), and hence appear
to imply their alternatives utfordringen and kona in Figure 5, which obviously is
not actually the case. e same is the case with the forms utstillingen, treningen,
stillingen, kirken, kvinnen and datteren in Figure 5, whose alternatives also occur
higher up in the graph. I assume that less sparse data would have moved these
forms out of the bottom oval into positions dominated by only subsets of the rest
of the forms.
Figure 6 shows the le part of the graph, with kvinna ‘the woman’ at the top,
indicating its position as a strongly marked choice implying most other a-forms.
Unexpectedly it also dominates the en-forms natten ‘the night’ and tiden ‘the
time’ (which could have been seen more easily in the undivided version of the
graph divided up between Figures6, 7 and 8 for reasons of space), which indi-
cates that the implicational relations found for low-frequency items (kvinna has
Figure 6. e le part of the implication graph
Norm clusters in written Norwegian 
14 occurrences) must be taken with a grain of salt – there is exactly one text in
which the form kvinna cooccurs with natten, and the same is the case with kvinna
and tiden. Similar probably spurious implications occur with løsninga ‘the solu-
tion, utfordringa ‘the challenge, stillinga ‘the position’, utstillinga ‘the exhibition,
målinga ‘the measuring’, meldinga ‘the message’, treninga ‘the exercise’ and dattera
‘the daughter’ in Figure 7. For the more frequent and hence more reliable a-cases
in Figures6 and 7 we notice that a-forms of more abstract words tend to imply
a-forms of more concrete or everyday words. us, makta ‘the power’, grensa ‘the
limit’/‘the border’, kirka ‘the church, and further down venstresida ‘the le, høyre-
sida ‘the right’, skylda the blame’ and nanskrisa ‘the nancial crisis’ dominate the
Figure 7. e middle part of the implication graph
 Helge Dyvik
more everyday concepts lua ‘the air’, avisa ‘the newspaper’, døra ‘the door’, gata
‘the street’, boka ‘the book, framtida ‘the future’, and even further down uka ‘the
week’ and jentathe girl’. Some of the exceptions to this pattern should probably be
attributed to stylistic properties associated with individual words. We may notice
that some of the a-forms of concretes that occur higher up and hence appear to
be more marked choices denote female humans, such as kvinna ‘the woman’ and
datterathe daughter’. Another example is mora ‘the mother’, which does not occur
in the graph because it does not seem to cooccur with any of the other 59 nouns
Figures 7 and 8 show the en-forms, which, as expected, tend to display
the same implicational hierarchy as the a-forms, but turned upside-down. On
the en-side the everyday words are on top: if you choose the en-form of con-
cretes like gatenthe street’, boken ‘the book’, døren ‘the door’, klokken ‘the clock’,
avisen ‘the newspaper’ luen the air’ etc. then you are likely to choose also the
en-forms of more abstract nouns like venstresiden ‘the le’, høyresidenthe right’,
nanskrisen ‘the nancial crisis, makten ‘the power’, grensen the limit’/‘the
border’, etc. At the very top (apart from the spurious løsninga) we nd the rare
forms jenten ‘the girl’ and konenthe wife’. It is interesting that nouns denot-
ing female humans stand apart from other concretes, but in two diametrically
opposed ways. ere are two subclasses of them, each occurring high up in its
own implicational hierarchy: those whose en-forms are clearly marked choices
(jenten, konen), and those whose a-forms are clearly marked choices (kvinna,
dattera, mora).
Figure 8. e right part of the implication graph
Norm clusters in written Norwegian 
.  Case II: Weak verbs in Bokmål
e largest and most productive class of weak verbs in Bokmål has the alterna-
tive endings -et (traditional, moderate) and -a (radical) in past and past parti-
ciple forms. us, the MSWs kaste +Verb +Past and kaste +Verb
+PastPart both have the alternative forms kastet and kasta. I have registered
all past and past participle forms of verbs of this class whose lemmas have a fre-
quency of occurrence equal to or greater than 500 in the 22 million word Bokmål
part of the corpus. is yields an inventory of 58 verbs.
Figure 9 shows the shape of the ‘cloud’ resulting from a correspondence analy-
sis of this material projected on the two most informative dimensions, which are
jointly responsible for 31% of the information about distances in the space (x-axis:
18.6%; y-axis: 12.4%).
Figure 9. e ‘cloud’ displaying the distribution of et- and a-forms of weak verbs
In the plot in Figure 9, unlike in the masculine/feminine case, there is no
transitional area where the two form categories mingle. e oval encloses only
et-forms (with the exception of the form rykka ‘moved quickly’ and a couple
of peripheral forms mentioned below), while all the forms outside the oval are
a-forms (except a few forms which are neither, for verbs which allow further
1 Helge Dyvik
options, such as lagde, past tense of lage ‘make’). is suggests that there may
be less of an implicational hierarchy among the verbal et-forms or a-forms
than there was in the case of the nouns: the tendency is to use either one or
the other ending consistently, irrespective of verb. Still the a-forms spread out
much more sparsely than the et-forms. is sparseness is probably not the
result ofless consistency in the choice of a-forms as against et-forms across the
texts, then, but rather the result of the extremely low frequency of the a-forms.
e strong tendency is that the a-forms have less than ten occurrences, while
the et-forms have a three-digit number of occurrences in the corpus. ere-
fore the a-forms cooccur with a much lower number of the other forms than
do the et-forms, a circumstance which gives rise to a greater distance between
them in the plot, since the a-forms will share fewer cooccurrence partners
thanthe et-forms.
With one exception – one Dagbladet text – Klassekampen is the only news-
paper whose texts occur outside the oval.15 Among the Klassekampen texts,
only one is located near the upper le corner, as indicated by the lemost arrow
in Figure 9. e next Klassekampen text is located at the rightmost arrow. e
forms in the sparse area from the le inwards are: jobba ‘worked’, laga ‘made,
bekrea ‘conrmed’, handla ‘shopped’/‘acted’, henta ‘fetched’, snakka ‘talked’,
venta ‘waited’/‘expected’, samlacollected’, overraska ‘surprised, mista ‘lost’, endra
changed’, regna ‘calculated’/‘rained’, sikra ‘secured’, ytta mo v e d ’, varsla ‘noti-
ed’, erna ‘removed. us the indications are that one single writer among the
80 Bokmål writers is mainly responsible for this a-form protuberance from the
central area in the plot.
Figure 10 shows the central and densest part of the plot in Figure 9.16 All
the four newspapers are densely represented in this region (although with
Klassekampen near the edge of the et-area), indicating that the et-forms consti-
tute a clear norm cluster for this corpus, and the only cluster in the verbal -et/-a
As shown in Figure 9 the et-forms (and two a-forms) also have a protu-
berance into the top right quadrant. All three newspapers Aenposten, Bergens
Tidende and Dagbladet are represented here, but not Klassekampen. Apparently
this departure from the main cluster is not explained by the choice of inectional
forms, but by vocabulary. e forms from the top down are: scoret, scoras co re d’,
1.  is does not exclude the possibility of a-forms in other texts, but if so, the preponder-
ance of other forms still place such texts within the oval.
1.  When a form occurs twice in the plot, one occurrence is the past tense form and the
other the past participle form. e past tense forms are printed in bolder type.
Norm clusters in written Norwegian 11
mista ‘lost’, trentexercised’, rykte ‘moved quickly’,17 byttet ‘changed’, laget ‘made,
reddet ‘s av e d ’, sørget ‘secured’/‘grieved’, virket ‘worked’/‘seemed’, skuet ‘disap -
pointed’, klarte ‘managed’, sikret ‘secured’, havnet ‘ended up, hentet ‘fetched’, ledet
‘led’. As this vocabulary already suggests, the writers represented in this area are
sports journalists. e absence of Klassekampen ts well with the fact that this
newspaper does not have sports pages. e reason why the sports terminology
1.  e forms trent and rykte, a past participle and a past tense form, respectively, do
notend in -a or -et, but are included because the verbs in question also allow the -a/-et
Figure 10. e central region of the plot in Figure 9
1 Helge Dyvik
leads to this kind of departure from the main cluster may be that the sports pages
use comparatively little of the rest of the vocabulary of the language, and possibly
also, conversely, that the verbs typical of the sports pages are not frequent in other
text types.
As expected from the fact that et- and a-forms hardly mingle at all in the plot,
the implicational analysis of this material indicates no clear implicational hierar-
chy among these forms across dierent verbs. e tendency is for a text to use one
or the other ending consistently across verbs.
In order to inspect patterns of cooccurrence across the noun and verb forms
of cases I and II the two tables were combined into one and the result subjected
to correspondence analysis again. e resulting plot has the same general shape as
Figure 1, with roughly the same distribution of the noun forms, but now with the
verb forms interspersed. All the verbal a-forms occur peripherally, and the vast
majority on the le side, in the general area of the more marked nominal a-forms
(kvinna, mora etc.). As we move towards the centre, the et-forms start showing up
at the same time as the nominal a-forms typical of everyday language (gata, uka,
natta, skylda etc.), i.e. as we enter the area shown in Figure 4. is supports the
impression that the choice of verbal a-forms clusters with the choice of the most
markedly ‘radical’ nominal a-forms.
.  Case III: Innitives in Nynorsk
Innitives in Nynorsk may end in -a or in -e. Unlike the case of feminine and
masculine nouns in Bokmål, in this case the ocial norm to some extent pre-
scribes the distribution of the two endings across verbs, according to three
options: (i)consistent -a in all verbs, (ii) consistent -e in all verbs, (iii) ‘split
innitive’ (‘kløyvd innitiv’). Option (iii) involves -a in some innitives and -e
in others based on historically rooted patterns of variation in certain dialects in
the Eastern and middle part of Norway, excluding most of North Norway (see
e.g. Faarlund et al. 1997: 476 f.; Skjekkeland 1997: 69). e historical explana-
tion is related to syllable quantity. In Modern Norwegian, with the exception
of a small dialectal area in Gudbrandsdalen, accented syllables are always long,
i.e. they have a long vowel, or a short vowel plus a long consonant or conso-
nant cluster.18 Accented syllables in Old Norse could also be short, with a short
1.  is is the traditional analysis. ere are alternative phonological analyses of these
phenomena which I will not go into for present purposes.
Norm clusters in written Norwegian 1
vowel plus a short consonant. In modern dialects such syllables generally have
been lengthened either by lengthening of the vowel (typical of the West) or by
lengthening of the consonant (typical of the East). But already in late Old Norse,
before the changes in syllable quantity, we see evidence that unstressed [a] was
reduced to an [e]- or [æ]-like sound aer long syllables, but not aer short ones,
in manuscripts from the Eastern part of the country. e split innitive is a
reex in modern dialects of this quantitatively conditioned reduction. However,
aer the lengthening of the old short accented syllables there is no quantitative
conditioning of this variation from a synchronic point of view. is means that
unless you have either split innitive in your own dialect or expert knowledge of
Old Norse and language history, there is no way to predict that bite = ‘bite’ (with
originally long /i:/) should have -e while vita = ‘know’ (with originally short /i/)
should have -a according to the rules of the split innitive. As a consequence,
the split innitive option in written Nynorsk is recommended only for people
who have this phenomenon in their own dialect. As long as such writers distrib-
ute the -a and the -e according to their own dialect they are within the ocial
norm. is still opens up for some variation, since the split innitive dialects
are not consistent among themselves as to how many of the originally short-
syllabic verbs get -a rather than -e in the innitive. Hence charting the variation
in Nynorsk innitives is also of some interest.
.1  Correspondence analysis
I have registered all innitival forms of verbs of this class whose lemmas have
a frequency of occurrence equal to or greater than 500 in the 12 million word
Nynorsk part of the corpus. is yields an inventory of 83 verb lemmas. Figure 11
shows the shape of the ‘cloud’ plotting the e- and a-innitives in the six Nynorsk
newspapers. e two dimensions in the graph contain 48.7% of the information
about distances in the space (x-axis: 41.9%, y-axis: 6.8%).
In Figure 11 most of the a-innitives cluster densely in the far le, while most
of the e-innitives spread out vertically in the far right. In the le-hand a-cluster
we nd all the writers of Sogn Avis, exactly half of the writers of Dag og Tid, 8 out
of the 20 writers of Bergens Tidende, 6 out of the 20 writers of Klassekampen, 2 out
of the 20 writers of Nationen, and none from Hallingdølen. us, across the six
newspapers 38% of the writers use consistent a-innitives.
e remaining writers mostly cluster with the e-forms, the most notable
exception being Hallingdølen. Figure 12 shows part of the lower right quadrant
of Figure 11, with most of the Hallingdølen writers spreading out between the two
1 Helge Dyvik
Figure 11. e ‘cloud’ displaying the distribution of e- and a-innitives
Figure 12. Part of the lower right quadrant of Figure 11, showing writers between the
Norm clusters in written Norwegian 1
Hallingdølen is published in an area where the dialects use split innitive, but
the newspaper does not impose the use of split innitive on its journalists. e plot
indicates that the writers of Hallingdølen, plus a couple from Nationen and Dag og
Tid, use split innitive to varying degrees, from almost consistent e-forms through
a gradually increasing number of a-forms.
e rules of the split innitive, prescribing-a in originally short-syllabic verbs,
have consequences for the vertical distribution of the forms. us the lower le
quadrant of Figure 11, shown in Figure 13 shows a ‘tail’ of a-forms departing from
the main cluster in the direction of the Hallingdølen writers, and they are all short-
syllabic with the single exception of styrka ‘strengthen.
Figure 13. e lower le quadrant of Figure 11; a tail of the a-cluster
ere are two forms of the innitive vera = ‘be’, vera and væra, shown in
Figure 13, with the latter apparently mostly restricted to the users of split inni-
tive, as appears from its position near the centre of the graph.
On the right-hand side the split innitive writers lead to a corresponding pre-
ponderance of long-syllabic forms in the lower half of the e-cluster, with most of
the short-syllabic forms in the upper half. However, it seems that other factors
also inuence the distribution of the e-forms. It is not immediately clear why the
e-forms spread out so much more in the vertical dimension than the a-forms (see
Figure 11), but it seems likely that it is related to the distribution of vocabulary
across writers. ere is a noticable tendency for the e-innitives in the upper and
lower halves to belong to dierent semantic elds. e verbs in the upper region,
from the top, are (with originally long-syllabic forms marked with an asterisk,
since the split innitive favours short-syllabic forms in this region, making long-
syllabic forms more marked): lese ‘read’, *skrive w r i t e’, *snakketalk, leve ‘live’,
*kjenne fe el ’, fortelje te ll ’, *tenkjethink’, spele ‘play’, spørjea sk’, vite ‘know’, vere
‘ be’, *lære ‘learn’/‘teach, *kallecall’, gjeredo, lage ma ke’, teneearn’/‘serve’, komme
come, *vise ‘show’, *høyre ‘hear’, sitje ‘sit, velje ‘choose, oppleve experience’, seie
1 Helge Dyvik
‘say’. us, the majority of these verbs are related to the general sphere of com-
munication or intellectual activities. e verbs in the lower three quarters of
the lower half, on the other hand, are, from the bottom upwards (this time with
originally short-syllabic forms asterisked): køyredrive’, bygge ‘build’, koste cost’,
vurdere ‘evaluate, styrke ‘strengthen, legge ‘lay’, sikre ‘secure’, starte ‘s ta r t ’, innføre
‘import’/‘introduce, byggje ‘build’, satse ‘invest’, etablere ‘establish, melde report’,
søkje ‘apply’, auke ‘increase’, redusere ‘reduce, løyse ‘solve’, rekne ‘calculate, *betale
‘pay’, opneopen, kjøpe ‘buy’, hente ‘fetch, ytte ‘ mo ve ’, jobbe ‘work’ *selje s el l’, liggje
‘lie’ (in a position) vente ‘wait’/‘expect’, samle collect’, gjennomførecarry through’,
drive ‘run’ (transitive), sendesend’, leggje ‘lay’, følgje ‘follow’, skae ‘provide, møte
‘meet’, bruke u s e’, arbeide ‘work’, *klare ‘manage, greie ‘manage’, utvikle ‘develop.
us, the general sphere here seems to be industry, trade and economy.
e vertical distribution distinguishing two semantic elds among the e-forms
indicates that the writers tend to specialize in certain domains, such as culture vs.
economic news. is accords well with the fact that all the Dag og Tid writers
and most of the Klassekampen writers within the e-cluster are located in its upper
half. Both these newspapers have an emphasis on culture and political commen-
tary. e majority e-cluster writers of Nationen, a newspaper with an emphasis on
regional politics and agriculture, are located in the lower region, while the Bergens
Tidende writers within the e-cluster spread out evenly between the two regions.
is indicates that the journalists of this major newspaper can be more special-
ized than what is possible for the journalists of the much smaller Sogn Avis, which
dominates the le-hand a-cluster. is circumstance may be the reason why we
do not see a corresponding division of the a-cluster into semantically motivated
subregions: each journalist in a small newspaper has to cover several domains, and
some domains may also be covered to a lesser extent.
.  Implication analysis
e semantic factors discussed above to some extent obscure the patterns of vari-
ation within the choices of e-forms and a-forms of the innitives. We have there-
fore subjected the data to an implication analysis of the kind described in 4 above.
e analysis reveals an implicational pattern among the users of split innitive
concerning the choice of innitive forms. Figure 14 shows the right half of the
resulting graph, comprising the a-forms. e le half, comprising the e-forms, is
almost exactly like the right half turned upside down.
Most of the originally short-syllabic a-forms occur below the large set of
a-forms with mutual implications. e pattern indicates a relatively clear ranking
of the short-syllabic a-forms used by writers which use split innitive. vera ‘be’ at
the bottom indicates that at least this form is used by all such writers, and some
Norm clusters in written Norwegian 1
Figure 14. Implicational relations among the a-innitives
1 Helge Dyvik
will use only this a-form while using -e in all other innitives. gjerado’ above it
indicates that some will use only these two a-forms, etc. is is matched by the fact
that vere dominates gjere etc. at the top of the other half of the graph (not shown
here) – i.e. if a writer chooses vere, then all innitives are e-forms.
e graph must be read with the caveats mentioned in paragraphs 4 and 5.2
above. Still, it strongly suggests a rather consistent pattern of implications across
split-innitive writers with regard to the most frequent verbs. Breaking this pattern
would then probably be perceived as a violation of the operative norm for nynorsk.
At the same time it is clearly unrealistic to expect non-expert writers without this
phenomenon in their own dialects to acquire mastery of this system, including the
implicational relationships. is might be taken as an argument against keeping the
split innitive option as part of the standard language, assuming that a standard in
a strict sense is the aim.
.  Conclusion
e primary goal of the pilot project reported here was to test the utility of corre-
spondence analysis and implication analysis in the investigation of norm clusters
in Bokmål and Nynorsk. e techniques will have to be applied to a much larger
and more representative corpus, and to a much wider range of linguistic phenom-
ena, before fairly safe conclusions about the situation of written Norwegian and
its possible emergent subvarieties can be reached. Still, the analyses indicate that
the approach is able to yield plausible insights about clusterings and implicational
relationships. e following tendencies were among the ones indicated:
In the alternation between masculine and feminine gender (in the sense
of Footnote4) of nouns in Bokmål, the choice of masculine gender forms
the densest cluster, indicating that this choice may be correlated with more
consistency across texts, i.e. a more clearly dened subnorm, than the alterna-
tive choice.
ere is an implicational relationship between the choice of feminine gender
for nouns denoting abstract concepts from the academic sphere and the
choice of feminine gender for concretes and everyday words, where the for-
mer choice implies the latter, but not vice versa.
In the alternation between -et and -a in weak verbs in Bokmål the et-forms
form a solid norm cluster, while pervasive use of a-forms seems to be typical
of a tiny fraction of the writers involved.
e use of a-forms of weak verbs clusters with the choice of feminine gender
for abstract words from the academic sphere.
Norm clusters in written Norwegian 1
In the alternation between -a and -e in innitives in Nynorsk the writers using
the split innitive show a fair amount of variation, but still display a compara-
tively clear implicational relationship in their choice of ending for individual
verbs. e complexity of this system is not conducive to having it captured by
clear normative rules.
ese results in themselves are not very novel or surprising. But they corroborate
and provide a further articulation of common existing assumptions, and this is
done on the basis of empirical corpus data rather than on that of intuitive judg-
ments. is supports the conclusion that the approach tested here, applied to a
much larger corpus, may provide useful information about emerging norm
patterns for the long-term work towards the adaptation of the written standards to
developments in the operative norms revealed in real texts.
Baayen, R.H. 2008. Analyzing Linguistic Data. A Practical Introduction to Statistics Using
R. Cambridge: CUP.
Faarlund, Jan Terje, Lie, Svein & Vannebo, Kjell Ivar. 1997. Norsk referansegrammatikk. Oslo:
Rosén, Victoria & De Smedt, Koenraad 2000. *Er korrekturlesningsevnen di god? Resultater
fra SCARRIE. In Nordlyd: Tromsø University Working Papers on Language and Linguistics
28, Olaf Jansen Westvik, Toril Swan, Endre Mørck & Ove Lorentz (eds), 214–228. Tromsø:
University of Tromsø.
De Smedt, Koenraad & Rosén, Victoria. 2000. Automatic proofreading for Norwegian: e chal-
lenges of lexical and grammatical variation. In NODALIDA’99: Proceedings from the 12th
“Nordiske datalingvistikkdager”, Trondheim, December 9–10, 1999, Torbjørn Nordgård
(ed), 206–215. Trondheim: NTNU.
Norsk Ordbank. (20
March, 2011).
SCARRIE. http// (19 March, 2011).
Skjekkeland, Martin. 1997. Dei norske dialektane. Kristiansand: Høyskoleforlaget.
... I denne delen skal vi sjå på ein del forsking som har vore gjort på bokmålssubvarietetane og somme av diskusjonane som har vore om nemningsbruken. Dyvik (1993Dyvik ( , 2003Dyvik ( , 2009Dyvik ( , 2012 A claim which is sometimes made is that moderate Bokmål is more of a real, identifiable subvariety of Bokmål than radical Bokmål, which may be more of an abstraction in the sense that it refers to the sum of possible departures from moderate Bokmål without itself being a subvariety about which language users tend to have consistent intuitions. (Dyvik 2012, 201) Termane konservativt, moderat og radikalt bokmål kan ha ulike tydingar for ulike språk- -sjølv om han rett nok brukar nemninga om former som ligg mellom dei konservative og radikale både i bokmål og nynorsk. ...
Full-text available
Måten vi omtalar bokmål og nynorsk på, er i endring, og frå politisk hald blir det lagt opp til at den hevdvunne målform-nemninga skal erstattast med skrift­språk. Ei anna endring ser vi i omtalen av subvarietetane som i varierande tyde­leg grad har danna seg i bokmål og nynorsk. Sjølv om det ikkje er semje om kva konservativt, moderat og radikalt bokmål denoterer, er det ein lang tradisjon for å om­tale samanfall i valfrie ordformer på denne måten i bokmål. I forskings­litteraturen og språkteknologien har desse nemningane i seinare tid òg meir eller mindre ukritisk blitt overførte til ulike subvarietetar av nynorsk. Artik­kelen utforskar korleis bokmål og nynorsk blir omtalte i nye, tonegjevande saman­hengar, og i kva grad endringane i nemningsbruken betyr eit skifte i synet på kva bokmål og nynorsk er, medan hovudvekta ligg på nemningar som blir brukte om subvarietetane av bokmål og nynorsk.
Full-text available
This paper illustrates the advantages of combining corpus linguistic methods and correspondence analysis when investigating sub-varieties within written languages that have codified variation. Through a study of a large-scale corpus of the written standard New Norwegian/Nynorsk, the paper demonstrates how correspondence analysis is a method that is well-suited to mapping norm clusters consisting of different sub-varieties in actual usage. The study reveals the existence of a norm cluster within the corpus consisting of a moderate sub-variety. Current Norwegian language policy is to base the official norms on developments in observed usage. The paper thus concludes that future standardisation of Nynorsk should be moving towards a narrower, moderate norm in order to be in accordance with the operative norm. The methods presented in this paper may be of value when investigating other written languages with codified variation, such as Irish, or languages without official norms, such as Shona.
Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using ’R’, the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.
Dei norske dialektane
  • Martin Skjekkeland
Skjekkeland, Martin. 1997. Dei norske dialektane. Kristiansand: Høyskoleforlaget.