Content uploaded by Hyunwoo Kim
Author content
All content in this area was uploaded by Hyunwoo Kim on Aug 19, 2024
Content may be subject to copyright.
/$1*8$*(
7(67,1*
https://doi.org/10.1177/02655322231222596
Language Testing
2024, Vol. 41(3) 506 –529
© The Author(s) 2024
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/02655322231222596
journals.sagepub.com/home/ltj
Korean Syntactic Complexity
Analyzer (KOSCA): An NLP
application for the analysis of
syntactic complexity in second
language production
Haerim Hwang
The Chinese University of Hong Kong, Hong Kong
Hyunwoo Kim
Yonsei University, Republic of Korea
Abstract
Given the lack of computational tools available for assessing second language (L2) production in
Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity
Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-
source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of
syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by
investigating whether the syntactic complexity indices measured by it in L2 written and spoken
production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-
effects regression analyses showed that all seven indices significantly accounted for learner
proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the
syntactic complexity indices explained 56.0% of the total variance in proficiency for the written
data and 54.4% for the spoken data. These findings underscore the validity of the syntactic
complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which
can serve as a valuable resource for researchers and educators in the field of L2 Korean learning
and assessment.
Keywords
Korean, natural language processing, spoken production, syntactic complexity, written
production
Corresponding author:
Hyunwoo Kim, Department of English Language and Literature, Yonsei University, 50 Yonsei-ro,
Seodaemun-gu, Seoul 03722, Republic of Korea.
Email: hyunwoo2@yonsei.ac.kr
1222596LTJ0010.1177/02655322231222596Language TestingHwang and Kim
research-article2024
Article
Hwang and Kim 507
Introduction
The advent of natural language processing (NLP) techniques has contributed to a grow-
ing interest in the application of NLP tools for the automatic assessment of second lan-
guage (L2) use in various production contexts. A prominent strand of automated
approaches in this field encompasses the measurement of multiple dimensions of L2 text
complexity, such as lexical sophistication (e.g., Kyle et al., 2018), syntactic complexity
(e.g., Kyle & Crossley, 2017; Lu, 2011), and text cohesion (e.g., McNamara et al., 2010).
The NLP tools developed in these studies have offered significant opportunities for
researchers and practitioners to understand the roles of diverse language-related and
learner-specific features in characterizing text quality.
Among various linguistic properties that can be measured by NLP tools, this study
focuses on linguistic complexity at the syntactic level. Linguistic complexity, alongside
accuracy and fluency, is considered a crucial subcomponent in assessing L2 competence
(Housen & Kuiken, 2009). In particular, syntactic complexity, defined as the variety and
sophistication of phrasal/clausal/sentential structures in production (Lu, 2011; Ortega,
2003), has been widely adopted as a significant indicator of L2 production abilities.
Previous studies have shown positive relationships between L2 proficiency and various
syntactic complexity indices, including length of production, amount of coordination
and subordination, and number of particular structures, such as verb phrases (e.g., Lu,
2011; Ortega, 2015; Wolfe-Quintero et al., 1998).
However, the current NLP applications are limited to certain languages, such as
English (e.g., Lu, 2011), French (e.g., François & Fairon, 2012), German (e.g., Weiss
et al., 2021), and Russian (e.g., Kisselev, Klimov, & Kopotev, 2022). This limitation
makes it difficult to extend earlier findings on the correlation between L2 proficiency
and syntactic complexity to other languages. To achieve the generalizability of syntactic
complexity in language assessment, the need for comparable NLP applications measur-
ing syntactic complexity for a wider range of languages is evident. Furthermore, there is
growing interest in Asian languages, which necessitates the development of automated
applications for assessing learners’ production abilities in these languages. Korean is one
such language that deserves further investigation since there has been a stark increase in
the number of Korean-learning populations around the globe (M. Lee, 2019).
Due to the lack of automated NLP tools capable of processing extensive Korean
data (e.g., Vu et al., 2022), however, prior attempts to assess the syntactic complexity
of Korean learner data have predominantly relied on manual coding (e.g., Y. Kim
et al., 2016; Seo, 2009). This manual approach can place a significant burden on
researchers and educators during data analysis. To address this challenge and enhance
the current automated approaches to measuring syntactic complexity in Korean, we
developed an NLP application called the “Korean Syntactic Complexity Analyzer”
(KOSCA). KOSCA incorporates both traditional and novel syntactic complexity indi-
ces and measures language-general and Korean-specific indices in both written and
spoken production. Specifically, Korean-specific indices pertain to the use of parti-
cles and eojeols, which refer to basic spacing units, each being able to contain multi-
ple base morphemes and functional morphemes (e.g., po-y-e-cwu-ta see-causative
marker-connector-give-sentence ender “show”), as will be further discussed below.
These indices can contribute to linguists’ and educators’ better understanding of L2
508 Language Testing 41(3)
Korean production because both particles and eojeols constitute the most fundamen-
tal properties in Korean and have consistently posed challenges to L2 learners of
Korean.
The main goal of our study is to introduce KOSCA as a publicly available resource for
applied linguists and language educators to evaluate the syntactic complexity of Korean
texts and speech samples, and to test its validity across different modalities. To this end,
we tested whether the KOSCA indices can account for learner proficiency in Korean
using two datasets, consisting of 20,223 written essays and 1,102 speech samples pro-
duced by L2 learners of Korean. Through the validation of the proposed language-general
and Korean-specific indices in assessing Korean data, our study is expected to enhance
the field of syntactic complexity by extending the well-established measures used in pre-
vious studies (primarily focused on English) to Korean, while, at the same time, highlight-
ing the importance of incorporating language-specific indices. Furthermore, we share our
dataset, analysis script, and application on an open data platform, which is expected to
foster an open science culture in the field. In doing so, KOSCA can contribute to the use
of automated applications for assessing Korean learners’ production abilities, facilitating
research and practice in the field of language learning and assessment.
Korean syntactic complexity analyzer and its indices
Application
KOSCA is an open-source graphic user interface (GUI) developed in Python using the
packages KoNLPy (Park, 2014) and Kivy (2023). The development process involved the
creation of a Python script, followed by multiple rounds of experimentation with differ-
ent datasets over a period of three months. After addressing any issues with the Python
script, we transformed it into a GUI application. KOSCA is designed to be installed on
the user’s hard drive and function offline, ensuring the protection of confidential and
private information. It measures seven syntactic complexity indices, including the mean
number of eojeols per sentence, the mean number of morphemes per sentence, the mean
number of coordinate phrases per sentence, the mean number of relative clauses per
sentence, the mean number of adjunct clauses per sentence, the mean number of verbs
per sentence, and the mean number of particle types per sentence. Additionally, the tool
measures two fluency indices: the number of sentences per sample and the number of
eojeols per sample (see Hwang, 2023a, for KOSCA’s downloadable link for Windows
and Mac platforms along with the user guide).
The application calculates the indices through a five-step process: (a) it reads each text
file in a directory designated by the user; (b) it tokenizes each text into individual sen-
tences; (c) it parses each sentence for part-of-speech (POS) categories, such as nominative
case particle and verb; (d) it extracts particular categories to count their frequency per
sentence; and (e) it saves the output as a CSV file in the same directory. Among many
parsers (e.g., Hannanum, Stanza, UDpipe), Kkma of KoNLPy was selected for the follow-
ing reasons. First, it is compatible with Python, which our KOSCA is built on, thus ensur-
ing the application’s stable operation. Second, it provides comprehensive POS categories
that are suitable for tagging morphemes in Korean data (D. J. Lee et al., 2010). Lastly, it
Hwang and Kim 509
achieves high tagging accuracies of up to 93.88% for morphemes and 93.85% for sen-
tences in Korean (Min et al., 2022).
As is widely acknowledged that NLP applications tend to exhibit poorer performance
when dealing with out-of-domain data, such as L2 data, compared to in-domain data,
such as native language (L1) data (Kyle, 2021), we conducted further assessment of
KOSCA’s tagging accuracy using a subset of L2 data that were used in this study (see
Section “Data”). The first author manually annotated 1,893 POS tags obtained from a
random sample of 100 sentences (50 from the written corpus and 50 from the spoken
corpus) and compared them with those generated by the Kkma parser. The results of this
comparison revealed a relatively high accuracy, with an F1 score of 0.91 (written data:
0.97; spoken data: 0.84).
Index selection
The automated approaches to measuring complexity are categorized into two types:
analysis based on sentences and analysis based on T-units (i.e., units comprising an
independent main clause and its dependent clauses). In our NLP analysis, we opted for
the sentence-based approach because a sentence involves “a certain degree of psycho-
logical reality in that it allows researchers to glimpse how the learner views [its] struc-
ture” (Bardovi-Harlig, 1992, p. 391; see also Biber et al., 2020). In contrast, a
T-unit-based approach artificially divides sentences with coordination, which can lead
to less reliable analyses when measuring syntactic complexity stemming from con-
junction. Consequently, this approach has often yielded somewhat inconsistent results
across previous studies (for further discussion, see Crossley, 2020; Ortega, 2003).
Regarding the analysis of speech data, we employed an utterance as the basic unit of
analysis, which represents a speech act conveying a single idea in the spoken data (e.g.,
Georgila et al., 2009; Hwang et al., 2020; H. Kim & Hwang, 2022). In the context of
formal spoken data in Korean, this unit primarily corresponds to a sentence, with clear
indicators provided by sentence enders, such as -yo (politeness marker) and -ta (declar-
ative marker), thus minimizing the need to rely on prosodic boundaries.
The indices in KOSCA are categorized into four groups, as presented in Table 1 (for
English, see Lu, 2011). The selection of these indices was conducted by referring to (a)
the well-established descriptions of syntactic complexity indices in English (Lu, 2011)
Table 1. Korean syntactic complexity indices in the Korean syntactic complexity analyzer.
Category Index
Complexity of sentence Number of eojeols per sentence
Number of morphemes per sentence
Coordination Number of coordinate phrases per sentence
Subordination Number of relative clauses per sentence
Number of adjunct clauses per sentence
Particular structures Number of verbs per sentence
Number of particle types per sentence
510 Language Testing 41(3)
and (b) the representative features of Korean. Given that our research is the first to pro-
pose and validate syntactic indices capable of accounting for learner proficiency in
Korean, we incorporated a set of deep-rooted indices that are considered language-inde-
pendent between English and Korean (e.g., the mean number of morphemes per sen-
tence, the mean number of coordinate phrases per sentence, the mean number of relative
clauses per sentence, the mean number of adjunct clauses per sentence, the mean number
of verbs per sentence).
At the same time, we tested new indices specific to Korean. One of these novel indices
is the eojeol-based index, which measures the number of eojeols per sentence. We selected
this index to explore its potential as a word-based measure, analogous to its function in
Lu’s (2011) study. Additionally, we introduced the number of particle types per sentence,
designed to replace the index measuring the number of complex nominals in English as
described in Lu’s study. In contrast to English, where complex nominals often involve
prepositional phrases (e.g., the girl with a hat), Korean employs relative clauses to serve
the same function (e.g., moca-lul ssu-n sonye hat-accusative case particle wear-relativizer
girl). This cross-linguistic distinction may lead to the potential conflation of the nominal-
based index with the number of relative clauses per sentence in the context of Korean.
Thus, to avoid this confusion, we propose the number of particle types per sentence as an
alternative index for assessing the complexity of nominals in Korean (see Section
“Number of particle types per sentence”). In the subsequent sections, we provide detailed
characterizations of each of the seven indices of our choice, along with their functional
attributes in the context of L2 acquisition and/or assessment.
Number of eojeols per sentence. The mean number of words in a sentence, a clause, or a
T-unit has been found to contribute to a text’s syntactic complexity. A greater number of
words within a particular syntactic unit is typically associated with increases in the
degree of informativity, the complexity of argument structure patterns (e.g., Subject-
Verb vs. Subject-Verb-Object), and/or inter-clausal modification (e.g., subordination).
Previous research on English has identified the number of words in a sentence, a clause,
or a T-unit as one of the strongest indicators of syntactic complexity (e.g., Ai & Lu,
2013; Lu, 2011). For example, Ai and Lu (2013) found that higher-proficiency L2 learn-
ers of English were more likely to use a greater number of words per sentence and per
T-unit in writing.
Building on these findings, we included the mean number of eojeols per sentence,
which may correspond to the mean number of words per sentence in English research.
The eojeol in Korean serves as a basic spacing unit that encompasses lexical and mor-
phosyntactic information for a phrase. It is separated by space, which is roughly analo-
gous to a word unit in English. Unlike an English word, however, an eojeol can comprise
“one or more [base] morphemes and a series of functional morphemes” (G. G. Lee et al.,
2002), such as case particles. For instance, (1) is made up of four eojeols.
(1) Mina-ka John-eykey sakwa-namwu-lul po-y-e-cwu-ess-ta.
Mina-NOM John-DAT apple-tree-ACC see-CAU-connector-give-
PST-SE
“Mina showed an apple tree to John.”
Hwang and Kim 511
(ACC = Accusative case particle; CAU = Causative marker; DAT = Dative case par-
ticle; NOM = Nominative case particle; PST = Past tense marker; SE = Sentence
ender)
KOSCA computes the number of eojeols per sentence by dividing the total number of
eojeols by the total number of sentences.
Number of morphemes per sentence. A morpheme, as the smallest unit of language that
carries meaning or function, has been found to be linked to the development of morpho-
syntax in both monolingual children (Brown, 1973) and L2 learners. Previous studies
have shown that the number of morphemes produced by L2 learners is a reliable predic-
tor of their proficiency in both English (e.g., Iwashita et al., 2008) and Korean learning
contexts (e.g., Seo, 2009). Extending these findings to the automated analysis of Korean
data, we included the number of morphemes per sentence as a potential index of syntac-
tic complexity. The morphemes analyzed in this study included both (a) content mor-
phemes, such as nouns (e.g., haksayng “student”), verb stems (e.g., ka- “go”), adjectival
verb stems (e.g., coh- “good”), and adverbs (e.g., maywu “very”), and (b) function mor-
phemes, such as determiners (e.g., i “this”), particles (e.g., nominative case particle
-i/ka), tense markers (e.g., past tense marker -ss), and sentence enders (e.g., declarative
marker -ta). To compute the number of morphemes per sentence, KOSCA utilized kkma.
pos() in Kkma to parse each sentence for part-of-speech tags and tokenize it into mor-
phemes. Next, the total number of morphemes was divided by the total number of sen-
tences in each text/speech sample.
Number of coordinate phrases per sentence. Coordination (e.g., and-coordination, but-
coordination) is commonly employed to connect ideas. The number of coordinate phrases
within a sentence is related to both the sentence’s length and the use of additional phrases
for conjunction. Previous research has produced mixed findings regarding the role of
coordination in explaining L2 proficiency. Some studies reported an increase in the use of
coordination in written production among English learners as their proficiency was higher
(e.g., Kyle & Crossley, 2017; Lu, 2011), while others showed that lower-proficiency
learners used coordination more frequently than higher-proficiency learners (e.g., Bar-
dovi-Harlig, 1992; Norris & Ortega, 2009). When it comes to a modality effect, both L1
and L2 English speakers tend to use coordination more frequently in spoken language as
opposed to written production (e.g., Hwang et al., 2020). This can be attributed to the fact
that speaking is considered a more cognitively demanding process, and the use of coordi-
nation allows speakers to add clauses without imposing much cognitive burden. In this
study, we tested whether the amount of coordination significantly contributes to explain-
ing Korean learners’ proficiency in both written and spoken production. To calculate this
index, KOSCA segments a sentence into separate coordinate phrases by identifying con-
junctive markers (e.g., -ko “and,” -[u]na “but”) using the tag ECE in the Kkma parser, and
then divides the number of coordinate phrases by the total number of sentences.
Number of relative clauses per sentence. The use of relative clauses has been identified as
a reliable index of language development. For example, Lu (2011) found that L2 learners
512 Language Testing 41(3)
of English produced a greater number of relative clauses as their proficiency increased.
In the domain of L2 Korean comprehension, L2 learners with limited proficiency have
been reported to show persistent difficulty in processing relative clauses. For instance,
O’Grady et al. (2003) found that L1-English L2 learners of Korean with beginner to
intermediate proficiency struggled to match relative clauses in Korean with correspond-
ing pictures. The results of their picture-selection task revealed only 22.6% accuracy in
matching direct object relative clauses (e.g., namca-ka cohaha-nun yeca, man-nomina-
tive case particle like-relativizer woman “the woman whom the man likes”) with the
appropriate pictures. KOSCA calculates the number of relative clauses per sentence by
extracting the relativizer, such as -(u)n/nun, -l, -ten (tagged as ETD in the Kkma parser),
from a text or speech sample and dividing the number of relative clauses by the total
number of sentences.
Number of adjunct clauses per sentence. An adjunct clause refers to a type of subordinate
clause headed by an adverbializer, such as because and if. Its use demonstrates a learn-
er’s ability to refine a central idea and connect various ideas in a logical manner. Using
an oral picture description task, Lambert and Nakamura (2019) found that L2 learners of
English with higher proficiency used a greater number of adjunct clauses (see also
Iwashita et al., 2008). To measure the number of adjunct clauses per sentence, KOSCA
extracts the subordinating morphemes (e.g., -e/aseo “because,” -(u)myen “if”) using the
ECD tag in Kkma and then divides a total number of those morphemes by a total number
of sentences in each text/speech sample.
Number of verbs per sentence. Research on English has primarily focused on verb phrases
as one of the key measures of syntactic complexity. As a verb phrase contains crucial
information regarding the syntactic and semantic aspects of its head’s argument structure
and the eventuality conveyed by a sentence (Pinker, 1989), a greater number of verb
phrases may indicate a higher level of syntactic complexity. For example, in the analysis
of 500 speech samples obtained from TOEFL IBT spoken tasks, Iwashita et al. (2008)
found that the number of verb phrases per T-unit significantly contributed to explaining
the variance of learner proficiency, with more proficient learners producing a greater
number of verb phrases in their essays.
Based on these findings, we included the number of verbs per sentence as a measure
of syntactic complexity. In contrast to English, Korean often instantiates a serial verb, as
in pwul-le-cwu-ta (sing-connector-give-sentence ender “sing a song for someone”).
Therefore, we counted the number of verbal stems present in each sentence. To accom-
plish this, KOSCA employs a two-step process. Initially, it counts the total number of
verb stems, including verbs (e.g., ka- ‘go’; Kkma tag: VV), adjectival verbs (e.g., swip-
‘easy’; Kkma tag: VA), auxiliary verbs (e.g., -bo- ‘try’; Kkma tag: VXV), and copulars
(e.g., -i- ‘be’; Kkma tag: VCP). Subsequently, the application divides the total number of
verbs by the total number of sentences in a text/speech sample.
Number of particle t ypes per sentence. As an agglutinative language with a rich particle
system, Korean allows for a relatively flexible word order through scrambling (Sohn,
1999). Within this system, particles, including case markers, play a crucial role in
Hwang and Kim 513
sentence production and comprehension. They convey morphosyntactic and/or semantic
information about the linguistic elements, thereby helping to establish their grammatical
relations within a sentence (M. Lee, 2019). In (1), for example, the particles ka (nomina-
tive case particle), eykey (dative case particle), and lul (accusative case particle) deter-
mine the grammatical roles of the nominals to which they are attached. This explicit
information clarifies that the sentence represents a ditransitive event comprising a sub-
ject, an indirect object, and a direct object.
In addition to their linguistic significance, Korean particles are regarded as a critical fea-
ture for explaining learners’ language development, displaying a systematic development
order in production (for a review, see No, 2012). In L1 acquisition, initial production pre-
dominantly involves the nominative case particle -ka and the locative particle -ey, followed
by the use of the delimiter -to and the dative case particle -hantey, and then the accusative
case particle -lul. On the other hand, studies on L2 acquisition have primarily focused on the
considerable learning difficulties that particles pose for learners (e.g., Ahn, 2015; Ji, 2006;
Shin, 2016). For example, Ahn (2015) showed that L1-English L2 learners of Korean were
only able to identify the incorrect usage of the accusative case particle 51% of the time and
the nominative case particle 60% of the time on the acceptability judgment task.
Building on the importance of particles in their linguistic features and their roles in
language development, we expect the use of particles to be significant contributors to pre-
dicting proficiency in Korean. Therefore, we chose to measure the number of particle types
per sentence as an index of L2 syntactic complexity. We specifically predict that as learners
progress toward higher levels of proficiency, they will exhibit a greater diversity of parti-
cles in their production. The particles measured by KOSCA include: (a) the nominative
case particle -i/ka (Kkma tag: JKS), (b) the accusative case particle -ul/lul (Kkma tag:
JKO), (c) the genitive case particle -uy (Kkma tag: JKG), (d) the complement particle -i/ka
(Kkma tag: JKC), (e) the adverbial particles, including the locative particle -eyse and -(u)lo
and the comparative particle -pota (Kkma tag: JKB), (f) the conjunction particle -wa/kwa
(Kkma tag: JKB), (g) the vocative marker -ya (Kkma tag: JKV), (h) the quotation particle
-lako (Kkma tag: JKQ), and (i) the auxiliary particles, including the topic marker -un/nun
and the delimiter (or focus marker) -to (Kkma tag: JX). KOSCA computes this index by
dividing the number of particle types by the number of sentences in a text/speech sample.
The present study
This study is guided by the following research questions:
1. Does each of the KOSCA syntactic complexity indices extracted from written
and spoken learner data significantly account for proficiency in Korean?
2. How strongly does the holistic set of KOSCA indices contribute to accounting for
proficiency in Korean?
Data
This study utilized the Korean Learner Corpus (The National Institute of Korean
Language, 2020), which comprises both written and spoken samples produced by L2
514 Language Testing 41(3)
learners of Korean. We selected our written data from various types of essays, such as
argumentative essays, autobiographies, explorer essays, life statements, narratives, and
travelogues, while our spoken data were obtained from interviews, which constituted the
largest proportion of the corpus. The data included in our analysis consisted of (a) written
essays from 20,223 learners, containing a total number of 284,993 sentences (M = 14.09;
SD = 5.64) and 2,234,775 eojeols (M = 110.51; SD = 48.46) and (b) oral interview sam-
ples from 1,102 learners, containing 67,125 utterances (M = 60.91; SD = 41.72) and
596,696 eojeols (M = 541.47; SD = 374.47). These data were produced by L2 learners
from diverse L1 backgrounds, including Mandarin Chinese, Japanese, Vietnamese,
English, Cantonese, Russian, Mongolian, Thai, French, Spanish, Indonesian, Swedish,
and Arabic (ordered by sample size). As L1 and essay type can influence the use of the
target language (Ortega, 2015), we factored in learners’ L1s and the essay types (written
data only) in our analysis models by including them as random effects (see Section
“Analysis”).
In the context of the project conducted by the National Institute of Korean Language
(2020), learners were given approximately 60 min to compose written descriptions on a
range of topics, such as future plans, the significance of wisdom, environmental conser-
vation and economic development, shopping, water pollution, and leisure activities. For
the interviews, learners were given approximately 10 min to articulate their opinions on
various subjects during the interview section, including daily routines, family, happiness,
hopes, life in Korea, preferred locations, and plans for the upcoming year. All spoken
responses were transcribed verbatim for analysis.
The proficiency of learners was evaluated through the Test of Proficiency in Korean
(TOPIK), a standardized test designed to assess the non-native speakers’ Korean profi-
ciency (National Institute for International Education, 2023). The TOPIK test evaluates
learners’ overall receptive (listening and reading) and expressive (writing) skills in
Korean. Its proficiency level is determined based on the overall performance in all the
test skills, following the Common European Framework of Reference for Languages
(CEFR; Council of Europe, 2020), which categorizes learners into six proficiency levels,
ranging from Level 1 for low beginners (= A1 CEFR level) to Level 6 for highly advanced
learners (= C2 CEFR level). Supplemental Appendix A provides holistic descriptors of
each proficiency level in TOPIK. The distribution of learners across the six levels, along
with detailed information on the corpus data used in this study, can be found in
Supplemental Appendix B.
Analysis
The proficiency levels were transformed into continuous scores ranging from 1 to 6, in
accordance with the convention of treating ordinal variables with more than four levels
as continuous (Robitzsch, 2020). Two statistical analyses were performed for both the
written and spoken datasets to investigate the relationship between the syntactic com-
plexity indices measured by KOSCA and learner proficiency. Prior to analysis, we tested
the assumptions of normal distribution for the regression models (e.g., Gries, 2021). We
identified and removed outliers using the criterion of ± 3 standard deviations (e.g.,
Hwang and Kim 515
Kisselev, Soyan, Pastushenkov, & Merrill, 2022) and inspected histograms and box
plots. This process resulted in the removal of 0.964% of the written data and 1.413% of
the spoken data.
The first research question aimed to determine whether each index significantly con-
tributed to predicting learner proficiency. To address this question, linear mixed-effects
regression models were constructed using the lme4 package (Bates et al., 2015) in R.
These models included each of the seven syntactic complexity indices extracted from the
written and spoken data as the fixed effect, learners’ L1 as the random effect, and profi-
ciency scores as the dependent variable. For written data, essay types were included as
an additional random effect. Initially, we constructed a mixed-effects regression model
with the maximal random effects structure including random intercepts and slopes of L1s
and essay types (for the written data) for all the fixed effects. However, due to the issue
of model convergence, the random slope of L1s and the random intercept and slope of
essay types (for the written data) were removed for the models (e.g., Barr et al., 2013).
The second research question aimed to ascertain the extent to which the KOSCA
indices collectively account for learner proficiency. To address this question, we con-
ducted a stepwise multiple regression analysis for both written and spoken data, includ-
ing the syntactic complexity indices as predictors. Following Kyle and Crossley (2017),
we went through several preliminary steps to select appropriate indices for inclusion in
this analysis. To control for Type I error, we set a minimum statistical relationship
between each index value and the proficiency scores at the correlation threshold of
r = .100 and at the alpha level of p = .001. Indices that did not meet these criteria were
excluded from further analyses. We also checked for multicollinearity to ensure that the
final model only included unique explanatory indices. We removed any highly correlated
indices with a variance inflation factor (VIF) value exceeding 4 (e.g., Fox, 1991).
The indices remaining after these procedures were entered into the stepwise regres-
sion model using the Akaike information criterion (AIC) method (Akaike, 1974). The
final model was determined in a stepwise manner using the package “olsrr” in R by run-
ning multiple analyses until the model did not include any variables resulting in the sup-
pression of effect sizes. To validate the model, we conducted 10-fold cross-validation
using the package “caret” in R by dividing the dataset into 10 folds and testing nine folds
as the training set against the remaining fold as the testing set. The prediction model was
repeatedly tested on the testing set until all folds had served as the testing set. All of the
R codes that were used for our analyses are available at https://osf.io/yjcze/.
Results
Written data
Our linear mixed-effects regression models showed that all seven indices significantly
predicted learner proficiency scores, as shown in Table 2 and Figure 1. These outcomes
suggest that as the learners’ proficiency increased, their essays displayed a rise in all
tested indices, including the number of eojeols per sentence, the number of morphemes
per sentence, the number of coordinate phrases per sentence, the number of relative
516 Language Testing 41(3)
Table 2. Summary of the linear mixed-effects models for the written data.
Category Index B95% CI SE t p R2C R2M
Complexity of
sentence
Number of eojeols per
sentence
0.385 [0.366, 0.404] 0.010 39.829 < .001 0.464 0.491
Number of morphemes
per sentence
0.194 [0.186, 0.203] 0.004 46.611 < .001 0.476 0.499
Coordination Number of coordinate
phrases per sentence
1.881 [1.714, 2.049] 0.085 22.033 < .001 0.167 0.225
Subordination Number of relative
clauses per sentence
1.726 [1.652, 1.800] 0.038 45.899 < .001 0.470 0.492
Number of adjunct
clauses per sentence
2.156 [2.043, 2.268] 0.058 37.493 < .001 0.316 0.351
Particular
structures
Number of verbs per
sentence
0.897 [0.858, 0.937] 0.020 44.479 < .001 0.374 0.407
Number of particle
types per sentence
1.627 [1.525, 1.730] 0.052 22.940 < .001 0.336 0.367
Model formula: lmer (Proficiency ~ Index + (1 + Index | L1)). CI: confidence interval; B: unstandardized coefficient; SE: standard error; R2C: conditional R2; R2M:
marginal R2.
Hwang and Kim 517
Figure 1. Relationship between the mean scores of the KOSCA indices and the proficiency levels in the written data. Error bars indicate 95%
confidence intervals.
518 Language Testing 41(3)
clauses per sentence, the number of adjunct clauses per sentence, the number of verbs per
sentence, and the number of particle types per sentence.
Subsequently, we conducted the stepwise regression analysis to assess the contribu-
tion of the KOSCA indices as predictors of proficiency. Prior to the analysis, we elimi-
nated three indices that exceeded the VIF value of 4: the number of eojeols per sentence
(VIF = 34.781), the number of morphemes per sentence (VIF = 42.021), and the number
of verbs per sentence (VIF = 9.778). The remaining four indices (the number of coordi-
nate phrases per sentence, the number of relative clauses per sentence, the number of
adjunct clauses per sentence, and the number of particle types per sentence) were entered
into a stepwise regression model. The final model was statistically significant, F(4,
19628) = 6811.209, p < .001, as summarized in Table 3. This model accounted for 56.0%
(R2 = 0.560) of the variance in the proficiency scores, with all four indices identified as
significant predictors. A follow-up 10-fold cross-validation showed that the constructed
model explained 56% of the variance in the proficiency scores (R2 = 0.560), confirming
the stability of the stepwise regression model across the dataset. Among the four indices
included in the final model, the number of relative clauses per sentence exhibited the
highest standardized coefficient (
β
), indicating its strongest contribution to the predic-
tion model.
To summarize, the mixed-effects regression analyses conducted on the written data
demonstrated that all syntactic complexity indices made a significant contribution to
predicting proficiency scores. The stepwise regression analysis indicated that the model
including four predictors significantly predicted proficiency, with the number of relative
clauses per sentence being the most important factor in the model.
Spoken data
The linear mixed-effects regression analyses using the spoken data identified all seven
indices as significant predictors of learner proficiency scores (see Table 4 and Figure 2).
This result suggests that the syntactic complexity values for all seven indices increased
in accordance with the learner proficiency, which is reminiscent of the outcomes obtained
from the analysis of the written data.
Before conducting the stepwise multiple regression analysis on the spoken data,
we removed the indices that caused a multicollinearity issue, which included the num-
ber of eojeols per sentence (VIF = 27.632), the number of morphemes per sentence
Table 3. Summary of stepwise multiple regression model for the written data.
Entry Predictors included R2Adjusted R2B SE β
1 Number of relative clauses per sentence 0.512 0.512 1.229 0.019 0.474
2 Number of adjunct clauses per sentence 0.542 0.542 0.671 0.024 0.175
3 Number of particle types per sentence 0.559 0.559 0.511 0.019 0.179
4 Number of coordinate phrases per
sentence
0.560 0.560 0.124 0.027 0.027
B: unstandardized coefficient; SE: standard error; β: standardized coefficient.
Hwang and Kim 519
Table 4. Summary of the linear mixed-effects models for the spoken data.
Category Index B 95% CI SE t p R2C R2M
Complexity of
sentence
Number of eojeols per sentence 0.258 [0.222, 0.294] 0.018 13.984 < .001 0.307 0.402
Number of morphemes per sentence 0.166 [0.147, 0.186] 0.010 16.729 < .001 0.357 0.477
Coordination Number of coordinate phrases per sentence 2.555 [2.077, 3.034] 0.244 10.458 < .001 0.247 0.332
Subordination Number of relative clauses per sentence 2.965 [2.584, 3.345] 0.194 15.258 < .001 0.512 0.592
Number of adjunct clauses per sentence 2.701 [2.317, 3.086] 0.196 13.776 < .001 0.364 0.429
Particular
structures
Number of verbs per sentence 0.738 [0.650, 0.826] 0.045 16.454 < .001 0.294 0.349
Number of particle types per sentence 1.703 [1.291, 2.116] 0.210 8.098 < .001 0.198 0.349
Model formula: lmer (Proficiency ~ Index + (1 + Index | L1)). CI: confidence interval; B: unstandardized coefficient; SE: standard error; R2C: conditional R2;
R2M: marginal R2.
520 Language Testing 41(3)
Figure 2. Relationship between the mean scores of the KOSCA indices and the proficiency levels in the spoken data. Error bars indicate 95%
confidence intervals.
Hwang and Kim 521
(VIF = 45.842), and the number of verbs per sentence (VIF = 7.284). Consequently,
the remaining four indices, including the number of relative clauses per sentence, the
number of adjunct clauses per sentence, the number of coordinate phrases per sen-
tence, and the number of particle types per sentence were entered into the stepwise
regression model. The final model was statistically significant, F(3, 1053) = 414.293,
p < .001, as shown in Table 5. This model consisted of three variables and explained
54.1% (R2 = 0.541) of the total variance in the proficiency scores. A cross-validated
model explained 54.4% of the total variance (R2 = 0.544), confirming the stability of
the final model across the dataset. Among the three indices, the number of relative
clauses per sentence had the strongest contribution to the model, followed by the
number of adjunct clauses per sentence, and then by the number of coordinate phrases
per sentence.
In summary, all indices measured using the spoken data were found to be significant
in predicting learner proficiency, consistent with the result obtained from the written
data. Furthermore, the stepwise regression analysis showed that the model consisting of
three predictors significantly accounted for proficiency, with the number of relative
clauses per sentence showing the strongest explanatory power.
Discussion
Individual contributions of the syntactic complexity indices to predicting
proficiency
The results of the linear mixed-effects regression analyses indicated that all the proposed
indices reliably predicted the learner proficiency, for both written and spoken data. It is
noteworthy that these indices showed effects of similar magnitude in the two modalities,
suggesting that KOSCA can effectively measure syntactic complexity in both written
and spoken production.
Our study aligns with corpus-based literature on L2 English (e.g., Biber et al., 2011;
Kyle & Crossley, 2017; Lu, 2011), as we observed that the indices identified in this lit-
erature also served as significant predictors of L2 Korean proficiency. They include the
number of morphemes per sentence, the number of coordinate phrases per sentence, the
number of relative clauses per sentence, the number of adjunct clauses per sentence, and
the number of verbs per sentence. The consistent findings across English and Korean
suggest a cross-linguistic uniformity in the capacity of these indices to explain
Table 5. Summary of stepwise multiple regression model for the spoken data.
Entry Predictors included R2Adjusted R2B SE β
1 Number of relative clauses per sentence 0.506 0.505 1.997 0.126 0.503
2 Number of adjunct clauses per sentence 0.539 0.538 1.030 0.143 0.227
3 Number of coordinate phrases per
sentence
0.541 0.540 0.386 0.162 0.072
B: unstandardized coefficient; SE: standard error; β: standardized coefficient.
522 Language Testing 41(3)
proficiency in each language. These results raise an important question regarding what
specific aspect of these indices in English and Korean has led to their universal impact
on predicting proficiency. Notably, the indices under discussion focus on measuring fea-
tures that are present in both languages, such as separable elements like morphemes and
common syntactic features like coordinate phrases, relative clauses, adjunct clauses, and
verbs. This observation suggests that when two languages share parallel linguistic prop-
erties, indices based on those properties are likely to operate similarly in their capacity to
predict learner proficiency across the languages. To establish the generalizability of this
possibility, further studies should explore the effects of diverse syntactic complexity
indices in a wider range of languages to enhance our understanding of language-inde-
pendent measures.
We also identified the indices unique to Korean, specifically the number of eojeols per
sentence and the number of particle types per sentence, as significant predictors of
learner proficiency. It is noteworthy that the eojeol-related index was as effective as the
morpheme-based index (i.e., number of morphemes per sentence) in accounting for pro-
ficiency, underscoring the validity of both Korean-specific and language-independent
indices in explaining learner proficiency. Furthermore, higher-proficiency learners pro-
duced a greater variety of particle types per sentence in their production. Despite the
well-known challenges posed by particles for L2 learners during comprehension (e.g.,
Ahn, 2015; Ji, 2006; Shin, 2016), their roles in production, particularly in different
modes of production, have remained largely unknown. Therefore, our study represents
the first demonstration that the diversity of particles can serve as a crucial factor in
explaining Korean proficiency in both written and spoken modalities.
Moreover, as discussed in “Index selection,” the current study adopted the sentence-
based analysis, rather than the T-unit-based analysis. Our results demonstrate the validity
of using the sentence as the fundamental syntactic unit for automated analysis.
Consequently, they lend support to the perspective that complexity analysis based on real
linguistic structures can offer more meaningful insights into L2 development (Biber
et al., 2020).
Holistic contributions of the syntactic complexity indices to predicting
proficiency
Our stepwise regression analyses produced significant prediction models with substan-
tial effect sizes for both written and speech data. In the written data, the final model,
which included four predictors—that is, the number of relative clauses per sentence, the
number of adjunct clauses per sentence, the number of particle types per sentence, and
the number of coordinate phrases per sentence—explained 56.0% of the variance in
learner proficiency. Similarly, the final model for the speech data explained 54.1% of the
variance of the proficiency. As with the model for the written data, the number of relative
clauses per sentence was the most contributive predictor in this model, consolidating its
pivotal role as a crucial indicator of L2 Korean development. These results also align
with previous observations that relative clauses in Korean are particularly vulnerable for
L2 learners (e.g., O’Grady et al., 2003). Additionally, similar to the written model, the
spoken model identified the number of adjunct clauses per sentence and the number of
Hwang and Kim 523
coordinate phrases per sentence as significant predictors. The substantial predictive
power of these indices suggests that learners, as their proficiency increased, exhibited an
improved ability to structure their production by effectively connecting ideas through
different types of clauses and phrases (see also Jiang et al., 2019).
Crucially, we found both consistent and inconsistent results when compared to prior
research on L1 English production regarding the significant indices for the written and
spoken models. For instance, in the analyses of English written and spoken production,
Biber et al. (2011) demonstrated that the complexity of noun phrases was the primary
characteristic of writing, while the number of finite dependent clauses, such as the number
of subordinate clauses, best characterized speaking. They explained this discrepancy
based on Halliday’s (1989) descriptions of spoken and written language wherein “the
complexity of written language is lexical, while that of spoken language is grammatical”
(p. 63). In line with this explanation, we observed the number of particle types per sen-
tence to be a robust predictor for the written model in our study. As particles rely on noun
phrases, the increased number of particle types in higher proficiency levels may be associ-
ated with lexical features. However, unlike Biber et al.’s (2011) study, where clausal fea-
tures were identified as a characteristic of spoken language, we found that these features
emerged as significant indices for both the written and spoken models. These divergences
could be attributed to cross-linguistic differences between English and Korean. Korean
allows for easy coordination and adjunction by appending a marker at the end of the
phrase/clause (e.g., na-nun kongpwuha-myen ca-yo I-topic marker study-if sleep “if I
study, I sleep”), resulting in lower cognitive effort needed to sustain speech (as well as
writing). If this conjecture holds true, we anticipate that the production of Japanese, which
shares similar characteristics with Korean, may exhibit a comparable pattern to our study.
Overall, our results suggest the modality effect in L2 Korean production. This effect
can be attributed to the distinct characteristics of written and spoken modes: Whereas the
former entails an iterative procedure enabling learners to plan, organize, and rework their
outcome, the latter necessitates a linear process in real time (Kellogg, 1996; Levelt,
1989). The greater cognitive resources permitted in the written mode may have facili-
tated our learners in producing a more diverse set of particle types in their output. To
ensure the generalizability of the present discussion, future studies should explore the
modality effects and test the most powerful syntactic predictors of learner proficiency
across a wider range of languages.
Implications for assessment and teaching
The effectiveness and validity of KOSCA can make a valuable contribution to the field
of L2 Korean assessment and education. To our knowledge, there are no publicly avail-
able GUI applications for evaluating syntactic features in Korean. This gap has presented
a challenge for research and education for Korean where manually calculating complex-
ity indices has long been the practice rather than automating the computation (e.g., Y.
Kim et al., 2016; Seo, 2009). Given the laborious process of manual coding, which can
limit the scope of research and may force researchers to focus primarily on a limited
number of easily calculated indices, the development of KOSCA can be regarded as a
crucial step in addressing this issue. Furthermore, the application provides a valuable
524 Language Testing 41(3)
resource for researchers interested in measuring the component of syntactic complexity
in both writing and speaking modalities in Korean.
KOSCA also holds the potential for assisting L2 educators in their assessment and
curriculum development. The tool can be utilized by teachers to assess their students’
linguistic performance and language development, specifically focusing on their sen-
tence-level production abilities (for automated assessment of global learner proficiency,
see Gaillat et al., 2022). For instance, teachers can gauge their students’ current syntactic
proficiency levels by comparing their syntactic complexity in production, as measured
by KOSCA, against the syntactic complexity values reported in this study for various
proficiency levels. Also, teachers can use the tool to track the syntactic growth patterns
in their students’ output. Moreover, KOSCA can help identify specific syntactic elements
that students may underuse or struggle with. This information can serve as a useful
guideline for teachers who are dedicated to designing teaching materials and organizing
instruction based on their students’ developmental levels and needs.
While KOSCA seems promising, we acknowledge that our study did not directly
examine its effects on automated assessment or education. Further empirical research
on this topic can help us better understand how the tool can be effectively utilized in
practice.
Limitations and future directions
We recognize some limitations that should be addressed in future research. One limita-
tion pertains to the discrepancy in the dataset sizes between our written and spoken data,
along with the comparatively lower tagging accuracy of KOSCA for the spoken dataset.
To address these concerns, further research should include a balanced amount of data
from both modalities to enhance the reliability of our findings, while improving tagging
accuracy for speech data.
Another limitation concerns the limited range of genres explored in our study, without
controlling for potentially influential factors, such as topic. Extensive research on L2
production has provided ample evidence demonstrating the significant roles of various
task-related and participant-related factors, including genre, topic, learners’ L1, and gen-
der (e.g., Alexopoulou et al., 2017). Therefore, future investigations should consider
incorporating these variables to increase the validity of the KOSCA indices across a
broader range of production contexts with diverse populations.
Furthermore, the KOSCA indices focus solely on features at the syntactic level while
disregarding indices at other levels, such as phonology, morphology, and lexicon. In
addition, our indices mostly rely on research conducted in English, leading to an insuf-
ficient exploration of Korean-specific features, such as honorific markers, topic markers,
and sentence enders. Complexity is a multifaceted construct that differs across linguistic
levels and languages, with different languages conveying information in distinct ways.
Therefore, to increase the accuracy and validity of our application, it is crucial to include
indices for various linguistic levels, including Korean-specific ones, to measure more
comprehensive aspects of L2 proficiency. Also, by refining and expanding our indices
using the notion of positive and negative criteria features (Hawkins & Buttery, 2010), we
may further identify nuanced linguistic properties that are indicative of each learner’s
Hwang and Kim 525
proficiency level, which is expected to enrich our understanding of L2 Korean
development.
Lastly, while we utilized the TOPIK test as a proficiency measure, which evaluates
the learners’ listening, reading, and writing abilities, it does not directly measure speak-
ing skills or syntactic complexity in learner production. Therefore, it is essential to con-
duct an independent assessment of speaking ability and incorporate measures of syntactic
complexity in the proficiency assessment to scrutinize the precise relationship between
language proficiency and the syntactic indices derived from both written and spoken
data.
Conclusion
Through an analysis of a learner corpus in the current study, we found that all of the
syntactic complexity indices measured by KOSCA, from both written and spoken data,
significantly predicted learner proficiency in Korean. Our analyses further showed that
the stepwise regression models for the written and spoken data included slightly different
indices as predictors for learner proficiency, indicating the influence of modality on
learner production. All in all, these findings highlight the reliability of the syntactic com-
plexity indices measured by KOSCA as indicators of L2 Korean development. The appli-
cation and findings presented in this paper are expected to be useful for future research,
assessment, and pedagogical practice.
Acknowledgements
We sincerely appreciate the anonymous reviewers and editors for their valuable feedback on this
paper.
Author contribution(s)
Haerim Hwang: Conceptualization; Data curation; Methodology; Software; Visualization;
Writing—original draft; Writing—review & editing.
Hyunwoo Kim: Conceptualization; Investigation; Methodology; Resources; Validation;
Writing—review & editing.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship,
and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
ORCID iDs
Haerim Hwang https://orcid.org/0000-0003-0888-590X
Hyunwoo Kim https://orcid.org/0000-0003-4810-6333
526 Language Testing 41(3)
Open Practice
This article has received badges for Open Data and Open Materials. More information about the
Open Practices badges can be found at https://osf.io/tvyxz/wiki/home/.
Supplemental material
Materials for this paper are publicly available via the Open Science Framework (OSF; Hwang,
2023b) and in Supplemental Appendixes A and B at the following link: sj-pdf-
1-ltj-10.1177_02655322231222596.pdf. In addition, an Accessible Summary of this research,
entitled “Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis
of syntactic complexity in second language production,” is available on the Open Accessible
Summaries in Language Studies (OASIS) database (Hwang & Kim, 2024).
References
Ahn, H. (2015). Second language acquisition of Korean case by learners with different first lan-
guages [Doctoral dissertation, University of Washington]. ResearchWorks Archive. https://
digital.lib.washington.edu/researchworks/handle/1773/34000
Ai, H., & Lu, X. (2013). A corpus-based comparison of syntactic complexity in NNS and NS uni-
versity students’ writing. In A. Díaz-Negrillo, N. Ballier & P. Thompson (Eds.), Automatic
treatment and analysis of learner corpus data (pp. 249–264). John Benjamins. http://doi.
org/10.1075/scl.59.15ai
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19, 716–723. http://doi.org/10.1109/TAC.1974.1100705
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D. (2017). Task effects on linguistic
complexity and accuracy: A large-scale learner corpus analysis employing natural language
processing techniques. Language Learning, 67, 180–208. https://doi.org/10.1111/lang.12232
Bardovi-Harlig, K. (1992). A second look at T-unit analysis: Reconsidering the sentence. TESOL
Quarterly, 26, 390–395. http://doi.org/10.2307/3587016
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirma-
tory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278.
http://doi.org/10.1016/j.jml.2012.11.001
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models
using lme4. Journal of Statistical Software, 67, 1–48. http://doi.org/10.18637/jss.v067.i01
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to meas-
ure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 5–35. http://
doi.org/10.5054/tq.2011.244483
Biber, D., Gray, B., Staples, S., & Egbert, J. (2020). Investigating grammatical complexity in l2
English writing research: Linguistic description versus predictive measurement. Journal of
English for Academic Purposes, 46, 100869. https://doi.org/10.1016/j.jeap.2020.100869
Brown, R. (1973). A first language: The early stages. Harvard University Press. http://doi.
org/10.4159/harvard.9780674732469
Council of Europe. (2020). Common European framework of reference for languages: Learning,
teaching, assessment. Council of Europe Publishing. https://rm.coe.int/common-european-
framework-of-reference-for-languages-learning-teaching/16809ea0d4
Hwang and Kim 527
Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview.
Journal of Writing Research, 11, 415–443. https://doi.org/10.17239/jowr-2020.11.03.01
Fox, J. (1991). Regression diagnostics. Sage.
François, T., & Fairon, C. (2012, July). An “AI readability” formula for French as a foreign
language. In J. Tsujii, J. Henderson & M. Paşca (Eds.), Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural Language Processing and Computational
Natural Language Learning (pp. 466–477). Association for Computational Linguistics.
https://aclanthology.org/D12-1043
Gaillat, T., Simpkin, A., Ballier, N., Stearns, B., Sousa, A., Bouyé, M., & Zarrouk, M. (2022).
Predicting CEFR levels in learners of English: The use of microsystem criterial fea-
tures in a machine learning approach. ReCALL, 34, 130–146. https://doi.org/10.1017/
S095834402100029X
Georgila, K., Lemon, O., Henderson, J., & Moore, J. D. (2009). Automatic annotation of context
and speech acts for dialogue corpora. Natural Language Engineering, 15, 315–353. http://
doi.org/10.1017/S1351324909005105
Gries, S. T. (2021). (Generalized linear) mixed-effects modeling: A learner corpus example.
Language Learning, 71, 757–798. http://doi.org/10.1111/lang.12448
Halliday, M. A. K. (1989). Spoken and written language. Oxford University Press.
Hawkins, J. A., & Buttery, P. (2010). Criterial features in learner corpora: Theory and illustrations.
English Profile Journal, 1, Article e5. https://doi.org/10.1017/S2041536210000103
Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisi-
tion. Applied Linguistics, 30, 461–473. https://doi.org/10.1093/applin/amp048
Hwang, H. (2023a, November 21). Korean Syntactic Complexity Analyzer. https://haerimhwang.
github.io/tools/Korean-syntactic-complexity-analyzer
Hwang, H. (2023b, November 21). Korean Syntactic Complexity Analyzer (KOSCA): An NLP
application for the analysis of syntactic complexity in second language production. https://
doi.org/10.17605/OSF.IO/YJCZE.
Hwang, H., Jung, H., & Kim, H. (2020). Effects of written versus spoken production modalities on
syntactic complexity measures in beginning-level child EFL learners. The Modern Language
Journal, 104, 267–283. http://doi.org/10.1111/modl.12626
Hwang, H., & Kim, H. (2024). An automated tool for assessing sentence complexity in second
language Korean production. OASIS Summary of Hwang, H., & Kim, H (2024) in Language
Testing. https://oasis-database.org/
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of second language
speaking proficiency: How distinct? Applied Linguistics, 29, 24–49. http://doi.org/10.1093/
applin/amm017
Ji, H. S. (2006). The error analysis of spoken grammar in Korean interview tests. Journal of Korean
Language Education, 17, 301–323. http://uci.kci.go.kr/resolution/result.do?res_cd=G704-
000597.2006.17.3.001
Jiang, J., Bi, P., & Liu, H. (2019). Syntactic complexity development in the writings of EFL learn-
ers: Insights from a dependency syntactically-annotated corpus. Journal of Second Language
Writing, 46, 100666. https://doi.org/10.1016/j.jslw.2019.100666
Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell
(Eds.), The science of writing: Theories, methods, individual differences and applications
(pp. 57–71). Lawrence Erlbaum Associates.
Kim, H., & Hwang, H. (2022). Assessing verb-construction integration in young learners of English
as a foreign language. Language Learning, 72, 497–533. http://doi.org/10.1111/lang.12480
Kim, Y., Nam, J., & Lee, S.-Y. (2016). Correlation of proficiency with complexity, accuracy, and
fluency in spoken and written production: Evidence from L2 Korean. Journal of the National
528 Language Testing 41(3)
Council of Less Commonly Taught Languages, 19, 147–181. https://doaj.org/article/5b33f84
d53354f8d8ac35353aa2c107e
Kisselev, O., Klimov, A., & Kopotev, M. (2022). Syntactic complexity measures as linguistic
correlates of proficiency level in learner Russian. In A. Leńko-Szymańska & S. Götz (Eds.),
Complexity, accuracy and fluency in learner corpus research (pp. 51–80). John Benjamins.
https://doi.org/10.1075/scl.104.03kis
Kisselev, O., Soyan, R., Pastushenkov, D., & Merrill, J. (2022). Measuring writing development
and proficiency gains using indices of lexical and syntactic complexity: Evidence from lon-
gitudinal Russian learner corpus data. The Modern Language Journal, 106, 798–817. https://
doi.org/10.1111/modl.12808
Kivy. (2023). Kivy: The open source Python app development framework. https://kivy.org/
Kyle, K. (2021). Natural language processing for learner corpus research. International Journal of
Learner Corpus Research, 7, 1–16. http://doi.org/10.1075/ijlcr.00019.int
Kyle, K., & Crossley, S. A. (2017). Assessing syntactic sophistication in L2 writing: A usage-
based approach. Language Testing, 34, 513–535. http://doi.org/10.1177/0265532217712554
Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the automatic analysis of lexical
sophistication (TAALES): Version 2.0. Behavior Research Methods, 50, 1030–1046. http://
doi.org/10.3758/s13428-017-0924-4
Lambert, C., & Nakamura, S. (2019). Proficiency-related variation in syntactic complexity: A
study of English L1 and L2 oral descriptive discourse. International Journal of Applied
Linguistics, 29, 248–264. http://doi.org/10.1111/ijal.12224
Lee, D. J., Yeon, J. H., Hwang, I. B., Lee, S. G. (2010). KKMA: a tool for utilizing Sejong corpus
based on relational database. The KIISE Transactions on Computing Practice, 16, 1046–10.
http://kkma.snu.ac.kr/
Lee, G. G., Cha, J., & Lee, J. H. (2002). Syllable-pattern-based unknown-morpheme segmentation
and estimation for hybrid part-of-speech tagging of Korean. Computational Linguistics, 28,
53–70. http://doi.org/10.1162/089120102317341774
Lee, M. (2019). Effects of case-marking on the anticipatory processing of Korean Sentences.
Journal of Cognitive Science, 20, 339–364. https://doi.org/10.17791/jcs.2019.20.3.339
Levelt, W. (1989). Speaking: From intention to articulation. MIT Press.
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of col-
lege-level ESL writers’ language development. TESOL Quarterly, 45, 36–62. https://doi.
org/10.5054/tq.2011.240859
McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh-Metrix:
Capturing linguistic features of cohesion. Discourse Processes, 47, 292–330. http://doi.
org/10.1080/01638530902959943
Min, M., Lee, J. J., & Lee, K. (2022). Detecting illegal online gambling (IOG) services in the
mobile environment. Security and Communication Networks, 2022, 3286623. https://doi.
org/10.1155/2022/3286623
National Institute for International Education. (2023). Test of Proficiency in Korean. https://www.
topik.go.kr/
National Institute of Korean Language. (2020). The Korean Learner Corpus. https://kcorpus.
korean.go.kr/
No, G. (2012). Acquisition of case markers and grammatical functions. In C. Lee (Ed.), The hand-
book of East Asian psycholinguistics (pp. 50–62). Cambridge University Press.
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed
SLA: The case of complexity. Applied Linguistics, 30, 555–578. https://doi.org/10.1093/app-
lin/amp044
O’Grady, W., Lee, M., & Choo, M. (2003). A subject-object asymmetry in the acquisition of rela-
tive clauses in Korean as a second language. Studies in Second Language Acquisition, 25,
433–448. http://doi.org/10.1017/S0272263103000172
Hwang and Kim 529
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A
research synthesis of college-level L2 writing. Applied Linguistics, 24, 492–518. http://doi.
org/10.1093/applin/24.4.492
Ortega, L. (2015). Syntactic complexity in L2 writing: Progress and expansion. Journal of Second
Language Writing, 29, 82–94. http://doi.org/10.1016/j.jslw.2015.06.008
Park, E. (2014). KoNLPy: Korean natural language processing in Python. In E. L. Park & S.
Cho (Eds.), Proceedings of the 26th Annual Conference on Human & Cognitive Language
Technology (pp. 133–136). Korean Institute of Information Scientists and Engineers, The
Korean Society for Cognitive Science.
Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. MIT Press.
http://doi.org/10.7551/mitpress/9700.001.0001
Robitzsch, A. (2020). Why ordinal variables can (almost) always be treated as continuous variables:
Clarifying assumptions of robust continuous and ordinal factor analysis estimation methods.
Frontiers in Education, 5, Article 589965. https://doi.org/10.3389/feduc.2020.589965
Seo, S. J. (2009). Study on the interlanguage development of Korean language learners by syntac-
tic proficiency assessment: Based on analyzing syntactic features of writings [Master’s the-
sis, Yonsei University]. Yonsei University Library. https://library.yonsei.ac.kr/search/detail/
CATTOT000000732514
Shin, S. C. (2016). English L1-Korean L2 learners’ cognitive knowledge and difficulty of gram-
matical error items. Language Facts and Perspectives, 37, 173–201.
Sohn, H. M. (1999). The Korean language. Cambridge University Press. https://doi.org/10.1075/
sl.32.2.15hor
Vu, D. T., Yu, G., Lee, C., & Kim, J. (2022). Text data augmentation for the Korean language.
Applied Sciences, 12(7), 3425. https://doi.org/10.3390/app12073425
Weiss, Z., Chen, X., & Meurers, D. (2021). Using broad linguistic complexity modeling for
crosslingual readability assessment. In D. Alfter, E. Volodina, I. Pilan, J. Graën & L. Borin
(Eds.), Proceedings of the 10th Workshop on Natural Language Processing for Computer
Assisted Language Learning (NLP4CALL 2021) (pp. 38–54). LiU Electronic Press. https://
aclanthology.org/2021.nlp4call-1.4
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing:
Measures of fluency, accuracy & complexity. University of Hawaii Press.