Journal of Quantitative Linguistics (J QUANT LINGUIST )

Description

This journal is the only refereed publication devoted exclusively to Quantitative Linguistics and its growing international readership.

  • Impact factor
    0.33
  • 5-year impact
    0.00
  • Cited half-life
    0.00
  • Immediacy index
    0.00
  • Eigenfactor
    0.00
  • Article influence
    0.00
  • Website
    Journal of Quantitative Linguistics website
  • Other titles
    Journal of quantitative linguistics (Online)
  • ISSN
    1744-5035
  • OCLC
    42679044
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: The incidence of different components of language in natural language texts are not arbitrarily organized but tend to obey particular laws which enable us to explain characteristic features of human language. The present paper is an attempt to analyse and model the pattern of occurrence of words in the Hindi language. Various kinds of corpora have been selected from different sources for the study, and the occurrence of words in these corpora has been observed for a variety of properties such as: frequencies, vocabulary measures, and pattern of initials of words relative to the subsequent matra.
    Journal of Quantitative Linguistics 02/2013; 20(1):1-12.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study investigated latent coherence and discrepancy between listening and reading comprehensions. A total of 460 Taiwanese children in the first or second grade participated in this study. Each child was assessed by test materials that contained both spoken and written Chinese tests. Specifically, multiple categorical latent variables (MCLV) models were proposed to assess latent coherence and discrepancy between Mandarin listening and Chinese reading comprehensions. Although notable coherent relations between the two abilities were found, results of this study also indicated that the discrepancy between them should not be ignored. It was concluded that coherence and discrepancy between Mandarin listening skills and Chinese reading performance were quantitatively assessable by the proposed methods. In addition, the discrepancy between the two abilities could be used to assess reading disabilities. Empirical demonstration showed the assessment of Chinese reading disabilities was practically feasible.
    Journal of Quantitative Linguistics 01/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Rongorongo, the undeciphered writing system of Rapanui (Easter Island) has received a lot of attention in the last 12 months with new studies tackling the “Mamari” (see Horley, 2009a; Melka, 2010a) and the “Keiti” tablets (see Horley, 2010; Wieczorek, 2011). The “Mamari” section is a potential “lunar series” (see Barthel, 1958a; Barthel, 1971, p. 1183; Guy, 1990); however more work is needed to ascertain whether “Keiti” reflects to some extent the same genre. In this study we look to other inscriptions in the corpus for the possible presence of the îka and timo genres (see Routledge, 1919; Fischer, 1997). After a review of the ethnographic data, in combination with a statistical analysis, we propose that one group of tablets that may reflect these genres are Gv, Ia, and Ta. The analysis focuses on a number of identified sequences that show examples of glyph /700/ across the rongorongo corpus. A mixed-methods approach has been adopted since it has the potential to coalesce advantages in terms of ethnography, text analysis and statistics. This is especially true when one has a lot of factors to consider, and errors tend to build up in the company of the unidentified data, of possibly contaminated folkloric and fragmented informants' material, of abstruse glyphic combinations, and of an imperfect system of transliteration (see Guy, 2006, p. 53).
    Journal of Quantitative Linguistics 05/2011; 18(2):122-173.
  • Journal of Quantitative Linguistics 01/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We use the rank-frequency analysis for determining the Kernel Vocabulary size within specific corpora of Ukrainian. The extrapolation of high-rank behavior is carried out for the estimation of the total vocabulary size. The entropy has been calculated for different functional styles.
    Journal of Quantitative Linguistics 08/2010; 11(3):161-171.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The entire, long history of linguistics is characterised by one type of reductionism: the context of the analysed language formants is mostly reduced to sentences and/or to lower units which are constituents of sentences. The extraction of analysed language objects from their broader connections in texts was one of the ways to make language description less complicated. The notion of ‘text’ did not occur among the analytical instruments used for the descriptive aims and even in stylistics it was applied as a free, non-specified term. This is quite understandable: the task of linguistics (or better: philology) was and still is to offer such a sorting of language units, which can serve for cultivation of mother languages and for learning foreign languages. Regardless of all the new developments in language cognition, these aims, according to our opinion, will still remain valid in the future for specialists dealing with languages, and that classical approaches with their classificatory achievements will continue to be a significant part of this branch of intellectual activity.
    Journal of Quantitative Linguistics 08/2010; 6(1):41-45.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this paper is to report for the first time the 1000 most common words and lemmas of Modern Greek and some of their quantitative characteristics. The frequency word list produced is based on the Hellenic National Corpus (HNC), a corpus of Modern Greek language consisting of about 13 million words of written texts. In particular, we investigate the application of Zipf’s law in both the 1000 most common words and lemmas. In addition we examine the frequency distribution of the grammatical categories in the 1000 most common words and lemmas as well as the average word length in the whole HNC and the growth of the average word length as a function of the number of the most common words.
    Journal of Quantitative Linguistics 08/2010; 8(3):175-185.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Some statistical characteristics of Korean texts are analyzed by experiments on large corpora. We obtain the number of occurrences of syllables and of words in Korean texts. The entropy of syllables is estimated using finite context model. Digram and trigram entropy of syllables are also estimated. The entropy of words is estimated using the same model. We try to examine how Korean text obeys the well-known Zipf’s law. Two mathematical models are constructed by modifying Mandelbrot distribution and are simulated for Korean texts. The coefficient B in Mandelbrot distribution is determined for our models by experiment. We compare Zipf’s law in Korean text with that in English and in French. According to Mandelbrot, the coefficient B is B > 1 in all the usual cases, however, we obtain B < 1 in some range of the rank-frequency distribution of Korean text. We also checked that the coefficient B does not depend on the kind and on the size of corpus but on the language.
    Journal of Quantitative Linguistics 08/2010; 7(1):19-30.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes and explains some regularities in the frequency of numbers in text. An analysis of number frequencies in text corpora in Dutch, English, German, and French confirms the expectation that frequency is highly dependent on two factors: magnitude and roundness. Roundness (defined as number frequency in an approximation context) proves to be related to three arithmetical properties: ‘10-ness’, ‘2-ness’, and ‘5-ness’. In predicting the frequency of numbers irrespective of their context ‘2½-ness’ should be added to these factors, as is suggested in the work of Sigurd (1988). The role of the four number characteristics found in this study can be explained by the preference of the language user for using base numbers, and for doubling and halving quantities.
    Journal of Quantitative Linguistics 08/2010; 8(3):187-201.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The problem of disputed authorship resolution is solved here by the formal analysis of texts. The method of the analysis is based on the Markov Model for the sequence of letters in text. We assume that the frequencies of letter pairs are very specific for an author. This assumption is checked in the large statistical experiment which was carried out for 386 text samples (stories, novels, and their combination) from stories and novels of 82 Russian fiction writers.
    Journal of Quantitative Linguistics 08/2010; 7(3):201-207.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper attempts to show that the formalism of quantum mechanics can be successfully applied to language as a (self-organising) system in discrete state spaces. It is shown that the typical ‘long tails’ of frequency distributions correspond to ‘high energy’ states, which has to be taken into account as a necessary condition for stabilising the distribution.
    Journal of Quantitative Linguistics 08/2010; 9(2):125-185.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, statistical models for the identification of word senses in English text have been suggested, such as Latent Semantic Analysis (LSA), which is based on dimensionality reduction. While this approach has yielded promising results, it makes many assumptions about the underlying semantic structure. In this paper, the goal is to use cluster analysis to group word senses objectively on the basis of their co-occurrence with other words. This method does not make any a priori assumptions about the group to which a case might be assigned: It is an arbitrary classification made on the basis of a specific number of a group, which are then classified on the basis of their metric distance from one another in a high-dimensional space. The results of classifying two senses of the word BANK indicate high classification accuracy for primary word senses, but poor classification accuracy for secondary word senses. A role for using cluster analysis to determine highly discriminating items in text is discussed.
    Journal of Quantitative Linguistics 08/2010; April 2002(Vol. 9):77-86.
  • Journal of Quantitative Linguistics 08/2010; December 1999(Vol. 6):269-270.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The article presents a collection of linguistic hypotheses from the work of G. K. Zipf (1902–1950) and re-interprets them as belonging to a systemic conception of language. Frequency of linguistic units as the central concept is linked with the units’ size, age, polylexy, semantic specifity and degree of crystallisation by self-regulative mechanisms which are functional with regard to certain system requirements. The economic constituency principle affects the characteristics of linguistic units and the order parameters of linguistic levels.
    Journal of Quantitative Linguistics 08/2010; April 1999(Vol. 6):78-84.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Phonological distance can be measured computationally using formally specified algorithms. This work investigates two such measures, one developed by Nerbonne and Heeringa (1997) based on Levenshtein distance (Levenshtein, 1965) and the other an adaptation of Dunning's (1994) language classifier that uses maximum likelihood distance. These two measures are compared against naïve transcriptions of the speech of pediatric cochlear implant users. The new measure, maximum likelihood distance, correlates highly with Levenshtein distance and naïve transcriptions; results from this corpus are easier to obtain since cochlear implant speech has a lower intelligibility than the usually high intelligibility of the speech of a different dialect.
    Journal of Quantitative Linguistics 02/2009; 16(1):96-114.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The occurrences of graphemes in a text are generally determined by Zipf's law. In an attempt to develop a theoretical model for grapheme frequencies, Grzybek and Kelih have tested different distribution models and have come to the conclusion that rank frequency distribution for Slavic languages can be expressed in the form of the negative hypergeometric distribution. The application of this distribution to different corpora has led us to derive a functional relationship between ranks and letters of the English language alphabet and thus has formed a platform for the present study. In order to identify the patterns of letters in the corpus, we have applied group theoretical aspects and have observed that different rings are generated corresponding to ranks 1, 2 having values in the range 23–26, fields for ranks in ranges 3–9 and 10–22. Applications of these rings and fields reveal that frequency distribution can always be fitted by locally adopting an equation in the sets. It has led us to generate a general model for rank frequency distribution of English texts.
    Journal of Quantitative Linguistics 01/2009; 16:307-326.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Disambiguating word meanings is an essential part of natural language processing. This paper offers a method of determining the meaning of polysemous words in text. We first provide a semantic network, automatically created from a corpus, as a tool for representing the context. We then describe the process of resolving lexical ambiguities using the network. Finally we show an experimental result with a success rate of over 85 per cent that proves effectiveness of the method.
    Journal of Quantitative Linguistics 08/2008; 3(3):244-251.

Related Journals