About
11
Publications
9,856
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
211
Citations
Citations since 2017
Additional affiliations
January 1999 - March 2006
Publications
Publications (11)
The enormous amount of data flow has made Relation Database Management System the most important and popular tools for persistence of data. While open-source RDBMS systems are not as widely used as proprietary systems like Oracle databases, but over the years, systems like PostgreSQL have gained massive popularity. High-availability database cluste...
Experiments show that for a large corpus, Zipf’s law does not hold for all ranks of words: the frequencies fall below those
predicted by Zipf’s law for ranks greater than about 5,000 word types in the English language and about 30,000 word types
in the inflected languages Irish and Latin. It also does not hold for syllables or words in the syllable...
Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words. However, when single wor...
Statistical language models should improve as the size of the n-grams increases from 3 to 5 or higher. However, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. To avoid these problems, the reduced n-grams' approach previously developed by O'Boy...
The Zipf curves of log of frequency against log of rank for a large English corpus of 500 million word tokens, 689,000 word types and for a large Spanish corpus of 16 million word tokens, 139,000 word types are shown to have the usual slope close to –1 for rank less than 5,000, but then for a higher rank they turn to give a slope close to –2. This...
Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for ngram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words.
It is shown that the enormous improvement in the size of disk storage space in recent years can be used to build individual word-domain statistical language models, one for each significant word of a language. Each of these word-domain language models is a precise domain model for the relevant significant word and when combined appropriately they p...
The Zipf curve of log of frequency against log of rank for a large English corpus of 500 million word tokens and 689,000 word types is shown to have the usual slope close to –1 for rank less than 5,000, but then for a higher rank it turns to give a slope close to –2. This is apparently mainly due to foreign words and place names. The Zipf curve for...
Projects
Projects (2)