Dictionary supported generation of English text from Pitman Shorthand scripted phonetic text
ABSTRACT The Pitman Shorthand Language (PSL) is a recording medium practised in all organizations, where English is the medium of transaction. It has the practical advantage of high speed of recording, more than 180 words per minute, because of which it is appreciably received. This recording medium continues to exist in spite of considerable developments in speech processing systems, which are not yet universally established. In order to exploit the vast transcribing potential of PSL a new area of research into automation of PSL processing is conceived. This paper describes the substitution of equivalent English words for the phonetic compositions of transcribed words, in the process of automatic generation of English text from a PSL document. Transcription is achieved by making use of two new types of dictionaries specifically developed and implemented for this purpose, one of them being a phonetic dictionary wherein the words are sequenced in phonetic order and the other being an extended conventional dictionary wherein the words are appended with additional details such as use domain, forms of verbs, etc. The proposed approach is tested with limited words in both dictionaries and is found to perform satisfactorily. However, the scope exists for addition of new words into these dictionaries.
Conference Proceeding: Dynamic word based text compression[show abstract] [hide abstract]
ABSTRACT: We propose a dynamic text compression technique with a back searching algorithm and a new storage protocol. Codes being encoded are divided into three types namely copy, literal and hybrid codes. Multiple dictionaries are adopted and each of them has a linked sub-dictionary. Each dictionary has a portion of pre-defined words i.e. the most frequent words and the rest of the entries will depend on the message. A hashing function developed by Pearson (1990) is adopted. It serves two purposes. Firstly, it is used to initialize the dictionary. Secondly, it is used as a quick search to a particular word. By using this scheme, the spaces between words do not need to be considered. At the decoding side, a space character will be appended after each word is decoded. Therefore, the redundancy of space can also be compressed. The result shows that the original message will not be expanded even if we have poor dictionary designDocument Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on; 09/1997
- 01/1993; Computer Science Press., ISBN: 978-0-7167-8250-6
- [show abstract] [hide abstract]
ABSTRACT: the lexicon, on the other, the grammar operates nondeterministically to yield a set of morphologically well-formed words that together express the situation. Up until the latter half of the twentieth century, Pfi.nini's grammar was the most rigorous and comprehensive ever written for any language. Moreover, many, if not most, of the key ideas in modern linguistic theory have their origin in the ideas of his grammar (e.g., sandhi) or were anticipated by it (e.g., the theta criterion). The tradition that the grammar initiated made Sanskrit, until this century, the most thoroughly studied human language. Though the languages of the Indian subcontinent belong to two distinct language families, Indo-European (those descended from Sanskrit) and Dravidian, they have much in common: in particular, they tend to be highly inflected. Hence, the treatment of inflection, and not word order, must play the most important role in the processing of such languages. English is a language in which infle07/2002;