Conference Paper

Dictionary supported generation of English text from Pitman Shorthand scripted phonetic text

Dept. of Studies in Comput. Sci., Mysore Univ., India
DOI: 10.1109/LEC.2002.1182289 Conference: Language Engineering Conference, 2002. Proceedings
Source: IEEE Xplore

ABSTRACT The Pitman Shorthand Language (PSL) is a recording medium practised in all organizations, where English is the medium of transaction. It has the practical advantage of high speed of recording, more than 180 words per minute, because of which it is appreciably received. This recording medium continues to exist in spite of considerable developments in speech processing systems, which are not yet universally established. In order to exploit the vast transcribing potential of PSL a new area of research into automation of PSL processing is conceived. This paper describes the substitution of equivalent English words for the phonetic compositions of transcribed words, in the process of automatic generation of English text from a PSL document. Transcription is achieved by making use of two new types of dictionaries specifically developed and implemented for this purpose, one of them being a phonetic dictionary wherein the words are sequenced in phonetic order and the other being an extended conventional dictionary wherein the words are appended with additional details such as use domain, forms of verbs, etc. The proposed approach is tested with limited words in both dictionaries and is found to perform satisfactorily. However, the scope exists for addition of new words into these dictionaries.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: The computer transcription of handwritten Pitmans shorthand has enormous potential as a means of rapid text entry to todays handheld devices. Recognition errors caused in pattern segmentation and classification raises the incidence of ambiguous interpretation in existing systems and the paper proposes a well-established unigram technique and an efficient heuristic method to reduce ambiguity in a linguistic post processor. Heuristics applied in our transcription system are: - firstly, incorporating visual stimulus as used by human readers; secondly, applying knowledge of the most common words of Pitman shorthand; and finally, adding knowledge of collocation. An experiment using a phonetic Lexicon of 5000 entries shows the distribution of ambiguity in a shorthand lexicon due to the similarity of outlines and estimates the transcription accuracy of 94%.
    Document Analysis Systems VI, 6th International Workshop, DAS 2004, Florence, Italy, September 8-10, 2004, Proceedings; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper proposes the computer transcription of handwritten Pitman shorthand as a mean of rapid text entry to handheld devices. Handwritten outlines are bound to be variation from writers to writers and it causes pattern recognition to be prone to errors, however these imperfections can be restored by the use of heuristic approach in the interpretation stage. The transcription accuracy can be improved by the combination of three factors: firstly, incorporating contextual knowledge as used by human readers; secondly, applying knowledge of the most frequent words of Pitman shorthand; and finally, adding knowledge of collocation. Statistical analysis of a Shorthand lexicon is presented and distribution of transcription accuracy based on accuracy of segmentation is discussed in the paper. Experiments using a phonetic Lexicon with 5000 entries show that the approach is efficient and produces a satisfactory transcription accuracy of 94%.
  • [Show abstract] [Hide abstract]
    ABSTRACT: An innovative solution to the lexical post processing of handwritten Pitman's shorthand, denoted as the Bayesian Network (BN) based transcription is discussed along with experimental results. Unlike the conventional phonetic based transliteration approach, the paper presents a novel primitive based transcription method along with the creation of a new machine readable Pitman's shorthand lexicon. The Bayesian Network representation is shown to be robust against stroke variation and highly effective for handling major ambiguities of handwritten Pitman's Shorthand, including unpredictable vowel omissions and unclear thicknesses between similar consonant strokes. Pitman's shorthand specific unigram-based rejection strategies are also introduced that are highly effective in finding the most likely candidate words for a given outline.
    TENCON 2012 - 2012 IEEE Region 10 Conference; 01/2012