Conference Paper

Deterministic Techniques for Efficient Non-Deterministic Parsers

DOI: 10.1007/3-540-06841-4_65 Conference: Automata, Languages and Programming, 2nd Colloquium, University of Saarbrücken, July 29 - August 2, 1974, Proceedings
Source: DBLP


A general study of parallel non-deterministic parsing and translation à la Earley is developped formally, based on non-deterministic pushdown acceptor-transducers. Several results (complexity and efficiency) are established, some new and other previously proved only in special cases. As an application, we show that for every family of deterministic context-free pushdown parsers (e.g. precedence, LR(k), LL(k), ...) there is a family of general context-free parallel parsers that have the same efficiency in most practical cases (e.g. analysis of programming languages).

Download full-text


Available from: Bernard Lang, Mar 17, 2014
  • Source
    • "The sbp package is an implementation of the Lang-Tomita Generalized LR Parsing Algorithm [2] [3], employing Johnstone & Scott's RNGLR algorithm [13] for handling -productions and circularities. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Scannerless generalized parsing techniques allow parsers to be derived directly from unified, declarative specifications. Unfortunately, in order to uniquely parse existing programming languages at the character level, disambiguation extensions beyond the usual context-free formalism are required. This paper explains how scannerless parsers for boolean grammars (context-free grammars extended with intersection and negation) can specify such languages un- ambiguously, and can also describe other interesting constructs such as indentation- based block structure.
    Electronic Notes in Theoretical Computer Science 10/2006; 164(2):97-102. DOI:10.1016/j.entcs.2006.10.007
  • Source
    • "Members of this class are widely agreed to be expressive enough to accommodate reasonable structures for natural language sentences while still ruling out some conceivable alternatives (Frank, 2004; Joshi, Vijay-Shanker, & Weir, 1991). Section 3 answers this question in the affirmative, showing how the entropy reduction idea can be extended to mildly context-sensitive languages by applying two classical ideas in (probabilistic) formal language theory: Grenander's (1967) closed-form solution for the entropy of a nonterminal in a probabilistic grammar, and Lang's (1974, 1988) insight that an intermediate parser state is itself a specification of a grammar. Sections 4 through 10 assert the feasibility of this extension by examining the implications of two alternative relative clauses analyses for a proposed linguistic universal. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A word-by-word human sentence processing complexity metric is presented. This metric formalizes the intuition that comprehenders have more trouble on words contributing larger amounts of information about the syntactic structure of the sentence as a whole. The formalization is in terms of the conditional entropy of grammatical continuations, given the words that have been heard so far. To calculate the predictions of this metric, Wilson and Carroll's (1954) original entropy reduction idea is extended to infinite languages. This is demonstrated with a mildly context-sensitive language that includes relative clauses formed on a variety of grammatical relations across the Accessibility Hierarchy of Keenan and Comrie (1977). Predictions are derived that correlate significantly with repetition accuracy results obtained in a sentence-memory experiment (Keenan & Hawkins, 1987).
    Cognitive Science 07/2006; 30(4):643-72. DOI:10.1207/s15516709cog0000_64 · 2.38 Impact Factor
  • Source
    • "An obvious way to extend the standard LR parsing approach to incorporate non-determinism is to replicate the stack when a point of non-determinism is reached, and to explore all the possible traversals of the DFA. An efficient algorithm for exploring all traversals of a non-deterministic PDA which performs at most one stack pop and one stack push at each step, was given by Lang [11]. Tomita [15] gave an algorithm aimed explicitly at LR DFAs (which in their standard form can pop multiple stack symbols at each step). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Reduction Incorporated (RI) recognisers and parsers deliver high performance by suppressing the stack activity except for those rules that generate fully embedded recursion. Automaton constructions for RI parsing have been presented by Aycock and Horspool [John Aycock and Nigel Horspool. Faster generalised LR parsing. In Compiler Construction, 8th Intnl. Conf, CC'99, volume 1575 of Lecture Notes in Computer Science, pages 32 – 46. Springer-Verlag, 1999] and by Scott and Johnstone [Adrian Johnstone and Elizabeth Scott. Generalised regular parsers. In Gorel Hedin, editor, Compiler Construction, 12th Intnl. Conf, CC'03, volume 2622 of Lecture Notes in Computer Science, pages 232–246. Springer-Verlag, Berlin, 2003] but both can yield very large tables. An unusual aspect of the RI automaton is that the degree of stack activity suppression can be varied in a fine-grained way, and this provides a large family of potential RI automata for real programming languages, some of which have manageable table size but still show high performance. We give examples drawn from ANSI-C, Cobol and Pascal; discuss some heuristics for guiding manual specification of stack activity suppression; and describe work in progress on the automatic construction of RI automata using profiling information gathered from running parsers: in this way we propose to optimise our parsers' table size against performance on actual parsing tasks.
    Electronic Notes in Theoretical Computer Science 12/2005; 141(4):143-160. DOI:10.1016/j.entcs.2005.02.060
Show more