Parallel non-deterministic bottom-up parsing

ACM SIGPLAN Notices (Impact Factor: 0.66). 12/1971; 6(12):56-57. DOI: 10.1145/942582.807982


The development of translator writing systems and extensible languages has led to a simultaneous development of more efficient and general syntax analyzers, usually for context-free (CF) syntax. Our paper describes a type of parser that can be used with reasonable efficiency for any CF grammar, even one which is ambiguous. Our parser, like most others, is based on the pushdown automaton model, and thus its generality requires it to be non-deterministic (in the automata theoretic sense). An actual implementation of a non-deterministic automaton requires that we explore every computational path that could be followed by the theoretical automaton. This can be done serially [5], following successively every computational path whenever a choice occurs. It leads to the algorithms described in [1], whose time bounds can be exponential functions of the length of the string to be parsed. We can also use a parallel implementation which consists in following “simultaneously” all the possible computational paths whenever a non-deterministic choice occurs. Then we can merge paths that have ceased to be different after a certain point. Those mergings reduce drastically the amount of computation. The best known example is the top-down algorithm described in [2]. The algorithm we describe in the first part of our paper is a basic bottom-up parser, similar to [2] in its organization. Both parsers can be shown to work within time bounds which are at most the cube of the length of the input string, and are often a linear function of it. The space bounds are at most the square of that length. The second part of our paper deals with the optimization of the basic parallel bottom-up algorithm, using the properties of weak precedence relations [3]. The various optimization techniques further reduce the amount of computation required and cause frequent occurrence of a “sparse determinism” phenomenon which allows determination of part of the parse-tree before the non-deterministic analysis is ended. These optimizations also considerably lessen the space requirements. Full details can be found in [6]. The parser is easy to generate for any CF grammar, requiring only the computation of precedence tables. It is slower than most deterministic parsers (on the order of ten times); but all examples we have tried showed it to be more efficient than any other parser having the same generality. We consider that the inefficiency of our parser is reasonable as we intend it as a research tool for the language designer rather than as a part of an industrial compiler. Languages designers, or extensible languages users, are often more preoccupied with semantics than syntax and fix the syntax of their language in a “nice” form only when they have found what kind of semantics it is to represent. So they can easily come up with ambiguous syntax [4]. Our parser parses ambiguous languages and detects ambiguities in programs, thus enabling the programmer to eliminate them.

164 Reads
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A general study of parallel non-deterministic parsing and translation à la Earley is developped formally, based on non-deterministic pushdown acceptor-transducers. Several results (complexity and efficiency) are established, some new and other previously proved only in special cases. As an application, we show that for every family of deterministic context-free pushdown parsers (e.g. precedence, LR(k), LL(k), ...) there is a family of general context-free parallel parsers that have the same efficiency in most practical cases (e.g. analysis of programming languages).
    Automata, Languages and Programming, 2nd Colloquium, University of Saarbrücken, July 29 - August 2, 1974, Proceedings; 01/1974
  • [Show abstract] [Hide abstract]
    ABSTRACT: The experience in designing large software systems shows that even the definitions of a programming language can be seen as the application of specific implementation and formalization techniques rather than as the result of a «designing art». In this context, a formalism is proposed here as a tool for defining conventional high level programming languages. The formalism is an algebraic model supporting the definition of representational entities such as types. A Type is a set of data and operations on them. An isomorphism between language components and types realizes an isomorphism between the language and its specification on the model. Then, the definition of a language can be performed by stepwise definitions of the types representing the language components. So the model is also a language development tool. Stepwise definition methodologies are also investigated and two are the proposed ones: the horizontal, methodology and the vertical methodology. In particular, the vertical methodology defines languages by the development of a hierarchy of abstraction levels, each corresponding to one class of languages: The last class only contains the language being defined.
    Calcolo 08/1981; 18(3):219-254. DOI:10.1007/BF02576358 · 1.20 Impact Factor