Deterministic Techniques for Efficient Non-Deterministic Parsers.
ABSTRACT A general study of parallel non-deterministic parsing and translation à la Earley is developped formally, based on non-deterministic pushdown acceptor-transducers. Several results (complexity and efficiency) are established, some new and other previously proved only in special cases. As an application, we show that for every family of deterministic context-free pushdown parsers (e.g. precedence, LR(k), LL(k), ...) there is a family of general context-free parallel parsers that have the same efficiency in most practical cases (e.g. analysis of programming languages).
Full-textDOI: · Available from: Bernard Lang, Mar 17, 2014
- SourceAvailable from: Fernando Berzal[Show abstract] [Hide abstract]
ABSTRACT: Formal languages let us define the textual representation of data with precision. Formal grammars, typically in the form of BNF-like productions, describe the language syntax, which is then annotated for syntax-directed translation and completed with semantic actions. When, apart from the textual representation of data, an explicit representation of the corresponding data structure is required, the language designer has to devise the mapping between the suitable data model and its proper language specification, and then develop the conversion procedure from the parse tree to the data model instance. Unfortunately, whenever the format of the textual representation has to be modified, changes have to propagated throughout the entire language processor tool chain. These updates are time-consuming, tedious, and error-prone. Besides, in case different applications use the same language, several copies of the same language specification have to be maintained. In this paper, we introduce a model-based parser generator that decouples language specification from language processing, hence avoiding many of the problems caused by grammar-driven parsers and parser generators.
Article: Scannerless Boolean Parsing[Show abstract] [Hide abstract]
ABSTRACT: Scannerless generalized parsing techniques allow parsers to be derived directly from unified, declarative specifications. Unfortunately, in order to uniquely parse existing programming languages at the character level, disambiguation extensions beyond the usual context-free formalism are required. This paper explains how scannerless parsers for boolean grammars (context-free grammars extended with intersection and negation) can specify such languages un- ambiguously, and can also describe other interesting constructs such as indentation- based block structure.Electronic Notes in Theoretical Computer Science 10/2006; 164(2):97-102. DOI:10.1016/j.entcs.2006.10.007
- [Show abstract] [Hide abstract]
ABSTRACT: A word-by-word human sentence processing complexity metric is presented. This metric formalizes the intuition that comprehenders have more trouble on words contributing larger amounts of information about the syntactic structure of the sentence as a whole. The formalization is in terms of the conditional entropy of grammatical continuations, given the words that have been heard so far. To calculate the predictions of this metric, Wilson and Carroll's (1954) original entropy reduction idea is extended to infinite languages. This is demonstrated with a mildly context-sensitive language that includes relative clauses formed on a variety of grammatical relations across the Accessibility Hierarchy of Keenan and Comrie (1977). Predictions are derived that correlate significantly with repetition accuracy results obtained in a sentence-memory experiment (Keenan & Hawkins, 1987).Cognitive Science 07/2006; 30(4):643-72. DOI:10.1207/s15516709cog0000_64 · 2.38 Impact Factor