Conference PaperPDF Available

Functional Parsers

Authors:

Abstract

. In an informal way the `list of successes' method for writing parsers using a lazy functional language (Gofer) is described. The library of higher-order functions (known as `parser combinators') that is developed is used for writing parsers for nested parentheses and operator expressions with an arbitrary number of priorities. The method is applied on itself to write a parser for grammars, that yields a parser for the language of the grammar. In the text exercises are provided, the solutions of which are given at the end of the article. 1 Introduction This article is an informal introduction to writing parsers in a lazy functional language using `parser combinators'. Most of the techniques have been described by Burge [2], Wadler [5] and Hutton [3]. Recently, the use of so-called monads has become quite popular in connection with parser combinators [6, 7]. We will not use them in this article, however, to show that no magic is involved in using parser combinators. You are ne...
... Parser combinators are unable to deal directly with le -recursion. Most parser combinator libraries provide a chain combinator (Fokker (Fokker 1995)) that captures the le -recursive design pa ern without obliging the user to rewrite a le recursive grammar to be right recursive (Aho et al. (Aho et al. 1986)). ...
Preprint
Strings are ubiquitous in code. Not all strings are created equal, some contain structure that makes them incompatible with other strings. CSS units are an obvious example. Worse, type checkers cannot see this structure: this is the latent structure problem. We introduce SafeStrings to solve this problem and expose latent structure in strings. Once visible, operations can leverage this structure to efficiently manipulate it; further, SafeStrings permit the establishment of closure properties. SafeStringsharness the subtyping and inheritance mechanics of their host language to create a natural hierarchy of string subtypes. SafeStrings define an elegant programming model over strings: the front end use of a SafeString is clear and uncluttered, with complexity confined inside the definition of a particular SafeString. They are lightweight, language-agnostic and deployable, as we demonstrate by implementing SafeStrings in TypeScript. SafeStrings reduce the surface area for cross-site scripting, argument selection defects, and they can facilitate fuzzing and analysis.
... Crucially, this parser works because it has been left-factored to remove left-recursion [Aho et al. 1986]. An alternative approach to solving left-recursion is by using so-called chain combinators, which act like folds at the parser level [Fokker 1995]. This deviates from the original form of the grammar, but can be more performant. ...
Conference Paper
Parser combinators are a clean and powerful abstraction which can provide reasonably efficient means of parsing a grammar into a form which the programmer desires. They remain close to the domain of grammars whilst at the same time offering enormous flexibility. In Haskell, the Parsec library is a prime example of such a library. However, a direct translation to Scala proves to be unbearably slow. This paper describes the semantics and design of a new library, called Parsley, which retains a close resemblance to Parsec style whilst providing very competitive performance.
... Instead of using a dedicated tool, a parser can be defined in a functional language as a parser combinator [Wad85; Fok95;HM98]. Each parser is a first-class entity that can be created dynamically and combined together to form bigger, more complex parsers. ...
Article
Full-text available
Domain-specific languages are becoming increasingly important. Almost every application touches multiple domains. But how to define, use, and combine multiple DSLs within the same application? The most common approach is to split the project along the domain boundaries into multiple pieces and files. Each file is then compiled separately. Alternatively, multiple languages can be embedded in a flexible host language: within the same syntax a new domain semantic is provided. In this paper we follow a less explored route of metamorphic languages. These languages are able to modify their own syntax and semantics on the fly, thus becoming a more flexible host for DSLs. Our language allows for dynamic creation of grammars and switching languages where needed. We achieve this through a novel concept of Syntax-Directed Execution. A language grammar includes semantic actions that are pieces of functional code executed immediately during parsing. By avoiding additional intermediate representation, connecting actions from different languages and domains is greatly simplified. Still, actions can generate highly specialized code though lambda encapsulation and Dynamic Staging.
... [217][218][219][220][221][222][223][224][225][226][227][228][229][230][231][232][233] La mise en oeuvre des différents transducteurs ne pose pas de problème particulier. On peut par exemple soit les écrire de but en blanc, auquel cas, on pourra utiliser les techniques et/ou les bibliothèques existantes [4,5] ou utiliser à bon escient des outils existants comme XSLT [9]. Un exemple de parseur-l'une des sous-routines du transducteur-écrit en Java et utilisant une feuille de style XSLT est donné dans la section 6.1 (annexe). ...
Article
Full-text available
International audience The cross-fertilization is a technique to pool expertise and resources of at least two sectors in order to make the best of each. In this paper, we present a protocol of programming based on cross-fertilization of two programming languages (Haskell and Java) under two different programming paradigms: the functional paradigm and the object paradigm. This pooling of the strengths of each type of language permit to develop more secure applications in a shorter time, with functional code concise, easily understandable and thus, easily maintainable by one third. We present the meta-architecture of applications developed following this approach and an instantiation of it for the implementation of a prototype of an asynchronous collaborative editor. La fertilisation croisée est une technique permettant de mettre en commun des compétences et des ressources d’au moins deux secteurs d’activité afin d’en tirer le meilleur de chaque. Dans ce papier, nous présentons un protocole de programmation basé sur la fertilisation croisée de deux langages de programmation (Haskell et Java) relevant de deux paradigmes de programmation différents: le paradigme fonctionnel et le paradigme objet. Cette mutualisation des points forts de chaque type de langage permet de développer des applications plus sûres, en un temps moindre, ayant un code fonctionnel concis, facilement compréhensible et donc, facilement maintenable par un tiers. Nous présentons la méta-architecture des applications développées suivant cette approche ainsi qu’une instanciation de celle-ci pour la mise en oeuvre d’un prototype d’éditeur coopératifasynchrone.
... One domain where embedding has been applied using language designs that already exist is language processing. Quite a bit of work has been done to express lexical analysis and parsing problems as combinator programs in functional languages [7] [12] [13]. The aim is to write expressions using a syntax based on the well-known regular expression and context-free grammar notations for expressing lexical and syntactic properties of languages. ...
Article
Experiences are presented from a new case study of embedding domain-specific languages in the lazy functional language Haskell. The domain languages come from the Odin software build system. Thus, in contrast to most previous embedding projects, a design and implementation of the domain languages existed when the project began. Consequently, the design could not be varied to suit the target language and it was possible to evaluate the success or otherwise of the embedding process in more detail than if the languages were designed from scratch. Experiences were mostly positive. The embedded implementation is significantly smaller than its Odin equivalent. Many benefits are obtained from having the full power of an expressive programming language available to the domain programmer. The project also demonstrates in a practical software engineering setting the utility of modern functional programming techniques such as lazy evaluation and monads for structuring programs. On the down side, the efficiency of the embedded version compares unfavourably to the original system.
... Make parse tree types with postions in them an instance of HasPos . The ABR.Parser module provides a framework for lexical analysis and parsing using parser combinators [3,4]. module ABR.Parser ( Msg, Could(Fail, Error, OK), Analyser, succeedA, epsilonA, failA, errorA, ( <|> ), ( <*> ), ( @> ), ( #> ), cons, some, many, optional, someUntil, manyUntil, ( *> ), ( <* ), alsoSat, alsoNotSat, dataSatisfies, dataSatisfies ', total, nofail, nofail', preLex, Lexeme, Tag, Lexer, TLP, TLPs, satisfyL, literalL, ( %> ), ( <**> ), ( <++> ), ( *%> ), soft, tagFilter, tokenL, endL, listL, Parser, tagP, lineNo, literalP, errMsg, warnMsg ) where ...
Article
Full-text available
This document lists and describes the libraries developed for and common to the various systems I have developed in Haskell 1 , hid-ing the implementation details of all module definitions unless ex-ported.
Article
Full-text available
Rogue behaviors refer to behavioral anomalies that can occur in human activities and that can thus be retrieved from human generated data. In this paper, we aim at showing that NoSQL graph databases are a useful tool for this purpose. Indeed these database engines exploit property graphs that can easily represent human and object interactions whatever the volume and complexity of the data. These interactions lead to fraud rings in the graphs in the form of sophisticated chains of indirect links between fraudsters representing successive transactions (money, communications, etc.) from which rogue behaviours are detected. Our work is based on two extensions of such NoSQL graph databases. The first extension allows the handling of time-variant data while the second one is devoted to the management of imprecise queries with a DSL (to define flexible operators and operations with Scala) and the Cypherf declarative flexible query language over NoSQL graph databases. These extensions allow to better address and describe sophisticated frauds. Feasibility have been studied to assess our proposition.
Article
This paper presents a purely declarative approach to artifact-centric collaborative systems, a model which we introduce in two stages. First, we assume that the workspace of a user is given by a mindmap, shortened to a map, which is a tree used to visualize and organize tasks in which he or she is involved, with the information used for the resolution of these tasks. We introduce a model of guarded attribute grammar, or GAG, to help the automation of updating such a map. A GAG consists of an underlying grammar, that specifies the logical structure of the map, with semantic rules which are used both to govern the evolution of the tree structure (how an open node may be refined to a subtree) and to compute the value of some of its attributes (which derives from contextual information). The map enriched with this extra information is termed an active workspace. Second, we define collaborative systems by making the various user's active workspaces communicate with each other. The communication uses message passing without shared memory thus enabling convenient distribution on an asynchronous architecture. We present some formal properties of the model of guarded attribute grammars, then a language for their specification and we illustrate the approach on a case study for a disease surveillance system.
Conference Paper
Full-text available
In these lectures we will introduce an interactive system that supports writing simple functional programs. Using this system, students learning functional programming: – develop their programs incrementally, – receive feedback about whether or not they are on the right track, can ask for a hint when they are stuck, – see how a complete program is stepwise constructed, – get suggestions about how to refactor their program. The system itself is implemented as a functional program, and uses fundamental concepts such as rewriting, parsing, strategies, program transformations and higher-order combinators such as the fold. We will introduce these concepts, and show how they are used in the implementation of the interactive functional programming tutor.
Article
We derive a combinator library for non-deterministic parsers with a monadic interface, by means of successive refinements starting from a specification. The choice operator of the parser implements a breadth-first search rather than the more common depth-first search, and can be seen as a parallel composition between two parsing processes. The resulting library is simple and efficient for “almost deterministic” grammars, which are typical for programming languages and other computing science applications.
Conference Paper
Should special features for exception handling, backtracking, or pattern matching be included in a programming language? This paper presents a method whereby some programs that use these features can be re-written in a functional language with lazy evaluation, without the use of any special features. This method may be of use for practicing functional programmers; in addition, it provides further evidence of the power of lazy evaluation. The method itself is straightforward: each term that may raise an exception or backtrack is replaced by a term that returns a list of values. In the special case of pattern matching without backtracking, the method can be applied even if lazy evaluation is not present. The method should be suitable for applications such as theorem proving using tacticals, as in ML/LCF.
Conference Paper
The use of monads to structure functional programs is de- scribed. Monads provide a convenient framework for simulating eects found in other languages, such as global state, exception handling, out- put, or non-determinism. Three case studies are looked at in detail: how monads ease the modication of a simple evaluator; how monads act as the basis of a datatype of arrays subject to in-place update; and how monads can be used to build parsers.
Gofer 2.30 release notes
  • Mark Jones
Monads for functional programming'. In Program design calculi, proc
  • Philip Wadler