Conference Paper

Less is more: Unparser-completeness of metalanguages for template engines

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A code generator is a program translating an input model into code. In this paper we focus on template-based code generators in the context of the model view controller architecture (MVC). The language in which the code generator is written is known as a metalanguage in the code generation parlance. The metalanguage should be, on the one side, expressive enough to be of practical value, and, on the other side, restricted enough to enforce the separation between the view and the model, according to the MVC. In this paper we advocate the notion of unparser-complete metalanguages as providing the right level of expressivity. An unparser-complete metalanguage is capable of expressing an unparser, a code generator that translates any legal abstract syntax tree into an equivalent sentence of the corresponding context-free language. A metalanguage not able to express an unparser will fail to produce all sentences belonging to the corresponding context-free language. A metalanguage able to express more than an unparser will also be able to implement code violating the model/view separation. We further show that a metalanguage with the power of a linear deterministic tree-to-string transducer is unparser-complete. Moreover, this metalanguage has been successfully applied in a non-trivial case study where an existing code generator is refactored using templates.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Parsing is a well established research field [1] -in fact, its maturity has already become its own enemy: new results are specialised refinements published at a handful of venues with a critical mass of experts to appreciate them. Unparsing is a less active field, there are no books on unparsing techniques and there is no general terminological agreement (printing, pretty-printing, unparsing, formatting), but this family of mappings has nevertheless been studied well [27,13,41,47,29,11,2,4,18,33]. Parsing research concerns recognising grammatically formed sentences, providing error-correcting feedback, constructing graph-based representations, as well as optimising such algorithms on time, memory and lookahead. ...
Conference Paper
Full-text available
Having multiple representations of the same instance is common in software language engineering: models can be visualised as graphs, edited as text, serialised as XML. When mappings between such representations are considered, terms “parsing” and “unparsing” are often used with incompatible meanings and varying sets of underlying assumptions. We investigate 12 classes of artefacts found in software language processing, present a case study demonstrating their implementations and state-of-the-art mappings among them, and systematically explore the technical research space of bidirectional mappings to build on top of the existing body of work and discover as of yet unused relationships.
... Sending information analogously works by unparsing the data provided by the developer within the AST into a document. We follow the definition by Danielsson [23] and Arnoldus et al. [28] of a correct (un)parse round-trip, i.e., the (un)parsing process for a given language is a correct round-trip, if for every AST x holds parse(unparse(x)) = x. However, we extend this property as follows: ...
Article
Full-text available
To exchange complex data structures in distributed systems, documents written in context-free languages are exchanged among communicating parties. Unparsing these documents correctly is as important as parsing them correctly because errors during unparsing result in injection vulnerabilities such as cross-site scripting (XSS) and SQL injection. Injection attacks are not limited to the web world. Every program that uses input to produce documents in a context-free language may be vulnerable to this class of attack. Even for widely used languages such as HTML and JavaScript, there are few approaches that prevent injection attacks by context-sensitive encoding, and those approaches are tied to the language. Therefore, the aim of this paper is to derive context-sensitive encoder from context-free grammars to provide correct unparsing of maliciously crafted input data for all context-free languages. The presented solution integrates encoder definition into context-free grammars and provides a generator for context-sensitive encoders and decoders that are used during (un)parsing. This unparsing process results in documents where the input data does neither influence the structure of the document nor change their intended semantics. By defining encoding during language definition, developers who use the language are provided with a clean interface for writing and reading documents written in that language, without the need to care about security-relevant encoding.
... When using templates to generate source code, the next question is which template language is used. It is often required to have somewhat restricted template language compared to Turing-complete programming languages (Arnoldus 2011;Parr 2004). A restricted template language enforces separation of the template and the model and makes the templates inherently cleaner and easier to read. ...
Thesis
Full-text available
OPC Unified Architecture is an industrial communication specification that introduces information modeling capabilities. These capabilities allow modeling the communicated data with an object model similar to object-oriented programming languages. However, using the information modeling capabilities is not developer-friendly in the current state of Prosys OPC UA Java SDK. In this thesis, it is identified how the usage of information models could be made easier. First, requirements for source code generation from OPC UA information models are elicited. After that, a type instantiation algorithm is designed to support the generated code. Finally, a design for the source code generation tool is constructed. Functional prototypes are constructed for both the type instantiation algorithm and the source code generation tool. The elicited requirements indicated that the type instantiation algorithm should be separated from the source code generation. The designed type instantiation algorithm creates instances of OPC UA types by reading the server address space on run-time. The designed source code generation tool generates Java classes that use the instances created by the algorithm. The results of this thesis are used in the future development of the Prosys OPC UA Java SDK. The protototypes are developed further by implementing missing requirements and the elicited requirements are used for validating the final product.
... These questions are not trivial and require investigation. Unparser-completeness has recently been studied in the context of template engines [ABS11]. ...
Article
This document is a case study in aggressive self-archiving. It collects all initiatives undertaken by its author in 2012, including unpublished ones, explains their relevance and relation with one another. Discussed topics include guided convergence of formal grammars in a broad sense, programmable grammar transformation operator suites, metasyntactic specifications and methods of their manipulation, tolerant (soft computing) methods in parsing theory, megamodelling as modelling linguistic architecture of software systems, repositories of grammatical knowledge, open notebook computer science, as well as the number of minor topics (new parsing algorithms, visualisation techniques, etc). A brief overview of involved venues is also included in the report.
Article
Templates are used to generate all kinds of text, including computer code. The last decade, the use of templates gained a lot of popularity due to the increase of dynamic web applications. Templates are a tool for programmers, and implementations of template engines are most times based on practical experience rather than based on a theoretical background. This book reveals the mathematical background of templates and shows interesting findings for improving the practical use of templates. First, a framework to determine the necessary computational power for the template metalanguage is presented. The template metalanguage does not need to be Turing-complete to be useful. A non-Turing-complete metalanguage enforces separation of concerns between the view and model. Second, syntactical correctness of all languages of the templates and generated code is ensured. This includes the syntactical correctness of the template metalanguage and the output language. Third, case studies show that the achieved goals are applicable in practice. It is even shown that syntactical correctness helps to prevent cross-site scripting attacks in web applications. The target audience of this book is twofold. The first group exists of researcher interested in the mathematical background of templates. The second group exists of users of templates. This includes designers of template engines on one side and programmers and web designers using templates on the other side
Article
Full-text available
This thesis discusses the notion of Software Templates and there formal inner working mechanism. Explained is how grammars can used to guarantee syntactically correctness of the output of a template engine and how syntactically errors can be found before a template is used. The thesis also shows that the metalanguage of templates does not need to be Turing-complete. The result of a limited metalanguage is a technically enforced separation of model and view. Finally some thoughts about programming, abstraction and concrete examples are shared with the reader, and a beautiful painting of Rubens showing that creating art and software/system engineering are maybe more related than expected.
Article
Full-text available
A relationship between parallel rewriting systems and two-way machines is investigated. Restrictions on the “copying power” of these devices endow them with rich structuring and give insight into the issues of determinism, parallelism, and copying. Among the parallel rewriting systems considered are the top-down tree transducer; the generalized syntax-directed translation scheme and the ETOL system, and among the two-way machines are the tree-walking automaton, the two-way finite-state transducer, and (generalizations of) the one-way checking stack automaton. The. relationship of these devices to macro grammars is also considered. An effort is made .to provide a systematic survey of a number of existing results.
Conference Paper
Full-text available
Invertible programming occurs in the area of data conversion where it is required that the conversion in one direction is the inverse of the other. For that purpose, we introduce bidirectional arrows (bi-arrows). The bi-arrow class is an extension of Haskell's arrow class with an extra combinator that changes the direction of computation.The advantage of the use of bi-arrows for invertible programming is the preservation of invertibility properties using the bi-arrow combinators. Programming with bi-arrows in a polytypic or generic way exploits this the most. Besides bidirectional polytypic examples, including invertible serialization, we give the definition of a monadic bi-arrow transformer, which we use to construct a bidirectional parser/pretty printer.
Conference Paper
Full-text available
Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interface. In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Disambiguation constructs can be interpreted as filters on parse forests. Depending on the kind of disambiguation, filters can be applied at parser generation time, at parse time, or after parsing. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.
Conference Paper
Full-text available
Program transformation is used in many areas of software engineering. Examples include compilation, optimization, synthesis, refactoring, migration, normalization and improvement [15]. Rewrite rules are a natural formalism for expressing single program transformations. However, using a standard strategy for normalizing a program with a set of rewrite rules is not adequate for implementing program transformation systems. It may be necessary to apply a rule only in some phase of a transformation, to apply rules in some order, or to apply a rule only to part of a program. These restrictions may be necessary to avoid non-termination or to choose a specific path in a non-con uent rewrite system. Stratego is a language for the specification of program transformation systems based on the paradigm of rewriting strategies. It supports the separation of strategies from transformation rules, thus allowing careful control over the application of these rules. As a result of this separation, transformation rules are reusable in multiple difierent transformations and generic strategies capturing patterns of control can be described independently of the transformation rules they apply. Such strategies can even be formulated independently of the object language by means of the generic term traversal capabilities of Stratego. In this short paper I give a description of version 0.5 of the Stratego system, discussing the features of the language (Section 2), the library (Section 3), the compiler (Section 4) and some of the applications that have been built (Section 5). Stratego is available as free software under the GNU General Public License from http://www.stratego-language.org.
Conference Paper
Full-text available
Templates are a very common solution to generate code. They are used for different tasks like rendering webpages, creating Java Beans and so on. Most template systems have no notion of the object language and just generate text. The drawback of this approach is the possibility to generate syntactical incorrect code. This can lead to all kinds of annoying errors. In this paper we present an approach for a syntax safe template engine. Syntax safety guarantees that the generated code can be correctly parsed. To ensure this we use the object language grammar to evaluate the template.
Conference Paper
Full-text available
The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development. This situation is due mostly to a lack of formal definition of separation and fear that enforcing separation emasculates a template's power. I show that not only is strict separation a worthy design principle, but that we can enforce separation while providing a potent template engine. I demonstrate my StringTemplate engine, used to build jGuru.com and other commercial sites, at work solving some nontrivial generational tasks.My goal is to formalize the study of template engines, thus, providing a common nomenclature, a means of classifying template generational power, and a way to leverage interesting results from formal language theory. I classify three types of restricted templates analogous to Chomsky's type 1..3 grammar classes and formally define separation including the rules that embody separation.Because this paper provides a clear definition of model-view separation, template engine designers may no longer blindly claim enforcement of separation. Moreover, given theoretical arguments and empirical evidence, programmers no longer have an excuse to entangle model and view.
Conference Paper
Full-text available
Many automated software engineering tools require tight integration of techniques for source code analysis and manipulation. State-of-the-art tools exist for both, but the domains have remained notoriously separate because different computational paradigms fit each domain best. This impedance mismatch hampers the development of new solutions because the desired functionality and scalability can only be achieved by repeated and ad hoc integration of different techniques. RASCAL is a domain-specific language that takes away most of this boilerplate by integrating source code analysis and manipulation at the conceptual, syntactic, semantic and technical level. We give an overview of the language and assess its merits by implementing a complex refactoring.
Article
Full-text available
Abstract syntax trees are a very common data structure in language related tools. For example, compilers, interpreters, documentation generators and syntax-directed editors use them extensively to extract, transform, store and produce information that is key to their functionality. The authors present a Java back-end for ApiGen, a tool that generates implementations of abstract syntax trees. The generated code is characterised by strong typing combined with a generic interface and maximal sub-term sharing for memory efficiency and fast equality checking. The goal of this tool is to obtain safe and more efficient programming interfaces for abstract syntax trees. The contribution of this work is the combination of generating a strongly typed data-structure with maximal sub-term sharing in Java. Practical experience shows that this approach is beneficial for extremely large as well as smaller data types.
Article
Full-text available
Program transformation is used in many areas of software engineering. Examples include compilation, optimization, synthesis, refactoring, migration, normalization and improvement [7]. Rewrite rules are a natural formalism for expressing single program transformations. However, using a standard strategy for normalizing a program with a set of rewrite rules is not adequate for implementing automatic program transformation systems. It may be necessary to apply a rule only in some phase of the transformation, or to apply rules in some order, or to apply a rule only to part of a program. These restrictions may be necessary to avoid non-termination or to choose a certain path in a non-conuent rewrite system. Stratego is a language for program transformation based on the paradigm of rewriting strategies. It supports the separation of strategies from transformation rules, thus allowing careful control over the application of these rules. As a result of this separation, transfor..
Article
Full-text available
syntax trees for (5+5)*(4-2) and 5+5*4-2 according to the syntax for expressions in Example 5. contain equations which insert these brackets automatically. For each nonterminal occurring in the right hand side of a context-free grammar rule which is used in the priorities section and/or is extended with an associativity attribute the text formatter contains the functions non-assocN (N,N) ! BOOL rightN (N,N) ! BOOL leftN (N,N) ! BOOL gtrN (N,N) ! BOOL Example 6. The rules in the SDF definition for the expressions from Example 5 are translated to the equations presented in Figure 9. The generated text formatting rule for the +-operator defined in Example 5 is shown in Figure 10. The functions l bracketsExp and r bracketsExp in Figure 10 transform the leftmost and rightmost argument, respectively, into a string and add brackets if needed. If the formatter contains a context-free grammar rule like "(" N ")" ! N fbracketg and for N there exists also a priority or associativity definiti...
Article
Unparsing is the problem of transforming an internal representation of a program into an external, concrete syntax. In conjunction with prettyprinting, it is useful for generating readable programs from internal representations. If the target language uses prefix and postfix operators, the problem is nontrivial. This paper shows how to unparse expressions using a simple, bottom‐up tree walk, which keeps track of the least tightly binding operator not enclosed by parentheses. During the tree walk, this operator is compared with the operator of the parent expression, and parentheses are inserted based on the precedence, associativity, and fixity (infix, prefix, or postfix) of the two operators. The paper is a literate program. It includes code for the unparser and for its inverse parser, both of which can handle operators of dynamically chosen precedence and associativity. Supporting such operators is useful for languages like ML, in which programmers may assign precedence and associativity to their own functions. © 1998 John Wiley & Sons, Ltd.
Conference Paper
Modern Software Engineering practice advocates the development of domain-specific specification languages to characterize formally the idioms of discourse and jargon of specific problem domains. With poorly-understood domains it is best to construct an abstract syntax to characterize the domain concepts and abstractions before developing a concrete syntax. Often, however, a good concrete syntax exists a priori: sometimes in sophisticated formal languages characterizing (often mathematical) domains but more often in miniature, legacy-code languages, sorely in need of reverse engineering. In such cases, it is necessary to derive an appropriate abstract syntax - or its first cousin, an object-oriented model - from the concrete syntax. This report describes a transformation process that produces a good abstract representation from a low-level concrete syntax specification.
Conference Paper
Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms.
Conference Paper
Modern Software Engineering practice advocates the development of domain-specific specification languages to characterize formally the idioms of discourse and jargon of specific problem domains. With poorly-understood domains it is best to construct an abstract syntax to characterize the domain concepts and abstractions before developing a concrete syntax. Often, however, a good concrete syntax exists a priori: sometimes in sophisticated formal languages characterizing (often mathematical) domains but more often in miniature, legacy-code languages, sorely in need of reverse engineering. In such cases, it is necessary to derive an appropriate abstract syntax – or its first cousin, an object-oriented model – from the concrete syntax. This report describes a transformation process that produces a good abstract representation from a low-level concrete syntax specification.
Conference Paper
We present a method for automatic program inversion in a first-order functional programming language. We formalize the transformation and illustrate it with several examples including the automatic derivation of a program for run-length decoding from a program for run-length encoding. This derivation is not possible with other automatic program inversion methods. One of our key observations is that the duplication of values and testing of their equality are two sides of the same coin in program inversion. This leads us to the design of a new self-inverse primitive function that considerably simplifies the automatic inversion of programs.
Conference Paper
The value of automated code generation is increasingly recognized, and the application model becomes the central artefact in the software development process. Model-driven development requires a rapid and flexible code generation mechanism. This paper discusses code generation based on templates that actively access UML model information to fill an implementation skeleton. Different templates result in different generated code, providing a highly flexible generation mechanism. Along with a discussion on the potential of such a code generation, an existing framework for code generation with templates is presented.
Article
A systematic treatment of the relationship between parallel rewriting systems (top-down tree transducer, ETOL system) and two-way machines (2-way gsm, tree-walking automaton, checking stack automaton) is given. Particular attention is paid to the effect of restricting the copying power of these devices. The results are employed to show that the iteration of nondeterministic top-down tree transducers, of nondeterministic 2-way gsm's and of control on ETOL systems each gives rise to a proper hierarchy.
Article
Massachusetts Institute of Technology. Dept. of Electrical Engineering. Thesis. 1969. Ph.D. MICROFICHE COPY ALSO AVAILABLE IN BARKER ENGINEERING LIBRARY. Vita. Bibliography: leaves 213-215. Ph.D.
Article
Thesis (Ph. D.)--Massachusetts Institute of Technology, 1969. Includes bibliographical references. Microfilm.
Article
The book is an introduction to the idea of design patterns in software engineering, and a catalog of twenty-three common patterns. The nice thing is, most experienced OOP designers will find out they've known about patterns all along. It's just that they've never considered them as such, or tried to centralize the idea behind a given pattern so that it will be easily reusable.
Conference Paper
this paper into several sections. As an overview, in Section 2, I try and classify meta-programs into groups. The purpose of this is to provide a common vocabulary which we can use to describe meta-programming systems in the rest of the paper
Article
and concrete syntax for infix operators Type rator represents a binary infix operator, which has a text representation, a precedence, and an associativity: hinfixij type precedence = int datatype associativity = LEFT | RIGHT | NONASSOC type rator = string * precedence * associativity This ML code uses simple integers (int) to represent precedence, an enumeration to represent associativity, and a triple to represent an operator. (In the context of an ML type definition, a * does not represent multiplication; it connects elements of a tuple.) The more general unparser, presented below, shows how to use an arbitrary type, not just string, as an operator's concrete representation. Precedence and associativity determine how infix expressions are parsed into trees, or equivalently, how they are parenthesized. For example, if operatorOmega has higher precedence than operator Phi, then x Phi yOmega z = x Phi (yOmega z) and xOmega y Phi z = (xOmega y) Phi z. When two operato...
Article
This essay describes the Model-View-Controller (MVC) programming paradigm and methodology used in the Smalltalk-80 TM programming system. MVC programming is the application of a three-way factoring, whereby objects of different classes take over the operations related to the application domain, the display of the application's state, and the user interaction with the model and the view. We present several extended examples of MVC implementations and of the layout of composite application views. The Appendices provide reference materials for the Smalltalk-80 programmer wishing to understand and use MVC better within the Smalltalk-80 system. Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 MVC and the Issues of Reusability and Pluggability . . . . . . . . . . . . . . . . . 2 The Model-View-Controller Metaphor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 An Implementation of Model-View-Controller . ....
Article
Computer program input generally has some structure; in fact, every computer program that does input can be thought of as defining an "input language" which it accepts. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity. Yacc provides a general tool for describing the input to a computer program. The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that handles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user's application handled by this subroutine. The input subroutine produced by Yacc calls a user-supplied routine to return the next basic input item. Thus, the user can specify his input in terms of individual input characters, or in terms of higher level constructs such as names and numbers. The usersupplied routine may also handle idiomatic features such as comment and continuation conventions, which typically defy easy grammatical specification. Yacc is written in portable C. The class of specifications accepted is a very general one: LALR(1) grammars with disambiguating rules. In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc has also been used for less conventional languages, including a phototypesetter language, several desk calculator languages, a document retrieval system, and a Fortran debugging system. 0:
Design patterns: elements of reusable object-oriented software A program inverter for a functional language with equality and constructors
  • E Gamma
  • R Helm
  • R Johnson
  • J Vlissides
  • R Glück
  • M Kawabe
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Boston, MA, USA, 1995. [14] R. Glück and M. Kawabe. A program inverter for a functional language with equality and constructors. In A. Ohori, editor, Programming Languages and Systems, volume 2895 of LNCS, pages 246–264. Springer, 2003.
RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation A description of the model-viewcontroller user interface paradigm in the Smalltalk-80 system
  • P Klint
  • J J Storm
  • G E Vinju
  • S T Krasner
  • Pope
P. Klint, T van der Storm, and J. J. Vinju. RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation. In SCAM '09, pages 168–177, Los Alamitos, CA, USA, 2009. IEEE Computer Society Press. [19] G. E. Krasner and S. T. Pope. A description of the model-viewcontroller user interface paradigm in the Smalltalk-80 system. Journal of Object Oriented Programming, 1(3):26–49, 1988.
Private communication
  • L G W A Cleophas
  • H Comon
  • M Dauchet
  • R Gilleron
  • C Löding
  • F Jacquemard
  • D Lugiez
  • S Tison
  • M Tommasi
L. G. W. A. Cleophas. Private communication, September 2009. [9] H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata Techniques and Applications. Available on: http://www.grappa.univ-lille3.fr/tata (accessed on November 30, 2010), 2008. release November, 18th 2008.
Tree Automata and Tree Grammars. Manual written lecture notes
  • J Engelfriet
  • Engelfriet J.
J. Engelfriet. Tree Automata and Tree Grammars. Manual written lecture notes, 1974.
Accomplishments and Research Challenges in Meta-programming
  • T Sheard
  • Sheard T.
T. Sheard. Accomplishments and Research Challenges in Metaprogramming. In SAIG, volume 2196 of LNCS, pages 2–44, London, UK, 2001. Springer.
Private communication
  • L G W A Cleophas
  • Cleophas L. G. W. A.