Conference Paper

DEFIne: A Fluent Interface DSL for Deep Learning Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent years have seen a surge of interest in deep learning models that outperform other machine learning algorithms on benchmarks across many disciplines. Most existing deep learning libraries facilitate the development of neural nets by providing a mathematical framework that helps users implement their models more efficiently. This still represents a substantial investment of time and effort, however, when the intention is to compare a range of competing models quickly for a specific task. We present DEFIne, a fluent interface DSL for the specification, optimisation and evaluation of deep learning models. The fluent interface is implemented through method chaining. DEFIne is embedded in Python and is build on top of its most popular deep learning libraries, Keras and Theano. It extends these with common operations for data pre-processing and representation as well as visualisation of datasets and results. We test our framework on three benchmark tasks from different domains: heart disease diagnosis, hand-written digit recognition and weather forecast generation. Results in terms of accuracy, runtime and lines of code show that our DSL achieves equivalent accuracy and runtime to state-of-the-art models, while requiring only about 10 lines of code per application.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the last years, we have started to see works presenting some kind of DSL to help in ML tasks. We have proposals aimed at facilitating DevOps approaches for ML pipelines such as OptiML [26] or ScalOps [29]; proposals targeting the creation of ML components such as DeepDSL [30], DEFine [7] and MD4DSPRR [15] for describing deep neural networks and cross-platform ML applications; or proposals like ThingML2 [17] that look to integrate IoT components in ML pipelines. ...
Preprint
Full-text available
Datasets play a central role in the training and evaluation of machine learning (ML) models. But they are also the root cause of many undesired model behaviors, such as biased predictions. To overcome this situation, the ML community is proposing a data-centric cultural shift where data issues are given the attention they deserve, and more standard practices around the gathering and processing of datasets start to be discussed and established. So far, these proposals are mostly high-level guidelines described in natural language and, as such, they are difficult to formalize and apply to particular datasets. In this sense, and inspired by these proposals, we define a new domain-specific language (DSL) to precisely describe machine learning datasets in terms of their structure, data provenance, and social concerns. We believe this DSL will facilitate any ML initiative to leverage and benefit from this data-centric shift in ML (e.g., selecting the most appropriate dataset for a new project or better replicating other ML results). The DSL is implemented as a Visual Studio Code plugin, and it has been published under an open source license.
... CAPH 1 is a famous DSL for implementing dataflow applications on FPGAs. For ML, [268], [53] use these languages to propose hardware accelerators. Besides these two works and few others, the use of DSL is not frequent in CNN design. ...
Thesis
Full-text available
Since the early days of the DARPA challenge, the design of self-driving cars is catching increasing interest. This interest is growing even more with the recent successes of Machine Learning algorithms in perception tasks. While the accuracy of thesealgorithms is irreplaceable, it is very challenging to harness their potential. Realtime constraints as well as reliability issues heighten the burden of designing efficient platforms.We discuss the different implementations and optimization techniques in this work. We tackle the problem of these accelerators from two perspectives: performance and reliability. We propose two acceleration techniques that optimize time and resource usage. On reliability, we study the resilience of Machine Learning algorithms. We propose a tool that gives insights whether these algorithms are reliable enough forsafety critical systems or not. The Resistive Associative Processor accelerator achieves high performance due to its in-memory design which remedies the memory bottleneck present in most Machine Learning algorithms. As for the constant multiplication approach, we opened the door for a new category of optimizations by designing instance specific accelerators. The obtained results outperforms the most recent techniques in terms of execution time and resource usage. Combined with the reliability study we conducted, safety-critical systems can profit from these accelerators without compromising its security.
... DEFIne [9] mainly focuses on liberating data scientists from the necessity to deal with general-purpose languages, such as Python, when describing the networks. The proposed syntax focuses exclusively on the machine learning primitives, and the accompanying tools take care of performance and portability, still using state of the art machine learning frameworks as a backend. ...
Preprint
Full-text available
Modern machine learning frameworks are complex: they are typically organised in multiple layers each of which is written in a different language and they depend on a number of external libraries, but at their core they mainly consist of tensor operations. As array-oriented languages provide perfect abstractions to implement tensor operations, we consider a minimalistic machine learning framework that is shallowly embedded in an array-oriented language and we study its productivity and performance. We do this by implementing a state of the art Convolutional Neural Network (CNN) and compare it against implementations in TensorFlow and PyTorch --- two state of the art industrial-strength frameworks. It turns out that our implementation is 2 and 3 times faster, even after fine-tuning the TensorFlow and PyTorch to our hardware --- a 64-core GPU-accelerated machine. The size of all three CNN specifications is the same, about 150 lines of code. Our mini framework is 150 lines of highly reusable hardware-agnostic code that does not depend on external libraries. The compiler for a host array language automatically generates parallel code for a chosen architecture. The key to such a balance between performance and portability lies in the design of the array language; in particular, the ability to express rank-polymorphic operations concisely, yet being able to do optimisations across them. This design builds on very few assumptions, and it is readily transferable to other contexts offering a clean approach to high-performance machine learning.
Conference Paper
Full-text available
Two Domain Specific Languages (DSLs) have been developed to improve the development of a power control component of interventional X-ray systems of Philips. Configuration files and test cases are generated from instances of these DSLs. To increase the confidence in these instances and the generators, formal models have been generated to analyse DSL instances and to crosscheck the results of the generators. A DSL instance serves as a single source from which the implementation and the formal analysis models are generated. In this way, it is easy to maintain the formal support in case of changes and for new product releases. We report about our experiences with this approach in a real development project at Philips.
Article
Full-text available
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
There is an increase in death rate yearly as a result of heart diseases. One of the major factors that cause this increase is misdiagnoses on the part of medical doctors or ignorance on the part of the patient. Heart diseases can be described as any kind of disorder that affects the heart. In this research work, causes of heart diseases, the complications and the remedies for the diseases have been considered. An intelligent system which can diagnose heart diseases has been implemented. This system will prevent misdiagnosis which is the major error that may occur by medical doctors. The dataset of statlog heart disease has been used to carry out this experiment. The dataset comprises attributes of patients diagnosed for heart diseases. The diagnosis was used to confirm whether heart disease is present or absent in the patient. The datasets were obtained from the UCI Machine Learning. This dataset was divided into training, validation set and testing set, to be fed into the network. The intelligent system was modeled on feed forward multilayer perceptron, and support vector machine. The recognition rate obtained from these models were later compared to ascertain the best model for the intelligent system due to its significance in medical field. The results obtained are 85%, 87.5% for feedforward multilayer perceptron, and support vector machine respectively. From this experiment we discovered that support vector machine is the best network for the diagnosis of heart disease.
Article
Full-text available
Domain-Specific Languages are used in software engineering in order to enhance quality, flexibility, and timely delivery of software systems, by taking advantage of specific properties of a particular application domain. This survey covers terminology, risks and benefits, examples, design methodologies, and implementation techniques of domain-specific languages as used for the construction and maintenance of software systems. Moreover, it covers an annotated selection of 75 key publications in the area of domain-specific languages. 1998 ACM Computing Classification System: D.3 Keywords and Phrases: Example DSLs, DSL design, DSL implementation, survey. Note: Work carried out under project SEN 1.5, Domain-Specific Languages, sponsored by the Telematica Instituut. 1 Introduction In all branches of science and engineering one can distinguish between approaches that are generic and those that are specific. A generic approach provides a general solution for many problems in a certain ar...
Article
Full-text available
Internal Domain-Specific Languages (DSL) provide an elegant mechanism to reduce the code complexity of complex systems simulations programming. Many complex systems problems manifest themselves as networks. Graph analysis techniques can be employed to count the number of: separate components; circuits; paths; or distances in the network. Several other properties such as: mean and peak degree; component or community size; adjacency eigen-spectra, and so forth can also be calculated. Calculation of these properties gives a signature that can help classify a network as belonging to a particular category with known behaviours. In practice however, importing applications data into graph analysis software and managing these calculations can be complex. Domain-specific language techniques allow a high-level graph calculations language to be developed that invokes software components in a graph manipulations framework. We describe a prototype graph generation and analysis domain-specific language built using fluent interface techniques and the Java programming language. We report on: attainable code complexity reduction, framework computational performance, and software engineering directions for internals DSLs for this sort of applications problem.
Article
Full-text available
Modern high-level programming languages are making it practical to develop internal domain-specific languages (DSL) for even computationally intensive applications such as complex systems simulations. We present a prototype DSL to implement lattice-based operations for a whole family of simulation models on different lattices and with different neighbourhood localities. We use closure techniques to switch between lattice geometries and neighbourhoods at run-time. We show how a framework can be implemented in both C++ and Java and that a fluent interface DSL command-line layer and a graphical interface layer can both use this same framework. We find that we can drastically speed up the development and testing of a new simulation model that fits our DSL pattern and demonstrate this with significantly reduced lines of code needed for an incrementally added new model. We present results based on two-dimensional lattice models with discrete cell states for topologically square, hexagonal and triangular geometries and describe how these ideas generalise to higher dimensional problems. We also discuss how the use of internal domain specific languages and tools can facilitate further rapid development of other simulation codes and models.
Article
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Article
Full-text available
Using domain-specific languages, scientific codes can let users work directly with equations and benefit from optimizations not available with general compilers.
Article
Full-text available
We show how compiler technology can generate fast and efficient yet human-readable data-parallel simulation code for solving certain partial differential equation (PDE) based problems. We present a code parser and generator based on an ANTLR grammar and tree walking approach that transforms a mathematical formulation of an equation such as the Cahn-Hilliard family into simulation software in C++ or in NVIDIA's Compute Unified Device Architecture (CUDA) language for programming Graphical Processing Units (GPUS). We present software architectural ideas, generated specimen code and detailed performance data on modern GPUs. We discuss how code generation techniques can be used to speed up code development and computational run time for related complex system simulation problems.
Article
Full-text available
Many domain-specific languages, that try to bring feasible alternatives for existing solutions while simplifying programming work, have come up in recent years. Although, these little languages seem to be easy to use, there is an open issue whether they bring advantages in comparison to the application libraries, which are the most commonly used implementation approach. In this work, we present an experiment, which was carried out to compare such a domain-specific language with a comparable application library. The experiment was conducted with 36 programmers, who have answered a questionnaire on both implementation approaches. The questionnaire is more than 100 pages long. For a domain-specific language and the application library, the same problem domain has been used – construction of graphical user interfaces. In terms of a domain-specific language, XAML has been used and C# Forms for the application library. A cognitive dimension framework has been used for a comparison between XAML and C# Forms.
Article
Full-text available
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.
Conference Paper
Full-text available
Previous work has shown that the dicul- ties in learning deep generative or discrim- inative models can be overcome by an ini- tial unsupervised learning step that maps in- puts to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a rep- resentation based on the idea of making the learned representations robust to partial cor- ruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to ini- tialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising ad- vantage of corrupting the input of autoen- coders on a pattern classification benchmark suite.
Conference Paper
Full-text available
This paper describes the experience of evolving a domain-specific language embedded in Java over several generations of a test framework. We describe how the framework changed from a library of classes to an embedded language. We describe the lessons we have learned from this experience for framework developers and language designers. Categories and Subject Descriptors
Conference Paper
Full-text available
Many interesting simulation problems in compu-tational physics and engineering are posed on a regular data structure, such as a lattice, in two or three dimensions. There is increasing interest however in studying systems on less regular graphs that are embedded in Euclidean spaces of dimen-sions higher than three. We report on our experi-ences in attempting to formulate a highly general object-oriented framework in Java for simulating complex model systems on a generalised mesh in ar-bitrary dimensions and connectivity geometry. We discuss the performance and scalability issues that arose for managing large-scale complex model sim-ulations and exploring their phase-spaces.
Conference Paper
Full-text available
A wide range of domain-specific languages (DSLs) has been implemented successfully by embedding them in general purpose lan- guages. This paper reviews embedding, and summarizes how two alter- native techniques—staged interpreters and templates—can be used to overcome the limitations of embedding. Both techniques involve a form of generative programming. The paper reviews and compares three pro- gramming languages that have special support for generative program- ming. Two of these languages (MetaOCaml and Template Haskell) are research languages, while the third (C++) is already in wide industrial use. The paper identifies several dimensions that can serve as a basis for comparing generative languages.
Conference Paper
Full-text available
I consider the problem of the domain-specic optimization of programs. I review dieren t approaches, discuss their potential, and sketch instances of them from the practice of high-performance paral- lelism. Readers need not be familiar with high-performance computing.
Conference Paper
Full-text available
The process of E-type software development and evolution has proven most difficult to improve, possibly due to the fact that the process is a multi-input, multi-output system involving feedback at many levels. This observation, first recorded in the early 1970s during an extended study of OS/360 evolution, was recently captured in a FEAST (Feedback, Evolution And Software Technology) hypothesis: a hypothesis being studied in on-going two-year project, FEAST/1. Preliminary conclusions based on a study of a financial transaction system-Logica's Fastwire (FW)-are outlined and compared with those reached during the earlier OS/360 study. The new analysis supports, or better does not contradict, the laws of software evolution, suggesting that the 1970s approach to metric analysis of software evolution is still relevant today. It is hoped that FEAST/1 will provide a foundation for mastering the feedback aspects of the software evolution process, opening up new paths for process modelling and improvement
So‐called little, or domain‐specific languages (DSLs), have the potential to make software maintenance simpler: domain experts can directly use the DSL to make required routine modifications. On the negative side, however, more substantial changes may become more difficult: such changes may involve altering the domain‐specific language. This will require compiler technology knowledge, which not every commercial enterprise has easily available. Based on experience taken from industrial practice, we discuss the role of DSLs in software maintenance, the dangers introduced by using them, and techniques for controlling the risks involved. © 1998 John Wiley & Sons, Ltd.
Conference Paper
Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be in- terfaced to third-party software thanks to Lua's light interface.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.
Article
The papers in this special section focus on the technology and applications supported by deep learning. Deep learning is a growing trend in general data analysis and has been termed one of the 10 breakthrough technologies of 2013. Deep learning is an improvement of artificial neural networks, consisting of more layers that permit higher levels of abstraction and improved predictions from data. To date, it is emerging as the leading machine-learning tool in the general imaging and computer vision domains. In particular, convolutional neural networks (CNNs) have proven to be powerful tools for a broad range of computer vision tasks. Deep CNNs automatically learn mid-level and high-level abstractions obtained from raw data (e.g., images). Recent results indicate that the generic descriptors extracted from CNNs are extremely effective in object recognition and localization in natural images. Medical image analysis groups across the world are quickly entering the field and applying CNNs and other deep learning methodologies to a wide variety of applications.
Article
Deep learning methods for image classification and object detection are overviewed. In particular we consider such deep models as autoencoders, restricted Boltzmann machines and convolutional neural networks. Existing software packages for deep learning problems are compared.
Article
Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be in-terfaced to third-party software thanks to Lua's light interface.
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Conference Paper
Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection ("what to say") and surface realization ("how to say") in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We represent our grammar compactly as a weighted hypergraph and recast generation as the task of finding the best derivation tree for a given input. Experimental evaluation on several domains achieves competitive results with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.
Article
This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.
Article
The development of software for wireless sensor networks is involved and complex. This does not only impose much work on pro- grammers but also prevents domain experts from directly contributing parts of the software. Domain-specic languages may help with these problems|if they are inexpensive to dene, have a syntax that domain experts understand, and creating simulations for them is easy. We pro- pose a language engineering approach that meets these requirements and thus allows rapid prototyping of domain-specic languages. We present our implementation based on Scheme and Eclipse EMF and present rst experiences from the prototyping of a stream-oriented language for the description of earthquake detection algorithms.
Chapter
This chapter describes the design of Hume: a domain-specific language targeting real-time embedded systems. Hume provides a number of high level features including higher-order functions, polymorphic types, arbitrary but sized user-defined data structures, asynchronous processes, lightweight exception handling, automatic memory management and domain-specific meta-programming features, whilst seeking to guarantee strong space/time behaviour and maintaining overall determinacy.
Article
To meet computing needs and overcome power density limitations, the computing industry has entered the era of parallelization. However, highly parallel, general-purpose computing systems face serious challenges in terms of performance, energy, heat dissipation, space, and cost. We believe that there is significant opportunity to look beyond parallelization and focus on domain-specific customization to bring significant power-performance efficiency improvement.
Article
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap—the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages, high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that are linearly proportional to the size of the program presented by the user, rather than to the aggregate size of that program plus the base libraries.
Article
The realisation of domain-specific languages (dsls) differs in fundamental ways from that of traditional programming languages. We describe eight recurring patterns that we have identified as being used for dsl design and implementation. Existing languages can be extended, restricted, partially used, or become hosts for dsls. Simple dsls can be implemented by lexical processing. In addition, dsls can be used to create front-ends to existing systems or to express complicated data structures. Finally, dsls can be combined using process pipelines. The patterns described form a pattern language that can be used as a building block for a systematic view of the software development process involving dsls.
Conference Paper
Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or “gated”) connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for characterlevel language modeling – a hierarchical nonparametric sequence model. To our knowledge this represents the largest recurrent neural network application to date. 1.
Conference Paper
We present a simple, robust generation system which performs content selection and surface realization in a unified, domain-independent framework. In our approach, we break up the end-to-end generation process into a se- quence of local decisions, arranged hierar- chically and each trained discriminatively. We deployed our system in three different domains—Robocup sportscasting, technical weather forecasts, and common weather fore- casts, obtaining results comparable to state-of- the-art domain-specific systems both in terms of BLEU scores and human evaluation.