Book

Engineering a Compiler

Authors:

Abstract

This work is a textbook for an undergraduate course in compiler construction.
... Static single assignment (SSA) intermediate representations (IRs) are widely-used across modern research and industrial compilers (e.g., LLVM [52], GCC [3], Cranelift [1], xDSL [14]), thanks to the broadly-accepted benefits that explicit data flow information offers [22,28]. SSA simplifies and improves analyses by forgoing the need to prove facts in every code location [22]. ...
... Operations organized in blocks correspond to straight-line code (i.e., basic blocks [28]). Blocks can represent bodies of functions or for loops, and may have values as arguments. ...
... Register allocation in generic compiler backends is typically performed on code at a low level of abstraction, where analysis of basic block graphs is needed to reconstruct liveness and control flow [28,77]. The handling of unstructured control flow is out of scope when the input to the compiler is guaranteed to be in a linear algebra DSL. ...
Preprint
Full-text available
High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical micro-kernels are left to specialized code generators or handwritten assembly. Even though widely-adopted compilers (e.g., LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, and general-purpose best-effort design inhibit tailored code generation for innovative hardware. We think it is time to widen the classical hourglass backend and embrace progressive lowering across a diverse set of structured abstractions to bring domain-specific code generation to compiler backends. We demonstrate this concept by implementing a custom backend for a RISC-V-based accelerator with hardware loops and streaming registers, leveraging knowledge about the hardware at levels of abstraction that match its custom ISA. We use incremental register allocation over structured IRs, while dropping classical spilling heuristics, and show up to 90% FPU utilization across key DNN kernels. By breaking the backend hourglass model, we reopen the path from domain-specific abstractions to specialized hardware.
... As avaliações incluem provas teóricas e um projeto prático de construção de um compilador. Neste projeto, os alunos desenvolvem uma linguagem de programação, implementando os analisadores léxicos, sintáticos e semânticos, culminando na criação de um gerador de código intermediário, especificamente em código de três endereços (Three Address Code -TAC) [2], [3]. ...
... Dentre elas há o código de três endereços (Three Address Code -TAC), comumente utilizado em disciplinas de compiladores por sua simplicidade e fácil aplicação didática. Como visto, essa representação serve como uma ponte entre a análise semântica e a geração de código, facilitando a tradução de programas de alto nível para código de máquina ou código de um nível mais baixo [3]. ...
... Mais detalhes sobra a TAC pode ser encontrada na literatura, por exemplo, nos livros de Aho et al. [2], Torczon et al [3] e Sunita [10]. Neste trabalho, utiliza-se uma versão estendida do código de três endereços, ETAC, descrita na Seção V-A. ...
Article
Full-text available
O artigo descreve o desenvolvimento e a avaliação de uma ferramenta didática chamada Celestial Suite, destinada ao ensino de compiladores. A ferramenta realiza a conversão de programas descritos em código de três endereços para assembly MIPS, além de permitir a emulação e sua execução por meio de uma máquina virtual. A avaliação feita pelos alunos indicou que a ferramenta é eficaz como recurso didático, mas sugeriu melhorias, como a precisão das mensagens de erro, a implementação de dicas no editor e melhor documentação.
... Some important compiler optimization problems, such as instruction scheduling and register allocation, are NP-hard problems [1]. An NP-hard problem has no known polynomialtime algorithm that computes the exact solution to every instance of the problem. ...
... Balancing ILP and RP is a challenging problem, because executing more independent instructions in parallel (maximizing ILP) tends to increase the demand for registers. Even optimizing one of these two objectives (ILP or RP) is NP-hard [1]. Current production compilers solve this problem using heuristics (usually greedy heuristics). ...
... In many compilers, including the LLVM compiler used in this work, scheduling is done within a basic block [1]. The input to the instruction scheduler is an instruction sequence with dependencies represented by a data dependence graph (DDG). ...
... Using these Co-FGs we have developed a methodology to statically analyze and compare the traditional system with modified hardware offloaded system using data flow analysis techniques [50], [51]. ...
... ICFG analysis is used to identify potential problems that can arise when multiple functions or procedures interact in real-time systems, such as synchronization issues, deadlocks, race conditions..etc. CFG analysis is an important technique for optimizing the performance and reliability of real-time systems, and it is commonly used in fields such as software engineering, computer science, and electrical engineering[51],[163].4.2.1 Interprocedural Control Flow Graph (ICFG)An Interprocedural Control Flow Graph (ICFG)[16],[17] is a type of control flow graph that is used to represent the control flow between different functions or procedures in a SW program. Unlike a traditional control flow graph, which represents the control flow within a single function or procedure, an ICFG represents the control flow that occurs between different functions or procedures.An ICFG is a comprehensive representation of a program's control flow that considers all functions and procedures called during program execution. ...
Thesis
Full-text available
High-Frequency Trading Systems(HFTS) require optimal performance and reliability to avoid losses and maximize profits. The primary underlying reason for this necessity is the operators’ realization within the realm of HFTS. They have recognized that the current software implementations of these trading systems are not quick enough in handling the workload they have to handle, in order to remain competitive: effectively, they are too slow in responding to computational demands put on the system. One approach to meet requirement is the use of hardware/software(HW/SW) co-design, where performance-critical software is implemented in hardware and integrated with the rest of the software. The TCP/IP stack is identified as a performance bottleneck in HFTS. Normally, all TCP/IP transactions in HFTS are handled using general-purpose processors (GPPs) running a commodity operating system, possibly with few modifications. Transferring the TCP/IP stack into hardware has been proposed as a way to improve performance. As these systems are mission-critical, it is essential to ensure functional correctness preferably, through the use of formal verification, because testing, though helpful, may not be adequate to provide the necessary level of guarantee. Further, since the systems are HW/SW co-designed, the need for HW/SW co-verification arises. Formal models for software and hardware verification have been, developed separately over the years. Also, in the context of HW/SW co-design, HW/SW co-verification tools are often lim ited due to the semantical differences between asynchronous event-driven, sequentially executed software and synchronous clock-driven, concurrent hardware. These seman tic gaps make it challenging to ensure that the hardware and software components of a system are functioning together correctly. To address these challenges, specialized techniques and tools are needed, and in particular, a modular compositional approach to break down the complex verification process into manageable tasks. In such an approach, depending on the performance requirements more and more software components could be moved to hardware progressively. In this thesis we propose a novel construct known as a Co-Flow graph (Co-FG) which is expressive enough to capture the semantics of software, hardware and their interactions. The Co-FG is capable of capturing the concurrency, delayed execution which is present in hardware designs and can capture the communications between hardware and software. Further, this representation allows for a modular analysis of all components, including software, hardware, and software, hardware interactions. Even though there is no common program compiler for the entire complex software system and hardware, the Co-FG can be used for formal analysis of the entire system from a software perspective. Furthermore, this thesis puts forward a plan for transferring computationally expensive components of the Linux TCP implementation to hardware using a modular, progressive approach. We demonstrate how the system can be modeled using the Co-FG and analyze the system using constant propagation– a commonly used data flow analysis method. Keywords: HW/SW Co-design; Co-Flow Graph; Formal Verification; Control Flow Graph; Data Flow Analysis; System Equivalence; Constant propagation
... El operador se usa entre las expresiones expresión 1 , expresión 2 , expresión 3 y no al final de cada expresión expresión 1 ; expresión 2 ; expresión 3 ; . La gramática libre de contexto para el lenguaje descrito es expresada a través de la notación de Backus-Naur [15] en la Tabla 7. ...
... La línea [13] crea fragmentos uniendo fragmento1 y fragmento2. En la línea [15] fragmentos, es usado para genera la salida MIDI. ...
Article
Este artículo presenta un lenguaje de programación que organiza fragmentos de un corpus musical en nuevas y coherentes composiciones de acuerdo con las instrucciones codificadas por el usuario. Para este fin, la sintaxis del lenguaje consiste en una colección de operadores de alto nivel para controlar sistemáticamente los aspectos armónicos y rítmicos de la música compuesta automáticamente. Los operadores principales están basados en un modelo de cadenas de Markov; lo que ocasiona que el proceso de composición sea resultado de varias decisiones aleatorias. Esto permite que para el mismo código, múltiples composiciones diferentes puedan ser producidas. El artículo da una perspectiva sobre los aspectos teóricos del lenguaje y sus sintaxis. Adicionalmente, algunos ejemplos de código y sus composiciones producidas son incluidos y analizados.
... Linearized syntax trees capture more structural information from the program since they include the parsing rules [ 8]. Linearization enables the trees to be treated as token strings. ...
... Linearized parse trees are similar to the linearized syntax trees except that the former considers all parsing rules including intermediate rules for compilation [ 8]. They usually have larger trees than the latter. ...
... Profiling tools provide insights into how much time and resources each function or section of code consumes during execution. This information helps developers pinpoint performance bottlenecks and focus their optimization efforts on the most critical areas (Cooper & Torczon, 2011). There are two main types of profiling: ...
... Loop unrolling aims at improving the efficiency of loops by reducing the loop control overhead and exploiting instruction-level parallelism (Thomas, 2004). In this transformation, the compiler replicates the loop body's contents, effectively increasing the granularity of computation within each iteration (Cooper & Torczon, 2011). Unrolling a loop allows multiple iterations to be grouped together, enabling more efficient use of available resources, such as registers and pipeline stages. ...
Thesis
Full-text available
Compiler-level code optimization is essential for improving speed and memory usage but traditionally relies on manual tuning by experts, which is time-consuming and requires significant domain-specific knowledge. Recent advancements have explored automating this process using algorithms and heuristics, with Reinforcement Learning (RL) showing promise due to its successes in fields like robotics and biology. However, applying RL to compiler optimization faces challenges, including the need for specialized systems, effective state representations, suitable reward functions, and scalable RL agents. We present an RL-based solution for optimizing code within the Multi-Level Intermediate Representation (MLIR) compiler, which enhances the performance of machine learning models and other computationally intensive applications. Our approach introduces a Hierarchical Action Space, representing the optimization search space as the Cartesian product of smaller subspaces, and a Hierarchical Policy Network to explore extensive action spaces efficiently. We developed a comprehensive RL ecosystem for MLIR code optimization and a Python API for experimental research. We explored various configurations, including an Unrolled Action Space and two reward functions: "Immediate Reward" and "Final Reward." Our method significantly outperforms traditional heuristics, improving both individual MLIR operation optimization and full MLIR code optimization.
... As an example, live variable analysis [18] calculates the set of live variables in a given basic block. A variable is live if its value may be read in subsequent blocks. ...
... The ingoing reaching definitions [18] set of B (REACH in [B]) is the set of definitions reaching B. The outgoing reaching definitions set of B (REACH out [B]) is the incoming set minus the definitions killed by B, as well as the definitions generated by B. We can define reaching definitions with the following dataflow equations: ...
Preprint
The lack of sound, concise and comprehensive error reports emitted by a static analysis tool can cause increased fixing cost, bottleneck at the availability of experts and even may undermine the trust in static analysis as a method. This paper presents novel techniques to improve the quality of bug reports for static analysis tools that employ symbolic execution. With the combination of data and control dependency analysis, we can identify the relevance of particular code snippets that were previously missing from the report. We demonstrated the benefits of our approach by implementing an improved bug report generator algorithm for the Clang Static Analyzer. After being tested by the open source community our solution became enabled by default in the tool.
... The scope of each analysis is flow-sensitive and intra-procedural (globally within a single function). 33 analysis is field sensitive (traces the addresses of different member variables of an object) but not array index sensitive (does not distinguish different subscripts of array references). Figure 12 shows the resulting connectivity graphs from using this algorithm to analyze the pushback function at lines 4-7 of Figure 1A. ...
... Both use standard program analysis algorithms available in compiler books. 33 Our main extension is that when considering modifications to indirectly referenced objects, the object connectivity graphs are used to help resolve pointer aliasing issues. In particular, each memory reference is mapped to a node in the connectivity graphs, and if the node is tagged as unknown, it may be aliased with all the other unknown nodes. ...
Article
Full-text available
This paper presents an extended version of our previous work on using compiler technology to automatically convert sequential C++ data abstractions, for example, queues, stacks, maps, and trees, to concurrent lock‐free implementations. A key difference between our work and existing research in software transactional memory (STM) is that our compiler‐based approach automatically selects the best state‐of‐the‐practice nonblocking synchronization method for the underlying sequential implementation of the data structure. The extended material includes a broader collection of the state‐of‐the‐practice lock‐free synchronization techniques, additional formal correctness proofs of the overall integration of the different synchronizations in our system, and a more comprehensive experimental study of the integrated techniques. We evaluate our compiler‐generated nonblocking data structures both by using a collection of micro‐benchmarks, including the Synchrobench suite, and by using a multi‐threaded application Dedup from PARSEC. Our automatically synchronized code attains performance competitive to that of concurrent data structures manually‐written by experts and much better performance than heavier‐weight support by STM.
... Given the inherent naturalness of software (Hindle et al., 2016) and the fact that our information-theoretic definition of token and AST entropy is language agnostic, we believe our results are not impacted by choice of language. Given that the Shannon entropy needs only the relative frequencies of the symbols involved in the communication process, and that the Abstract Syntax Tree (AST) nodes producing during scanning and parsing are assigned types that correspond to the production rules in the grammar of the respective language (Aho et al., 1986;Torczon and Cooper, 2007), we can use the frequencies of these node types to compute the structural entropy of programs written in any level language that is parsed into an AST. We could, for example, use the ANTLR parser generator (Parr, 2013) to build parse trees for source code written in multiple languages and measure their Shannon entropies. ...
Preprint
The code base of software projects evolves essentially through inserting and removing information to and from the source code. We can measure this evolution via the elements of information - tokens, words, nodes - of the respective representation of the code. In this work, we approach the measurement of the information content of the source code of open-source projects from an information-theoretic standpoint. Our focus is on the entropy of two fundamental representations of code: tokens and abstract syntax tree nodes, from which we derive definitions of textual and structural entropy. We proceed with an empirical assessment where we evaluate the evolution patterns of the entropy of 95 actively maintained open source projects. We calculate the statistical relationships between our derived entropy metrics and classic methods of measuring code complexity and learn that entropy may capture different dimensions of complexity than classic metrics. Finally, we conduct entropy-based anomaly detection of unusual changes to demonstrate that our approach may effectively recognise unusual source code change events with over 60% precision, and lay the groundwork for improvements to information-theoretic measurement of source code evolution, thus paving the way for a new approach to statically gauging program complexity throughout its development.
... Other existing Computational linguistic Models include the Recursive-Descent Parsing Technique (RDPT) for input string process execution [7]. This implemented a Cebuano Parse Tree to generate a syntactic structure of a sentence. ...
Conference Paper
The Philippines, having over 170 languages, faces language barriers and diminished language fluency due to heavy English influence and the effects of globalization. The language focuses on Cebuano, the second most spoken language mainly spoken in Central Visayas. Existing Cebuano parsers have been made through Recursive Descent Parsing (RDP) for a Cebuano Parse Tree in relaying grammar through a syntactic structure. However, there were identified limitations through low-level sentences, lacking grammar codes, low metric score, and stemmer evaluation errors. To address this gap, the researchers proposed a Snowball-based stemmer approach along with the Cocke-Kasami-Younger algorithm and additional grammar codes for better syntax tree optimization. Due to the lack of a dataset for Cebuano sentences, the Cebuano Corpus is extensively created by the researchers via web scraping and data augmentation techniques such as language dialect filtration and sentence range variation. The dataset comprises at least 500 collected phrases to be annotated by the linguist experts for the Gold Standard Parse Treebank (GSPT). The GSPT will serve as the parser's benchmark through syntax constituents, known as the PARSEVAL metric system. The improved parser attained an accuracy of 82.14%, suggesting a high rate of correct Cebuano constituents from a Cebuano sentence. The findings suggest that the developed system is more effective than its predecessor. By leveraging advanced Snowball-based algorithms and updated Grammar Codes, the study demonstrates the potential of parsers in preserving cultural language.
... Within the context of C++, a data model was developed to detect dead code (Chen et al., 1998). Historically, the detection and elimination of dead code have been compiler-driven optimization processes (Torczon and Cooper, 2007). Piranha, a recent tool, tackles feature debt and dead code by identifying and removing outdated and unused flags (Ramanathan et al., 2020). ...
Preprint
Full-text available
Context: Previous research on software aging is limited with focus on dynamic runtime indicators like memory and performance, often neglecting evolutionary indicators like source code comments and narrowly examining legacy issues within the TD context. Objective: We introduce the concept of Aging Debt (AD), representing the increased maintenance efforts and costs needed to keep software updated. We study AD through Self-Admitted Aging Debt (SAAD) observed in source code comments left by software developers. Method: We employ a mixed-methods approach, combining qualitative and quantitative analyses to detect and measure AD in software. This includes framing SAAD patterns from the source code comments after analysing the source code context, then utilizing the SAAD patterns to detect SAAD comments. In the process, we develop a taxonomy for SAAD that reflects the temporal aging of software and its associated debt. Then we utilize the taxonomy to quantify the different types of AD prevalent in OSS repositories. Results: Our proposed taxonomy categorizes temporal software aging into Active and Dormant types. Our extensive analysis of over 9,000+ Open Source Software (OSS) repositories reveals that more than 21% repositories exhibit signs of SAAD as observed from our gold standard SAAD dataset. Notably, Dormant AD emerges as the predominant category, highlighting a critical but often overlooked aspect of software maintenance. Conclusion: As software volume grows annually, so do evolutionary aging and maintenance challenges; our proposed taxonomy can aid researchers in detailed software aging studies and help practitioners develop improved and proactive maintenance strategies.
... The notion of local optimality proposed in this paper is related peephole optimization techniques from the classical compilers literature [6,14,29]. Peephole optimizers typically optimize a small number of instructions, e.g., rewriting a sequence of three addition operations into a single multiplication operation. Our notion of local optimality applies to segments of quantum circuits, without making any assumption about segment sizes (in our experiments, our segments typically contained over a thousand gates). ...
Preprint
Full-text available
Recent advances in quantum architectures and computing have motivated the development of new optimizing compilers for quantum programs or circuits. Even though steady progress has been made, existing quantum optimization techniques remain asymptotically and practically inefficient and are unable to offer guarantees on the quality of the optimization. Because many global quantum circuit optimization problems belong to the complexity class QMA (the quantum analog of NP), it is not clear whether quality and efficiency guarantees can both be achieved. In this paper, we present optimization techniques for quantum programs that can offer both efficiency and quality guarantees. Rather than requiring global optimality, our approach relies on a form of local optimality that requires each and every segment of the circuit to be optimal. We show that the local optimality notion can be attained by a cut-and-meld circuit optimization algorithm. The idea behind the algorithm is to cut a circuit into subcircuits, optimize each subcircuit independently by using a specified "oracle" optimizer, and meld the subcircuits by optimizing across the cuts lazily as needed. We specify the algorithm and prove that it ensures local optimality. To prove efficiency, we show that, under some assumptions, the main optimization phase of the algorithm requires a linear number of calls to the oracle optimizer. We implement and evaluate the local-optimality approach to circuit optimization and compare with the state-of-the-art optimizers. The empirical results show that our cut-and-meld algorithm can outperform existing optimizers significantly, by more than an order of magnitude on average, while also slightly improving optimization quality. These results show that local optimality can be a relatively strong optimization criterion and can be attained efficiently.
... However, the order in which these phases are executed significantly impacts the performance of the generated code. There is no universally optimal order for all programs due to the complex interactions between different passes and the specific characteristics of the target hardware and application [2]. Some studies [3], [4], have addressed this critical issue by introducing a novel reinforcement learning model designed to revolutionize compiler optimization. ...
Article
Full-text available
In this study, we reviewed the applications of reinforcement learning (RL) models to optimize compiler phase ordering, a crucial aspect of compiler optimization. The study examines several prominent RL-based models, including Machine Learning Guided Optimization (MLGO), Autophase, DeepTune, NeuroVectorizer, and COBAYN, highlighting their key contributions, methodologies, limitations, and potential improvements. While RL-based approaches have demonstrated significant advancements in optimizing compiler tasks, such as phase ordering and loop vectorization, the review identifies common challenges, including task-specific optimization, dependency on predefined sequences, limited adaptability, and lack of interpretability. The paper also discusses the gaps in the existing literature, emphasizing the need for more generalizable models, dynamic learning capabilities, and enhanced transparency in optimization decisions. Future research should focus on developing scalable, adaptable, and interpretable RL models that can seamlessly integrate into modular compiler frameworks, paving the way for more efficient and adaptive compiler optimization techniques in real-world applications.
... Without them, every piece of software would have to be tailored to a specific hardware platform, significantly slowing the development and adoption of new technologies. As computing systems became more sophisticated, so did the need for more flexible and efficient compilation processes [6][7][8][9]. ...
Article
Full-text available
Intermediate representations (IRs) are fundamental to classical and quantum computing, bridging high-level quantum programming languages and the hardware-specific instructions required for execution. This paper reviews the development of quantum IRs, focusing on their evolution and the need for abstraction layers that facilitate portability and optimization. Monolithic quantum IRs, such as QIR (Lubinski et al. in Front Phys 10:940293, 2022. https://doi.org/10.3389/fphy.2022.940293), QSSA (Peduri et al. in Proceedings of the 31st ACM SIGPLAN international conference on compiler construction. CC 2022. Association for Computing Machinery, New York, 2022), or Q-MLIR (McCaskey and Nguyen in Proceedings-2021 IEEE International Conference on Quantum Computing and Engineering, QCE, 2021), their effectiveness in handling abstractions, and their hybrid support between quantum-classical operations are evaluated. However, a key limitation is their inability to address qubit locality, an essential feature for distributed quantum computing (DQC). To overcome this, InQuIR (Nishio and Wakizaka in InQuIR: Intermediate Representation for Interconnected Quantum Computers, 2023. https://arxiv.org/abs/2302.00267) was introduced as an IR specifically designed for distributed systems, providing explicit control over qubit locality and inter-node communication. While effective in managing qubit distribution, InQuIR’s dependence on manual manipulation of communication protocols increases complexity for developers. NetQIR (Vázquez-Pérez et al. in NetQIR: An Extension of QIR for Distributed Quantum Computing, 2024. https://arxiv.org/abs/2408.03712), an extension of QIR for DQC, emerges as a solution to achieve the abstraction of quantum communications protocols. This review emphasizes the need for further advancements in IRs for distributed quantum systems, which will play a crucial role in the scalability and usability of future quantum networks.
... Basic block: A basic block is a maximal length sequence of branch-free code [25]. Unless an exception happens, the instructions in a BB always execute together. ...
Preprint
Branch predictor (BP) is an essential component in modern processors since high BP accuracy can improve performance and reduce energy by decreasing the number of instructions executed on wrong-path. However, reducing latency and storage overhead of BP while maintaining high accuracy presents significant challenges. In this paper, we present a survey of dynamic branch prediction techniques. We classify the works based on key features to underscore their differences and similarities. We believe this paper will spark further research in this area and will be useful for computer architects, processor designers and researchers.
... ExceLint extracts reference vectors by first gathering data dependencies for every formula in the given spreadsheet. It obtains dependence information by parsing a sheet's formulas and building the program's dataflow graph [Barowy et al. 2014;Cooper and Torczon 2005]. ExceLint can analyze all Excel functions. ...
Preprint
Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences. We present a static analysis specifically designed to find spreadsheet formula errors. Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that are especially surprising disruptions to nearby rectangular regions. We present ExceLint, an implementation of our static analysis for Microsoft Excel. We demonstrate that ExceLint is fast and effective: across a corpus of 70 spreadsheets, ExceLint takes a median of 5 seconds per spreadsheet, and it significantly outperforms the state of the art analysis.
... The program point immediately after the single definition of a temporary t is denoted as start(t), whereas the program point immediately before the last use of t is denoted as end(t). In our setup, the live range of a temporary t is indeed a range (or interval) (start(t), end(t)), and can be enforced straightforwardly in single basic blocks by local value numbering [32]. Live ranges being simple intervals is essential for modeling register assignment as Section 4.2 shows. ...
Preprint
This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain. An extensive evaluation confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1.1% to 10% and mean code size reduction ranges from 1.3% to 3.8% for the different architectures. A significant part of this improvement is due to the integrated nature of the approach. Executing the generated code on Hexagon confirms that the estimated speedup results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 946 instructions, nearly an order of magnitude larger than previous approaches. The results show that our combinatorial approach can be applied in practice to trade compilation time for code quality beyond the usual compiler optimization levels, identify improvement opportunities in heuristic algorithms, and fully exploit processor-specific features.
... A term describing the conversion of code from one language to another [Keith D.Cooper 2011].Proceedings of the 39 th Brazilian Symposium on Data BasesOctober 2024 -Florianópolis, SC, Brazil ...
Conference Paper
Full-text available
This article proposes an explicit two-tier approach for reasoning and querying on Knowledge Bases. It separates the conceptual fraction from the model elements, enabling clear connections between the components. This method may be more effective than known one-tier solutions. Besides a theoretical view, we explore a case study on actual data about legal knowledge to illustrate how well our strategy may work in practice.
... Connecting the front and the back end, a compiler has an intermediate representation (IR) form to represent the semantics of source code in a more abstract and optimized form. As the output of the compiler's front end, an IR of the source code must be generated for use in the compiler's back end to generate the target code [11]. ...
... This figure demonstrates a serial execution of the taken and not-taken paths, and posterior reconvergence at line 9. The line 9 is a point called Immediate Post Dominator (IP-Dom) [6], which is the nearest point in the program where two diverging paths are guaranteed to converge again. Although the execution model does not specify which divergent path [1] has priority, in this example the taken path is assumed to have higher priority than the not-taken one. ...
Preprint
Full-text available
In GPUs, the control flow management mechanism determines which threads in a warp are active at any point in time. This mechanism monitors the control flow of scalar threads within a warp to optimize thread scheduling and plays a critical role in the utilization of execution resources. The control flow management mechanism can be controlled or assisted by software through instructions. However, GPU vendors do not disclose details about their compiler, ISA, or hardware implementations. This lack of transparency makes it challenging for researchers to understand how the control flow management mechanism functions, is implemented, or is assisted by software, which is crucial when it significantly affects their research. It is also problematic for performance modeling of GPUs, as one can only rely on traces from real hardware for control flow and cannot model or modify the functionality of the mechanism altering it. This paper addresses this issue by defining a plausible semantic for control flow instructions in the Turing native ISA based on insights gleaned from experimental data using various benchmarks. Based on these definitions, we propose a low-cost mechanism for efficient control flow management named Hanoi. Hanoi ensures correctness and generates a control flow that is very close to real hardware. Our evaluation shows that the discrepancy between the control flow trace of real hardware and our mechanism is only 1.03% on average. Furthermore, when comparing the Instructions Per Cycle (IPC) of GPUs employing Hanoi with the native control flow management of actual hardware, the average difference is just 0.19%.
... Examples of some functions are shown in Table 2. The functions return one value each to be written into the variable that is the first operand (as said, an accumulator does not exist in this VM) and there may be up to 15 other operands.Note that such an order is different than in the Static Single Assignment of dataflow graphs used in typical compilers [27]. Contrary to functions, system procedures do not return values directly. ...
Article
Full-text available
The paper outlines a methodology for validating the accuracy of a control system’s runtime implementation. The runtime takes the form of a virtual machine executing portable code compliant with IEC 61131-3 standards. A formal model, comprising denotational semantics equations, has been devised to specify machine instruction decoding and operations, including arithmetic functions across various data types, arrays, and subprogram calls. The model also encompasses exception-handling mechanisms for runtime errors, such as division by zero and invalid array index access. This denotational model is translated into executable form using the functional F ♯ language. Verification involves comparing the actual implementation of the virtual machine against this executable model. Any disparities between the model and implementation indicate deviations from the specification. Implemented within the CPDev engineering environment, this approach ensures consistent and predictable control program execution across different target platforms.
... Before an artificial neural network (ANN) can be effectively used for any practical application, it must undergo a training process using a dataset known as the training set [7][8][9][10][11][12][13][14][15][16][17][18]. This training set should ideally contain a diverse range of training cases that fully represent the problem being addressed [19]. ...
Article
Full-text available
Artificial neural networks are inspired by biologic processes. Artificial neural networks are important because they can be used to deduct a function from observations, in other words artificial neural networks can learn from experience. Artificial neural network simulator to fulfill a need into the growing interest of neural network education is introduced in this study. NeuroQuick Laboratory simulator is implemented using object-oriented programming by Delphi programming and these classes can be used to create a standalone application with artificial neural networks. The NeuroQuick Laboratory Simulator is designed for a broad range of users, including beginning graduate/advanced undergraduate students, engineers, and scientists. It is particularly well-suited for use in individual student projects or as a simulation tool in one- or two-semester neural network-related courses at universities.
... In memory-oriented scientific approaches, there are studies on reducing program code size and saving memory by using compiler optimization flags [18][19][20][21]. In addition, there are some studies in the literature on saving memory through methods, such as reducing code size by memory optimization, eliminating sub-expressions, and cleaning up dead code [22,23]. ...
Conference Paper
Full-text available
The effort to optimize all available resources has always been an important goal of mankind. This is also true for compiler technology. The purpose of a compiler is to produce code that leads to improved processor performance. Processor architectures play an essential role in this production and influence the runtime of the application or the required memory space. In this study, the sizes of assembly codes generated by the latest version of the GCC g++ compiler for six different processor architectures, namely ARM64, MIPS64, POWER64, s390x, x86_64, and RISC-V rv64gc, were investigated for the first time in the literature. In this context, a tool called Compiler Explorer was used. In the first phase of the study, a comparison dataset containing 24 different benchmarks was created in the C++ language, with known control and data structures, and functions. In the second phase, the Compiler Explorer tool was utilized to create six different assembly codes for each benchmark, using the appropriate compiler and processor architectures. In the final phase, the assembly code sizes were determined using the same tool. Experimental analysis showed that the x86_64 and RISC-V rv64gc processor architectures generated the most memory-efficient assembly codes, while the MIPS64 architecture produced the least efficient code in terms of the code size for all benchmarks among all architectures.
... This optimization technique can significantly improve the performance of the resulting code. For example, choosing a machine code instruction that performs multiple operations in a single instruction can reduce the number of instructions executed and improve performance [10]. ...
Article
Full-text available
The research paper represents a novel approach to the design and optimization of a compiler for a domain-specific language (DSL) focused on geometric shape creation and manipulation. The primary objective is to develop a compiler capable of generating efficient machine code while offering users a high level of abstraction. The paper begins with an overview of DSLs and compilers, emphasizing their importance in software development. Next, it outlines the specific requirements of the geometric shape DSL and proposes a compiler design that addresses them. This innovative approach considers DSL's unique features, such as shape creation and manipulation, and aims to generate high-quality machine code. The paper also discusses optimization techniques to enhance the generated code's quality and performance, including loop unrolling and instruction scheduling. These optimizations are particularly suited to the DSL, which focuses on geometric shape creation and manipulation and are integral to achieving efficient machine code generation. In conclusion, the paper emphasizes the novelty of this approach to DSL compiler design and anticipates exciting results from testing the compiler developed for the geometric shape DSL.
... The functions return one value each to be written into the variable being the first operand (as said, an accumulator does not exist in this VM) and may have up to 15 other operands. Note that such order is different than in Static Single Assignment of dataflow graphs used in typical compilers [11]. Arithmetic operations are executed in limited ranges, depending on the type. ...
Article
Computer education is often as daunting for the professor as it is for the student, as the subject matter can be very demanding. This walk-through shows what it takes to develop and organize an advanced computer course from scratch.
Preprint
Quantum computers do not run in isolation; rather, they are embedded in quantum-classical hybrid architectures. In these setups, a quantum processing unit communicates with a classical device in near-real time. To enable efficient hybrid computations, it is mandatory to optimize quantum-classical hybrid code. To the best of our knowledge, no previous work on the optimization of hybrid code nor on metrics for which to optimize such code exists. In this work, we take a step towards optimization of hybrid programs by introducing seven optimization routines and three metrics to evaluate the effectiveness of the optimization. We implement these routines for the hybrid quantum language Quil and show that our optimizations improve programs according to our metrics. This lays the foundation for new kinds of hybrid optimizers that enable real-time collaboration between quantum and classical devices.
Chapter
This chapter describes how to generate code for the MicroJava Virtual Machine. The MicroJava VM is a simplified version of the Java VM. It is a stack machine with its own bytecode instruction set, which is executed by an interpreter. After discussing the instructions and the memory areas of the MicroJava VM we look at how to generate code for loading data, computing expressions, transforming control flow statements into jumps, and finally for calling methods including parameter passing. During code generation, we also need to check the semantic correctness of MicroJava programs, which is expressed by context conditions. All translation steps are described concisely by attributed grammars.
Technical Report
Full-text available
Compiler error signals play a crucial role in the software development process, as they guide developers in identifying and resolving issues within their code. However, traditional compiler error messages are often vague and lacking in actionable information, making it challenging for developers, especially those who are inexperienced, to promptly identify and fix errors. Additionally, inconsistent function naming conventions, such as the use of camelCase and snake_case within the same codebase, can impair code readability and maintainability. This research paper presents the implementation of an intelligent error message system and an automated function renaming refactoring tool within the context of a basic functional programming language. The enhanced error messages provide clear descriptions of the errors and suggestions for potential solutions, while the refactoring tool systematically detects and renames functions from camelCase to snake_case, improving code consistency and readability. The project aims to enhance developer productivity, reduce debugging time, and promote better coding practices.
Article
Full-text available
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Compiler is a tool that converts the high-level language into assembly code after enabling relevant optimizations. The automatic selection of suitable optimizations from an ample optimization space is a non-trivial task mainly accomplished through hardware profiling and application-level features. These features are then passed through an intelligent algorithm to predict the desired optimizations. However, collecting these features requires executing the application beforehand, which involves high overheads. With the evolution of Natural Language Processing (NLP), the performance of an application can be solely predicted at compile time via source code analysis. There has been substantial work in source code analysis using NLP, but most of it is focused on offloading the computation to suitable devices or detecting code vulnerabilities. Therefore, it has yet to be used to identify the best optimization sequence for an application. Similarly, most works have focused on finding the best machine learning or deep learning algorithms, hence ignoring the other important phases of the NLP pipeline. This paper pioneers the use of NLP to predict the best set of optimizations for a given application at compile time. Furthermore, this paper uniquely studies the impact of four vectorization and seven regression techniques in predicting the application performance. For most applications, we show that tfidf vectorization and huber regression result in the best outcomes. On average, the proposed technique predicts the optimal optimization sequence with a performance drop of 18%, achieving a minimum drop of merely 0.5% compared to the actual best combination.
Book
Full-text available
Preprint
Full-text available
The Halting Problem, introduced by Alan Turing in 1936, is a foundational concept in theoretical computer science. It posits that no general algorithm can determine whether any given program will halt or run indefinitely. While this has provided significant insights into the limits of computation, this paper explores the practical impacts of removing the Halting Problem from history. We assess how its absence would affect current software development practices, operating systems, software reliability and testing tools, compilers, and security analysis. Additionally, we consider alternative paths and advancements that might have emerged, such as more adaptive algorithms, enhanced debugging methodologies, and new theoretical frameworks. By focusing on practical applications and tools used in computing today, we aim to understand whether expunging the Halting Problem could have led to further progress in these areas. Our analysis highlights the resilience of practical computing practices and the potential for further innovation, emphasizing the importance of balancing theoretical exploration with empirical methods. Keywords: Halting Problem, Alan Turing, theoretical computer science, software development, debugging tools, operating systems, process management, software reliability, testing tools, compilers, code optimization, security analysis, adaptive algorithms, empirical methods, theoretical frameworks, innovation, practical computing, heuristic methods, real-world applications.
Chapter
High-level synthesis (HLS) is the process of compiling a software program into a digital circuit. This chapter provides a view into the HLS design flow and presents algorithms, tools, and methods to generate digital circuits from software descriptions. It details FPGA-oriented HLS techniques, discusses recent HLS advancements, and outlines the current challenges of HLS for FPGAs.
Conference Paper
Low-code / no-code platforms (LCNCP) are on the rise, a trend that represents a step towards the decade-long goal to automate coding. The quality of an LCNCP can be measured by external attributes such as functionality, reliability, usability, flexibility, efficiency, and maintainability. When businesses decide on using an LCNCP, they place significant importance on efficiency, which can be evaluated based on factors like performance and scalability. This paper describes the performance and scalability aspects in the space of business decisions specified using the Decision Model And Notation (DMN) language, an industry standard for the specification of business decisions, and the execution engine jDMN. Furthermore, it outlines a range of efficient optimization techniques within the DMN domain.
ResearchGate has not been able to resolve any references for this publication.