Figure 1 - uploaded by Anita Sabo
Content may be subject to copyright.

Source publication
Conference Paper
Full-text available
The task of programming concurrent systems is substantially more difficult than the task of programming sequential systems with respect to both correctness and efficiency. Nowadays multi core processors are common. The tendency in development of embedded hardware and processors are shifting to multi core and multiprocessor setups as well. This mean...

Contexts in source publication

Context 1
... Shared memory communication Concurrent components communicate by altering the contents of shared memory location. This style of concurrent programming usually requires the application of some form of locking (e.g., mutexes (meaning(s) mutual exclusion), semaphores, or monitors) to coordinate between threads. Shared memory communication can be achieved with the use of Software Transactional Memory (STM) [1][2][3]. Software Transactional Memory (STM) is an abstraction for concurrent communication mechanism analogous to database transactions for controlling access to shared memory. The main benefits of STM are composability and modularity. That is, by using STM one can write concurrent abstractions that can be easily composed with any other abstraction built using STM, without exposing the details of how the abstraction ensures safety. B. Message Passing Communication Concurrent components communicate by exchanging messages. The exchange of messages may be carried out asynchronously (sometimes referred to as "send and pray"), or one may use a rendezvous style in which the sender blocks until the message is received. Message- passing concurrency tends to be far easier to reason about than shared-memory concurrency, and is typically considered a more robust, although slower, form of concurrent programming. The most basic feature of concurrent programming is illustrated in Figure 1. The numbered nodes present instructions that need to be performed and as seen in the figure certain nodes must be executed simultaneously. Since most of the time intermediate results from the node operations are part of the same calculus this presents great challenge for practical systems. A wide variety of mathematical theories for understanding and analyzing message-passing systems are available, including the Actor model [4]. In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Figure 1 demonstrates the most basic but essential problem in the concurrent programming. Each number represents one process or one operation to be performed. The main goal is not to find the resource for parallel computing but to find the way to pass intermediate results between the numbered nodes. Increased application II. COMMUNICATION throughput - the number of tasks done In case in certain of distributed time period systems will the performance increase. High of responsiveness parallelization largely for input/output depends on the input/output performance intensive of the applications communication mostly between wait for the input peers or of output the system. operations Two to complete. peers communicate Concurrent by programming sending data allows to the each time other, that would therefore be spent the performance waiting to be of used the for peers another depends task. on It can the be processing stated that of there the data are more sent appropriate and received. program The structures communication - some data problems contains and the problem application domains data are as well- well suited as the to transfer representation layer data. as concurrent It is important tasks for or processes. the transfer layer to operate with small overhead and provide fast processing. Embedded systems have specific requirements. It is important that the communication meets these requirements. The design of the presented method is focused around the possibility to support and execute high level optimizations and abstractions on the whole program. The graph-based software layout of the method provides the II. COMMUNICATION In case of distributed systems the performance of parallelization largely depends on the performance of the communication between the peers of the system. Two peers communicate by sending data to each other, therefore the performance of the peers depends on the processing of the data sent and received. The communication data contains the application data as well as the transfer layer data. It is important for the transfer layer to operate with small overhead and provide fast processing. Embedded systems have specific requirements. It is important that the communication meets these requirements. The design of the presented method is focused around the possibility to support and execute high level optimizations and abstractions on the whole program. The graph-based software layout of the method provides the possibility III. REALIZATION to execute graph IN EMBEDDED algorithms on SYSTEMS the software architecture The architecture itself. The of modern graph embedded algorithms systems operate is on based the on software’s multi-core logical or graph multi-processor not the execution setups. This graph. makes This concurrent provides the computing possibility an important for higher problem level optimizations in the case of these (super systems, optimization). as well. The The architecture existing is algorithms designed to and be solutions easily modelable for concurrency with a domain were not specific designed language. for embedded This systems domain specific with resource language constraints. eases the In the development case of real-time of the embedded software, but systems its primary it is purpose necessary is to provide to meet information time and resource for higher constraints. level optimizations. It is important It can to be create viewed algorithms as the which logical prioritize description, these documentation requirements. of Also, the software. it is vital to Based take human on the description factor into language consideration it is possible and to simplify generate the the development low level execution of concurrent of the software, applications this means as much that it as is possible not necessary and help to the work transition at a from low the level sequential during world the to development the parallel of world. the It software. is also important The development to have the is possibility concentrated to around trace the and logic verify of the the application. created concurrent It focuses applications. on what is to The be achieved traditional instead methods of the used small ...
Context 2
... Shared memory communication Concurrent components communicate by altering the contents of shared memory location. This style of concurrent programming usually requires the application of some form of locking (e.g., mutexes (meaning(s) mutual exclusion), semaphores, or monitors) to coordinate between threads. Shared memory communication can be achieved with the use of Software Transactional Memory (STM) [1][2][3]. Software Transactional Memory (STM) is an abstraction for concurrent communication mechanism analogous to database transactions for controlling access to shared memory. The main benefits of STM are composability and modularity. That is, by using STM one can write concurrent abstractions that can be easily composed with any other abstraction built using STM, without exposing the details of how the abstraction ensures safety. B. Message Passing Communication Concurrent components communicate by exchanging messages. The exchange of messages may be carried out asynchronously (sometimes referred to as "send and pray"), or one may use a rendezvous style in which the sender blocks until the message is received. Message- passing concurrency tends to be far easier to reason about than shared-memory concurrency, and is typically considered a more robust, although slower, form of concurrent programming. The most basic feature of concurrent programming is illustrated in Figure 1. The numbered nodes present instructions that need to be performed and as seen in the figure certain nodes must be executed simultaneously. Since most of the time intermediate results from the node operations are part of the same calculus this presents great challenge for practical systems. A wide variety of mathematical theories for understanding and analyzing message-passing systems are available, including the Actor model [4]. In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Figure 1 demonstrates the most basic but essential problem in the concurrent programming. Each number represents one process or one operation to be performed. The main goal is not to find the resource for parallel computing but to find the way to pass intermediate results between the numbered nodes. Increased application II. COMMUNICATION throughput - the number of tasks done In case in certain of distributed time period systems will the performance increase. High of responsiveness parallelization largely for input/output depends on the input/output performance intensive of the applications communication mostly between wait for the input peers or of output the system. operations Two to complete. peers communicate Concurrent by programming sending data allows to the each time other, that would therefore be spent the performance waiting to be of used the for peers another depends task. on It can the be processing stated that of there the data are more sent appropriate and received. program The structures communication - some data problems contains and the problem application domains data are as well- well suited as the to transfer representation layer data. as concurrent It is important tasks for or processes. the transfer layer to operate with small overhead and provide fast processing. Embedded systems have specific requirements. It is important that the communication meets these requirements. The design of the presented method is focused around the possibility to support and execute high level optimizations and abstractions on the whole program. The graph-based software layout of the method provides the II. COMMUNICATION In case of distributed systems the performance of parallelization largely depends on the performance of the communication between the peers of the system. Two peers communicate by sending data to each other, therefore the performance of the peers depends on the processing of the data sent and received. The communication data contains the application data as well as the transfer layer data. It is important for the transfer layer to operate with small overhead and provide fast processing. Embedded systems have specific requirements. It is important that the communication meets these requirements. The design of the presented method is focused around the possibility to support and execute high level optimizations and abstractions on the whole program. The graph-based software layout of the method provides the possibility III. REALIZATION to execute graph IN EMBEDDED algorithms on SYSTEMS the software architecture The architecture itself. The of modern graph embedded algorithms systems operate is on based the on software’s multi-core logical or graph multi-processor not the execution setups. This graph. makes This concurrent provides the computing possibility an important for higher problem level optimizations in the case of these (super systems, optimization). as well. The The architecture existing is algorithms designed to and be solutions easily modelable for concurrency with a domain were not specific designed language. for embedded This systems domain specific with resource language constraints. eases the In the development case of real-time of the embedded software, but systems its primary it is purpose necessary is to provide to meet information time and resource for higher constraints. level optimizations. It is important It can to be create viewed algorithms as the which logical prioritize description, these documentation requirements. of Also, the software. it is vital to Based take human on the description factor into language consideration it is possible and to simplify generate the the development low level execution of concurrent of the software, applications this means as much that it as is possible not necessary and help to the work transition at a from low the level sequential during world the to development the parallel of world. the It software. is also important The development to have the is possibility concentrated to around trace the and logic verify of the the application. created concurrent It focuses applications. on what is to The be achieved traditional instead methods of the used small for steps parallel that programming need to be taken are in order not suitable to get there. for embedded systems because of the possibility of dead-locks. Dead-locks pose a serious problem for embedded systems [5], because they can cause huge losses. The methods presented in [6] (Actor model and STM), which do not have dead-locks, have increased memory and processing requirements, this also means that achieving real-time execution becomes harder due to the use of garbage collection. Using these methods and taking into account the requirements of embedded systems one can create a method which is easier to use than low-level threading and the resource requirements are negligible. In the development of concurrent software the primary affecting factor is not the method used for ...

Similar publications

Article
Full-text available
Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA's Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above objectives. In this paper, we provide a survey of works that ev...
Article
Full-text available
The present work proposes to evaluate, compare, and determine software alternatives that present good detection performance and low computational cost for the plant segmentation operation in computer vision systems. In practical aspects, it aims to enable low-cost and accessible hardware to be used efficiently in real-time embedded systems for dete...
Preprint
Full-text available
Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power...
Article
Full-text available
A pendular platform is a robotic structure commonly used in the design of controllers given its nonlinear dynamics; This workpresents the modeling, design and implementation of an optimal LQR controller and a Sliding Mode SMC controller applied totwo commercial platforms, the Quanser rotary inverted pendulum (RotPen) and the Lego mobile inverted pe...