Article

Parallel and Distributed Programming Using C++

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

From the Publisher:Parallel and Distributed Programming Using C++ provides an up-close look at how to build software that can take advantage of multiprocessor computers. Simple approaches for programming parallel virtual machines are presented, and the basics of cluster application development are explained. Through an easy-to-understand overview of multithreaded programming, this book also shows you how to write software components that work together over a network to solve problems and do work.Parallel and Distributed Programming Using C++ provides an architectural approach to parallel programming for computer programmers, software developers, designers, researchers, and software architects. It will also be useful for computer science students.Demonstrates how agents and blackboards can be used to make parallel programming easierShows object-oriented approaches to multitasking and multithreadingDemonstrates how the UML is used to document designs that require parallel or distributed programmingContains the new POSIX/UNIX IEEE Standard for the Pthreads library

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Concurrency [2,61,67,68,69] As set of tasks or threads are said to be concurrent iff • They execute and make progress at the same time or within the same time interval. By saying same time or same time interval we mean that they make progress together but they may or may not be executing at the same time instant. ...
... A Mutex is a programming abstraction for implementing the concept of mutual exclusion. [66,69] It is the process of preventing concurrently executing threads or tasks to simultaneous access a shared resource or critical section. In other words, atomic statements from the critical sections of two or more processes must not be interleaved. ...
... Parallelism [2,61,67,68,69] Two or more tasks or threads are said to be executing in parallel if they execute simultaneously. Simultaneous execution means making progress together at every instant of time. ...
Article
Full-text available
One area of Computing applications which poses significant challenge of performance scalability on Chip Multiprocessors(CMP's) are Irregular applications. Such applications have very little computation and unpredictable memory access patterns making them memory-bound in contrast to compute-bound applications. Since the gap between processor and memory performance continues to exist, difficulty to hide and decrease this gap is one of the important factors which results in poor performance of these applications on CMP's. The goal of this thesis is to overcome many such challenges posed during performance acceleration of an irregular graph algorithm called Triad Census. We accelerated the Triad Census algorithm on two significantly different Chip Multiprocessors: Dual-socket Intel Xeon Multicore (8 hardware threads per socket) and 240-processor core NVIDIA Tesla C1060 GPGPU(128 hardware threads per core). The experimental results obtained on Intel Multicore Xeon system shows performance speedups (w.r.t baseline sequential) of maximum 56x , average 33x and minimum 8.3x for real world graph data sets. On NVIDIA Tesla C1060 GPGPU, we were able to match almost equally the Multicore results - 58.4x maximum, 32.8x average and 4.2x minimum speedups w.r.t baseline sequential. In terms of raw performance, for the graph data set called Patents network, our results on Intel Xeon Multicore(16 hw threads) were 1.27x times faster than previous results on Cray XMT(16 hw threads) while results achieved on GPGPU were comparatively slower(0.72x). To the best of our knowledge, this algorithm has only been accelerated on supercomputer class computer named Cray XMT and no work exists that demonstrates performance evaluation and comparison of this algorithm on relatively lower-cost Multicore and GPGPU based platforms.
... Still, in complex software, this can even exhaust a processor capacity, demanding faster processor or even some sort of distributions (e.g. dual-core) [3][4][5][6]. Indeed, an optimization-oriented programming could avoid such drawbacks and related costs [3][4][5][6][7]. ...
... dual-core) [3][4][5][6]. Indeed, an optimization-oriented programming could avoid such drawbacks and related costs [3][4][5][6][7]. ...
... Pascal, C/C++, and Java) present no real facilities to develop optimized and really distributable code, particularly in terms of finegrained decoupling of code [2,3,18,19]. This happens due to the structure and execution nature imposed by their paradigm [6,8,9]. ...
Article
Full-text available
This paper presents a new programming paradigm named Notification Oriented Paradigm (NOP) and analyses performance aspects of NOP programs by means of an experiment. NOP provides a new manner to conceive, structure, and execute software, which allows better performance, causal-knowledge organization, and entity decoupling than standard solutions based upon current paradigms. These paradigms are essentially Imperative Paradigm (IP) and Declarative Paradigm (DP). In short, DP solutions are considered easier to use than IP solutions thanks to the concept of high-level programming. However, they are considered slower to execute and lesser flexible to program than IP. Anyway, both paradigms present similar drawbacks like causal-evaluation redundancies and strongly coupled entities, which decrease software performance and processing distribution feasibility. These problems exist due to an orientation to monolithic inference mechanism based upon sequential evaluation by means of searches over passive computational entities. NOP proposes another manner to structure software and make its inferences, which is based upon small, smart, and decoup-led collaborative entities whose interaction happen by means of precise notifications. This paper discusses NOP as a paradigm and presents certain comparison of NOP against IP. Actually, performance is evaluated by means of IP and NOP programs with respect to a same application, which allow demonstrating NOP superiority.
... Still, in complex software, this can even exhaust a processor capacity, demanding faster processor or even some sort of distributions (e.g. dual-core) [1,4,7]. Indeed, an optimization-oriented programming could avoid such drawbacks and related costs [1,4,8]. ...
... The code redundancies may result, for example, in the need of a more powerful processor than it is really required [1,4,7]. Also, they may result in the need for code distribution to processors, thereby implying in other problems such as module splitting and synchronization. ...
... Also, they may result in the need for code distribution to processors, thereby implying in other problems such as module splitting and synchronization. These problems, even if solvable, are additional issues in the software development whose complexity increases as much as the fine-grained code distribution is demanded, particularly in terms of logical-causal ("if-then") calculation [1,4,7,9]. ...
Article
Full-text available
This paper presents a new programming paradigm named Notification-Oriented Paradigm (NOP) and analyses the performance aspects of NOP programs by means of an experiment. NOP provides a new manner to conceive, structure, and execute software, which would allow causal-knowledge organization and decoupling better than standard solutions based upon current paradigms. These paradigms are essentially Imperative Paradigm (IP) and Declarative Paradigm (DP). In short, DP solutions are considered easier to use than IP solutions due to the concept of high-level programming. However, they are considered slower in execution and lesser flexible in development. Anyway, both paradigms present similar drawbacks such as redundant causal-evaluation and strongly coupled entities, which decrease software performance and processing distribution feasibility. These problems exist due to an orientation to a monolithic inference mechanism based on sequential evaluation searching on passive computational entities. NOP proposes another way to structure software and make its inferences, which is based on small, collaborative, and decoupled computational entities whose interaction happens through precise notifications. This paper presents a quantitative comparison between two equivalent implementations of a sale system, one developed according to the principles of Object-Oriented Paradigm (OOP/IP) in C++ and other developed according to the principles of NOP based on a NOP framework in C++. The results showed that NOP implementation obtained quite equivalent results with respect to OOP implementation. This happened because the NOP framework uses considerable expensive data-structures over C++. Thus, it is necessary a new compiler to NOP in order to actually use its potentiality.
... Still, in complex software, this can even exhaust a processor capacity, demanding faster processor or even some sort of distributions (e.g. dual-core) [3][4][5][6]. Indeed, an optimization-oriented programming could avoid such drawbacks and related costs [3][4][5][6][7]. ...
... dual-core) [3][4][5][6]. Indeed, an optimization-oriented programming could avoid such drawbacks and related costs [3][4][5][6][7]. ...
... Pascal, C/C++, and Java) present no real facilities to develop optimized and really distributable code, particularly in terms of finegrained decoupling of code [2,3,18,19]. This happens due to the structure and execution nature imposed by their paradigm [6,8,9]. ...
Article
Full-text available
This paper presents a new programming paradigm named Notification Oriented Paradigm (NOP) and analyses per-formance aspects of NOP programs by means of an experiment. NOP provides a new manner to conceive, structure, and execute software, which allows better performance, causal-knowledge organization, and entity decoupling than standard solutions based upon current paradigms. These paradigms are essentially Imperative Paradigm (IP) and Declarative Paradigm (DP). In short, DP solutions are considered easier to use than IP solutions thanks to the concept of high-level programming. However, they are considered slower to execute and lesser flexible to program than IP. Anyway, both paradigms present similar drawbacks like causal-evaluation redundancies and strongly coupled entities, which decrease software performance and processing distribution feasibility. These problems exist due to an orientation to monolithic inference mechanism based upon sequential evaluation by means of searches over passive computational entities. NOP proposes another manner to structure software and make its inferences, which is based upon small, smart, and decoup-led collaborative entities whose interaction happen by means of precise notifications. This paper discusses NOP as a paradigm and presents certain comparison of NOP against IP. Actually, performance is evaluated by means of IP and NOP programs with respect to a same application, which allow demonstrating NOP superiority.
... Das Peer Thread-Model[NBF96] (auch Peer-to-Peer Thread-Model[HH03]) basiert auf gleichberechtigten Threads, die sich selbstständig koordinieren (siehe Abbildung 5.2). Es muss sich lediglich zu Beginn des Arbeitsprozesses ein Thread um die Erzeugung der anderen Threads kümmern, sobald die Initialisierungsphase abgeschlossen ist, reiht er sich in die Liste der Threads ein, um dann selber als gleichberechtigter Arbeiter am Arbeitsprozess teilzunehmen.Grundlagen und Konzepte der Multi-Threaded -Programmierung Abbildung 5.2: Das Peer Thread-Model (auch Peer-to-Peer Thread-Model )5.2.3 Das Pipeline Thread-ModelBeim Pipeline Thread-Model[NBF96,HH03] werden die zu erledigenden Aufgaben auf die einzelnen Threads aufgeteilt, die dann im Fließbandverfahren die einzelnen Verarbeitungsschritte nacheinander ausführen (siehe Abbildung 5.3). ...
... Das Peer Thread-Model[NBF96] (auch Peer-to-Peer Thread-Model[HH03]) basiert auf gleichberechtigten Threads, die sich selbstständig koordinieren (siehe Abbildung 5.2). Es muss sich lediglich zu Beginn des Arbeitsprozesses ein Thread um die Erzeugung der anderen Threads kümmern, sobald die Initialisierungsphase abgeschlossen ist, reiht er sich in die Liste der Threads ein, um dann selber als gleichberechtigter Arbeiter am Arbeitsprozess teilzunehmen.Grundlagen und Konzepte der Multi-Threaded -Programmierung Abbildung 5.2: Das Peer Thread-Model (auch Peer-to-Peer Thread-Model )5.2.3 Das Pipeline Thread-ModelBeim Pipeline Thread-Model[NBF96,HH03] werden die zu erledigenden Aufgaben auf die einzelnen Threads aufgeteilt, die dann im Fließbandverfahren die einzelnen Verarbeitungsschritte nacheinander ausführen (siehe Abbildung 5.3). Zwecks Erhöhung der Leistung sollten alle Arbeiter immer genug Arbeit haben, um auch wirklich gleichzeitig arbeiten zu können, was durch Vorkehrungen und sorgfältige Programmplanung erreicht werden kann. ...
... Zwecks Erhöhung der Leistung sollten alle Arbeiter immer genug Arbeit haben, um auch wirklich gleichzeitig arbeiten zu können, was durch Vorkehrungen und sorgfältige Programmplanung erreicht werden kann. Zusätzliche Informationen finden sich in[But97].Abbildung 5.3: Das Pipeline Thread-Model5.2.4 Das Producer-Consumer Thread-ModelIm Producer-Consumer Thread-Model[HH03] (auch Client/Server Thread-Model[But97]) fordert ein Client vom Server an, eine bestimmte Operation durchzuführen. Der Server führt diese Operation unabhängig vom Client durch, der Client kann also auf den Server warten oder auch zwischenzeitlich andere Aufgaben durchführen, um zu einem späterenGrundlagen und Konzepte der Multi-Threaded -Programmierung Zeitpunkt das Ergebnis abzufragen (siehe Abbildung 5.4). ...
Article
In dieser Masterarbeit wird ein Object Tracking System realisiert, das bei Video-Material mit Auflösungen von bis zu 768 OE 576 Bildpunkten bei 3 Farbkanälen Echtzeitfähigkeit erreichen soll (Def.: das Verarbeiten von 25 Bildern pro Sekunde -in 40 Millisekunden müs-sen alle Algorithmen fertig berechnet werden). Dabei werden state-of-the-art Algorithmen verwendet, die auf CPUs mit nur einem Rechenkern nur bei geringeren Video-Auflösungen Echtzeitfähigkeit erreichen. Ziel ist die Performance-Steigerung eines Object Tracking Sys-tems. Die Motivation für eine Performance-Steigerung rührt daher, dass sich, je weniger Zeit für das Berechnen von Verarbeitungsschritten eines Object Tracking Systems benö-tigt wird, desto höhere Video-Auflösungen und -Frameraten handhaben lassen (resultiert in besseren Tracking-Ergebnissen), und es bleibt mehr Zeit für weitere Verarbeitungsschrit-te (z.B. höherentwickelte Data Association, Behaviour Detection). Die Berechnungsschritte, die im entwickelten System durchgeführt werden und als Gesamtpaket in Echtzeit laufen sollen, sind Datenerfassung, Bewegungserkennung samt Schatten-und Reflexionserkennung inklusive deren Entfernung, Connected Components Analysis, Verwalten der zu trackenden Objekte (Initialisierung, Tracking, Data Association, Löschen) und die grafische Ausgabe. Die Echtzeitfähigkeit wird dadurch erreicht, dass einerseits Algorithmen bzw. Algorithmus-schritte, die parallelisierbar sind, auf die Grafikkarte ausgelagert werden und andererseits durch Multi-Threading Mehrkernprozessoren ausgereizt werden, indem verschiedene Phasen der Verarbeitungskette des Systems auf alle vorhandenen CPU-Kerne aufgeteilt und so Be-rechnungen simultan auf verschiedenen Rechenkernen ausgeführt werden (Datenerfassung, Bewegungserkennung, Object Tracking, Visualisierung). Mit diesen Konzepten erreicht das implementierte Object Tracking System eine Performancesteigerung um mehr als den Fak-tor 9 im Vergleich zu einer optimierten Single-Core CPU-Variante.
... Still, in complex software, this can even exhaust a processor capacity, demanding faster processor or even some sort of distributions (e.g. dual-core) [1,4,7]. Indeed, an optimizationoriented programming could avoid such drawbacks and related costs [1,4,8]. ...
... The code redundancies may result, for example, in the need of a more powerful processor than it is really required [1,4,7]. Also, they may result in the need for code distribution to processors, thereby implying in other problems such as module splitting and synchronization. ...
... These problems, even if solvable, are additional issues in the software development whose complexity increases as much as the fine-grained code distribution is demanded, particularly in terms of logical-causal (i.e. "if-then") calculation [1,4,7]. ...
Article
Full-text available
This paper presents a new programming paradigm named Notification-Oriented Paradigm (NOP) and analyses the per-formance aspects of NOP programs by means of an experiment. NOP provides a new manner to conceive, structure, and execute software, which would allow better performance, causal-knowledge organization, and decoupling than standard solutions based upon usual paradigms. These paradigms are essentially Imperative Paradigm (IP) and Declarative Para-digm (DP). In short, DP solutions are considered easier to use than IP solutions due to the concept of high-level pro-gramming. However, they are considered slower in execution and less flexible in development. Anyway, both para-digms present similar drawbacks such as redundant causal-evaluation and strongly coupled entities, which decrease the software performance and the processing distribution feasibility. These problems exist due to an orientation to a mono-lithic inference mechanism based upon sequential evaluation by searching on passive computational entities. NOP pro-poses another way to structure software and make its inferences, which is based upon small, collaborative, and decoup-led computational entities whose interaction happens through precise notifications. In this context, this paper presents a quantitative comparison between two equivalent implementations of a computer game simulator (Pacman simulator), one developed according to the principles of Object-Oriented Paradigm (OOP/IP) in C++ and other developed accord-ing to the principles of NOP. The results obtained from the experiments demonstrate, however, a quite lower perform-ance of NOP implementation. This happened because NOP applications are still developed using a framework based on C++. Besides, the paper shows that optimizations in the NOP framework improve NOP program performance, thereby evidencing the necessity of developing a NOP language/compiler.
... As modularity is a main design goal, the client cannot only cooperate with the neurosimulator but offers functionalities that make it possible to hook up other programs. For the basic programming paradigms used, see [1,17,18,22,24,35,40,41]. ...
... For simple parallelization purposes specialized programming packages can be used. For general information, [22] and [10] are a good start. The most common packages are the Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM); see [12,16] for information. ...
Article
The famous game of two cars is a pursuit-evasion dynamic game. In the extended version presented here, a correct driver (evader) on a freeway detects a wrong-way driver (pursuer in a worst case scenario), i.e., a car driving on the wrong lanes of the road or in the wrong direction. The correct driver must try to avoid collision against all possible maneuvers of the wrong-way driver. Additionally, he must try to stay on the freeway lanes. Analytically, the game is not fully solvable. The state-space is cut by various singular manifolds, e.g., barriers, universal, and dispersal manifolds. Here, discretized Stackelberg games are solved numerically for many positions in the state-space. The resulting trajectories and their adherent information are used to synthesize optimal strategies with artificial neural networks. These networks learn the optimal turn rates and optimal velocity change rates. The networks are trained with the high-end neurosimulator FAUN (Fast Approximation with Universal Neural Networks). A grid computing implementation is used which allows significantly shorter computing times. This implementation runs on low-budget, idle PC clusters and moreover power saving allows to wake up and shut down computers automatically. Parallelization on cheap hardware is one of the key benefits of the presented approach as it leads to fast but nonetheless good results. The computed artificial neural networks approximate the Stackelberg strategies accurately. The approach presented here is applicable to many other complex dynamic games which are not (fully) solvable analytically.
... NET is an open source, high-performance, easy-to-use implementation of Message Passing Interface (MPI) for Microsoft's .NET environment [17]. Most MPI implementations provide support for writing MPI programs in C, C++, and FORTRAN [18]. MPI.NET provides support for all of the .NET languages (especially C#), and includes significant extensions (such as automatic serialization of objects) that make it far easier for us to build parallel programs that run on clusters [19]. ...
... The implemented program.1 has been shown in fig.4. WCCS allows easy management of clusters which is an important feature as discussed in [18]. With WCCS we can divide and distribute the work load between different nodes in a desired manner [12,13]. ...
Article
Full-text available
We have implemented multifarious aspects of multi scale modeling on various HPC (High Performance Computing) setups. Distribution of jobs from macro to nano scale has been shown. This distribution is substantiated on MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) on MATLAB, Linux and WCCS (Windows Compute Cluster Server) environments. In this paper we have shown the connections and a novel way of implementing multi scale computations on an HPC setup. We have also compared MPI and PVM based HPC setup for MATLAB, Linux and WCCS environments. The selection criteria for identification and proposition of the tool, protocol and environment for an HPC setup has been corroborated. Comparison of the advantages and disadvantages of each of the methodologies being put forward. Thus depending on the need the correct choice can be made. MPI.NET was used under WCCS where C# was used. The latest versions were used for PVM Linux based setup where Open SUSE Linux was used as the operating system. The main two criteria user friendly and performance were compared and the recommendations are made for making the right balance between them.
... MPI.NET is an open source, high-performance, easy-to-use implementation of Message Passing Interface (MPI) for Microsoft's .NET environment [17]. Most MPI implementations provide support for writing MPI programs in C, C++, and FORTRAN [18]. MPI.NET provides support for all of the .NET languages (especially C#), and includes significant extensions (such as automatic serialization of objects) that make it far easier for us to build parallel programs that run on clusters [19]. ...
... Given below is the very general C# program code for distribution of computations of nanotechnology. break; case 2: /*Code to be executed at this level goes down*/ /*Molecular Dynamics -Newton*/ initMolecularDynamics(); break; case 3: /*Code to be executed at this level goes down*/ /*Quantum Mechanics -Schrodinger*/ initQuantumMechanics(); break; default: Console.WriteLine("Process " + comm.Rank + " status: Idle"); break; } /*All processes join here*/ comm.Barrier(); /*All processes completed*/ if (comm.Rank == 0){Console.WriteLine("All processes finished");} }/* End of MPI Environment namespace */ The implemented program.1 has been shown in Fig. 2. WCCS allows easy management of clusters which is an important feature as discussed in [18]. With WCCS we can divide and distribute the work load between different nodes in a desired manner [12,13]. ...
Conference Paper
Full-text available
We have implemented multifarious aspects of nano simulation using multi scale modeling on various HPC (High Performance Computing) setups. Distribution of jobs from macro to nano scale has been shown which holds the essence of simulation at nano scale. This distribution is substantiated on MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) on MATLAB, Linux and WCCS (Windows Compute Cluster Server) environments. In this paper we have shown the connections and a novel way of implementing multi scale computations on an HPC setup. We have also compared the implementation of MPI and PVM based HPC setup for MATLAB, Linux and WCCS environments. The selection criteria for identification and proposition of the tool, protocol and environment for an HPC setup plays an important role in deciding the tool to be used. Comparison of the advantages and disadvantages of each of the methodologies being put forward. MPI.NET was used under WCCS where C# was used. The latest versions were used for PVM Linux based setup where Open SUSE Linux was used as the operating system. The main two criteria user friendly and performance were compared and the recommendations are made for making the right balance between them.
... At its core, distributed programming is about the development of applications where two or more processes cooperate to solve a given task, where the processes may exist or not on a same computer [6]. This allows to leverage the computational capabilities of several computers, for solving problems that would be, otherwise, out of grasp for a single workstation. ...
Article
Full-text available
In this paper, we introduce MARVEL, a system designed to simplify the teaching of MapReduce, a popular distributed programming paradigm, through software visualization. At its core, it allows a teacher to describe and recreate a MapReduce application by interactively requesting, through a graphical interface, the execution of a sequence of MapReduce transformations that target an input dataset. Then, the execution of each operation is illustrated on the screen by playing an appropriate graphical animation stage, highlighting aspects related to its distributed nature. The sequence of all animation stages, played back one after the other in a sequential order, results in a visualization of the whole algorithm. The content of the resulting visualization is not simulated or fictitious, but reflects the real behavior of the requested operations, thanks to the adoption of an architecture based on a real instance of a distributed system running on Apache Spark. On the teacher’s side, it is expected that by using MARVEL he/she will spend less time preparing materials and will be able to design a more interactive lesson than with electronic slides or a whiteboard. To test the effectiveness of the proposed approach on the learner side, we also conducted a small scientific experiment with a class of volunteer students who formed a control group. The results are encouraging, showing that the use of software visualization guarantees students a learning experience at least equivalent to that of conventional approaches.
... Parallel computers can be classified, for example, by the type of memory architecture [2]. The shared, distributed, and hybrid memory systems exist. ...
Article
Full-text available
The efficient codes can take an advantage of multiple threads and/or processing nodes to partition a work that can be processed concurrently. This can reduce the overall run-time or make the solution of a large problem feasible. This paper deals with evaluation of different parallelization strategies of assembly operations for global vectors and matrices, which are one of the critical operations in any finite element software. Different assembly strategies for systems with a shared memory model are proposed and evaluated, using Open Multi-Processing (OpenMP), Portable Operating System Interface (POSIX), and C++11 Threads. The considered strategies are based on simple synchronization directives, various block locking algorithms and, finally, on smart locking free processing based on a colouring algorithm. The different strategies were implemented in a free finite element code with object-oriented architecture OOFEM [1].
... This also makes it difficult to distribute code in the case of a system with parallelism, and it is particularly difficult to distribute code with fine granularity. This happens in the usual programming languages due to the structure and nature of execution enforced by their respective paradigms [15,18,46]. Furthermore, it is important that the process of distributable code development be agile and practical, since distribution is reliably necessary in certain contexts. ...
Article
Full-text available
Since the 1960s, artificial neural networks (ANNs) have been implemented and applied in various areas of knowledge. Most of these implementations had their development guided by imperative programming (IP), usually resulting in highly coupled programs. Thus, even though intrinsically parallel in theory, ANNs do not easily take an effective distribution on multiple processors when developed under IP. As an alternative, the notification-oriented paradigm (NOP) emerges as a new programming technique. NOP facilitates the development of decoupled and distributed systems, using abstraction of knowledge through logical–causal rules, as well as the generation of an optimized code. Both features are possible by means of a notification-oriented inference process, which avoids structural and temporal redundancies in the logic–causal evaluations. These advantages are relevant to systems that have parts decoupled in order to run in parallel, such as ANN. In this sense, this work presents the development of a multilayer perceptron ANN using backpropagation training algorithm based on the concepts of a NOP implementation. Such implementation allows, transparently from high-level programming, parallel code generation that runs on multicore platforms. Furthermore, the solution based on NOP, when compared against the equivalent on IP, presents a high level of decoupling and explicit use of logic–causal elements, which are, respectively, useful to distribution, understanding and improvement of the application.
... The runtime systems for low-level programming models (C++ std::thread, CUDA, OpenCL, and PThreads) could be simpler than that for more comprehensive models such as OpenMP, Cilk Plus, OpenACC, C++ std::future and TBB. The C++11 standard enables users to make the most use of the available hardware directly using the interfaces that are similar to the PThread library [14]. The implementation of the std::thread interfaces could be simple mapping to PThread APIs, thus has minimum scheduling in the runtime. ...
Conference Paper
Full-text available
Abstract—In this paper, we provide comparison of language features and runtime systems of commonly used threading parallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, Nvidia CUDA, OpenCL, C++11 and PThreads. We then report our performance comparison of OpenMP, Cilk Plus and C++11 for data and task parallelism on CPU using benchmarks. The results show that the performance varies with respect to factors such as runtime scheduling strategies, overhead of enabling parallelism and synchronization, load balancing and uniformity of task workload among threads in applications. Our study summarizes and categorizes the latest development of threading programming APIs for supporting existing and emerging computer architectures, and provides tables that compare all features of different APIs. It could be used as a guide for users to choose the APIs for their applications according to their features, interface and performance reported.
... However, these requirements must then be taken into consid- eration at the early stage of development. Among the most important of these requirements are the following 4,10,11 : the ability for run-time coupling between MATLAB/Simulink and one or more ESP-r(s) to run on a heterogeneous network as on Windows and Unix OSs; the ability for run-time coupling between MATLAB/Simulink and one or more ESP-r(s) to support data exchange over a network that is either unidirectional or bidirectional; the ability for run-time coupling between MATLAB/Simulink and one or more ESP-r(s) to support different data-exchange formats, including ASCII, binary, and Extensible Markup Language (XML); the ability for run-time coupling between MATLAB/Simulink and one or more ESP-r(s) to support different communication modes, including synchronous, asynchronous, and partially synchro- nous (or asynchronous); and the possibility for run-time coupling between MATLAB/Simulink and ESP-r to enable simula- tions with either a real building (e.g., building emu- lator) or a control test-rig (e.g., hardware in the loop testing), in which the Inter-process Communication (IPC) must then be platform independent. ...
Article
Full-text available
The use of computer-based automation and control systems for smart sustainable buildings, often so-called Automated Buildings (ABs), has become an effective way to automatically control, optimize, and supervise a wide range of building performance applications over a network while achieving the minimum energy consumption possible, and in doing so generally refers to Building Automation and Control Systems (BACS) architecture. Instead of costly and time-consuming experiments, this paper focuses on using distributed dynamic simulations to analyze the real-time performance of network-based building control systems in ABs and improve the functions of the BACS technology. The paper also presents the development and design of a distributed dynamic simulation environment with the capability of representing the BACS architecture in simulation by run-time coupling two or more different software tools over a network. The application and capability of this new dynamic simulation environment are demonstrated by an experimental design in this paper.
... Operating System: Include real-time operating system; and Software Engineering: Include concurrency and real-time issues. Note that concurrency programming with C++ is rapidly evolving, thanks to the Internet Client-Server concurrency [8,9]. Since SystemC is a concurrent version of the C++ language, one can take advantage of the concepts developed in such books. ...
Chapter
Full-text available
A typical graduate of computer engineering (CE) program pursues a career in the computer industry or with a company that integrates computers into complex products. The Bachelor’s degree curriculum in CE needs to focus in the future on system aspects and the integration of the hardware with software. The current curriculum introduces the hardware concepts with courses on processors, computer architecture, VLSI, electronics, and design automation [1]. Similarly, software concepts are addressed with courses on data structures and algorithms, operating systems, and software engineering. Though these courses met the earlier needs of the industry, we need to re-orient the courses based on the current and future industry requirements and job opportunities that are cross-disciplinary: A current embedded system warrants a seamless integration of software and hardware into a system that meets ever expanding functional and quality metrics. Under this notion, a system is more than the sum of its parts, that is, software and hardware. This requires a holistic approach and a constant dialog between software and hardware practitioners.
... The data transfer correspond to the sending of the results calculated by one of the process and receiving by another process whereas the transfer of control is the execution of one particular function to be performed remotely in another process. Most of attempts to couple codes together rely on the use of shared file system, which is not efficient to exchange data at an adequate level of abstraction (Hughes andHughes 2003, Ranganathan et al. 1996). ...
Article
Full-text available
Communication software and distributed applications for control and building performance simulation software must be reliable, efficient, flexible, and reusable. This paper reports on progress of a project, which aims to achieve better integrated building and systems control modeling in building performance simulation by run-time coupling of distributed computer programs. These requirements motivate the use of the Common Object Request Broker Architecture (CORBA), which offers sufficient advantage than communication within simple abstraction. However, set up highly available applications with CORBA is hard. Neither control modeling software nor building performance environments have simple interface with CORBA objects. Therefore, this paper describes an architectural solution to distributed control and building performance software tools with CORBA objects. Then, it explains how much the developement of CORBA based distributed building control simulation applications is difficult. The paper finishes by giving some recommendations.
... In fact, the running time for a parallel program on a parallel computer is a function of several factors which includes, the data dependencies of the parallel program which relates to the serial portion of the computation that cannot be parallelised and the interconnection network. This serial portion of the program has been long shown by Amhdal (1960) and Gustafason to contribute to the overall speedup of parallel computations [6,9]. ...
Article
Full-text available
The basic idea about parallel computing is about putting independent processing units together to collectively solve a task. However, the amount of speedup attained by this collection of processing units is a function of several factors, one of which is the interconnection network. This paper focuses on measuring performance of parallel programs deployed on wired and wireless networks. Our experiments were conducted on Beowulf clusters; a parallel computer built using a collection of everyday personal computers. This paper shows empirically that distributed memory parallel programs (MPI) written for Beowulf clusters on wireless LAN (IEEE 802.11 g) do not gain appreciable speedup as the number of processing nodes increases compared to the same parallel programs written for the same Beowulf clusters but on wired LAN. It further shows the impact the kind of network has in the overall performances of parallel programs when a multiprogramming approach is used to achieve speedup.
... Of the many possible ways to run-time couple more than one ESP-r with Matlab/Simulink at the same time, the Portable Operating System Interface (POSIX) standard for threads has been the most widely adopted [15]. The use of POSIX threads is very advantageous because of its standardization, flexibility, and portability, as well as fact that POSIX threads provide a standardized programming interface for the dynamic creation and destruction of threads (i.e. ...
Article
A distributed simulation between control systems and building performance applications is becoming more and more an invaluable tool in the analysis of Automated Buildings (ABs) for better operation and design. Instead of costly and time-consuming experiments, using distributed simulations can simultaneously fulfill the occupants’ needs while reducing energy consumption and greenhouse gas emissions. Distributed simulations are used to investigate the impact of advanced control systems on building performance applications. This paper describes the development and implementation of a framework for distributed simulations involving different software tools over a network. The main role of this framework is to run-time couple one or more building performance simulation tool(s) with a control systems environment over a network, within a Building Automation and Control Systems (BACS) architecture. Finally, the paper ends with some conclusions and perspectives for future work.
... Several methods exist for running automata concurrently, including applying multiple processes or threads or, using multiple computers or CPUs. In the system proposed in this paper, multiple threads are used for running automata in parallel because doing so saves memory and computer resource and because automata run by multiple threads can easily communicate with each other [18,19] . Multiple threads are created and assigned to high-and low-level automata separately, which run on their own threads concurrently. ...
Article
This paper describes the application of discrete event systems theory to the design of an automated laboratory system. Current automated laboratory systems typically consist of several interacting processes that must be carefully sequenced to avoid any possible process conflicts. Discrete Event Systems (DES) theory and Supervisory Control Theory (SCT) can be applied together as effective methods of modeling the system dynamics and designing supervisory controllers to precisely sequence the many processes that such systems might involve. Classical approaches to supervisory controller design tend to result in complex controller structures that are difficult to implement, maintain, and upgrade. In this paper, a new approach to designing supervisory controllers for automated laboratory systems is introduced. This new approach uses a modular controller structure that is easier to implement, maintain, and upgrade, and deals with "state explosion" issues in a novel and efficient way.
... The data transfer corresponds to the sending of the results calculated by one of the processes and to the receiving by another process whereas the transfer of control is the execution of one particular function to be performed remotely in another process. Most of the attempts to run-time couple codes rely on the use of shared (or intermediate) files, which is not a very efficient way to exchange data at an adequate level of abstraction (Hughes andHughes 2003, Ranganathan et al. 1996). In ( Yahiaoui et al. 2003), we described and compared also various other options for interapplication data transfer facilities. ...
Article
Full-text available
This paper reports on progress of an ongoing research project, which aims to achieve better control modeling in building performance simulation by integrating distributed computer programs. Recent developments show that there is a need to enhance building performance assessments by integrating new simulation features in order to predict the overall effect of innovative control strategies for integrated building systems. However, both domain independent control modeling environments and domain specific building performance simulation, have their own restrictions. For example, certain control features are represented in one simulation environment while others are only available in other simulation software. To alleviate these practical problems, this paper describes a mechanism that can be used to allow a building simulation environment to exchange data with an external control simulation environment. In particular, this paper focuses on the problem of developing run-time coupling of control and building performance environments over TCP/IP using Internet sockets. The socket implementation is analyzed in terms of minimizing overhead, communication efficiency, and the integration into existing software tools. Perspectives for a run-time coupling specification are given to enable connection-oriented sockets to easily exchange data as well as coupling software. Data requirements in view of integration in real building control protocols (BACnet and LonWorks) are discussed. An early implementation of run-time coupling is demonstrated with a case-study, and the paper finishes with some conclusions and directions for future work.
... For instance, design principles for real-time systems is the focus of a book written by Kopetz [50], which includes the fundamentals on real-time processing as well as deeper aspects of system and application design. Lea [51] and Hughes and Hughes [52] have described principles and patterns for developing parallel and concurrent applications in Java and in C++, targeting mainly practitioners. Along these lines, there has also been academic work on design principles with examples in automation systems [53] and distributed data analysis middleware design [54]. ...
Article
Stream processing applications are used to ingest, process, and analyze continuous data streams from heterogeneous sources of live and stored data, generating streams of output results. These applications are, in many cases, complex, large-scale, low-latency, and distributed in nature. In this paper, we describe the design principles and architectural underpinnings for stream processing applications. These principles are distilled from our experience in building real-world applications both for internal use as well as with customers from several industrial and academic domains. We provide principles, guidelines, as well as appropriate implementation examples to highlight the different aspects of stream processing application design and development. Copyright © 2010 John Wiley & Sons, Ltd.
... The object-oriented programming language C++, in particular, has obtained a considerable amount of success in this field, see e.g. [2, 3, 15, 4, 13]. In comparison with C++, the modern object-oriented programming language Python is known for its even richer expressiveness and flexibility. ...
Chapter
Full-text available
This chapter aims to answer the following question: Can the high-level programming language Python be used to develop sufficiently efficient parallel solvers for partial differential equations (PDEs)? We divide our investigation into two aspects, namely (1) the achievable performance of a parallel program that extensively uses Python programming and its associated data structures, and (2) the Python implementation of generic software modules for parallelizing existing serial PDE solvers. First of all, numerical computations need to be based on the special array data structure of the Numerical Python package, either in pure Python or in mixed-language Python-C/C++ or Python/Fortran setting. To enable high-performance message passing in parallel Python software, we use the small add-on package pypar, which provides efficient Python wrappers to a subset of MPI routines. Using concrete numerical examples of solving wave-type equations, we will show that a mixed Python-C/Fortran implementation is able to provide fully comparable computational speed in comparison with a pure C or Fortran implementation. In particular, a serial legacy Fortran 77 code has been parallelized in a relatively straightforward manner and the resulting parallel Python program has a clean and simple structure.
... There are a number of parallel architectures, such as singleinstruction multi-data (SIMD), multiple-instruction multiple data (MIMD), and single-program multiple-data (SPMD). The parallel FDTD implements a non-supervised SIMD structure. [5] Each processor in the cluster uses the same algorithm to calculate the neighbor locations and the length of the process boundary based on its own tags. The workers also transfer the data and communicate with each other directly. This architecture has three advantages: it does not require a master to coordinate and dispatch the work load ...
Conference Paper
Full-text available
A parallel FDTD algorithm has been realized based on 1-D domain decomposition. Data communication among different adjacent processors is manipulated by a self-defined C++ class ( MPI_FDTD) in which portable functions of the message passing interface (MPI) library are used. The details of the implementation are discussed. EMC applications of the code such as cross talk of traces, cavity, vias, as well as an IC package are provided to demonstrate the parallel efficiency.
Article
Full-text available
Background. It is proposed to develop agent-based network metacomputer systems and applications based on logical methods and related conceptual graphical models, which allows combining imperative and declarative methods when designing the functional architecture and software of a metacomputer. Formalized specifications for creating agent-based network applications based on conceptual and logical models of artificial intelligence are proposed. The term “metacomputer” is chosen to denote the network environment in which the action script is deployed. Another name is a cloudnetwork application, in principle it means the same thing, but it differs in the additional consideration of terminology from the field of modern network technologies in an explicit form. In connection with the growing importance of global computer networks in science and education, the problem of creating large-scale applications is relevant. A functional organization of metacomputer agent-based network distributed computing is proposed, which implements the main structures of distributed programming, where the network is actually considered as a computer with distributed program control based on the messagedriven computing paradigm, and not as a means of implementing the simple client-server or master-slave applications. The aim of the work is to increase the level of parallelism in data processing in metacomputer systems by organizing the pipeline movement of messages over the network. Materials and methods. Conceptual models, logical-algebraic operating models, logical Petri nets are used as the main methods. Results. Conceptual graphs of distributed algorithms and logical-algebraic operational expressions suitable for use as directly executable specifications are proposed, a method is developed for moving from conceptual graphs to executable specifications that define the functional architecture of a metacomputer. Simulation models for distributed algorithms have been developed. Conclusions. The practical implementation of the above concepts and models will increase the level of parallelism in the operation of agent-based virtual metacomputer systems due to the pipeline organization of message passing.
Article
Full-text available
The problem of generalization of the method is the main question that arises when studying the quality of iterative methods. The efficiency of solving systems using iterative methods directly depends on the assumptions about the system of equations to be solved. Prerequisites are used to provide a more efficient solution. Many types of prerequisites are currently known, for example, prerequisites based on the approximation of the system matrix: ILU, IQR, and ILQ; Prerequisites based on the approximation of the inverse matrix: a polynomial, rarely filled approximation of the inverse matrix (for example, AINV), an approximation in the factorized form of the inverse matrix (for example, FSAI, SPAI, etc.). This article analyzes the CG and CG methods with the preconditioner ILU (0) by the example of solving the two-dimensional Poisson equation. The CG method is usually used to solve any system of linear equations. ILU (0) was selected as a prerequisite for the article. The incomplete LU decomposition (ILU (0)) is an efficient precursor and is easily implemented. This suggests a system that can be solved to speed up the accumulation of CG and other iterative methods, that is, to reduce the number of iterations. The ILU (0) preconditioner is very easy to detect using the LU decomposition. Since the linear matrix was rarely filled, the CSR format was used to store the matrix in memory. ILU (0) + CG, i.e. the algorithm with a precondition, was assembled 5-8 times faster than the CG algorithm. Data on the number of iterations of convergence of the method without a preconditioner and with the ILU(0) preconditioner were obtained and analyzed.
Article
Computer based automation and control systems are becoming increasingly important in smart sustainable buildings, of- ten referred to as automated buildings (ABs), in order to automatically control, optimize and supervise a wide range of building performance applications over a network while minimizing energy consumption and associated green house gas emission. This technology generally refers to building automation and control systems (BACS) architecture. Instead of costly and time-consuming experiments, this paper focuses on development and design of a distributed dynamic simulation environment with the capability to represent BACS architecture in simulation by run-time coupling two or more different software tools over a network. This involves using distributed dynamic simulations as means to analyze the performance and enhance networked real-time control systems in ABs and improve the functions of real BACS technology. The application and capability of this new dynamic simulation environment are demonstrated by an experimental design, in this paper.
Thesis
Full-text available
Convolutional neural networks (CNNs) are a variant of deep neural networks (DNNs) optimized for visual pattern recognition, which are typically trained using first order learning algorithms, particularly stochastic gradient descent (SGD). Training deeper CNNs (deep learning) using large data sets (big data) has led to the concept of distributed machine learning (ML), contributing to state-of-the-art performances in solving computer vision problems. However, there are still several outstanding issues to be resolved with currently defined models and learning algorithms. Propagations through a convolutional layer require flipping of kernel weights, thus increasing the computation time of a CNN. Sigmoidal activation functions suffer from gradient diffusion problem that degrades training efficiency, while others cause numerical instability due to unbounded outputs. Common learning algorithms converge slowly and are prone to hyperparameter overfitting problem. To date, most distributed learning algorithms are still based on first order methods that are susceptible to various learning issues. This thesis presents an efficient CNN model, proposes an effective learning algorithm to train CNNs, and map it into parallel and distributed computing platforms for improved training speedup. The proposed CNN consists of convolutional layers with correlation filtering, and uses novel bounded activation functions for faster performance (up to 1.36x), improved learning performance (up to 74.99% better), and better training stability (up to 100% improvement). The bounded stochastic diagonal Levenberg-Marquardt (B-SDLM) learning algorithm is proposed to encourage fast convergence (up to 5.30% faster and 35.83% better than first order methods) while having only a single hyperparameter. B-SDLM also supports mini-batch learning mode for high parallelism. Based on known previous works, this is among the first successful attempts of mapping a stochastic second order learning algorithm to be deployed in distributed ML platforms. Running the distributed B-SDLM on a 16- core cluster achieves up to 12.08x and 8.72x faster to reach a certain convergence state and accuracy on the Mixed National Institute of Standards and Technology (MNIST) data set. All three complex case studies tested with the proposed algorithms give comparable or better classification accuracies compared to those provided in previous works, but with better efficiency. As an example, the proposed solutions achieved 99.14% classification accuracy for the MNIST case study, and 100% for face recognition using AR Purdue data set, which proves the feasibility of proposed algorithms in visual pattern recognition tasks. URL: http://eprints.utm.my/60714/
Article
The analysis of innovative designs that distributes control to buildings over a network is currently a challenging task as exciting building performance simulation tools do not offer sufficient capabilities and the flexibility to fully respond to the full complexity of Automated Buildings (ABs). For that reason, this paper deals with the design and development of a middleware for distributed control and building performance simulations that has been carried out to study and analyze the impact of control systems on building performance applications (i.e., building indoor environments) over a network, rather than costly and time-consuming experiments. The paper also presents a model-based Systems Engineering (SE) methodology for development and design of distributed control and building performance simulations involving two or more different software tools over a network. The main objective of this framework is to run-time couple one or multiple building performance simulation tool(s) with a control modelling environment over a network in order to similarly represent Building Automation and Control Systems (BACS) architecture in a simulation.
Article
The aim of this paper is to the evaluate efficiency of differentapproaches to solution of large, sparse, non-symmetric systems of linearequations on high performance machines, that can be found in any finiteelement software. The different approaches based on direct or iterativealgorithms for solution of linear equations are compared. In particular,directs solver using Skyline sparse storage, direct solver from SuperLUlibrary, iterative solver from Iterative Method Library(IML)are compared. SuperLU is a general purpose library for the directsolution of large, sparse, nonsymmetric systems of linear equations.Additionally, the performance and scalability of parallel SuperLU solveris studied, based on OpenMP. The paper shows thatparallelization can efficiently exploit the power of modern availablehardware, significantly reducing the needed computation time.The different strategies were implemented in OOFEM which is afree finite element code with object oriented architecture for solvingmechanical, transport and fluid mechanics problems that operates onvarious platforms.
Thesis
Full-text available
Esta tese apresenta uma metodologia de paralelização híbrida aplicada ao Método dos Elementos Discretos (DEM - Discrete Element Method) que combina MPI e OpenMP com o intuito de melhoria de desempenho computacional. A metodologia utiliza estratégias de decomposição de domínio visando a distribuição do cálculo de modelos de larga escala em um cluster. A técnica proposta também particiona a carga de trabalho de cada subdomínio entre threads. Este procedimento adicional visa obter maiores desempenhos computacionais através do ajuste de utilização de mecanismos de troca de mensagens entre processos e paralelização por threads. O objetivo principal da técnica é reduzir os elevados tempos de comunicação entre processos em ambientes computacionais de memória compartilhada tais como os processadores modernos. A divisão de trabalho por threads emprega a curva de preenchimento de espaço de Hilbert (HSFC) visando a melhoria de localidade dos dados e evitando custos computacionais (overheads) resultantes de ordenações constantes para o vetor de partículas. As simulações numéricas apresentadas permitem avaliar os métodos de decomposição de domínio, técnicas de particionamento, mecanismos de controle de acesso à memória, dentre outros. Algoritmos distintos de particionamento e diferentes estratégias de solução paralela são abordados para ambientes computacionais de memória distribuída, compartilhada ou para um modelo híbrido que envolve os dois ambientes. A metodologia desenvolvida e a ferramenta computacional utilizada nas implementações realizadas, o software DEMOOP, fornecem recursos que podem ser aplicados em diversos problemas de engenharia envolvendo modelos de partículas em larga escala. Nesta tese alguns destes problemas são abordados, em especial aqueles relacionados com fluxo de partículas em rampas, em funis de descarga e em cenários reais de deslizamento de terra. Os resultados mostram que as estratégias de execução híbridas atingem, em geral, melhores desempenhos computacionais que aqueles que se baseiam unicamente em troca de mensagens. A técnica de paralelização híbrida desenvolvida também obtém um bom controle de balanço de carga entre threads. Os estudos de caso apresentados apresentam boa escalabilidade e eficiências paralelas. O método proposto permite uma execução configurável de modelos numéricos do DEM e introduz uma estratégia combinada que melhora localidade dos dados e um balanceamento de carga iterativo.
Article
The paper considers the task of making schedules for processing data of various types in the conveyor system in the presence of time restrictions on its operation and under condition of forming sets from the processing results. The paper is dedicated to the first stage of solving the above problem. This stage is associated with the rationale of a multilevel model of making decisions on the composition of data batches; composition of groups of batches being processed within the specified time intervals of the system; scheduling of processing of data batches in groups taking into account the conditions of forming sets of various types from the processing results.
Article
We are developing a tool named Kaira. This tool is intended for modelling, simulation and generation of parallel applications. Modelling is based on the variant of Coloured Petri Nets. Coloured Petri Nets provide the theoretical background and we use their syntax and semantics. Moreover our tool can automatically generate standalone parallel applications from the model. In this paper we present how to develop parallel applications in Kaira. Like an example we use two dimensional heat flow problem solved by Jacobi finite difference method. We present different aspects and different approaches how to model this problem in Kaira on different levels of abstraction.
Chapter
Full-text available
Шостий розділ описує методи декомпозиції обчислень на компонентному рівні проектування МЕМС. Тут наводяться основи доменної декомпозиції та розпаралелювання обчислень. Розділ присвячений яскравому прикладу таких методів – відносно молодому методу скінченних елементів розривів та з'єднань. Основний акцент зроблено на висвітлення геометричної інтерпретації методу, шляхом розгляду взаємозв’язків просторів лінійних операторів та векторів, що в них лежать. Для цього піднімаються питання знаходження псевдообернених матриць та їх геометричного змісту.
Thesis
Programming languages and distributed systems have long influenced each other. Naturally, every programming language has its strengths and weaknesses. Consequently, it might be difficult to decide precisely which language should be chosen for a software project. However, the selection of the right programming language can be crucial to the success of a project or a software system. This research project attempts to compare C++, Java and C# in an open distributed systems environment with respect to the following technical and economic language comparison criteria: concurrency, scalability, reliability, security, portability or platform, simplicity and usage, efficiency, high integrity, reusability and maintainability. These criteria are chosen so as to make a comparative study between the three candidate programming languages against the criteria mentioned above in order to find out how best a programming language is selected for a project based on distributed systems. At the end the evaluation and findings are presented in the form of a comparison table and bar chart graph to provide evidence and analysis on why Java is better than other languages or has an advantage over C++ and C# according to some criteria.
Article
Resumen En este trabajo se describe como utilizar el modelo de SMA para el diseño de programas con paralelismo. Uno de los objetivos es identificar las características que debería poseer un lenguaje de programación con paralelismo que utilice un modelo de Sistemas Multi-agente. Además, se evalúa uno de los pocos lenguajes que existen con estas características. Los SMA se han concentrado en proveer un modelo abstracto cuya intención es lograr que el programador se involucre más con el modelado del problema y la lógica del programa que con los detalles de la ejecución paralela en sí misma. Los desarrollos de paralelismo se han orientado más hacia la eficiencia, siendo la mayoría programas de cálculo numérico. En este trabajo mostramos cómo la areas de SMA y de paralelismo pueden complementarse y ofrecer soluciones a problemas de cada una. El enfoque será hacia campos de procesamiento no numérico, donde las interacciones son menos predeci-bles o generalizables. Palabras clave: Programación en paralelo. Sistemas Multi-agente. Lenguajes de Programación.
Article
Full-text available
This paper shows the application of Fuzzy sets to select the optimal sizes of analog integrated circuits accomplishing target specifications that are established by linguistic variables, namely: gain “closer to” unity and “large” bandwidth. The cases of study are three unity-gain cells and a current-feedback operational amplifier (CFOA), whose performances characteristics are evaluated by using integrated circuit technology of 0.35μm and 180nm. Every circuit is codified by the width (W) and large (L) of every metal-oxide-semiconductor field-effect-transistor (MOSFET), and by the bias current source, in order to generate a population of feasible solutions by applying the non-dominated sorting genetic algorithm NSGA-II, from which the optimal W and L sizes are selected through the intersection of fuzzy sets. We present the results of the proposed fuzzy selection approach that has beenimplemented in a sequential and a distributed system, and by applying a zoom technique to search for the optimal sizes in a more detailed or refined way.
Conference Paper
A distributed simulation between control systems and building performance applications is increasingly becoming an important enabler in the analysis of Automated Buildings (ABs)for better design and operation. To tackle problems in fulfillingoccupants' needs while reducing energy use and greenhouse gas emissions, it requires using distributed simulations to investigate the impact of advanced control systems on building performance applications through virtual representations rather than using experimental trials, which are usually time-consuming and cost prohibitive. For this reason, this paper describes the development and implementation of a framework for distributed simulations involving different software tools over a network. The main role of this framework is analogous to a cooperative middleware that distributes one or more building performance simulation tool(s)and control systems environment by run-time coupling over a network as qualified by similarity to Buildings Automation and Control Systems (BACS) architecture. The paper ends by giving an outlook on the future work of further developments, mainly the analysis of emergent properties multi-physical co-simulation.
Conference Paper
The paper deals with the problems of building CAD systems using the methodology of problem adaptation. It is shown the method of creation of adaptors - artificial intelligent program components - using the paradigm of objective-oriented programming. Also it is discussed some common scientific aspects of building intelligent objects and the development of intelligent information environments on the platform of agent technologies
Conference Paper
Explosive growth in data size, data complexity, and data rates, triggered by emergence of high-throughput technologies such as remote sensing, crowd-sourcing, social networks, or computational advertising, in recent years has led to an increasing availability of data sets of unprecedented scales, with billions of high-dimensional data examples stored on hundreds of terabytes of memory. In order to make use of this large-scale data and extract useful knowledge, researchers in machine learning and data mining communities are faced with numerous challenges, since the classification algorithms designed for standard desktop computers are not capable of addressing these problems due to memory and time constraints. As a result, there exists an evident need for development of novel, more scalable algorithms that can handle large data sets. In this paper we propose such method, named AROW-MR, a linear SVM solver for efficient training of recently proposed confidence-weighted (CW) classifiers. Linear CW models maintain a Gaussian distribution over parameter vectors, thus allowing a user to estimate, in addition to separating hyperplane between two classes, parameter confidence as well. The proposed method employs MapReduce framework to train CW classifier in a distributed way, obtaining significant improvements in both training time and accuracy. This is achieved through training of local CW classifiers on each mapper, followed by optimally combining local classifiers on the reducer to obtain aggregated, more accurate CW linear model. We validated the proposed algorithm on synthetic data, and further showed that AROW-MR algorithm outperforms the baseline classifiers on an industrial, large-scale task of Ad Latency prediction, with nearly one billion examples.
Article
With the arrival of the Internet Age, more and more companies start to operate E-commerce to make a fortune. Firstly, this paper analyzes the connotation and influencing factors of enterprise E-commerce Business Process (BP). And then we carries out the synergetic elements of enterprise E-commerce BPR by using synergetic ideas and methods. In these bases, we propose an information synergetic degree model basing on target and process in this paper.
Article
In this paper we discuss the architecture characteristics of Multi-core and Multi-processor. According to this architecture, multithreaded algorithm is suitable. In order to make full use of Multi-core and Multi-processor, we must allocate threads to the appropriate core. So that multithreaded allocation algorithm is very important. In this paper we propose a multithreaded allocation algorithm based on Greedy algorithm. This algorithm allocates threads reasonably and makes the use of each core effectively.
Article
A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem. The parallelism can be achieved by executing multiple processes on different processors. A distributed operating system is a special kind of software that is used to manage the distributed system shared resources, the process scheduling activity and the implemented communication and synchronization mechanisms. Basically, it represents the extension for multiprocessor architectures of multitasking and multiprogramming operating systems. Four distributed operating systems categories can be identified by combining loosely coupled and tightly coupled hardware and software.
Article
This report describes an overview and the progress to date of the Distributed Episodic Exploratory Planning (DEEP) project. DEEP is a mixed-initiative decision support system that utilizes past experiences to suggest courses of action for new situations. It has been designed as a distributed multi-agent system, using agents to maintain and exploit the experiences of individual commanders as well as to transform suggested past plans into potential solutions for new problems. The system is mixed-initiative in the sense that a commander, through his or her agent, can view and modify the contents of the shared repository as needed. The agents interact through a common knowledge repository, represented by a blackboard in the initial architecture. The blackboard architecture is well suited for dealing with ill-defined, complex situations such as warfare.
Article
Distributed computing allows to combine the computing power of miscellaneous computers. These computers may be at different locations as long as they are connected via a network, e. g. the Internet or an intranet. In this paper the development of a distributed computing version of the neurosimulator FAUN (F ast Approximation with U niversal Neural N etworks) is described. This offers the opportunity to use free computing resources, e. g. of a student and staff computer cluster. An easy to install client is part of the development work. The combined computation power is necessary for, e. g., fast forecasting or fast simulation problems to be solved with FAUN which would otherwise take hours or days on a single processor computer. Problems which computation time can be shortened significantly by distributed computing with FAUN include, but are not limited to, dynamic games, robust optimal reentry guidance of a space shuttle and currency forecasting.
ResearchGate has not been able to resolve any references for this publication.