ArticlePDF Available

Reevaluating Amdahl's Law

Authors:
  • Vq Research Inc

Abstract

An astronomer-turned-sleuth traces a German trespasser on our military networks, who slipped through operating system security holes and browsed through sensitive databases. Was it espionage?
A preview of the PDF is not available
... which is sometimes referred to as scaled speedup [20]. Parallel efficiency η N can be defined as the ratio of the total resource use time per amount of work for on N ref cores to that on N cores, ...
Article
While substantial research effort has been made recently in the development of computational liquid-metal magnetohydrodynamics (MHD) solvers, this has typically been confined to closed-source and commercial codes. This work aimed to investigate some open-source alternatives. Two OpenFOAM-based MHD solvers, mhdFoam and epotFoam, were found to show strong scaling profiles typical of fluid dynamics codes, while weak scaling was impeded by an increase in iterations per timestep with increasing resolution. Both were found to accurately solve the Shercliff and Hunt flow problems for Hartmann numbers from 20 to 1000, except for mhdFoam which failed in the Hunt flow Ha = 1000 case. An inductionless MHD solver was implemented in the Proteus MOOSE application as a proof of concept, using two methods referred to as the kernel method and material method. The material method was found to converge with a wider range of preconditioners than the kernel method, however the kernel method was found to be significantly more accurate. Future work will aim to build on these studies, exploring more advanced OpenFOAM MHD solvers as well as improving the Proteus MHD solver.
... This phenomenon is not observed on the CPU and is related to properly saturating the GPU's compute capacity. Clearly, large problems benefit the most from GPU-acceleration, which is consistent with the Gustafson's Law (Gustafson, 1988). max has relatively insignificant influence on the scaling of PC and PI schemes, but NR sees slightly increased speedup on GPU when max = 200 s ( Fig. 4b and Table 1). ...
... The proportion of the parallelisable part in the total execution time grows and often more significantly than the sequential part. Another well-known theorem, Gustafson's law [23], takes the increase of workload into account and therewith provides a less pessimistic and more pragmatic view of the theoretically achievable speedup. Equation (2.2) demonstrates Gustafson's law, where s is the proportion of execution time which is spent on the sequential part of the workload, p is the proportion of execution time which is spent on the parallel part of the workload, and N is the number of processors present in a parallel computing resource. ...
Thesis
Full-text available
Industrial Cyber-Physical Systems (CPS) drive industry sectors worldwide, combining physical and software components into sophisticated interconnected systems. Distributed CPS (dCPS) further enhance these systems by interconnecting multiple distributed subsystems through intricate, complex networks. Researchers and industrial designers need to carefully consider various design options that have the potential to impact system behaviour, cost, and performance during the development of dCPS. However, the increased size and complexity present manufacturing companies with new challenges when designing their next-generation machines. Furthermore, objectively evaluating these ma-chines' vast number of potential arrangements can be resource-intensive. One of the approaches designers can utilise to aid themselves with early directions in the design process is Design Space Exploration (DSE). Nevertheless, the vast amount of potential design points (a single system configuration) in the design space (collection of all possible design points) poses a significant challenge to scalably and efficiently reach an exact or reasonable solution during the design process. This thesis addresses the scalability challenge in the design process employed by researchers and designers of the next-generation complex dCPS. A baseline of understanding is constructed of the state-of-the-art, its complexity, research directions, and challenges in the context of DSE for dCPS and related research fields. To facilitate scalable and efficient DSE for dCPS, an evaluation environment is proposed, implemented, and evaluated. The research considers key design considerations for developing a distributed evaluation workflow that can dynamically be adapted to enable efficient and scalable exploration of the vast design space of complex, distributed Cyber-Physical Systems. Evaluation of the proposed environment employs a set of system models, representing design points within a DSE process, to assess the solution and its behaviour, performance, capability, and applicability in addressing the scalability challenge in the context of DSE for dCPS. During the evaluation, the performance and behaviour are investigated in three areas: (i) Simulation Campaign , (ii) Task Management Configuration, and (iii) Parallel Discrete-Event Simulation (PDES). Throughout the evaluation, it is demonstrated that the proposed environment is capable of providing scalable and efficient evaluation of design points in the context of DSE for dCPS. Furthermore, the proposed solution enables designers and researchers to tailor it to their environment through dynamic complex workflows and interactions, workload-level and task-level parallelism, and simulator and compute environment agnosticism. The outcomes of this research contribute to advancing the research field towards scalable and efficient evaluation for DSE of dCPS, supporting designers and researchers developing their next-generation dCPS. Nevertheless, further research can be conducted on the impact of a system's behavioural characteristics on the performance and behaviour of the proposed solution when using the PDES methodology. Additionally, the interaction between external applications and the proposed solution could be investigated to support and enable further complex interactions and requirements.
Article
The numerical simulation of the flow in oil reservoirs has become, over decades, a standard tool applied by the oil and gas industry to forecast the behavior of a hydrocarbon producing field. To reduce the computational effort of these simulations, which in general demand more time as the case studied becomes more realistic, like other researchers, we use high-performance computing techniques in reservoir simulation. In this context, the main contribution of this work is the proposal of a strategy for the numerical simulation of nonisothermal flow in oil reservoirs using an operator splitting, the OpenMP, a coprocessor, and a one-equation model for temperature without the need to consider local thermal equilibrium. Throughout the development of this work, we used a non-isothermal flow modeling in porous media, the control-volume finite-difference method for discretization, and a linearization of the non-linear algebraic equations by Picard’s method. Our main objective was the parallelization of the numerical code using the OpenMP and, from the variation in the number of threads used in the simulations, we were able to reach speedups higher than 45 in some cases.
Chapter
Reports have emerged of the vast amounts of data being transmitted across the Internet by Internet of Things (IoT) devices. Amidst this, the number of connected IoT devices continues to grow at a very rapid pace but over the last five years peak internet utilization has remained just under 50%. While applications have been processing increasingly massive amounts of data, users have been demanding faster or more acceptable response times and others have touted the economic value of having faster response times. In this paper we proffer arguments for adding more fungible paths to IoT and other smart devices to obtain theoretical speedup in network data transmission, and we present equations to compute the number of fungible paths needed to complete transmission of a set amount of data in a specific required minimum time frame, and the resulting speedup achieved. Our proposed model can assist IoT and smart device manufacturers, network administrators, and networked applications developers in deciding how many physical and logical layer one communication units to build into the devices they manufacture, and how many of these communication paths should be utilized during a communication session to achieve the desired data transfer/response times.
Conference Paper
For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution. Variously the proper direction has been pointed out as general purpose computers with a generalized interconnection of memories, or as specialized computers with geometrically related memory interconnections and controlled by one or more instruction streams.
Processor Architectures]: Multiple Data Stream Architectures (Multiprocessors)-parallel processors General Terms: Theory Additional Key Words and Phrases: Amdahl's law, massively parallel processing
  • Cr Categories
  • Subject Descriptors
CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]: Multiple Data Stream Architectures (Multiprocessors)-parallel processors General Terms: Theory Additional Key Words and Phrases: Amdahl's law, massively parallel processing, speedup
Development and analysis of scientific application programs on a 1024-processor hypercube . SAND 88-0317. Sandia National Laboratories
  • R E Benner
  • J L Gustafson
Benner. R.E.. Gustafson, J.L., and Montry. R.E. Development and analysis of scientific application programs on a 1024-processor hypercube. SAND 88-0317. Sandia National Laboratories. Albuquerque. N.M.. Feb. 1988.
Development and analysis of scientific application programs on a 1024~processor hypercube
  • R E Benner
  • J L Gustafson
  • R E Montry