Developments in computer architectures are changing the way we do computational physics. Data-parallel programming languages eliminate unnecessary serialization, which obscures the parallelism in nature, and allow scientists to program computers at higher levels. Massively parallel hardware designs allow both higher performance and better performance-per-price. In this talk, I will discuss data-parallel programming, Connection Machine (CMTM) design, and selected applications on Connection Machine supercomputers from computational condensed-matter physics.
Lattice Gas techniques, which are becoming widely accepted for simulating fluid flow in complex geometries, are well suited for studying parallel computing, because they offer opportunities for parallelization on four different levels. These four levels: instruction, statement, loop and process, represent the four levels upon which parallel algorithms can be designed. In this paper we discuss how lattice Gas algorithms are developed while keeping these parallel levels in mind and we discuss specific realizations on the following machines: SUN SPARC-10, the IBM-9000, the NEC-SX3/11, the Cray-YMP/8, the CM-2 and the Intel iPSC/860 Hyper Cube. Finally, we demonstrate that this programming effort was not in vain, because results for two-dimensional systems with up to 200 million particles show finite size effects which may be interpreted as arising from the mean free path, λ, of the fluid particles. Empirically, these finite size effects are found to be proportional to: 1 + 7 · λ / R, where R is the characteristic size of the system.
A relaxation neural network model is proposed to solve the binary image representation problem. This network iteratively minimizes the computational energy defined by the quantization error in neighboring picture elements in local and parallel computations. For effective binary representation depending on local features such as edges, a relaxation neural network is proposed. Interactions between binary processes and line processes represent discontinuities of the image. It is shown that the proposed neural models can generate high-quality binary images. It is also shown that the proposed models can be efficiently implemented on loosely coupled, hypercube multiple-instruction, multiple data stream (MIMD) and single instruction-multiple data stream (SIMD) parallel computers by parallel computational models and programming methods.
The aim of the first part of this paper is to obtain a simple rational description of trickle flow, which may be extended to a radial dispersion study. Indeed, a straightforward extension of the results, derived previously on the basis of a thermodynamical analogy (P. Marchot, M. Crine and G. A. L'Homm Chem. Eng. J., 36 (1987) 141), has not been possible until now. This is the reason why, in part I, we examine carefully the probabilistic content o our method. Then, on this basis, the results are generalized in part II. Simple probabilistic reasoning leads to analytical relations between liquid flow-rate distributions, and the irrigated fraction of the bed and the operating conditions. These relations (at the limit) can also be derived by maximizing the informational entropy of the system. This method provides numerical algorithms which are used to simulate the flow structure within beds of realistic length. The theoretical results are compared with numerical simulations, and with experimental distributions measured on biofilters using four different types of packing.
The Connection Machine Supercomputer system is described with emphasis on the solution to large scale physics problems. Numerous parallel algorithms as well as their implementation are given that demonstrate the use of the Connection Machine for physical simulations. Applications discussed include classical mechanics, quantum mechanics, electromagnetism, fluid flow, statistical physics and quantum field theories. The visualization of physical phenomena is also discussed and in the lectures video tapes demonstrating this capability are shown. Connection Machine performance and I/O characteristics are also described as well as the CM-2 software.
Massively parallel computers will play an increasingly dominant role in
hydrological computing. One such computer is the Connection Machine
model CM-2, a single-instruction stream, multiple-data stream computer
with up to 65,536 processors, as much as 8 gigabytes (Gbyte) of random
access memory distributed among the processors, and a FORTRAN compiler
based on the proposed FORTRAN-90 standard. One-, two-, and
three-dimensional examples from hydrology are used in this paper to
present a tutorial on programming for the CM-2. The problem of
saturated, steady flow in a randomly heterogeneous three-dimensional
porous medium is explored here in some detail. A diagonally
preconditioned conjugate gradient (DPCG( iterative solver is applied to
this problem for up to 1283 nodes. Less than l min of CM-2
time is required to reduce the error by a factor of 10-6 for
a 128 × 128 × 128 grid with heterogeneous hydraulic
conductivity. Measured CPU times for the DPCG method are significantly
smaller than those reported in the literature for a polynomial PCG
solver applied to the same domain with different boundary conditions and
executed on a Cray X-MP/48 and an Alliant FX/8. The measured performance
is also much greater than that reported in the literature for a strongly
implicit procedure solver applied to a similar problem on a Cray 2. The
need for continued development of massively parallel algorithms,
including effective iterative solution of linear systems of equations
and problems with irregular domains, is indicated.
A molecular dynamics simulation study of MgSiO3 has been performed using a large sample containing 4096 unit cells. Thermodynamic properties have been extracted using a semiclassical approximation to the correct quantum mechanical treatment, using the calculated density of states and the quantum harmonic formalism for thermodynamic functions. Simulations performed at different temperatures and volumes have given an estimate of the relative contributions due to thermal expansion (quasi-harmonic effects) and direct anharmonic interactions. Comparison of results for mean square atomic displacements with results on smaller samples have shown the limitations of smaller sample sizes.
We describe the solution to very large systems of dense linear equations using the CM-2/DATAVAULT system. The block LU decomposition algorithm is presented and the implementation of this algorithm on the CM-2 is described. It is shown that the computation time is dominated by matrix multiplication. Performance results for the factorization of a dense complex 20k by 20k double precision matrix are given.
The subject of the report is the development of the design for a specially-organized, general-purpose computer which performs matrix operations efficiently. The content of the report is summarized as follows: First, a review of the relevant work which has been done with microcellular and macrocellular techniques is made. Second, the discrete Kalman filter is described as an example of the type of problem for which this computer is efficient. Third, a detailed design for a cellular, array-structured computer is presented. Fourth, a computer program which simulates the cellular computer is described. Fifth, the recommendation is made that one cell and the associated control circuits be constructed to determine the feasibility of producing a hardware realization of the entire computer. (Author)
A cellular automaton (CA) recently developed by D.H. Rothman and J.M. Keller (1988) simulates the flow of two incompressible, immiscible, viscous fluids in two dimensions. This automaton has been simulated on the CM-2 Connection Machine using a sequence of logical operations and table lookups to determine the state of a CA site from its old state and those of its neighbors. The logical operations are performed in parallel by each of the Connection Machine processors, while the table lookups use the indirect addressing capabilities among groups of 32 processors. A description is given of CA fluids, including the issue of isotropy, the choice of a rule set, and the averaging procedure used to obtain hydrodynamical quantities. The CM-2 Connection Machine is then described, with emphasis on the indirect addressing capabilities of the machine. A complete description is also given of the Rothman-Keller model for two-phase flow. It is shown how the indirect addressing is used in the simulation algorithm, and how a symmetry in the dynamics is used to reduce the size of the lookup tables by a factor of six. A time sequence of results showing the separation of two immiscible phases from an initially homogenized state is presented.
The Connection Machine is a massively parallel architecture with 65 536 single-bit processors and 32 Mbytes of memory, organized as a high-dimensional hypercube. A sophisticated router system provides efficient communication between remote processors. A rich software environment, including a parallel extension of COMMON LISP, provides access to the processors and network. Virtual processor capability extends the degree of fine-grained parallelism beyond 1 000 000.We describe the hardware and the parallel programming environment. We then present implementations of SOR, Multigrid and Conjugate Gradient algorithms for solving Partial Differential Equations on the Connection Machine. Measurements of computational efficiency are provided as well as an analysis of opportunities for achieving better performance. Despite the lack of floating-point hardware, computation rates above 100 Mflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly.
We describe an implementation of the Cooley-Tukey complex-to-complex FFT on the Connection Machine. The implementation is designed to make effective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors. The peak data motion rate that is achieved for the interprocessor communication stages is in excess of 7 Gbytes/s for a Connection Machine system CM-200 with 2048 floating-point processors. The peak rate of FFT computations local to a processor is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision. The same FFT routine is used to perform both one- and multi-dimensional FFT without any explicit data rearrangement. The peak performance for a one-dimensional FFT on data distributed over all processors is 5.4 Gflops/s in 32-bit precision and 3.2 Gflops/s in 64-bit precision. The peak performance for square, two-dimensional transforms, is 3.1 Gflops/s in 32-bit precision, and for cubic, three dimensional transforms, the peak is 2.0 Gflops/s in 64-bit precision. Certain oblong shapes yield better performance. The number of twiddle factors stored in each processor is P/2N + log2N for an FFT on P complex points uniformly distributed among N processors. To achieve this level of storage efficiency we show that a decimation-in-time FFT is required for normal order input, and a decimation-in-frequency FFT is required for bit-reversed input order.
Cooley-Tukey FFT on the Connection Machine
Jan 1989
Macdonald
MacDonald, " Cooley-Tukey FFT on the Connection Machine," submitted to Parallel Computing (1989).
Modelling Waves in the Earth
Jan 1989
P Mora
P. Mora, "Modelling Waves in the Earth,"
Thinking Machines Corporation Technical
Report GE089-1 (1989).
Jan 1989
H Simon
H. Simon ed., "Proceedings of the Conference on Scientific Application of the Connection Machine," World Scientific (1989).
Jan 1986
PHYS REV LETT
U Frisch
B Hasslacher
Y Pomeau
U. Frisch, B. Hasslacher, Y. Pomeau,
Phys. Rev. Lett., 56 (1986).
Thinking Machines Corporation
Jan 1989
P Mora
P. Mora, "Modelling Waves in the Earth,"
Thinking Machines Corporation Technical
Report GE089-1 (1989).