About
349
Publications
21,625
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,942
Citations
Current institution
Publications
Publications (349)
Students and researchers from all fields of mathematics are invited to read and treasure this special Proceedings. A conference was held 25 –29 September 2017 at Noah’s On the Beach, Newcastle, Australia, to commemorate the life and work of Jonathan M. Borwein, a mathematician extraordinaire whose untimely passing in August 2016 was a sorry loss to...
We give new bounds on the error in the asymptotic approximation of the log-Gamma function $\ln\Gamma(z)$ (also $\ln\Gamma(z+\frac{1}{2}))$ for complex $z$ in the right half-plane. These improve on bounds by Hare (1997) and Spira (1971). We show that $|R_{k+1}(z)| < \sqrt{\pi k}\,|T_k(z)|$ for nonzero $z$ in the right half-plane, where $T_k(z)$ is t...
Mathematical research is undergoing a transformation from a mostly theoretical enterprise to one that involves a significant amount of experimentation. Indeed, computational and experimental mathematics is now a full-fledged discipline with mathematics, and the larger field of computational science is now taking its place as an experimental discipl...
We show that a well-known asymptotic series for the logarithm of the central binomial coefficient is strictly enveloping, so the error incurred in truncating the series is of the same sign as the next term, and is bounded in magnitude by that term. We also consider some closely related asymptotic series, and make some historical remarks.
Paper 2: Richard P. Brent, “Fast multiple-precision evaluation of elementary functions,” Journal of the Association of Computing Machinery, vol. 23 (1976), no. 11, pg. 713–735. ©1976 Association of Computing Machinery, Inc. Reprinted by permission. http:// doi. acm. org/ 10. 1145/ 321941. 321944
Synopsis: As mentioned in the preface to Paper 1, Eug...
We consider several families of binomial sum identities whose definition
involves the absolute value function. In particular, we consider centered
double sums of the form \[S_{\alpha,\beta}(n) :=
\sum_{k,\;\ell}\binom{2n}{n+k}\binom{2n}{n+\ell}
|k^\alpha-\ell^\alpha|^\beta,\] obtaining new results in the cases $\alpha =
1, 2$. We show that there is...
We consider discretisations of the Macdonald--Mehta integrals from the theory of finite reflection groups. For the classical groups, A_{r-1}, B_r and D_r$, we provide closed-form evaluations in those cases for which the Weyl denominators featuring in the summands have exponents 1 and 2$. Our proofs for the exponent-1 cases rely on identities for cl...
We consider discretisations of the Macdonald--Mehta integrals from the theory of finite reflection groups. For the classical groups, $\mathrm{A}_{r-1}$, $\mathrm{B}_r$ and $\mathrm{D}_r$, we provide closed-form evaluations in those cases for which the Weyl denominators featuring in the summands have exponents $1$ and $2$. Our proofs for the exponen...
Let D(n) be the maximal determinant for n × n {±1}-matrices, and R(n) = D(n)/nn/2 be the ratio of D(n) to the Hadamard upper bound. Using the probabilistic method, we prove new lower bounds on D(n) and R(n) in terms of d = n − h, where h is the order of a Hadamard matrix and h is maximal subject to h ≤ n. For example, (Formula Presented) By a recen...
We give upper and lower bounds on the determinant of a small perturbation of the identity matrix. The lower bounds are best possible, and in most cases they are stronger than well-known bounds due to Ostrowski and other authors. The upper bounds are best possible if a skew-Hadamard matrix of the same order exists.
We present a new method for algebraic independence results in the context of
Mahler's method. In particular, our method uses the asymptotic behaviour of a
Mahler function $f(z)$ as $z$ goes radially to a root of unity to deduce
algebraic independence results about the values of $f(z)$ at algebraic numbers.
We apply our method to the canonical examp...
Tuenter [Fibonacci Quarterly 40 (2002), 175-180] and other authors have
considered centred binomial sums of the form \[S_r(n) = \sum_k
\binom{2n}{k}|n-k|^r,\] where $r$ and $n$ are non-negative integers. We
consider sums of the form \[U_r(n) = \sum_k \binom{n}{k}|n/2-k|^r\] which are a
generalisation of Tuenter's sums as $S_r(n) = U_r(2n)$ but $U_r...
We give both lower and upper bounds on the determinant of a perturbation of
the identity matrix or, more generally, a perturbation of a nonsingular
diagonal matrix. The matrices considered are, in most cases, diagonally
dominant. The lower bounds are best possible, and are sharper than bounds
obtained from Gerschgorin's theorem or Ostrowski's inequ...
The Brent-McMillan algorithm B3 (1980), when implemented with binary
splitting, is the fastest known algorithm for high-precision computation of
Euler's constant. However, no rigorous error bound for the algorithm has ever
been published. We provide such a bound and justify the empirical observations
of Brent and McMillan. We also give bounds on th...
We prove a double binomial sum identity which differs from most binomial sum
identities in that the summands involve the absolute value function. The
identity is of interest because it can be used in proofs of lower bounds for
the Hadamard maximal determinant problem. Our proof of the identity uses a
two-variable variant of the method of telescopin...
This is a letter to the editor concerning Semjon Adlaj's article "An eloquent
formula for the perimeter of an ellipse", AMS Notices 59, 8 (2012), 1094-1099.
The binary Euclidean algorithm is a variant of the classical Euclidean
algorithm. It avoids multiplications and divisions, except by powers of two, so
is potentially faster than the classical algorithm on a binary machine. We
describe the binary algorithm and consider its average case behaviour. In
particular, we correct some errors in the literatu...
The general number field sieve (GNFS) is the most efficient algorithm known
for factoring large integers. It consists of several stages, the first one
being polynomial selection. The quality of the chosen polynomials in polynomial
selection can be modelled in terms of size and root properties. In this paper,
we describe some algorithms for selectin...
We show that the maximal determinant D(n) for $n \times n$ ${\pm 1}$-matrices
satisfies $R(n) := D(n)/n^{n/2} \ge \kappa_d > 0$. Here $n^{n/2}$ is the
Hadamard upper bound, and $\kappa_d$ depends only on $d := n-h$, where $h$ is
the maximal order of a Hadamard matrix with $h \le n$. Previous lower bounds on
R(n) depend on both $d$ and $n$. Our boun...
By an old result of Cohn (1965), a Hadamard matrix of order n has no proper
Hadamard submatrices of order m > n/2. We generalise this result to maximal
determinant submatrices of Hadamard matrices, and show that an interval of
length asymptotically equal to n/2 is excluded from the allowable orders. We
make a conjecture regarding a lower bound for...
We prove an upper bound on sums of squares of minors of {+1, -1} matrices.
The bound is sharp for Hadamard matrices, a result due to de Launey and Levin
(2009), but our proof is simpler. We give several corollaries relevant to
minors of Hadamard matrices, and generalise a result of Turan on determinants
of random {+1,-1} matrices.
Let $D(n)$ be the maximal determinant for $n \times n$ $\{\pm 1\}$-matrices,
and ${\mathcal R}(n) = D(n)/n^{n/2}$ be the ratio of $D(n)$ to the Hadamard
upper bound. Using the probabilistic method, we prove new lower bounds on
$D(n)$ and ${\mathcal R}(n)$ in terms of the distance $d$ to the nearest
(smaller) Hadamard matrix, defined by $d = n-h$, w...
We consider the distribution of $\arg\zeta(\sigma+it)$ on fixed lines $\sigma
> \frac12$, and in particular the density \[d(\sigma) = \lim_{T \rightarrow
+\infty}
\frac{1}{2T}
|\{t \in [-T,+T]: |\arg\zeta(\sigma+it)| > \pi/2\}|\,,\] and the closely
related density \[d_{-}(\sigma) = \lim_{T \rightarrow +\infty}
\frac{1}{2T}
|\{t \in [-T,+T]: \Re\zet...
We consider the real part Re(zeta(s)) of the Riemann zeta-function zeta(s) in
the half-plane Re(s) >= 1. We show how to compute accurately the constant
sigma_0 = 1.19... which is defined to be the supremum of sigma such that
Re(zeta(sigma+it)) can be negative (or zero) for some real t. We also consider
intervals where Re(zeta(1+it)) <= 0 and show t...
We show that a certain weighted mean of the Liouville function lambda(n) is
negative. In this sense, we can say that the Liouville function is negative "on
average".
The Hadamard maximal determinant (maxdet) problem is to find the maximum
determinant D(n) of a square {+1, -1} matrix of given order n. Such a matrix
with maximum determinant is called a saturated D-optimal design. We consider
some cases where n > 2 is not divisible by 4, so the Hadamard bound is not
attainable, but bounds due to Barba or Ehlich an...
A saturated D-optimal design is a {+1,-1} square matrix of given order with
maximal determinant. We search for saturated D-optimal designs of orders 19 and
37, and find that known matrices due to Smith, Cohn, Orrick and Solomon are
optimal. For order 19 we find all inequivalent saturated D-optimal designs with
maximal determinant, 2^30 x 7^2 x 17,...
This work considers the deployment of pseudo-random number generators (PRNGs)
on graphics processing units (GPUs), developing an approach based on the
xorgens generator to rapidly produce pseudo-random numbers of high statistical
quality. The chosen algorithm has configurable state size and period, making it
ideal for tuning to the GPU architecture...
We consider the computation of Bernoulli, Tangent (zag), and Secant (zig or
Euler) numbers. In particular, we give asymptotically fast algorithms for
computing the first n such numbers in O(n^2.(log n)^(2+o(1))) bit-operations.
We also give very short in-place algorithms for computing the first n Tangent
or Secant numbers in O(n^2) integer operatio...
The best known algorithm to compute the Jacobi symbol of two n-bit integers runs in time O(M(n)logn), using Schönhage’s fast continued fraction algorithm combined with an identity due to Gauss. We give a different O(M(n)logn) algorithm based on the binary recursive gcd algorithm of D. Stehlé and P. Zimmermann [Algorithmic number theory. ANTS-VI. Le...
We outline some of Chris Wallace's contributions to pseudo-random number generation. In particular, we consider his recent
idea for generating normally distributed variates without relying on a source of uniform random numbers and compare it with
more conventional methods for generating normal random numbers. Implementations of Wallace's idea can b...
We describe a search for primitive trinomials of high degree and its
interaction with the Great Internet Mersenne prime search (GIMPS). The search
is complete for trinomials whose degree is the exponent of a Mersenne prime,
for all 47 currently known Mersenne primes.
A pseudo-random number generator (RNG) might be used to generate w-bit random samples in d dimensions if the number of state bits is at least dw. Some RNGs perform better than others and the concept of equidistribution has been introduced in the literature in order to rank different RNGs. We define what it means for a RNG to be (d,w)-equidistribute...
We discuss the error reconciliation phase in quantum key distribution (QKD) and analyse a simple scheme in which blocks with bad parity (that is, blocks containing an odd number of errors) are discarded. We predict the performance of this scheme and show, using a simulation, that the prediction is accurate. Comment: 19 pages. Presented at the 53rd...
We describe von Neumann's elegant idea for sampling from the exponential distribution, Forsythe's generalization for sampling from a probability distribution whose density has the form exp(-G(x)), where G(x) is easy to compute (e.g. a polynomial), and my refinement of these ideas to give an efficient algorithm for generating pseudo-random numbers w...
We survey the numerical stability of some fast algorithms for solving systems of linear equations and linear least squares problems with a low displacement-rank structure. For example, the matrices involved may be Toeplitz or Hankel. We consider algorithms which incorporate pivoting without destroying the structure, and describe some recent results...
Many matrices that arise in the solution of signal processing problems have a special displacement structure. For example, adaptive filtering and direction-of-arrival estimation yield matrices of Toeplitz type. A recent method of Gohberg, Kailath and Olshevsky (GKO) allows fast Gaussian elimination with partial pivoting for such structured matrices...
We consider the problem of computing ratings using the results of games played between a set of n players, and show how this problem can be reduced to computing the positive eigenvectors corresponding to the dominant eigenvalues of certain n by n matrices. There is a close connection with the stationary probability distributions of certain Markov c...
For odd square-free n > 1 the n-th cyclotomic polynomial satisfies an identity of Gauss. There are similar identity of Aurifeuille, Le Lasseur and Lucas. These identities all involve certain polynomials with integer coefficients. We show how these coefficients can be computed by simple algorithms which require O(n^2) arithmetic operations and work...
This report contains a numerical stability analysis of factorization algorithms for computing the Cholesky decomposition of symmetric positive definite matrices of displacement rank 2. The algorithms in the class can be expressed as sequences of elementary downdating steps. The stability of the factorization algorithms follows directly from the num...
We determine the probability that a random n x n symmetric matrix over {1, 2, ... , m} has determinant divisible by m. Comment: 8 pages. An old Technical Report, submitted for archival purposes. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub101.html
This is a draft of a book about algorithms for performing arithmetic, and their implementation on modern computers. We are concerned with software more than hardware - we do not cover computer architecture or the design of computer hardware. Instead we focus on algorithms for efficiently performing arithmetic operations such as addition, multiplica...
We describe some "unrestricted" algorithms which are useful for the computation of elementary and special functions when the precision required is not known in advance. Several general classes of algorithms are identified and illustrated by examples. The topics include: power series methods, use of halving identities, asymptotic expansions, continu...
We survey some results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers. We show how n * n Toeplitz systems of linear e...
In studying the complexity of iterative processes it is usually assumed that the arith- metic operations of addition, multiplication, and division can be performed in certain constant times. This assumption is invalid if the precision required increases as the computation proceeds. We give upper and lower bounds on the number of single- precision o...
For scientific computations on a digital computer the set of real number is usually approximated by a finite set F of "floating-point" numbers. We compare the numerical accuracy possible with difference choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix)...
We consider methods for finding high-precision approximations to simple zeros of smooth functions. As an application, we give fast methods for evaluating the elementary functions log(x), exp(x), sin(x) etc. to high precision. For example, if x is a positive floating-point number with an n-bit fraction, then (under rather weak assumptions) an n-bit...
We consider pseudo-random number generators suitable for vector processors. In particular, we describe vectorised implementations of the Box-Muller and Polar methods, and show that they give good performance on the Fujitsu VP2200. We also consider some other popular methods, e.g. the Ratio method of Kinderman and Monahan (1977) (as improved by Leva...
Marsaglia recently introduced a class of xorshift random number generators (RNGs) with periods 2n-1 for n = 32, 64, etc. Here we give a generalisation of Marsaglia's xorshift generators in order to obtain fast and high-quality RNGs with extremely long periods. RNGs based on primitive trinomials may be unsatisfactory because a trinomial has very sma...
This Report updates the tables of factorizations of a^n +- 1 for 13 < a < 100, previously published as CWI Report NM-R9212 (June 1992) and updated in CWI Report NM-R9419 (Update 1, September 1994) and CWI Report NM-R9609 (Update 2, March 1996). A total of 951 new entries in the tables are given here. The factorizations are now complete for n < 76,...
MP is a package of ANSI Standard Fortran (ANS X3.9-1966) subroutines for performing multiple-precision floating-point arithmetic and evaluating elementary and special functions. The subroutines are machine independent and the precision is arbitrary, subject to storage limitations. The User's Guide describes the routines and their calling sequences,...
The best known algorithm to compute the Jacobi symbol of two n-bit integers runs in time O(M(n) log n), using Sch\"onhage's fast continued fraction algorithm combined with an identity due to Gauss. We give a different O(M(n) log n) algorithm based on the binary recursive gcd algorithm of Stehl\'e and Zimmermann. Our implementation - which to our kn...
A method which uses one-sided Jacobi to solve the symmetric eigenvalue problem in parallel is presented. We describe a parallel ring ordering for one-sided Jacobi computation. One distinc-tive feature of this ordering is that it can sort
Modern Computer Arithmetic focuses on arbitrary-precision algorithms for efficiently performing arithmetic operations such as addition, multiplication and division, and their connections to topics such as modular arithmetic, greatest common divisors, the Fast Fourier Transform (FFT), and the computation of elementary and special functions. Brent an...
A wireless ad hoc network consists of mobile nodes that are powered by batteries. The limited battery lifetime imposes a severe constraint on the network performance, energy conservation in such a network thus is of paramount importance, and energy efficient operations are critical to prolong the lifetime of the network. All-to-all multicasting is...
We exhibit ten new primitive trinomials over GF(2) of record de- grees 24036583, 25964951, 30402457, and 32582657. This completes the search for the currently known,Mersenne prime exponents. Primitive trinomials of degree up to 6972593 were previously known [4]. We have
Abstract Fast and reliable pseudo-random number generators are required for simulation and other applications in Scientific Computing. Because of Moore’s law, random number generators that were satisfactory in the past may be inadequate today. We outline some requirements for good uniform random number generators, and describe a class of generators...
In this paper, we discuss an implementation of various algorithms for multiplying polynomials in
: variants of the window methods, Karatsuba’s, Toom-Cook’s, Schönhage’s and Cantor’s algorithms. For most of them, we propose improvements that lead to practical speedups.
Pollard's rho method is a randomized algorithm for computing discrete logarithms. It works by defining a pseudo-random sequence and then detecting a match in the sequence. Many improvements have been pro- posed, while few evaluation results and eciency sug- gestions have been reported. This paper is devoted to a detailed study of the eciency issues...
We give a new algorithm for performing the distinct-degree factorization of a polynomial P(x) over GF(2), using a multi-level blocking strategy. The coarsest level of blocking replaces GCD computations by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup (1992), and others. The novelty of our approach is that a finer level o...
Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a nee...
Extending the idea of our previous algorithm (17, 18) we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm....
This paper describes a parallel implementation of our recently developed algorithm for phylogenetic analysis on the IBM BlueGene/L cluster. This algorithm constructs evolutionary trees for a given set of DNA or protein sequences based on the topological information of every possible quartet trees. Our experimental results showed that it has several...
Recently we developed a new quartet-based algorithm for phylogenetic analysis (22). This algorithm constructs a limited number of trees for a given set of DNA or protein sequences and the initial experimental results show that the probability for the correct tree to be included in this small set of trees is very high. In this paper we further exten...
An interesting and important, but largely ignored question associated with the ML method is whether there exists only a single maximum likelihood point for a given phylogenetic tree. Mike Steel presented a simple analytical result to argue that the ML point is not unique (11). However, his view so far attracts only little attention. Though many res...
Fast and reliable pseudo-random number generators are required for simulation and other applications in Scientific Computing. We outline the requirements for good uniform random number generators, and describe a class of generators having very fast vector/parallel implementations with excellent statistical properties. We also discuss the problem of...
In this paper we introduce a new quartet-based method. This method makes use of the Bayes (or quartet) weights of quartets as those used in the quartet puzzling. However, all the weights from the related quartets are accumulated to form a global quartet weight matrix. This matrix provides integrated information and can lead us to recursively merge...
We describe a search for primitive trinomials of degree 6972593 over GF(2). The only primitive trinomials found were x + 1 and its reciprocal. This completes the search for primitive trinomials whose degree is a Mersenne exponent less than ten million.
Given floating-point arithmetic with $t$-digit base-$\beta$ significands in which all arithmetic operations are performed as if calculated to infinite precision and rounded to a nearest representable value, we prove that the product of complex values $z_0$ and $z_1$ can be computed with maximum absolute error $\abs{z_0} \abs{z_1} \frac{1}{2} \beta^...
In this paper we mainly study the parallelization of the CGLS method, a basic iterative method for large and sparse least squares problems in which the conjugate gradient method is applied to solve normal equations. On modern parallel architectures its parallel performance is always limited because of the global communication required for inner pro...
Marsaglia (2003) has described a class of Xorshift random number generators (RNGs) with periods 2^n - 1 for n = 32, 64, etc. We show that the sequences generated by these RNGs are identical to the sequences generated by certain linear feedback shift register (LFSR) generators using "exclusive or" (xor) operations on n-bit words, with a recurrence d...
Fast and reliable pseudo-random number generators are re- quired for simulation and other applications in Scientific Computing. We outline the requirements for good uniform random number generators, and describe a class of generators having very fast vector/parallel im- plementations with excellent statistical properties. We also discuss the proble...
With VLSI architecture, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
Consider polynomials over GF(2). We de ne almost irreducible and almost primitive polynomials, explain why they are useful, and give some examples and conjectures relating to them. 2
A parallel sorting algorithm is presented for storage-ecient internal sorting on MIMD machines. The algorithm rst sorts the elements within each node using a serial sorting algorithm, then uses a two-phase parallel merge. The algorithm is comparisonbased and requires additional storage of order the square root of the number of elements in each node...
The problem of merging two sequences of elements which are stored separately in two processing elements (PEs) occurs in the implementation of many existing sorting algorithms. We describe ecient algorithms for the merging problem on asynchronous distributed-memory machines. The algorithms reduce the cost of the merge operation and of communication,...
We consider the requirements for uniform pseudo-random number generators on modern vector and parallel supercomputers, consider the pros and cons of various classes of methods, and outline what is currently available. We propose a class of random number generators which have good statistical properties and can be implemented e#ciently on vector pro...
As an extension of the "Cunningham" tables, we present tables of factorizations of a a < 100. The exponents n satisfy a if a < 30, and 100 if a 30. The factorizations are complete for n 46, and the tables contain no composite numbers smaller than 10 80 .
Dedicated to Hugh Cowie Williams on the occasion of his 60th birthday. Abstract. Consider polynomials over GF(2). We describe ecient al- gorithms for finding trinomials with large irreducible (and possibly prim- itive) factors, and give examples of trinomials having a primitive factor of degree r for all Mersenne exponents r = ±3 mod 8 in the range...
Applying gang scheduling can alleviate the blockade problem caused by exclusively used space-sharing strategies for parallel processing. However, the original form of gang scheduling is not practical as there are several fundamental problems associated with it. Recently many researchers have developed new strategies to alleviate some of these probl...
In this paper we mainly study the parallelization of the CGLS method, a basic iterative method for large and sparse least squares problems in which the conjugate gradient method is applied to solve normal equations. On modern parallel architectures its parallel performance is always limited because of the global communication required for inner pro...
The standard algorithm for testing reducibility of a trinomial of prime degree r over GF(2) requires 2r+O(1) bits of memory. We describe a new algorithm which requires only 3r/2+O(1) bits of memory and significantly fewer memory references and bit-operations than the standard algorithm. If 2 r -1 is a Mersenne prime, then an irreducible trinomial o...
Pseudo-random numbers with long periods and good statistical properties are often required for applications in computational nance. We consider the requirements for good uniform random number generators, and describe a class of generators whose period is a Mersenne prime or a small multiple of a Mersenne prime. These generators are based on almost...
Jacobi-based algorithms have attracted attention as they have a high degree of potential parallelism and may be more accurate than QR-based algorithms. In this paper we discuss how to design efficient Jacobi-like algorithms for eigenvalue decomposition of a real normal matrix. We introduce a block Jacobi-like method. This method uses only real arit...
In this paper, we would like to summarize the recent advances on the improved Krylov subspace methods for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. The proposed methods combine elements of numerical stability and parallel algorithm design without increasing much computational costs. The met...
Pseudo-random numbers with long periods and good sta- tistical properties are often required for applications in computational finance. We consider the requirements for good uniform random number generators, and describe a class of generators whose period is a Mersenne prime or a small multiple of a Mersenne prime. These generators are based on "al...
Motivated by a connection with block iterative methods for solving linear systems over finite fields, we consider the probability that the Krylov space generated by a fixed linear mapping and a random set of elements in a vector space over a finite field equals the space itself.
Gang schedu-.- iscuw56 tly the most popuKfi schedu5.# scheme for parallel processing in a time shared environment. In this paper we first describe the ideas of job re-packing and workload tree for e#ciently allocatingresou1w- to enhance the performance of gang scheduK5j1 We then present some experimentalresu-- obtained by implementing fou di#erentr...
Pseudo-random numbers are often required for simulations performed on parallel computers. The requirements for parallel random number generators are more stringent than those for sequential random number generators. As well as passing the usual sequential tests on each processor, a parallel random number generator must give dierent, independent seq...
Applying gang scheduling can alleviate the blockade problem caused by exclusively space-sharing scheduling. To simply allow jobs to run simultaneously on the same processors as in the conventional gang scheduling, however, may introduce a large number of time slots in the system. In consequence the cost of context switches will be greatly increased...
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on High Performance Computing and Networking (HPCN-97) (1997); IEICE Trans Inform Syst E80-D (9) (1997) 919] by using the Lanczos proces...
In this paper, an improved version of the BiCGStab (IBiCGStab) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that al...
For the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the BiConjugate Gradient method (IBiCG) method based on [5, 6] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stab...
Applying gang scheduling can alleviate the blockade problem caused by exclusively used space-sharing strate- gies for parallel processing. However, the original form of gang scheduling is not practical as there are several fun- damental problems associated with it. Recently many re- searchers have developed new strategies to alleviate some of these...