Conference PaperPDF Available

Parallel FEM Simulation of Crack Propagation — Challenges, Status, and Perspectives

Authors:

Abstract and Figures

Understanding how fractures develop in materials is crucial to many disciplines, e.g., aeronautical engineering, material sciences, and geophysics. Fast and accurate computer simlation of crack propagation in realistic 3D structures would be a valuable tool for engineers and scientists exploring the fracture process in materials. In the following, we will describe a next generation crack propagation simulation software that aims to make this potential a reality.
Content may be subject to copyright.
Parallel FEM Simulation of Crack Propagation
Challenges, Status, and Perspectives
List of Authors (in alphabetical order):
Bruce Carter, Chuin-Shan Chen, L. Paul Chew, Nikos Chrisochoides, Guang R. Gao,
Gerd Heber, Antony R. Ingraffea, Roland Krause, Chris Myers, Demian Nave,
Keshav Pingali, Paul Stodghill, Stephen Vavasis, Paul A. Wawrzynek
Cornell Fracture Group, Rhodes Hall, Cornell University, Ithaca, NY 14853
CS Department, Upson Hall, Cornell University, Ithaca, NY 14853
CS Department, University of Notre Dame, Notre Dame, IN 46556
EECIS Department, University of Delaware, Newark, DE 19716
1 Introduction
Understanding how fractures develop in materials is crucial to many disciplines, e.g., aeronautical engineering, mate-
rial sciences, and geophysics. Fast and accurate computer simulation of crack propagation in realistic 3D structures
would be a valuable tool for engineers and scientists exploring the fracture process in materials. In this paper, we will
describe a next generation crack propagation simulation software that aims to make this potential a reality.
Within the scope of this paper, it is sufcient to think about crack propagation as a dynamic process of creating
new surfaces within a solid. During the simulation, crack growth causes changes in the geometry and, sometimes, in
the topology of the model. Roughly speaking, with the tools in place before the start of this project, a typical fracture
analysis at a resolution of degrees of freedom, using boundary elements, would take about 100 hours on a state-
of-the-art single processor workstation. The goal of this project is it to create a parallel environment which allows the
same analysis to be done, using nite elements, in 1 hour at a resolution of degrees of freedom.
In order to attain this level of performance, our system will have two features that are not found in current fracture
analysis systems:
Parallelism Current trends in computer hardware suggest that in the near future, high-end engineering worksta-
tions will be 8- or 16-way SMP “nodes”, and departmental computational servers will be built by combining a number
of these nodes using a high-performance network switch. Furthermore, the performance of each processor in these
nodes will continue to grow. This will happen not only because of faster clock speeds, but also because ner-grain
parallelism will be exploited via multi-way (or superscalar) execution and multi-threading.
Adaptivity Cracks are (hopefully) very small compared with the dimension of the structure, and their growth
is very dynamic in nature. Because of this, it is impossible to know a priori how ne a discretization is required
to accurately predict crack growth. While it is possible to over-rene the discretization, this is undesirable, as it
tends to dramatically increase the required computational resources. A better approach is to adaptively choose the
discretization renement. Initially, a coarse discretization is used, and, if this induces a large error for certain regions
of the model, then the discretization is rened in those regions.
The dynamic nature of crack growth and the need to do adaptive renement make crack propagation simulation a
highly irregular application. Exploiting parallelism and adaptivity presents us with three major research challenges,
developing algorithms for parallel mesh generation for unstructured 3D meshes with automatic element size
control and provably good element quality,
implementing fast and robust parallel sparse solvers, and
determining efcient schemes for automatic, hybrid h-p renement.
1
To tackle the challenges of developing this system, we have assembled a multi-disciplinary and multi-institutional
team that draws upon a wide-ranging pool of talent and the resources of 3 universities.
2 System Overview
Figure 1 gives an overview of a typical simulation. During pre-processing, a solid model is created, problem specic
Prediction
Fracture
Analysis
Boundary
Conditions
FRANC3D
Errors
Estimate Iterative
Solution
Introduce
Flaws
Unstructured
Refinement
Structured
Refinement
Increase Order of
Basis Functions
YES
NO
Acceptable
Error?
Volume
Mesh
Finite
Element
Formulation
Solid
Model
Crack
Propagation
Live
Figure 1: Simulation loop.
boundary conditions (displacements, tractions, etc.) are imposed, and aws (cracks) are introduced. In the next step,
a volume mesh is created, and (linear elasticity) equations for the displacements are formulated and solved. An error
estimator determines whether the desired accuracy has been reached, or further iterations, after subsequent adaptation,
are necessary. Finally, the results are fed back into a fracture analysis tool for post-processing and crack propagation.
Figure 1 presents the simulation loop of our system in its nal and most advanced form. Currently, we have se-
quential and parallel implementations of the outer simulation loop (i.e., not the inner renement loop) running with the
following restrictions: currently, the parallel mesher can handle only polygonal (non-curved) boundaries, which can
be handled by the sequential meshers though (see section 3). We have not yet implemented unstructured h-renement
and adaptive p-renement, although the parallel formulator can handle arbitrary p-order elements.
3 Geometric Modeling and Mesh Generation
The solid modeler used in the project is called OSM. OSM, as well as the main pre- and post-processing tool,
FRANC3D, is freely available from the Cornell Fracture Group’s website [12]. FRANC3D - a workstation based
FRacture ANalysis Code for simulating arbitrary non-planar 3D crack growth - has been under development since
1987, with hydraulic fracture and crack growth in aerospace structures as the primary application targets since its
inception. While there are a few 3D fracture simulators available and a number of other software packages that can
model cracks in 3D structures, these are severely limited by the crack geometries that they can represent (typically
planar elliptical or semi-elliptical only). FRANC3D differs by providing a mechanism for representing the geometry
and topology of 3D structures with arbitrary non-planar cracks, along with functions for 1) discretizing or meshing the
structure, 2) attaching boundary conditions at the geometry level and allowing the mesh to inherit these values, and 3)
modifying the geometry to allow crack growth but with only local re-meshing required to complete the model. The
simulation process is controlled by the user via a graphic user-interface, which includes windows for the display of
the 3D structure and a menu/dialogue-box system for interacting with the program.
The creation of volume meshes for crack growth studies is quite challenging. The geometries tend to be compli-
cated because of internal boundaries (cracks). The simulation requires smaller elements near each crack front in order
to accurately model high stresses and curved geometry. On the other hand, larger elements might be sufcient away
2
from the crack front. There is a considerable difference between these two scales of element sizes, which amounts to
three orders of magnitude in real life applications. A mesh generator must provide automatic element size control and
give certain quality guarantees for elements. The mesh generators we studied so far are QMG by Steve Vavasis [15],
JMESH by Joaquim Neto [11], and DMESH by Paul Chew [15]. These meshers represent three different approaches:
octree-algorithm based (QMG), advancing front (JMESH), and Delaunay mesh (DMESH). QMG and DMESH come
with quality guarantees for elements in terms of aspect ratio. All these mesh generators are sequential and give us
insight into the generation of large “engineering quality” meshes.
We decided to pursue the Delaunay mesh based approach rst for a parallel implementation, which is described
in [5]. Departing from traditional approaches, we simultaneously do mesh generation and partitioning in parallel. This
not only eliminates most of the overhead of the traditional approach, it is almost a necessary condition to do crack
growth simulations at this scale, where it is not always possible or too expensive to keep up with the geometry changes
by doing structured h-renement. The implementation is a parallelization of the so-called Bowyer-Watson (see the
references in [5]) algorithm: given an initial Delaunay triangulation, we add a new point to the mesh, determine the
simplex containing this point and the point’s cavity (the union of simplices with non-empty circumspheres), and, -
nally, retriangulate this cavity. One of the challenges for a parallel implementation is that this cavity might extend
across several submeshes (and processors). What looks like a problem, turns out to be the key element in unifying
mesh generation and partitioning: the newly created elements, together with an adequate cost function, are the best
candidates to do the “partitioning on the y”. We compared our results with Chaco and MeTis in terms of equidistri-
bution of elements, relative quality of mesh separators, data migration, I/O, and total performance. Table 1 shows a
runtime comparison between ParMeTis with PartGeomKway (PPGK) and, our implementation, called SMGP, on 16
processors of an IBM SP2 for meshes of up to 2000K elements. The numbers behind SMGP refer to different cost
functions used in driving the partitioning [5].
Mesh Size PPGK SMGP0 SMGP1 SMGP2 SMGP3
200K 90 42 42 42 42
500K 215 65 87 64 62
1000K 439 97 160 91 94
2000K 1232 133 310 110 135
Table 1: Total run time in seconds on 16 processors.
4 Equation Solving and Preconditioning
We chose PETSc [2,14] as the basis for our equation solver subsystem. PETSc provides a number of Krylov space
solvers, and a number of widely-used preconditioners. We have augmented the basic library with third party packages,
including BlockSolve95 [8] and the Barnard’s SPAI [3]. In addition, we have implemented a parallel version of the
Global Extraction Element-By-Element (GEBE) preconditioner [7] (which is unrelated to the EBE preconditioner of
Winget and Hughes), and added it to the collection using PETSc’s extension mechanisms. The central idea of GEBE
is to extract subblocks of the global stiffness matrix associated with elements and invert them, which is highly parallel.
The ICC preconditioner is frequently used in practice, and is considered to be a good preconditioner for many
elasticity problems. However, we were concerned that it would not scale well to the large number of processors
required for our nal system. We believed that GEBE would provide a more scalable implementation, and we hoped
that it would converge nearly as well as ICC.
In order to test our hypothesis, we ran several experiments on the Cornell Theory Center SP-2. The preliminary
performance results for the gear2 and tee2 models are shown in Tables 2 and 3, respectively. (gear2 is a model
of a power transmission gear with a crack in one of its teeth. tee2 is a model of a T steel prole). For each model, we
ran the Conjugant Gradient solver with both BlockSolve95’s ICC preconditioner and our own parallel implementation
of GEBE on 8 to 64 processors. (The iteration counts in Tables 2 and 3 correspond to a reduction of the residual
error, which is completely academic at this point.) The experimental results conrm our hypothesis:
GEBE converges nearly as quickly as ICC for the problems that we tested.
Our naive GEBE implementation scales much better than BlockSolve95’s sophisticated ICC implementation.
3
Prec. Nodes Prec Time Per Iters.
Type Time(s) Iteration (s)
ICC 8 17.08 0.2 416
GEBE 8 9.43 0.19 487
ICC 16 15.47 0.27 422
GEBE 16 6.71 0.11 486
ICC 32 8.51 0.32 539
GEBE 32 3.73 0.08 485
ICC 64 11.00 0.28 417
GEBE 64 4.74 0.07 485
Table 2: Gear2 (79,656 unknowns)
Prec. Nodes Prec Time Per Iters.
Type Time(s) Iteration (s)
ICC 32 30.00 0.29 2109
GEBE 32 35.70 0.21 2421
ICC 64 23.60 0.29 2317
GEBE 64 7.60 0.12 2418
Table 3: Tee2 (319,994 unknowns)
5 Adaptivity
Understanding the cost and impact of the different adaptivity options is the central point in our current activities. We
are in the process of integrating the structured (hierarchical) h-renement into the parallel testbed, and the nal version
of this paper will contain more results on that. Our implementation follows the approach of Biswas and Oliker [4] and
currently handles tetrahedra, while allowing enough exibility for an extension to non-tetrahedral element types.
Error Estimation and Adaptive Strategies. For relatively simple, two-dimensional problems, stress intensity factors
can be computed to an accuracy sufcient for engineering purposes with little mesh renement by proper use of sin-
gularly enriched elements. There are many situations though when functionals other than stress intensity factors are
of interest or when the singularity of the solution is not known a priori. In any case the engineer should be able to
evaluate whether the data of interest have converged to some level of accuracy considered appropriate for the compu-
tation. It is generally sufcient to show, that the data of interest are converging sequences with respect to increasing
degrees of freedom. Adaptive nite element methods are the most efcient way to achieve this goal and at the same
time they are able to provide estimates of the remaining discretization error. We dene the error of the nite element
solution as and a possible measure for the discretization error is the energy norm,
Following an idea of Babuˇska and Miller [1] the error estimator introduced by Kelly et.al. [9,6] is derived by inserting
the nite element solution into the original differential equation system and calculating a norm of the residual using
interpolation estimates. An error indicator computable from local results of one element of the nite element solution
is then derived and the corresponding error estimator is computed by summing the contribution of the error indicators
over the entire domain. The error indicator is computed with a contribution from the interior residual of the element
and a contribution of the stress jumps on the faces of an element. Details on the computation of the error estimator
from the nite element solution can be found in [10].
Control of a Mesh Generator. For a sequence of adaptively rened and quasi optimal meshes, the rate of conver-
gence is independent of the smoothness of the exact solution. A mesh is called quasi optimal if the error associated
with each element is nearly the same. The goal of an adaptive nite element algorithm is to generate a sequence of
quasi optimal meshes by equilibrating the estimated error until a prescribed accuracy criterion is reached. Starting
from an initial mesh, error indicators and the error estimator are computed in the post-processing step of the solution
phase. The idea is then to compute for each element the new element size from the estimated error, the present
element size and the expected rate of convergence.
6 Future Work
The main focus of our future work will be on improving the performance of the existing system. The Cornell Fracture
Group is continuously extending our test-suite with new real world problems. We are considering introducing special
elements at the crack tip, and non-tetrahedral elements (hexes, prisms, pyramids) elsewhere. The linear solver, after
proving its robustness, will be wrapped into a Newton-type solver for nonlinear problems.
Among the newly introduced test problems are some that can be made arbitrarily ill-conditioned (long thin plate or
tube models with cracks in them) in order to push the iterative methods to their limits. We are exploring new precon-
ditioners (e.g., support tree preconditioning), and multigrid, as well as sparse direct solvers, to make our environment
4
more effective and robust.
We have not done yet any specic performance tuning, like locality optimization. This is not only highly platform
dependent, but also has to be put in perspective to the forthcoming runtime optimizations, like dynamic load balancing.
We are following with interest the growing importance of latency tolerant architectures in the form of multithreading
and exploring for which parts of the project multithreaded architectures are the most benecial.
Finally, there is a port of our code base to the new 256 node NT cluster at the Cornell Theory Center underway.
7 Conclusions
At present, our project can claim two major contributions. The rst is our parallel mesher/partition, which is the rst
practical implementation of its kind with quality guarantees. This technology makes it possible, for the rst time, to
fully automatically solve problems using unstructured h-renement in a parallel setting.
The second major contribution is to show that GEBE outperforms ICC, at least for our problem class. We have
shown that, not only does GEBE converge almost as quickly as ICC, it is much more scalable in a parallel setting than
ICC. We believe that GEBE, not ICC, is the yardstick against which other parallel preconditionersshouldbe measured.
And nally, our rst experimants indicate that we should be able to meet our project’s performance goals. We are
condent that, as we run our system on larger and faster machines, as we further optimize each of the subsystems, and
as we incorporate adaptive h- and p-renement, we will reach our performance goals.
References
[1] I. Babuˇska and A. Miller, A-posteriori error estimates and adaptive techniques for the nite element method,
Technical Report Tech. Note BN–968, University of Maryland, Inst.for Physics, Sci. and Tech., 1981.
[2] S. Balay, W.D. Gropp, L. Curfman McInnes, and B.F. Smith, “Efcient management of parallelism in object-
oriented numerical software libaries”, In E. Arge, A.M. Bruaset, and H.P. Langtangen, editors, Modern Software
Tools in Scientic Computing, Birkhauser Press, 1997.
[3] S.T. Barnard and R. Clay, A portable MPI implementation of the SPAI preconditioner in ISIS++”, Eighth SIAM
Conference for Parallel Processing for Scientic Computing, March 1997.
[4] R. Biswas and L. Oliker, A new procedure for dynamic adaption of three-dimensional unstructured grids”, Applied
Numerical Mathematics, 13:437–452, 1994.
[5] N. Chrisochoides and D. Nave, “Simultaneous mesh generation and partitioning for Delaunay meshes”, In 8th
Int’l. Meshing Roundtable, 1999.
[6] J.P. de S.R. Gago, D.W. Kelly, O.C. Zienkiewicz,and I. Babuˇska, “A posteriori error analysis and adaptive processes
in the nite element method: Part II Adaptive mesh renement”, International Journal for Numerical Methods
in Engineering, 19:1621–1656, 1983.
[7] I. Hladik, M.B. Reed, and G. Swoboda, “Robust preconditioners for linear elasticity FEM analyses”, International
Journal for Numerical Methods in Engineering, 40:2109–2127, 1997.
[8] M.T. Jones and P.E. Plassmann, “Blocksolve95 users manual: Scalable library software for the parallel solution of
sparse linear systems”, Technical Report ANL-95/48, Argonne National Laboratory, December 1995.
[9] D.W. Kelly, J.P. de S.R. Gago, O.C. Zienkiewicz, and I. Babuˇska, “A posteriori error analysis and adaptive processes
in the nite element method: Part I Error analysis”, International Journal for Numerical Methods in Engineering,
19:1593–1619, 1983.
[10] R. Krause, “Multiscale Computations with a Combined - and -Version of the Finite Element Method”, PhD
thesis, Universit¨at Dortmund, 1996.
[11] J.B.C. Neto et al., “An Algorithm for Three-Dimensional Mesh Generation for Arbitrary Regions with Cracks”,
submitted for publication.
[12] http://www.cfg.cornell.edu/.
[13] http://www.cs.cornell.edu/People/chew/chew.html.
[14] http://www.mcs.anl.gov/petsc/index.html.
[15] http://www.cs.cornell.edu/vavasis/vavasis.html.
5
... ese models do not e ect the elasticity of the bulk, however, typically they are inserted adaptively on-the-y into nite element meshes (Zhou and Molinari, 2004a), as pre-inserting the elements leads to very large computational overheads due to each node requiring duplicates. Historically the on-the-y insertion has meant that extrinsic models were di cult to parallelise (Carter et al., 2000), but modern techniques have been developed that have overcome the previous di culties (Espinha et al., 2013). e absence of arti cial sti ness in the extrinsic formulation renders it suitable for use in dynamic analyses due to the absence of arti cial compliance e ects (Camacho and Ortiz, 1996;Murphy and Ivankovic, 2005;Seagraves and Radovitzky, 2010), but care must be taken to ensure that the system exhibits time-continuous behaviour (Papoulia et al., 2003;Sam et al., 2005). ...
Article
Full-text available
An extrinsic cohesive zone model with a novel unload–reload behaviour is developed in the framework of non-smooth mechanics. The model is extended to include the effects of dynamics with impact, and is discretised in such a way that it can be written as a Linear Complementarity Problem (LCP). This LCP is proved to be well-posed, and to respect the discrete energy balance of the system. Finally, the LCP system is validated numerically, in both statics and dynamics, by simple test cases, and more involved finite element simulations that correspond to standard test geometries in the literature. The results correspond well with those of other authors, while also demonstrating the simulations’ ability to resolve with relatively large time steps while respecting the energetic balance.
... ese models do not e ect the elasticity of the bulk, however, typically they are inserted adaptively on-the-y into nite element meshes (Zhou and Molinari, 2004a), as pre-inserting the elements leads to very large computational overheads due to each node requiring duplicates. Historically the on-the-y insertion has meant that extrinsic models were di cult to parallelise (Carter et al., 2000), but modern techniques have been developed that have overcome the previous di culties (Espinha et al., 2013). e absence of arti cial sti ness in the extrinsic formulation renders it suitable for use in dynamic analyses due to the absence of arti cial compliance e ects (Camacho and Ortiz, 1996;Murphy and Ivankovic, 2005;Seagraves and Radovitzky, 2010), but care must be taken to ensure that the system exhibits time-continuous behaviour (Papoulia et al., 2003;Sam et al., 2005). ...
... Adaptive mesh refinement (AMR) methods are being widely used for numerical simulations of physical phenomena in many scientific domains, including earthquake (tectonic/thermal) modeling [20], computational shock multi-physics [16], computational heat transfer [3], materials science applications such as crack propagation [8], biological problems such as E-Coli proliferation [19], and generalized classes of nonlinear reaction-diffusion systems [7]. The survey articles by Carey et. ...
Article
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under Contract DE-AC04-94-AL85000. Approved for public release; further dissemination unlimited.
... The architecture of the runtime system we are using is shown inFigure 7. This leverages existing work on the Portable Runtime Environment for Adaptive Applications (PREMA) for the Crack Propagation Project [6]and Remos [11, 26, 25]. We have ported the Data Movement and Control Substrate (DMCS) [13]on top of Grid communication substrates like those provided as part of Globus [35, 15]. ...
Article
Full-text available
This paper describes the concept of optimistic grid computing. This allows applications to synchronize more loosely and better tolerate the dynamic and heterogeneous bandwidths and latencies that are seen in grid environ-ments. Based on the observed performance of a world-wide grid testbed, we estimate target operating regions for grid applications. Mesh generation is the primary test application where boundary mesh cavities can be optimisti-cally expanded in parallel. To manage the level of optimistic execution and stay within the application's operating region, we are integrating grid performance monitoring and prediction into the supporting runtime system. The ulti-mate goal of this project is to generalize the experience and knowledge of optimistic grid computing gained through mesh generation into a tool that can be applied to other tightly coupled computations in other application domains.
Article
The effect of welding residual stress on surface fatigue crack propagation and fracture mechanics parameters of intersecting orthogonal cracks is discussed. The effect of welding residual stresses on fatigue propagation of the internal surface crack is analysed in the case of welded joint of the pipeline. To predict fatigue crack propagation, the modified Forman law taking into account the nonsingular T‐stress as the crack‐tip constraint has been employed. It is shown by means of numerical simulation (the software ANSYS and MATLAB) that welding residual stress and crack‐tip constraint effects lead to the semielliptical form of final configuration of the surface fatigue crack. Fracture mechanics parameters for intersecting surface cracks, namely, circular and semielliptical cracks in welded joint of the pipeline are also analysed by numerical simulation. It was concluded that welding residual stress intensifies the effect of intersecting orthogonal surface cracks on fracture mechanics parameters including the crack‐tip constraint.
Article
Full-text available
This paper deals with the fatigue crack growth simulations of three-dimensional linear elastic cracks by XFEM under cyclic thermal load. Both temperature and displacement approximations are extrinsically enriched by Heaviside and crack front enrichment functions. Crack growth is modelled by successive linear extensions, and the end points of these linear extensions are joined by cubic spline segments to obtain a modified crack front. Different crack geometries such as planer, non-planer and arbitrary spline shape cracks are simulated under thermal shock, adiabatic and isothermal loads to reveal the sturdiness and versatility of the XFEM approach.
Conference Paper
Development of practical and accurate numerical tools for evaluating gear performance and life assists in the design of new and better gears. This paper summarizes new results for predicting crack trajectory and fatigue life for a spiral bevel pinion using the parallel finite element method (FEM) and computational fracture mechanics. The predictions presented are based on linear elastic fracture mechanics (LEFM) theories combined with FEM, and incorporating plasticity-induced fatigue crack closure and moving loads. The analyses are carried out using a parallel FEM solver, which calculates stress intensity factors (SIF) using the equivalent domain J-integral method. We show that we can simulate arbitrarily shaped fatigue crack growth in a spiral bevel gear more accurately and efficiently than with a previous boundary element based approach [1] using the parallel FEM along with a better representation of moving loads.
Article
In order to achieve realistic cohesive fracture simulation, a parallel computational framework is developed in conjunction with the parallel topology based data structure (ParTopS). Communications with remote partitions are performed by employing proxy nodes, proxy elements and ghost nodes, while synchronizations are identified on the basis of computational patterns (at-node, at-element, nodes-to-element, and elements-to-node). Several approaches to parallelize a serial code are discussed. An approach combining local computations and replicated computations with stable iterators is proposed, which is shown to be the most efficient one among the approaches discussed in this study. Furthermore, computational experiments demonstrate the scalability of the parallel dynamic fracture simulation framework for both 2D and 3D problems. The total execution time of a test problem remains nearly constant when the number of processors increases at the same rate as the number of elements.
Article
We present the results of an evaluation study on the re-structuring of a latency-bound mesh generation algorithm into a latency-tolerant parallel kernel. We use concurrency at a fine-grain level to tolerate long, variable, and unpredictable latencies of remote data gather operations required for parallel guaranteed quality Delaunay triangulations. Our performance data from a 16 node SP2 and 32 node Cluster of Sparc Workstations suggest that more than 90% of the latency from remote data gather operations can be masked effectively at the cost of increasing communication overhead between 2 and 20% of the total run time. Despite the increase in the communication overhead the latency-tolerant mesh generation kernel we present in this paper can generate tetrahedral meshes for parallel field solvers eight to nine times faster than the traditional approach. Copyright © 2003 John Wiley & Sons, Ltd.
Article
In this paper we present a parallel runtime substrate, the Mobile Object Layer (MOL), that supports data or object mobility and automatic message forwarding in order to ease the implementation of adaptive and irregular applications on distributed memory machines. The MOL implements a global logical name space for message passing and distributed directories to assist in the translation of logical to physical addresses. Our data show that the latency of the MOL primitives is within 10–14% of the latency of the underlying communication substrate. The MOL is a lightweight, portable library designed to minimize maintenance costs for very large-scale parallel adaptive applications.
Article
Full-text available
An algorithm for generating unstructured tetrahedral meshes of arbitrarily shaped three-dimensional regions is described. The algorithm works for regions without cracks, as well as for regions with one or multiple cracks. The algorithm incorporates aspects of well known meshing procedures, but includes some original steps. It uses an advancing front technique, along with an octree to develop local guidelines for the size of generated elements. The advancing front technique is based on a standard procedure found in the literature, with two additional steps to ensure valid volume mesh generation for virtually any domain. The first additional step is related to the generation of elements only considering the topology of the current front, and the second additional step is a back-tracking procedure with face deletion, to ensure that a mesh can be generated even when problems happen during the advance of the front. To improve mesh quality (as far as element shape is concerned), an a posteriori local mesh improvement procedure is used. The performance of the algorithm is evaluated by application to a number of realistically complex, cracked geometries.
Article
This is a paper presented in two parts dealing respectively with error analysis and adaptive processes applied to finite element calculations. Part I contains the basic theory and methods of deriving error estimates for second-order problems. Part II of the paper deals with the strategy for adaptive refinement and concentrates on the p-convergent methods. It is shown that an extremely high rate of convergence is reached in practical problems using such procedures. Applications to realistic stress analysis and potential problems are presented.
Article
This paper deals with two forms of preconditioner which can be easily used with a Conjugate Gradient solver to replace a direct solution subroutine in a traditional engineering finite element package; they are tested in such a package (FINAL) over a range of 2-D and 3-D elasticity problems from geotechnical engineering. Quadratic basis functions are used.A number of modifications to the basic Incomplete Choleski [IC(0)] factorization preconditioner are considered. An algorithm to reduce positive off-diagonal entries is shown in numerical experiments to ensure stability, but at the expense of slow convergence. An alternative algorithm of Jennings and Malik is more successful, and a relaxation parameter o is introduced which can make a further significant improvement in performance while maintaining stability. A heuristic for determining a near-optimal value of o is proposed. A second form of preconditioning, symmetrically scaled element by element, due to Bartelt, is also shown to perform robustly over a range of problems; it does not require assembly of the global stiffness matrix, and has great potential for parallelization. © 1997 by John Wiley & Sons, Ltd.
Article
The theory of the computable a-posteriori error estimate for a finite element method is developed. Among other things, it is shown that the error estimate is very reliable and the ratio (called effectivity index) between the estimator and the true error approaches one. Numerical examples computed by program FEARS (Finite Element Adaptive Research Solver) of the University of Maryland, illustrate the effectivity and reliability of the estimators. (Author)
Article
Despite continuing advancements in computer technology, there are many problems of engineering interest that exceed the combined capabilities of today's numerical algorithms and computational hardware. The resources required by traditional finite element algorithms tend to grow geometrically as the problem size is increased. Thus, for the forseeable future, there will be problems of interest which cannot be adequately modeled using currently available algorithms. For this reason, we have undertaken the development of algorithms whose resource needs grow only linearly with problem size. In addition, these new algorithms will fully exploit the ‘parallel-processing’ capability available in the new generation of multi-processor computers.
Article
A new procedure is presented for the simultaneous coarsening and refinement of three-dimensional unstructured tetrahedral meshes. This algorithm allows for localized grid adaption that is used to capture aerodynamic flow features such as vortices and shock waves in helicopter flowfield simulations. The mesh-adaption algorithm is implemented in the C programming language and uses a data structure consisting of a series of dynamically-allocated linked lists. These lists allow the mesh connectivity to be rapidly reconstructed when individual mesh points are added and/or deleted. The algorithm allows the mesh to change in an anisotropic manner in order to efficiently resolve directional flow features. The procedure has been successfully implemented on a single processor of a Cray Y-MP computer. Two sample cases are presented involving three-dimensional transonic flow. Computed results show good agreement with conventional structured-grid solutions for the Euler equations.
Article
The numerical treatment of structural problems is often difficult if the global system behavior is affected by phenomena on different length scales. When simulating multiscale problems, it is essential to find a discretization that equally reflects all aspects of the problem with sufficient accuracy. The present paper is concerned with the formulation of a finite-element procedure which combines the p-method to resolve global features of a problem with an h-method to resolve local features. Thus the advantageous properties of both standard procedures can be optimally exploited. The implementation provides the further advantage that the numerical treatment of the problems on the different length scales can be done with independent finite-element discretizations. To solve the overall problem, an iteration scheme with an adaptive solution strategy is developed. Benchmark problems and an example with relevance to soil mechanics are presented.
Article
In this paper, we present a new approach for the parallel generation and partitioning of unstructured 3D Delaunay meshes. The new approach couples the mesh generation and partitioning problems into a single optimization problem. Traditionally, these two problems are solved separately, first generating the mesh (usually sequentially) and then partitioning the mesh, either sequentially or in parallel. In the traditional approach, the overheads due to I/O and data movement exceed 50% of the total execution time. Even if parallel partitioning schemes are employed, data movement, synchronization, and data structure translation overheads are high; for applications which require frequent remeshing (e.g. crack growth simulations), these overheads are prohibitive. We present a method for solving the mesh partitioning and placement problem simultaneously with the mesh generation problem. By eliminating unnecessary and redundant cache, local, and remote memory accesses, we can speed up the generation and redistribution process, for very large meshes, by almost an order of magnitude compared to traditional approaches. Our results show that we can achieve nearly perfect equi-distribution of mesh elements over the processors, while maintaining reasonably good separator size, all while improving the quality of the mesh by eliminating many of the problems inherent in traditional parallel constrained mesh generation.