Science method
Parallel Computing - Science method
Explore the latest questions and answers in Parallel Computing, and find Parallel Computing experts.
Questions related to Parallel Computing
Hello everyone,
I am facing an issue with Abaqus and parallel computing. I am using VUHARD subroutine with COMMON block and VEXTERNALDB. But I am using different results using different number of cores. I am starting each analysis with same microstructure and same subroutine just with different number of cores. The results seem to show triangles where calculations seems to be happening. For example, in the attached document, I start with an initial microstructure with 10,000 elements and I run it with cpus=4, 8, 12. I get different results. Could someone please explain what could be going on? And how can I achieve analysis of the full model?
Thanks,
Akanksha
I want to simulate a discrete random media with FDTD method.
the simulation environment is air that is filled with random spherical particles (that are small to the wavelength) with defined size distribution.
what is an efficient and simple way to create random scatterers in large numbers in FDTD code?
i have put some random scatterers in small area but i have problem producing scatterers in large numbers.
any tips, experiences and suggestions would be appreciated.
thanks in advance.
Is it possible, in some way, to parallelize a script in MATLAB that has dependent variables? Would it be possible to use parfor?
I am currently studying the application of simulated annealing techniques to optimization problems. In particular, I am interested in applying the optimization method to the study of Ising models. It is well known that artificial annealing is an intrinsically sequential method. Over the past 20 years, many parallel algorithms based on simulated annealing have emerged. At this time, I don't understand their pros and cons at all. I ask for a recommendation for a good and clear overview article or book chapter on parallel methods based on simulated annealing.
I am a researcher who works on Abaqus and i am using parallel computing system (HPC)
I have very heavy files, and i just noticed that when i run them, ABAQUS is not using the full capacity of the memory.
it is very slow to the point it might take months to run a file!
Note: i have around 10 millions element in the mesh.
Is there anything to do for ABAQUS to make it actually use the full capacity of the memory!
Note: parallel computation option is turned on and it actually uses several cores but the problem it is not using the full capacity of each core.
See the screenshot!
it is using around 50 GB only out of 376GB.
How to increase this value!
Any help is appreciated!
Best,
Yasmeen
Hello,
I am trying to enable a parallel computing in ABAQUS using my AMD GPU with the OpenCL, but it is not working
I have this error on log file
Error: Call of undefined function
Compiler log: There is a call to an undefined label
Error: HSAIL program is not finalized successfully
Codegen phase failed compilation
BUILD LOG
*************************
There is a call to an undefined label
Error: HSAIL program is not finalized successfully
Codegen phase failed compilation
Error: BRIG finalization to ISA failed.
**************************
Error: Kernel compilation Failure
Warning: GPUAcceleration disabled
My GPU is an AMD Radeon PRO wx3100
Someone could help me?
I am working on a research project in which we are doing a comparative analysis of reinforcement learning (RL) with evolutionary algorithms in solving a nonconvex and nondifferentiable optimization problem with respect to solution quality and computation time.
We are using python implementations, but one difficulty is that, although we can use GPUs for the execution of reinforcement learning algorithm, there is not much support for using GPUs with evolutionary algorithms in Python.
On the other hand, if we want to compare the algorithms with respect to computation time, we have to execute them on the same hardware (parallel computing system).
However, we cannot run RL algorithm on CPU based parallel system because of our resource constraints.
Can anyone tell us how to establish an equivalent parallel computing systems, one based on CPUs & GPUs (for RL algorithms), and the other based on CPUs only (for evolutionary algorithms), so that we can compare them with respect to computation time.
Thanks in advance,
Best Regards
I am running various Abaqus simulations with a model size of about 1.5 million degrees of freedom in total. To speed up the calculations I am trying to decide what number of CPUs would be optimal and what are influencing factors (like model size, steps, time steps, outputs, hardware etc.). I'm interested in the question at what number the writing and merging of partial results and data between the different cores outweighs the benefit of using multiple CPUs.
I want to run an ODE system by varying a suitable parameter, suppose \alpha. Now I have divided the job into the corresponding worker and created a data file so that for each parameter \alpha, I need to copy a certain measure of the system into the data file. When I googled it and went through some documentation, I realized that using a single file only creates a route to a corrupt file. It is not a way to do that, and I need to create multiple files. I need to ask whether there is any way to create files with some exact name (must be in parallel programing) and scan data from all the files to create another data file that contains all the data.
Hello everyone,
I have a VUMAT code for a plasticity model, and I want to run a job for this model in cluster with ABAQUS 6.12. Because my VUMAT code is time-cosuming I have to run my model in cluster with 16 processors. I receive the following error when run the model and the job is terminated.
***ERROR: Floating Point Exception detected in increment 86485. Job exiting.
86485 1.469E-01 1.469E-01 08:58:41 1.699E-06 436 NaN NaN 3.230E+04
INSTANCE WITH CRITICAL ELEMENT: BILLET-1
A special ODB Field Frame with all relevant variables was written at 1.469E-01
THE ANALYSIS HAS NOT BEEN COMPLETED
Many Thanks,
ALI
Hello everyone,
I've implemented a FEM code in matlab, and now I'd like to make it faster. Since I have to perform nonlinear dynamic analysis, I have a lot of iterations, each one requiring to asseble the full stiffness matrix K.
What I would like to do thus is to parallelize the assembling procedure of K. Till now I tried parpool and spmd, the latter with poor results, the first one performing nicely (speedup factor x2... despite using 10 cores...) but only under a certain number of elements. Overcome a certain treshold, parallel computation (14 cores) would take as much as 10 times the single core version.
I understand this may be related to overheads in the comunication between "master" and workers and/or slicing procedures, but it seems I cannot get the hang of it...
Does anyone have suggestions and/or can point me to some useful material on this specific matter?
Thank you all in advance,
Jacopo
I have a problem. Measurements show the opposite of what convention assumes.
I test soil specimens. We try to decode how much deformation a certain loading (force) sequence will generate.
After 6 years of testing, I noticed reaction force behaves as a function of deformation. Not the way convention tries to describe it.
It's a big problem. All software is designed to midel deformation as a function of force applied. But in reality, the stiffness hysteresis loops behave such, that force (reaction) is a function of deformation.
The observations, empirical evidence, pointed to an abandoned theory from 40 years ago (strain space plasticity, by P. J. Yoder). His theory seems to be not only compatible with the observed physical properties, but also GPU - parallel computation compatible.
So, we have something that is both:
1. Potentially more physically cotrect
2. For the first time - super computer compatible.
I am stuck building robots for industrial projects. Which are meant to provide "quick profit" to faculty. Research is not financed. All observations were made in spare time. At times - using life savings...
When experts are shown the test results, they become red in the face, angry, and say "have not seen anything like it". After an hour of questions - vannot find any flaws in the testing machines. And.. Leave. Never to hear from again.
The theory of P. J. Yoder was defended in public defenses multiple times in the 80's. No one found flaws in it. Did not prove it wrong. But... Forgot, ignored and abandoned.
Industry asks for code compatible with existing software (return of investment)... And I alone can not code a full software package. Frankly, I would rather keep testing, try to prove my assumptions wrong. But the more I test, the mote anomalies and paradoxes are observed, exposed and resolved on the topic..
Is
How to run NAMD 2.14 on cores from multiple nodes in slurm? need batch file please.
this is the one i am using:
#!/bin/sh
#SBATCH --job-name=grp_prod
#SBATCH --partition=cpu
#SBATCH --ntasks=48
#SBATCH --cpus-per-task=1
#SBATCH --time=30-50:20:00
mpirun -np 48 /lfs01/workdirs/cairo029u1/namd2_14/NAMD_2.14_Linux-x86_64-multicore/namd2 +auto-provision step4_equilibration.inp > eq.log
this batch file produces this attached log file (the first part only is attached)
I am looking for a user-friendly neuro-evolution package with good parallelization. I wish to quickly explore some ideas (using a short learning curve); preferably in R or Python programming language.
Hello,
I am running a finite element (FE) simulation using the FE software PAMSTAMP. I have noticed that when I run the solver in parallel with 4 cores, the repeatably of the results is poor; when I press 'run' for the same model twice, it gives me inconsistent answers.
However when I run the simulation on a single core, it gives me consistent answers for multiple runs of the same model.
This has lead me to believe that the simulations are divided differently between multiple cores, each time the solver is run.
Is there a way to still run the simulation in parallel (multiple cores) but have the solver divide the calculation in the same manner each time, to ensure consistency for multiple runs?
Thanks,
Hamid
Parallel computing in Matlab is not supported for the fminsearch (Nelder-Mead Simplex search) method. We would be grateful if you could share your experience to run this method in parallel in order to speed up the optimisation process of expensive problems.
Hello,
I'm trying to run my Abaqus simulation using gpus. I have a PC with AMD Ryzen™ 5 2400G & Radeon™ RX Vega 11 Graphics.
Calling the gpu from CAE and the Command window doesn't work. I have found a possible solution using CUDA, which I have not tried yet since it refers to NVIDIA hardware. Other posts suggest using OpenCL but I can not find how to download it.
Any ideas would be helpful!
In litterature i find lot of paper that discuss the problematic of parallel computing but i didn't find something relevant concerning how can we define the optimal number of compute nodes that will give us the maximum speedup.
Thanks for your help
#HPC #parallel computing #scientifique_simulation
#graph_partitionning
I am currently interested in solving large linear systems on a distributed parallel computing grid using the ScaLAPACK library. I wonder if there is a quadruple precision version (Or any other alternative library which provides quadruple precision for parallel distributed computing)?
Hi, I am trying to compile quantum espresso for amd ryzen 7 4800h CPU, which is my personal laptop. Since Intel mkl isn't well suited or optimised for amd processors I was wondering if there were other alternatives. For instance is openblas a good option? Or should I just go ahead with the internal libraries that are already there in qe. Also Has anyone tested aocl 2.2 for qe? I read recently in the qe website that aocl 2.1 gave some weird results. Finally, what mpi should I choose for parallel computing? Is openmpi fine or is there other better alternative which anyone would like to suggest. Thank you
I am using fluent 14.5 with a parallel computing system. I am using UDF for property calculation as well as nucleation and growth in the subdomain/ cell zone.
After solving one iteration it is giving errors like "primitive error" and cortex error Ansys.Fluent.Cortex.CortexNotAvailableException' was thrown.
Can anyone please tell me why is this happening?
Hi, I am looking for some good online (prefereably free) resources where I can learn "parallel computing" from scratch using Python or R programming languages. Thanks.
I wanna apply parallel computing techniques to decrease the calibration time. Using the code below, I can calculate 10 sets for 11 parameters simultaneously using PARFOR loop but I cannot get the cost function value simultaneously for each set of parameters values during the calibration process using GA. Your helpful commands and suggestion are highly appreciated.
% GA Parameteres
MaxIt=20; % Maximum Number of Iterations
nPop=10; % Population Size
NoVar=11; % No. of parameters to be calibrated during
delete(gcp('nocreate')) % shut down previous parrel computing
parpool('local') % Activate parallel computing parfor
i=1:nPop
pop(i).Position=unifrnd(VarMin,VarMax,VarSize);
pop(i).Cost=CostFunction(pop(i).Position);
end
Hello everyone,
I'm running MMPBSA by using the g_mmpbsa tool for GROMACS. However, it runs extremely slow even though it's claimed that by default it uses all the processors: https://rashmikumari.github.io/g_mmpbsa/How-to-Run.html
How can I be sure if it really uses all of the processors? And how can I increase it if possible/necessary?
Thanks in advance.
Hi,
I am trying to do a Quantum espresso SCF calculation on an Intel Xeon Gold Gold 5120 CPU @ 2.20 GHz (2 Processor). It has 56 cores and a 96 GB RAM.
I am trying to do a parallel calculation on this workstation by using:
mpirun -np (no of cores) pw.x -npool (no of pools) -inp si.pw.in
According to internet sources, I have tried to improve the performance by setting the OMP_NUM_THREADs=1 and I_MPI_PIN_DOMAIN=1.
Can anyone please guide me as to how to choose the no of optimum cores and the no of pools on which I should run the calculation.
The input file is attached below.
The FFT grid dimensions is (48 48 48) and maximum number of k-points is 40000
Subsidiary Questions:
1. Should the Subspace diagonalization in iterative solution of the eigenvalue problem run by a serial algorithm or an ELPA distributed-memory algorithm
2. Should the maximum dynamical memory per process be high or low for better performance?
I have built up a DEM model with Abaqus, in which there are three cubics falling onto the floor. Particles in the two cubics (named as Solid-1 and Solid-2 in the model) were applied Beam-MPC constraint, and the other one cubic (named as Particle-3 in the model) has no constraint among the particles. The three cubics fall onto the rigid floor due to gravity.
The purpose of the example is to testify the possibility of parallel computing of a DEM model with particle clusters and particles.
However, the example can run under 1 CPU, but it failed to parallel computing. Attached is the .inp file and the snapshot of the error message. I try to use the *DOMAIN DECOMPOSITION to decompose the Particle-3 domain, but it failed.
I will really appreciate it if anyone can help me fix this problem.
We are undergraduate students and would like to work on simulating tsunamis. Could you suggest study material/ebooks/lectures to learn SPH basics from?
Hi,
I'm trying to run my Abaqus simulation using gpus. I've downloaded the Nvidia Cuda Toolbox, but my simulation still doesn't seem to run on the gpu.
If I run following command in Abaqus Command:
"abaqus job=Job-8 inp=Job-8.inp user=umat_1d_linear_viscoelastic.for cpus=2 gpus=1"
Abaqus starts the simulation, but never finishes it and the .lck file stays remaining.
Does anybody now how I can enable Abaqus to use my gpu?
I want to launch a parallel simulation of CQN model on Omnet++ using multiple instance of Omnet++ in different PCs. Do you have any idea of this?
I am trying to run an SCF calculation on a small cluster of 3 CPUs with a total of 12 processors. But when I am running my calculation with command mpirun -np 12 ' /home/ufscu/qe-6.5/bin/pw.x ' -in scf.in> scf.out it is taking a longer time then it was taking in a single CPU with 4 processor. So it will be really great if you can guide me on this since I am new in this area. Thank you so much for your time.
Hi,
I want to setup a 4 nodes (each one with 4 cores) Raspberry pi cluster and install quantum espresso on it for simulations. Does anyone know how good it works compare with core i7 PC computer for simulating nanostructures?
any test on it?
thanks
In many research papers, I read that dependency between iterations or tasks can be identified using some type of dependency tests like Banerjee's Test. Can anyone please provide a real example that can be applied on for loop that showed how does the dependency test work?
Hi dear
I want to use MICH2 for parallel computing of MCNP. I Try it nut MPIEXEC run multi-time instead of running parallel?
Can anyone help me?
my command is here:
mpiexec -n 2 -noprompt mcnpx.exe i=Sp1w_05.TXT
I installed AmberTools18 and I installed MPI libraries as well. Everthing is working but when I want to run sander parallel like this:
mpirun -np 32 sander.MPI -O -i min.in -o min.out -p ras-raf_solvated.prmtop -c ras-raf_solvated.inpcrd \
-r min.rst -ref ras-raf_solvated.inpcrd
it didn't work, then I realized sander.MPI doesn't seem in my AmberTools package. So how can I run it? or if I didn't install how can I install it?
i wrote a shallow water equation code on MATLAB but it is solving sequentially and takes a lot of time so i was wondering how can i transform my code to a parallel one so that it can utilize multiple cores.
I tried to detect faces in video with help of Machine Learning. Now I want to accelerate the running time of my Algorithm via some abstraction layer (High Language).
Is there any high-level language of OpenCL to do that?
Has someone experience with SYCL, it's possible to implement this with it?
many thanks in advance for your help
Me
Special issue
Parallel computing and co-simulation in railway research
International Journal of Rail Transportation
Submissions are due on 15th April 2019.
Minuscule devices need to process huge amounts of data. Currently, data from such devices is offloaded to larger machines such as GPUs to be processed. However, this setup poses many issues such as exposure of human tissue to radiation, as well as distance/security issues. Thus, such data needs to be processed at the source.
I read some scientific papers and most of them are using data dependency test to analyse their code for parallel optimization purpose. one of the dependency test is that, Banerjee's test. Are there other tests that can provide better result testing data dependency? and is it hard to do a test on control dependency code, and if we can, what are some of the technique that we can use?
Thank you
Hi,
I need to solve an Eigenvalue problem of the form [A]{x} = e [B]{x}, where A, B are sparse matrices. Since the size of my matrices are quite huge, some 20,000 x 20,000, the only approach is to employ a parallel code. Can anyone suggest such a solver or point me in the right direction?
Thanks in advance !
Hello,
I currenty work on a FSI (Fluid-Structure-Interaction) using Ansys 18.1. After a lot of efford I succeded at a working simulation, but it takes a high computation duration. With Ansys Fluent, a simple simulation with around 500k elements in CFD and 30k elements in Mechanical takes 67h with only 10 time steps (75 iterations).
I want to improve the speed of the computation, but I cannot find any setting in order to do this. In Fluent I selected parallel computing with 4 cores. Even with GPGPU support, there is no significant duration improvement. During the project, I have a max CPU perfomance of 20%. The CFD simulation with Fluent, without FSI takes around 1h.
Has anyone an idea for a FSI simulation? I use the pressure-based solver with the coupled setting and second order implicit transient. The dynamic mesh is generated by smoothing (diffusion).
By request, there is more information, if needed.
Kind regards,
I want to run a stereo vision algorithm on two computers, what is the best way to write a parallel algorithm?
Is there a way that I can access a Hadoop Map/Reduce cluster available for researchers for free to run some experiments?
aoa my research topic is resource management in cloud computing. my supervisor ask me to relate it to parallel computing. can any one tell me how can i do so ?
We live in a world of computations, there is no doubt about it. Computations that increase in terms of computarional demand every day. What do you think is the future in solving large scale numerical models? Some say OMP, others MPI and the romantics say cloud computing. What is your opinion and why?
PS: Any other methods used to solve in parallel or other techniques are more than welcome to be presented.
Does anyone have experience in deploying the WIndows HPC private cluster, or executing an Abaqus job on two computers without queuing server using MPI Parallelisation?
I have managed to set-up the Cluster and to join the headnode running on Windows Server 2012 R2 and one compute node "W510" running on Windows 7 64bit Ultimate.
Unfortunately I keep getting the following error massage:
Failed to execute "hostname" on host "w510". Please verify that you can execute commands remotely on "w510" without a password. Command used: "C:\Program Files\Microsoft MPI\Bin\mpiexec.exe -wdir C:\Windows -hosts 1 w510 1 -env PATH C:\SIMULIA\Abaqus\6.13-4\code\bin;C:\Windows\system32 C:\SIMULIA\Abaqus\6.13-4\code\bin\dmpCT_rsh.exe hostname".
I have this project about atherosclerosis and huge amount of computation. I don't know how can I use parallel computing in ADINA or using a cluster or distributed computation. The information in the internet is so confusing. Anyone know an introductory book?
I read many books and search on Internet but I did not find any answer .
Please Share the Steps.
It should be highly appreciated.
more explanation on implementation
As I know a YARN container specifies a set of resources (i.e. vCores, memory etc.). Further we may also allocate containers of required size based on the type of task it executes. In the same sense, what does a map or reduce slot specify in MRv1? does it specify cpu core or small size of memory or both? And is it possible to specify the size of map reduce slots?
Further, suppose we have two nodes with same CPU power but different disk bandwidth, one with a HDD and other with SSD. Can we say that power of slot on first node with HDD is less than node with SSD?
I want to find the total memory access time for a matrix program. The formula is
MAT = HITRate × CacheAccessTime + MissRate × RAMAccessTime
I have calculated the cache miss and hit rate by cache grind. But I dont know how to find the cache access time and Ram access time? Kindly I need your help. Is there any tool like perf and cache grid which find the cache and Ram access time in ubunutu.
I have conducted the results on the coachbench benchmark. Each one performs repeated access to data items on varying vector lengths. Timings are taken for each vector length over a number of iterations. Computing the product of iterations and vector length gives us the total amount of data accessed in bytes. This total is then divided by the total time to compute a bandwidth figure. This figure is in megabytes per second. Here we define a Megabyte as being 10242 or 1048576 bytes. In addition to this figure, the average access time in nanoseconds per each data item is computed
and reported.
But I just got the result in the following form: why there is a sudden decrease in the time by increasing vector size.
output:
C Size Nanosec
256 11715.798191
336 11818.694309
424 11812.024314
512 11819.551479
.... ......
..... .....
4194309 9133.330071
I need the results in the following form. How I get this result.
The output looks like this:
Read Cache Test
C Size Nanosec MB/sec % Change
------- ------- ------- -------
4096 7.396 515.753 1.000
6144 7.594 502.350 1.027
8192 7.731 493.442 1.018
12288 17.578 217.015 2.274
I was wondering whether anyone knows about an automated tool to collect GPU kernels features, i.e., stencil dimension, size, operations, etc. Such tools are widely available for CPU kernels.
i am using Ubuntu 64 bit 14.04 and 32 bit 8.04 versions. I am running a source code in fortran. i get different results on two machines. what can be the reASONS.
i tried using gfortran and intel compilers. the two compilers give identical results on respective machines but they differ when the results are compared from two different machines
I am using OpenMP for performing parallel programming and i want to execute my c code (with OpenMP) in gem5. But i was unable to execute my code. I have tried to install m5threads, but there is lots of error when i installing. Can anybody help me about how to successfully install m5threads and how can i execute my c code (with OpenMP) in gem5?
Please Help
Hello,
I using gromacs 5 and want to choose how many processors to use in a simulation. For choosing GPUs I have read the mdrun function page which is easy to do it, but for cpu I haven't found how to do it.
regards
Hi,
We have several isolated GPU units and plan to build a cluster and a queuing system. Does anyone have relevant experiences? Can you tell me the prerequisites of the soft and hard wares and how to compute the installation step by step?
Thanks for your time!
Cheng
if I had 3 nodes each one has 8 core and these nodes allow hyper threading. We found running code on 3 nodes and 16 thread will be faster than 4 nodes with 16 thread and 5 nodes with 16 thread faster than 3 and 4 nodes! Any explanation?
I have a problem with parallel computing. I can't open matlabpool. When I type matlabpool 2, I get the following error:
Error using matlabpool (line 144) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.)
Caused by: Error using distcomp.interactiveclient/start (line 61) Failed to locate and destroy old interactive jobs.
This is caused by: The storage metadata file does not exist or is corrupt
Hi all,
I would like to establish parallel computing by using the GPU of my M2000 Nvidia graphics card. I was wondering if this works for all the softwares, in my case bioinformatics softwares like PASTA (Python platform), or is it limited to certain programming platforms or particular softwares/libraries provided by the parallel computing APIs (e.g. CUDA)?
If it is limited then, is there way to achieve parallel computing without being limited to using only certain softwares?
Thanks.
Is it possible to load the operating system on a Jetson TK1/TX1 using SD cards? From the best of our knowledge, only it is possible using the internal 16Gb flash memory.
We have already used Trimaran, but it seems to be hard to work with.
Is there any alternative to Trimaran compiler?
we need the following features:
Very Long Instruction Word Processors (VLIW)
Explicitly Parallel Instruction Computing (EPIC)
Instruction-Level Parallelism (ILP)
I know that you use mpicc to compile MPI programs as > mpicc abc.cpp and you use pgc++ for compiling OpenACC directives. Is there a way where you can compile MPI+OpenACC or pgc++ and mpicc. Thanks
How can i create small cloud computing lab? ( I am now working in task scheduling and balancing in cloud computing, so i want know what are i need requirements ( no. of physical machines and types of operating system, how can i install hypervisor, types of hypervisor, how can i create virtual machines and send the task to vms , where can i add my task scheduling algorithm ) is there any site or video can clear that.
I would like to emulate various n-bit binary floating-point formats, each with a specified e_max and e_min, with p bits of precision. I would like these formats to emulate subnormal numbers, faithful to the IEEE-754 standard.
Naturally, my search has lead me to the MPFR library, being IEEE-754 compliant and able to support subnormals with the mpfr_subnormalize() function. However, I've ran into some confusion using mpfr_set_emin() and mpfr_set_emax() to correctly set up a subnormal-enabled environment. I will use IEEE double-precision as an example format, since this is the example used in the MPFR manual:
mpfr_set_default_prec (53);
mpfr_set_emin (-1073); mpfr_set_emax (1024);
The above code is from the MPFR manual in the above link - note that neither *e_max* nor *e_min* are equal to the expected values for `double`. Here, p is set to 53, as expected of the double type, but e_max is set to 1024, rather than the correct value of 1023, and e_min is set to -1073; well below the correct value of -1022. I understand that setting the exponent bounds too tightly results in overflow/underflow in intermediate computations in MPFR, but I have found that setting e_min exactly is critical for ensuring correct subnormal numbers; too high or too low causes a subnormal MPFR result (updated with mprf_subnormalize()) to differ from the corresponding double result.
My question is how should one decide which values to pass to mpfr_set_emax() and (especially) mpfr_set_emin(), in order to guarantee correct subnormal behaviour for a floating-point format with exponent bounds e_max and e_min? There doesn't seem to be any detailed documentation or discussion on the matter.
Here is a small program which demonstrates the choice of e_max and e_min for single-precision numbers.
#include <iostream>
#include <cmath>
#include <float.h>
#include <mpfr.h>
using namespace std;
int main (int argc, char *argv[]) {
cout.precision(120);
// Actual float emin and emax values don't work at all
//mpfr_set_emin (-126);
//mpfr_set_emin (127);
// Not quite
//mpfr_set_emin (-147);
//mpfr_set_emax (127);
// Not quite
//mpfr_set_emin (-149);
//mpfr_set_emax (127);
// These float emin and emax values work in subnormal range
mpfr_set_emin (-148);
mpfr_set_emax (127);
cout << "emin: " << mpfr_get_emin() << " emax: " << mpfr_get_emax() << endl;
float f = FLT_MIN;
for (int i = 0; i < 3; i++) f = nextafterf(f, INFINITY);
mpfr_t m;
mpfr_init2 (m, 24);
mpfr_set_flt (m, f, MPFR_RNDN);
for (int i = 0; i < 6; i++) {
f = nextafterf(f, 0);
mpfr_nextbelow(m);
cout << i << ": float: " << f << endl;
//cout << i << ": mpfr: " << mpfr_get_flt (m, MPFR_RNDN) << endl;
mpfr_subnormalize (m, 1, MPFR_RNDN);
cout << i << ": mpfr: " << mpfr_get_flt (m, MPFR_RNDN) << endl;
}
mpfr_clear (m);
return 0;
}
With thanks,
James
Dear all
- Would anyone give me a brief and useful review on VASP and Quntum esprsso? I need to know about their performance, speed, parallel computing, free license and tools. Which one is more applicable in solid state physics and comutational nanosciences. Please rank them 0-100.
I am looking for works or researches where the CPU actually outperforms the GPU