The University of Manchester
Question
Asked 5 June 2016
Software for high-performance cellular automata simulations?
Greetings everyone,
I keep looking for software for high-performance cellular automata simulations, but I can't find anything specific. I need one that takes advantage of multi-core processors.
What software do researchers use? Is it a Matlab toolbox or an R library for example..? And can I somehow extract accurate measurements of its performance?
Thank you in advance for your help!
Most recent answer
Thank you very much for once again! You have helped me gain perspective on all this - I think we are on the same page! Not only on dimensioning the problem, but also on what and why something like this is needed or not.
GCALab looks like incredible work!!! (for those interested, https://github.com/davidwarne/GCALab )
Popular answers (1)
University of Warsaw
If you talking about classical synchronous CA, consider the NVIDIA CUDA (it is in fact massivelly parallel C).
Nothing have such efficiency in one machine.
3 Recommendations
All Answers (12)
University of Tours
Compiled languages are the fastest, using multispin coding techniques.
2 Recommendations
The University of Manchester
Thank you both very much.
Mr Meyerhenke, I have seen many authors using implementations in C, but none of them used any parallel programming techniques (OMP, MPI etc). I will look for the paper you suggested, thank you very much!
I find it really peculiar that there isn't actually anything like this though - everyone is indeed writing their own code. I need to compare a CPU's performance with my hardware implementation's performance, but the rules are quite complicating and I am not sure I have enough time to write an optimal parallel program for this.
University of Warsaw
If you talking about classical synchronous CA, consider the NVIDIA CUDA (it is in fact massivelly parallel C).
Nothing have such efficiency in one machine.
3 Recommendations
The University of Manchester
Thank you very much! You are right about CUDA, but what I am looking for is a way to benchmark the CPU! I am sorry, I wasn't clear at all about that!
I will try to find some time to adjust stencil computing packages and figure out if they can be used for this. Thank you all for your interesting, useful remarks!
1 Recommendation
Queensland University of Technology
Depending on your CA application you may want to consider the Golly program (implements the Hashlife algorithm), it is highly optimized for standing synchronous, deterministic CA.
1 Recommendation
The University of Manchester
Dear Mr Warne,
thank you very much for your answer! Golly is a fantastic tool indeed, very powerful! Unfortunately, since Hashlife is really hard to parallelize, Golly currently uses a single CPU core. Still, Golly is perfect for benchmarking Hashlife on a single CPU core!
For those interested: Hashlife is a fascinating algorithm, which can calculate many CA generations ahead per step - this means it doesn't ever explicitly compute the intermediate steps, it just jumps n generations ahead directly!
Queensland University of Technology
I agree that getting a good performance boost from with parallelism for haslife may be difficult (with the exception of say performing a Monte Carlo like experiment over many initial configurations). Hashlife essentially trades off compute for memory, so a fare GPU vs CPU is a bit awkward.
In comparing algorithms (particularly if you also have different hardware architectures in the mix) it is very important to have a clear idea of what you are defining as optimal. Is the fastest algorithm (e.g., most cell updates/ generations per second/ cycle)? Or is it the fastest for a certain memory footprint or power footprint? etc.
I am of the opinion that if the task is to compare an optimised GP-GPU implementation with a CPU version, then the fastest CPU version should be used (even if not parallel).
1 Recommendation
The University of Manchester
I totally agree with everything you wrote. Thank you again for your answer, it helps me put certain things in perspective!
Excuse me if I come out as stubborn and ignorant, I am an undergrad who tries to learn things beyond curriculum. Let me rephrase my question and be more specific, since I think we have covered only part of my concern:
Hashlife is a great algorithm, the best we can do as far as deterministic CA rules of classes I and II are concerned (stepping into the dangerous grounds of CA classification).
As you said Hashlife is good depending on your application. It doesn't perform well on every deterministic CA rule - if we come across a deterministic CA rule that eventually results in large (pseudo)chaotic patterns it would probably perform worse than a parallel software (ran on many CPU cores, GPU etc etc etc).
What would that parallel software be if you had to choose?
I think CA can be easilly parallelized - the simplest form of it would be that every processing core (of whatever kind) processes a diferent part o the CA's grid. There are no dependences between the cores, only a few common cells that cause no problem (for example a memory bandwith bottleneck, especially if the neighborhood's small). I think stencil computing packages must be taking advantage of this feature, but I still have no idea.
Thank you again for your contribution and your help!
Queensland University of Technology
Well in terms of Wolfram classifications Game of Life would be a Class VI (as would be many CA of interest for the use of Hashlife). There may be some rare Class III with a chaotic attractor such that Hashlife is reduced to its worst case.
Have you got an example in mind of a CA that will perform poorly with Hashlife? I believe that in time complexity of hashlife will be worst case O(n) (where n is the number of cells), so you don't really lose much in time (of coarse memory usage is greater than a simple CA implementation).
I think the biggest parallelisation challenge for a simple CA implementation is giving each core enough work to do in each timestep. Essentially each cell only needs to do one neighbourhood lookup and then store the result. At each timestep all threads need to synchronise, this has an overhead.. The number of cells you would need to out-weigh the threading overheads will probably be very large. That's just my intuition on it. I'm not saying that it is not worth pursuing... In practice I have never needed a parallel CA implementation, rather I would use the cores to process many evolutions of the CA under different initial conditions to compute entropies etc. The software I used was GCALab (I wrote it, so I'm biased), It is not a polished tool but it is very efficient for the above stated task.
1 Recommendation
The University of Manchester
Thank you very much for once again! You have helped me gain perspective on all this - I think we are on the same page! Not only on dimensioning the problem, but also on what and why something like this is needed or not.
GCALab looks like incredible work!!! (for those interested, https://github.com/davidwarne/GCALab )
Similar questions and discussions
Related Publications
La reducción de modelos para problemas de control de gran tamaño es actualmente uno de los temas fundamentales en teoría de sistemas y control. Entre diversas técnicas existentes, los métodos de truncamiento de estados son los que permiten una mayor precisión en la representación del sistema reducido. Muchos de estos métodos necesitan resolver una...