The code makes relatively heavy use of the texture cache and does almost not use the CPU. While one of the main focuses of the code is to scale well on GPU clusters I have only used one GPU here since I wanted to compare the performance of the GPUs only.
As a comparison I run the CPU version of LAMMPS (which uses the exact same algorithms) on a conventional node with 2 Quadcore Nehalems (X5550 @ 2.66GHz).
I have tested three different systems:
lj-melt: lowest amount of computations per memory access
(for anyone familiar with MD: its a plain lj system with 2.5 cutoff, 0.84 density, 850k atoms)
silicate/long: half of the time is spend on an 3D FFT (using cufft) and the rest is much more compute intense than lj-melt
(lithium silicate glass, buckingham potential + long range coulomb via pppm, ~12k atoms)
silicate/cut: also more compute intense than lj-melt, but no FFT
(lithium silicate glass, buckingham potential + cutoff coulomb (10A), ~100k atoms)
8xCPU C1060 GTX470 GTX480 C2050 C2050ECC lj-melt 293 114 143 116 131 155 silicate/long 212 63.6 37.2 31.7 38.4 41.4 silicate/cut 580 123 84.2 69.8 88.9 91.5
8xCPU C1060 GTX470 GTX480 C2050 C2050ECC lj-melt 293 237 183 152 167 206 silicate/long 212 200 80.9 67.4 80.5 94.0 silicate/cut 580 536 285 221 260 353
As you can see in the first example the fermi cards are slowed down by the lack of enough texture throughput, hence the C1060 can actually beat the Fermi GPUs in single precision. In the other examples which are much less dominated by texture reads the Fermi GPUs are signifcantly faster than the C1060. As more or less expected, the GTX470 is about as fast as the C2050 since they have both the same number of cores (in the texture heavy case the C2050 is better, am I remembering correct that it got one more texture unit than the GTX470?). The GTX480 is roughly 20% faster than a GTX470.
It is interesting to see that while the Fermi cards are generally much better in double prec than the C1060 (more so than in single prec). The C2050 can not show of its much higher double prec power compared to the Geforce GPUs.
But anyway its nice to see that the Fermi GPUs beat a full node of modern intel cpus by a factor of 2-3 even in double precision.
I thought that these numbers might be interesting for you.