Hi
I am currently writing my PHD thesis. Part of it is the development of LAMMPScuda, a comprehensive USER-package for the widely used molecular dynamics code LAMMPS. My code already supports many differnet material classes (Metals, Granular, Coarse Grained Systems, Semi Conductors, Bio Molecules, Polymers, inorganic glasses and so on) and runs effectively on GPU-clusters with several hundred GPUs.
While I already have a lot of benchmarks in my thesis, I would like to add a graph showing the relative performance of the various available GPUs. I have already GTX280, GTX295, C1060, GTX470 and C2050. I’d like to add all the other CC1.3 or higher GPUs (GTX260, GTX275, GTX460, GTX570,GTX580) as well. And thats where I need your help.
What do you need to help me?
-a linux machine with an CC1.3 or higher GPU
-g++ and make need to be available (but I guess thats a given for a reader of the GPU Computing baord with a linux machine)
How does it work?
-download the program from http://code.google.com/p/gpulammps/
-If you have installed cuda in a different folder than /usr/local/cuda you need to modify the “trunk/src/USER-CUDA/Makefile.common” accordingly.
-do the following procedure:
EDIT: I have added a file src/USER-CUDA/Examples/PHD-Thesis-Tests/run_test which should do everything automatically.
Just go to src/USER-CUDA/Examples/PHD-Thesis-Tests/run_test and run “sh run_test”. Otherwise follow the steps below.
(i) go to trunk/src/STUBS and type “make”
(ii) go to trunk/src and type: “make yes-KSPACE” and “make yes-USER-CUDA” (in that order)
(iii) go to trunk/src/USER-CUDA and type “make precision=1” or “make precision=1 arch=20” if you have a fermi GPU
(iv) go to trunk/src and type “make serial precision=1” or “make serial precision=1 arch=20” if you have a fermi GPU
(v) go to trunk/src/USER-CUDA/Examples/PHD-Thesis-Tests and run the tests with
"../../../lmp_serial < in.melt.cuda > out.melt"
"../../../lmp_serial < in.silicate-buckingham.cuda > out.silicate"
(vi) go to trunk/src/USER-CUDA/Examples/PHD-Thesis-Tests and report back the output of “grep Loop *” here
(vii) repeat steps (iii) to (vi) with precision=2 instead of precision=1 after deleteing all *.o files in src/USER-CUDA and all files in src/Obj_serial
If you have any problems following this procedure let me know.
If you have any other questions feel free to ask.
Thanks for helping
Ceearem
Results so far:
Device melt(single) melt(double) silicate(single) silicate(double)
GTX260 26.6 67.9 50.9 136.8
GTX280 23.8 61.2 44.2 118.3
GTX295 23.5 57.1 44.2 113.8
C1060 26.1 61.2 48.0 122.0
GTS450 36.5 79.3 67.4 143.4
GTX470 14.3 27.4 25.8 58.1
GTX480 11.9 22.6 21.4 44.3
C2050 14.7 23.8 26.0 44.6
C2050ECC 17.6 27.4 32.1 53.0
M2050ECC 17.5 27.0 31.8 52.4
CPU-Results (xn indicates how many processes are run, d indicates dual processor board, h indicates usage of hyperthreading):
CPU melt silicate
i7 950 (x4) 157.5 347.8
i7 950 (x8h) 137.5 305.2
X5550 (x4) 165.7 376.1
X5550 (x8d) 82.2 214.4
AMD 6128 (x4) 270.5 552.4
AMD 6128 (x8) 136.0 309.7
AMD 6128 (x16d) 74.0 182.7