Help with Hardware Bench for PHD Thesis I am lookign for people who can run my code on other GPUs

ceearem · February 21, 2011, 8:36pm

Hi

I am currently writing my PHD thesis. Part of it is the development of LAMMPScuda, a comprehensive USER-package for the widely used molecular dynamics code LAMMPS. My code already supports many differnet material classes (Metals, Granular, Coarse Grained Systems, Semi Conductors, Bio Molecules, Polymers, inorganic glasses and so on) and runs effectively on GPU-clusters with several hundred GPUs.

While I already have a lot of benchmarks in my thesis, I would like to add a graph showing the relative performance of the various available GPUs. I have already GTX280, GTX295, C1060, GTX470 and C2050. I’d like to add all the other CC1.3 or higher GPUs (GTX260, GTX275, GTX460, GTX570,GTX580) as well. And thats where I need your help.

What do you need to help me?

-a linux machine with an CC1.3 or higher GPU

-g++ and make need to be available (but I guess thats a given for a reader of the GPU Computing baord with a linux machine)

How does it work?

-download the program from http://code.google.com/p/gpulammps/

-If you have installed cuda in a different folder than /usr/local/cuda you need to modify the “trunk/src/USER-CUDA/Makefile.common” accordingly.

-do the following procedure:

EDIT: I have added a file src/USER-CUDA/Examples/PHD-Thesis-Tests/run_test which should do everything automatically.

Just go to src/USER-CUDA/Examples/PHD-Thesis-Tests/run_test and run “sh run_test”. Otherwise follow the steps below.

(i) go to trunk/src/STUBS and type “make”

(ii) go to trunk/src and type: “make yes-KSPACE” and “make yes-USER-CUDA” (in that order)

(iii) go to trunk/src/USER-CUDA and type “make precision=1” or “make precision=1 arch=20” if you have a fermi GPU

(iv) go to trunk/src and type “make serial precision=1” or “make serial precision=1 arch=20” if you have a fermi GPU

(v) go to trunk/src/USER-CUDA/Examples/PHD-Thesis-Tests and run the tests with

   "../../../lmp_serial < in.melt.cuda > out.melt"

   "../../../lmp_serial < in.silicate-buckingham.cuda > out.silicate"

(vi) go to trunk/src/USER-CUDA/Examples/PHD-Thesis-Tests and report back the output of “grep Loop *” here

(vii) repeat steps (iii) to (vi) with precision=2 instead of precision=1 after deleteing all *.o files in src/USER-CUDA and all files in src/Obj_serial

If you have any problems following this procedure let me know.

If you have any other questions feel free to ask.

Thanks for helping

Ceearem

Results so far:

Device		melt(single)	melt(double)	silicate(single)	silicate(double)

GTX260		26.6		67.9		50.9			136.8

GTX280		23.8		61.2		44.2			118.3

GTX295		23.5		57.1		44.2			113.8

C1060		26.1		61.2		48.0			122.0

GTS450		36.5		79.3		67.4			143.4

GTX470  	14.3		27.4		25.8			58.1

GTX480		11.9		22.6		21.4			44.3

C2050		14.7		23.8		26.0			44.6

C2050ECC	17.6		27.4		32.1			53.0

M2050ECC	17.5		27.0		31.8			52.4

CPU-Results (xn indicates how many processes are run, d indicates dual processor board, h indicates usage of hyperthreading):

CPU		melt		silicate

i7 950 (x4)	157.5		347.8

i7 950 (x8h)	137.5		305.2

X5550 (x4)	165.7		376.1

X5550 (x8d)	82.2		214.4

AMD 6128 (x4)	270.5		552.4

AMD 6128 (x8)	136.0		309.7

AMD 6128 (x16d)	74.0		182.7

dominik · February 22, 2011, 9:18am

Is there a command line switch for the device to use? I could provide data points for C2070 (ECC on, no chance to reboot within the next week since other people are using the CPUs in that box) and a low-end GTS 450.

ceearem · February 22, 2011, 10:09am

While there is no commandline switch, LAMMPScuda tries to figure out itself which device to use. More precisily: it generates a list of CUDA devices and sorts them by multiprocessor count. Then it tries to request devices in that order. If they are not in exclusive mode, it will just request the first device in its list, otherwise it will try all until it finds a free one.

If one needs to overwrite that behaviour one can provide a list of devices to use within the input script. While providing multiple GPUs is only usefull if LAMMPScuda is compiled with MPI support, it can also be used to specify a single device. Just add “gpu/node special 1 ID” as options to the “accelerator cuda” command where ID is your desired device ID as reported by deviceQuery of the SDK.

Cheers

Ceearem

P.S. The GTS450 should be able to run the test as well. So it would be nice if you run it twice one time with:

“accelerator cuda gpu/node special 1 0”

and the second time with

“accelerator cuda gpu/node special 1 1”

Just replace the old “accelerator cuda” lines in the two in.* files.

P.P.S. It would be nice if you could add an info about the clockspeed on consumer cards, since there are a lot of non-reference design Devices out there.

ceearem · February 22, 2011, 10:20am

Ah as you might have noticed I misunderstood your intentions, you wanted to know right away how to use the GTS450. You find the answer in the P.S. of my previous post.

Ceearem

avidday · February 22, 2011, 10:58am

The build system in svn trunk I just pulled is broken. Trying to run stage 2 of your instructions gives me this:

~/build/gpulammps-read-only/src$ make yes-USER-CUDA

Installing package USER-CUDA

[: 420: 1: unexpected operator

[: 420: 1: unexpected operator

Further to that, your Makefiles in USER-CUDA needed some modification before I could get the library to build. After I fixed that I couldn’t get the next steps to build, failing with this

pair_morse_coul_long.cpp: In member function â€˜virtual void* LAMMPS_NS::PairMorseCoulLong::extract(char*, int&)â€™:

pair_morse_coul_long.cpp:615: error: â€˜strcmpâ€™ was not declared in this scope

make[1]: *** [pair_morse_coul_long.o] Error 1

make[1]: Leaving directory `/home/david/build/gpulammps-read-only/src/Obj_serial'

make: *** [serial] Error 2

so I gave up…

ceearem · February 22, 2011, 11:11am

Ok I fixed the Issue with the missing reference. Do a “make no-all” before updating from the svn with “svn update” in the src folder.

In order to Install the CUDA package it should also be possible to go to src/USER-CUDA and do “./Install.sh 1”.

Ceearem

avidday · February 22, 2011, 11:42am

This was on a Ubuntu 9.04 64 bit machine (so gcc 4.3.3) with CUDA 3.2.

ceearem · February 22, 2011, 11:48am

OK Sorry to all I forgot one more thing for repeating the test in double precision. YOu need to delete the *.o files in the USER-CUDA folder and all files in src/Obj_serial/ before recompiling.

Sorry again
Ceearem

ceearem · February 22, 2011, 11:51am

I see we had some problems with Ubuntu earlier already - which were fixed at some point though. Thanks for the info. I have a friend who migth be able to reproduce the problem.

Ceearem

avidday · February 22, 2011, 12:51pm

On a stock GTX 470 (607MHz graphics clock, 1674MHz memory clock, 1215MHz process clock)

precision=1

build/gpulammps-read-only/src/USER-CUDA/Examples/PHD-Thesis-Tests/out.melt:Loop time of 681.948 on 1 procs for 2000 steps with 256000 atoms

in.silicate-buckingham.cuda didn’t run for the precision case=1. I don’t know whether that is expected or not.

precision=2 running now…

ceearem · February 22, 2011, 12:59pm

I think the USER-CUDA package didnt install correct. The time is roughly what a CPU core needs. If the USER-CUDA package is installed where should be files with *_cuda.cpp and *_cuda.h in the src folder. Mabye you could try “./Install.sh 1” in the USER-CUDA folder and compile again?

I have added now a “run_test” script in src/USER-CUDA/Exampels/PHD-Thesis-Tests/ which should automaticely do everything (at least it worked with me on two different machines with a fresh download).

Btw. I really really appreciate your help.

Thanks

Ceearem

On a stock GTX 470 (607MHz graphics clock, 1674MHz memory clock, 1215MHz process clock)

precision=1
build/gpulammps-read-only/src/USER-CUDA/Examples/PHD-Thesis-Tests/out.melt:Loop time of 681.948 on 1 procs for 2000 steps with 256000 atoms
precision=2 running now…

avidday · February 22, 2011, 1:07pm

I don’t think I can waste any more time messing around with this, sorry. Perhaps someone more adroit than I can get you some result.

ceearem · February 22, 2011, 1:12pm

sure no problem. I appreciate your try.

ceearem · February 22, 2011, 1:52pm

I added results of the GPUs I have access to in the first post.

Ceearem

LSChien · February 22, 2011, 2:27pm

(v) go to trunk/src/USER-CUDA/Examples/PHD-Thesis-Tests and run the tests with
   "../../../lmp_serial < in.melt.cuda > out.melt"

   "../../../lmp_serial < in.silicate-buckingham.cuda > out.silicate"

I cannot find lmp_serial, only lmp_mpi_mpd_cpu, lmp_mpi_mpd_cuda in directory gpulammps-read-only/src/USER-CUDA/Examples

ceearem · February 22, 2011, 2:44pm

After compiling the lmp_serial should exist in gpulammps-read-only/src

(thats why there are …/…/…/ before lmp_serial).

Regards

Ceearem

ceearem · February 22, 2011, 2:47pm

Added CPU results in first post.

jam11 · February 22, 2011, 11:38pm

Using device 0: GeForce GTX 260

Binary file lmp_serial-d matches

Binary file lmp_serial-s matches

log.lammps:Loop time of 136.785 on 1 procs for 2000 steps with 11664 atoms

out.melt-d:Loop time of 67.9529 on 1 procs for 2000 steps with 256000 atoms

out.melt-s:Loop time of 26.5719 on 1 procs for 2000 steps with 256000 atoms

out.silicate-d:Loop time of 136.785 on 1 procs for 2000 steps with 11664 atoms

out.silicate-s:Loop time of 50.9264 on 1 procs for 2000 steps with 11664 atoms

run_test:grep Loop *

Binary file lmp_serial-d matches

Binary file lmp_serial-s matches

log.lammps:Pair time (%) = 90.7013 (66.3096)

out.melt-d:Pair time (%) = 47.8504 (70.417)

out.melt-s:Pair time (%) = 16.2782 (61.261)

out.silicate-d:Pair time (%) = 90.7013 (66.3096)

out.silicate-s:Pair time (%) = 24.4041 (47.9203)

run_test:grep ‘Pair time’ *

Binary file lmp_serial-d matches

Binary file lmp_serial-s matches

log.lammps:Neigh time (%) = 4.46387 (3.26343)

out.melt-d:Neigh time (%) = 16.3246 (24.0233)

out.melt-s:Neigh time (%) = 7.2138 (27.1483)

out.silicate-d:Neigh time (%) = 4.46387 (3.26343)

out.silicate-s:Neigh time (%) = 1.77486 (3.48514)

run_test:grep ‘Neigh time’ *

dominik · February 23, 2011, 8:32am

Sorry, doesn’t compile for me. It seems to expect that CUDA is installed in /usr/local/cuda/bin/nvcc. On my machines, it’s installed elsewhere. I unfortunately don’t have time to go trying to figure out how your build system works.

Also, I am getting the “unexpected operator” errors avidday reported.

ceearem · February 23, 2011, 9:05am

Hi

the install path can be changed in line 17 of the src/USER-CUDA/Makefile.common. We were also now able to reproduce the unrecognized Operator behaviour which avidday reported. In a virtual Ubunutu box we found the same thing, and tracked it back to the bash version not recognizing ‘==’ as an comparison. It expects a single ‘=’ that is now changed in the repository.

Thanks to avidday for identifying the problem btw.

There have been some other compiler problems though in that virtual box. We had to specifically fall back to GCC4.3, but thats probably a thing of the CUDA version.

Cheers

Ceearem

Topic		Replies	Views
Attention Lucky GTX 480/GTX 470 Owners! Please run some benchmarks for us. :) CUDA Programming and Performance	88	22472	May 5, 2010
cuda fortran questions Legacy PGI Compilers	10	10956	July 27, 2012
cuda on ubuntu 10.04 CUDA Programming and Performance	34	7773	May 28, 2010
problem running demos CUDA Programming and Performance	9	8214	January 1, 2009
Cannot run any CUDA kernels CUDA runtime doesn't recognize NVIDIA GPU CUDA Programming and Performance	26	12434	August 24, 2010
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13593	July 9, 2008
Different results with -Mcuda=emu / -Mcuda with simple code Legacy PGI Compilers	17	15282	December 10, 2009
deviceQuery OK, everything else hangs Cuda sdk 4.1 examples simply hang, no errors, no warnings CUDA Programming and Performance	12	8891	April 23, 2012
CUDA 9.2 install failure (clean Centos 7) with GTX 1070 Ti CUDA Setup and Installation	0	1719	May 17, 2018
CUDA compile trouble CUDA Programming and Performance	47	5198	November 8, 2010

Help with Hardware Bench for PHD Thesis I am lookign for people who can run my code on other GPUs

Using device 0: GeForce GTX 260

Using device 0: GeForce GTX 260

Using device 0: GeForce GTX 260

Using device 0: GeForce GTX 260

Related topics