I tried the simplemultipleGPU program provided by CUDA SDK. I found that the speed of using 4 gpus is slower than using only 1 gpu. Dose anybody know why?
That project is just designed to show how to use multiple GPUs, but is not designed to be a good benchmark. I explain more over in this post (though you should probably scan the whole thread):
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA
I have changed the DATA_N to 1048576*128. But the gpu time is:
1 gpu 388ms
4 gpus 1093ms
Do you have a better multiple gpus example?
Huh, there must be a lot of thread creation overhead (are you on Windows, out of curiosity?). I don’t know of any good multiGPU benchmarks, aside from major applications, like HOOMD-Blue:
[url=“http://codeblue.umich.edu/hoomd-blue/benchmarks.html”]http://codeblue.umich.edu/hoomd-blue/benchmarks.html[/url]
You can see from the tests that they get everything from no improvement (that’s odd) to double speed with two GPUs.
I am on windows vista. Now I have tried to use the thread pool. The time is:
1 gpu 388ms
4 gpus 528ms
It is faster than before