simpleMultiGPU processing time slower on dual than single?

I’m going to do some dual gpu processing, if possible.
So I started out with the simpleMultiGPU exsample in the SDK. and tested with both one and 2 cards

CUDA-capable device count: 1
main(): generating input data…
main(): waiting for GPU results…
GPU Processing time: 75.172005 (ms)
Checking the results…
CPU Processing time: 171.246002 (ms)
GPU sum: 16779778.000000; CPU sum: 16779776.312309
Relative difference: 1.005789E-007

CUDA-capable device count: 2
main(): generating input data…
main(): waiting for GPU results…
GPU Processing time: 155.711899 (ms)
Checking the results…
CPU Processing time: 171.246002 (ms)
GPU sum: 16779776.000000; CPU sum: 16779776.312309
Relative difference: 1.005789E-007

So… it’s actually slower on 2 cards…
The sum is more similar to the CPU on 2 cards, but it is more than twice as slow as just on one card.

Can anyone else confirm that this is true… and maybe explain why…
something about used on transfering the data, or…

I’m using the 2 cards on a P35 Motherbord… ie. Not a SLI motherbord.
I’m using the new beta drivers…

KingGuru

I believe this program is just an example of how to use more than 1 GPU.

Well… I can follow that… I was just wondering, why it was this much slower

I think what he means is that the example is designed to show how multiGPU programming works, but the task being performed does not actually get faster with multiple GPUs due to the thread-initialization overhead. I explain more in this post:

http://forums.nvidia.com/index.php?s=&…st&p=455131

well. thanks… it was something like that I was looking for, with no luck

KG