I have a machine with a Quadro FX5600 and a QuadroPlex Model IV (total num devices = 3).
I get different results from bandwidthTest when I run it on them individually. When I run bandwidthTest using --device=all, the results are not a summation of the individual results.
It somehow seems as if the cudaSetDevice calls following the first one do not work.
The results seem to be 3x the results for device 0. If I modify the code to run the tests starting from the last device, the results seem to be 3x the results for device 2. Hence, my suspicion that the cudaSetDevice somehow doesn’t work properly when it is called more than once.
Here’s what I get when I run the CUDA 2.0 bandwidthTest individually on each device with the 177.70 driver:
Running on......
device 0:Quadro FX 5600
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1870.7
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1109.7
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 59341.7
&&&& Test PASSED
Press ENTER to exit...
Running on......
device 1:Quadro FX 5600
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 860.8
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 940.4
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 60497.2
&&&& Test PASSED
Press ENTER to exit...
Running on......
device 2:Quadro FX 5600
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 859.5
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 940.3
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 60491.5
&&&& Test PASSED
Press ENTER to exit...
Then, I run it using --device=all:
!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!
Running on......
device 0:Quadro FX 5600
device 1:Quadro FX 5600
device 2:Quadro FX 5600
Quick Mode
Host to Device Bandwidth for Pageable memory
...
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5547.9
Quick Mode
Device to Host Bandwidth for Pageable memory
...
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3329.1
Quick Mode
Device to Device Bandwidth
...
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 178438.9
&&&& Test PASSED
Press ENTER to exit...
This does not match the results obtained during individual runs.
I modified bandwidthTest to iterate through from last device to first instead of first to last and also print out the values obtained from each run, and now I get this:
!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!
Running on......
device 0:Quadro FX 5600
device 1:Quadro FX 5600
device 2:Quadro FX 5600
Quick Mode
Host to Device Bandwidth for Pageable memory
Device 2: 859.498840 MB/s
Device 1: 858.382996 MB/s
Device 0: 858.544067 MB/s
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2576.4
Quick Mode
Device to Host Bandwidth for Pageable memory
Device 2: 940.211365 MB/s
Device 1: 940.203125 MB/s
Device 0: 939.982117 MB/s
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2820.4
Quick Mode
Device to Device Bandwidth
Device 2: 60468.632812 MB/s
Device 1: 61597.687500 MB/s
Device 0: 61579.910156 MB/s
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 183646.2
&&&& Test PASSED
Press ENTER to exit...
From this it seems like the cudaSetDevice is not working when I call it the second and third time to go to device 1 and device 0 respectively.