bandwidthTest bug?

I have a machine with a Quadro FX5600 and a QuadroPlex Model IV (total num devices = 3).

I get different results from bandwidthTest when I run it on them individually. When I run bandwidthTest using --device=all, the results are not a summation of the individual results.

It somehow seems as if the cudaSetDevice calls following the first one do not work.

The results seem to be 3x the results for device 0. If I modify the code to run the tests starting from the last device, the results seem to be 3x the results for device 2. Hence, my suspicion that the cudaSetDevice somehow doesn’t work properly when it is called more than once.

Which driver are you using?
Please post the actual output that you’re seeing.

Here’s what I get when I run the CUDA 2.0 bandwidthTest individually on each device with the 177.70 driver:

Running on......

      device 0:Quadro FX 5600

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               1870.7

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               1109.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               59341.7

&&&& Test PASSED

Press ENTER to exit...
Running on......

      device 1:Quadro FX 5600

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               860.8

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               940.4

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               60497.2

&&&& Test PASSED

Press ENTER to exit...
Running on......

      device 2:Quadro FX 5600

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               859.5

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               940.3

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               60491.5

&&&& Test PASSED

Press ENTER to exit...

Then, I run it using --device=all:

!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!

Running on......

      device 0:Quadro FX 5600

      device 1:Quadro FX 5600

      device 2:Quadro FX 5600

Quick Mode

Host to Device Bandwidth for Pageable memory

...

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               5547.9

Quick Mode

Device to Host Bandwidth for Pageable memory

...

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               3329.1

Quick Mode

Device to Device Bandwidth

...

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               178438.9

&&&& Test PASSED

Press ENTER to exit...

This does not match the results obtained during individual runs.

I modified bandwidthTest to iterate through from last device to first instead of first to last and also print out the values obtained from each run, and now I get this:

!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!

Running on......

      device 0:Quadro FX 5600

      device 1:Quadro FX 5600

      device 2:Quadro FX 5600

Quick Mode

Host to Device Bandwidth for Pageable memory

Device 2: 859.498840 MB/s

Device 1: 858.382996 MB/s

Device 0: 858.544067 MB/s

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               2576.4

Quick Mode

Device to Host Bandwidth for Pageable memory

Device 2: 940.211365 MB/s

Device 1: 940.203125 MB/s

Device 0: 939.982117 MB/s

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               2820.4

Quick Mode

Device to Device Bandwidth

Device 2: 60468.632812 MB/s

Device 1: 61597.687500 MB/s

Device 0: 61579.910156 MB/s

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               183646.2

&&&& Test PASSED

Press ENTER to exit...

From this it seems like the cudaSetDevice is not working when I call it the second and third time to go to device 1 and device 0 respectively.

Note that the lower than expected numbers for device 1 and 2 is a separate issue with Quadro Plex which I have reported separately.

For now, I am only worried about bandwidthTest not reporting accurate results when run with --device=all.