I have two machines with CUDA devices. One is running Ubuntu 7.04 using a Tesla S870, and the other is running Ubuntu 8.04 with a Tesla C870. Both are using CUDA 2.0 final and exhibit the same device to device memory bandwidth numbers for the various driver versions.
Here are the results of bandwidthtest (as best as I can remember) with the various drivers.
177.67 - 31 GB/s
177.13 - 52 GB/s
174.?? and below I get about 65 GB/s.
Although I swear I used to get closer to 70 GB/s when using CUDA 1.1, but I haven’t verified that recently. Anyway, has any else observed this issue? I’m thinking of switching to CentOS, because I get the feeling it’s better tested by NVIDIA.
If anyone else can run the bandwidth test with CUDA 2.0 and recent drivers I would definitely appreciate the sanity check.
I just upgrade to final 2.0 and run the bandwidth test. My system is running Ubuntu 8.04 with 2.6.24.19 kernel, and the cuda display driver version is 177.67. I have 8800GTX and 8600GT on 1st and 3rd slot on ASUS Striker Extreme MB.
On 8800GTX, I observed different d2d bandwidth is 8~21% slower (from 70GB/s => 65GB/s ~ 55GB/s), and also the measured bandwidth is not as stable as before (ranging from 55GB~65GB now, while it remains 70GB/s in 2.0b) (usually 1 out of 5 runs could reach 65GB/s)
Here is the result:
./bandwidthTest --device=0 --memory=pinned
Running on......
   device 0:GeForce 8800 GTX   Â
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 3149.4
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 2927.5
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 55315.5
&&&& Test PASSED
./bandwidthTest --device=0 --memory=pinned
Running on......
   device 0:GeForce 8800 GTX   Â
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 3156.1
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 2844.4
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 65286.1
&&&& Test PASSED
However, on 8600GT, which is G92-based core, the bandwidth is quite stable (I forgot the bandwidth number I have for 8600GT before, so I don’t know how many % loss here)
Running on......
   device 1:GeForce 8600 GT   Â
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 1738.3
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 1688.6
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 Â 15469.0
&&&& Test PASSED
Just wondering if 280GTX/260GTX (or other cuda card with higher d2d bandwidth) have the same bandwidth variation as my 8800GTX…anyone?
EDIT: a second thought, maybe the bandwidth variation is because my 8800GTX is used as primary display?
On RHEL4, I used to get 1.5GB/s host to device speed with 174.55. Now, with 177.67, I’m getting only 730MB/s!!! When I tried CUDA 2.0 with 177.67, the results are not very different.
I also experienced bandwidth variation with my Tesla C870 running fedora 8. My device to device bandwidth was with various drivers :
169.09 and Cuda 1.1 : 64000 MB/s
177.13 Cuda beta2 : 57000 MB/s
177.67 Cuda 2 : 60000 MB/s
In my case, the final release of cuda2 partially fixed the decrease.
However, I didn’t experience any performance change in my memory-bound kernels, so I wonder if
its only a modification in the way the measure is done and not on the actual performance.
Will this be fixed in future releases, or was it a correctness issue? What I mean is, I would like to be able to write in my final report on CUDA that performance is getting better with every new release ;)