Hi, everyone!
I used CUDA 4.0 to test the bandwidth between CPU->GPU, GPU->CPU, and GPU<->GPU in a system with Tesla C2050. I ran the SDK sample program bandwidthTest.cu and simpleP2P.cu. The result shows the bandwidth between CPU and GPU is around 2.9~3.0GB/s, while the bandwidth between GPU and GPU is only about 2.4GB/s. Why the data transfer rate between GPUs is even slower than CPU and GPU? Can anyone explain that? Have you tested your program and got the similar conclusion?