Global memory to GPU Bandwidth GM to GM Bandwidth via GPU core register

Hi all !

I found around 15 GB/s Bandwidth between GM and GPU on a Tesla C870.
Memory transfers are coalesced.

Does it sound like a good mesurment ?

The C870 has about 75Gb/s memory bandwidth, so no it doesn’t sound like a good measurement. You can use the bandwidth test in the CUDA SDK to get an indicative memory bandwidth number for your card, and you should be able to get pretty close to that number with one of the highly optimized numerical kernels like the CUBLAS SGEMM implementation.