I recently received a K3000M GPU for an embedded CUDA compute application, and am seeing surprisingly low memory bandwidth performance. In the CUDA 5.5 bandwidth test, the internal device to device memory bandwidth is measuring only 31 GB/sec, where this card has an advertised bandwidth of 89.6 GB/sec. On many other cards, I’m used to seeing the measured bandwidth approach the advertised bandwidth, so it seems that there is something not quite right.
In nvidia-smi, I noticed that the graphics and memory clocks are 324MHz and 800MHz respectively instead of the maximum 653MHz and 1400 MHz respectively. This is a headless application, so I’ve been unable to change the NVIDIA power management settings to maximum performance to rule this out.
Just as another data point, I’m able to trade out this MXM module for a GT 745M (GDDR5) module in this exact same system to measure the bandwidth. On this card, I’m measuring 40GB/sec, so it would seem that there is nothing in the system limiting the K3000M to such low device to device bandwidth.
nvidia-bug-report.log.gz (76.3 KB)