I run samples/1_Utilities/bandwidthTest/bandwidthTest on a computer where a single 32GB GV100 installed, and the result is as below:
[CUDA Bandwidth Test] - Starting…
Running on…
Device 0: Quadro GV100
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 12.3
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 13.2
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 539.0
Result = PASS
I don’t know why this gives such low memory bandwidth, compared to what the spec sheet says (around 900GB/s).
I exhibit such lower memory bandwidth in other programs too, so guess it’s not the problem of the sample benchmark.
I also fixed ‘Graphics’ and ‘Memory’ clock to 1132MHz and 850 MHz, which is confirmed in ‘Clocks’ in nvidia-smi -q.
Also, (nvidia-smi -q) says that it has Link width of 16x and PCIe generation of 3.
The driver version is 455.23.05 and tested under ubuntu 18.04.5 LTS and linux 5.4.0-48.
How can I deal with this problem? Is it a normal behavior?
@Robert_Crovella Still I don’t understand. I have a Tesla V100 PCIE 16GB, which has only slightly higher clock while having the same memory bit bus (4096-bit) as that of GV100, giving certainly 700+GB/s in both nsight compute and bandwidthTest. @rs277 If 900GB/s is the bandwidth between its memory subsystem (cache<=>global memory) , why the other Tesla V100 gives so much higher performance on the same program? Would you like to share a possible clue?
I’m not sure I know what the reasons are, and I don’t happen to have a GV100 to play with.
A GV100 is a display-capable GPU, whereas most other V100 variants I am aware of are not (excepting Titan V). If you are running a display on this GPU or have the system configured to use this GPU as part of X, then I think it’s possible that the display activities might be consuming memory bandwidth on a continuous basis. If I were doing a comparison I would disable X and if possible move the console to another display device. But I don’t know if that accounts for the difference or not.
Apologies, I was too quick reading and failed to notice the last test was device<->device. I can’t offer any more than what you have already checked and Robert has offered.
NVIDIA doesn’t provide any way for you to do something like change the memory refresh settings on a GPU.
Every GPU has a difference between the measureable/achievable bandwidth and the stated (“peak theoretical”) bandwidth. These differences vary from one GPU type/design to the next. I’m reasonably sure the 870GB/s bandwidth number (or whatever the stated peak theoretical number for GV100) is a peak number, and is referring to peak theoretical bandwidth, and this is never achievable, on any GPU.
I’m not aware of any Quadro GPU that was released prior to the GV100 that had that level of memory bandwidth. Quadro P6000 would have been the previous “high-end” Quadro, and it has a lower memory bandwidth than the GV100.