Low device-cpu bandwidth for GTX 1080 TI

Hi,

I am using a GTX 1080 Ti and I ran the bandwidth test in cuda samples.
here is the output, but the bandwidth values seem unusually low:

[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: GeForce GTX 1080 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 1.5

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 1.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 350.3

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

I also saw this post https://devtalk.nvidia.com/default/topic/851390/k80-bandwidth-test/
and decided to try calculating the theoretical BW.
1582MHz * 2 (DDR) * 352 bits / 8bits per byte = 140 GB/s
That doesn’t seem to add up with the BW in the product specs as well as the above bandwidth:
https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/

Can someone clarify?

The most likely cause of this extremely low host/device throughput is that the GPU is plugged into the wrong PCIe slot. It should go into a PCIe gen3 x16 capable slot, and this should result in a transfer rate of 12+ GB/sec.

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro P2000
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12327.1

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12364.3

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     119536.8

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Check your PCIe link configuration by looking at the output of nvidia-smi -q:

GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3         <--------------
            Link Width
                Max                 : 16x       
                Current             : 16x       <--------------

If you look at this while bandwidthTest or some other CUDA software which uses frequent host/device transfers is running (I took the above snapshot with Folding@Home running), the “Current” items should show generation = 3, link width = x16. You can also use 3rd party software like TechPowerUp’s GPU-Z to monitor the link configuration.

Yep, the GPU was on the slow slot.

    GPU Link Info
        PCIe Generation
            Max                 : 3
            Current             : 3
        Link Width
            Max                 : 16x

Current : 2x

Transferred to the x16 slot (which is usually the top-most slot on the motherboard), got 12 GB/s.

[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: GeForce GTX 1080 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 12.6

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 13.2

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 351.0

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.