Hey,
I have some weird results in the bandwidth test. I ran it after I saw that the bandwidth in my own cuda programm was crappy. I had about 7.5 MB/s without pinned memory. In the bandwidth test I have 400 MB/s (PINNED). So still no good results. Results of the bandwidthTest from the cuda samples:
Device 0: GeForce GTX 1080
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 380.1
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 417.1
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 251427.2
My best guess would be that the GPU is plugged into the wrong PCIe slot, but the measured data looks way too low even for a PCIe x4 slot (the GPU should be in a x16 slot).
Is it possible some other program is hammering the system memory, thus causing stalls as the DMA controller is trying to read from / write to the system memory?
What kind of system is this? The GTX 1080 presumably has an extra PCIe power connector (somewhere around the top edge of the card), is that plugged in? What is the output of nvidia-smi -q?
So shame over my head. I figuered out that it was running on x8 not x16. But still I guess it’s a bit too slow, isn’t it?
Device 0: GeForce GTX 1080
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6574.9
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6573.0
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 252131.0
@njuffa I don’t have any programms running which I would suspect of hammering the system memory.
Power connector is plugged in.
The host/device and device/host transfer rates shown are about half of what is expected for a PCIe gen3 x16 link (which is what the GTX 1080 has). You should see around 12 GB/sec in each direction.
Thanks a lot for your answers!
There are no other slots used at the Moment. One issue that I see which might limit the bandwidth is the DDR3-1333 ram. It has only 10.6 GB/s for each channel. Is the data transfer running on one channel or is it possible to run dual channel at that point?
Is it possible to find out in which mode the slot is running?
I found out that the i7 2600 doesn’t support PCIe 3.0. So this seems to be the issue. Since I get the bandwidth of 2.0 when I need it, everything is working as it should.