Weird bandwidth issues

Hey,
I have some weird results in the bandwidth test. I ran it after I saw that the bandwidth in my own cuda programm was crappy. I had about 7.5 MB/s without pinned memory. In the bandwidth test I have 400 MB/s (PINNED). So still no good results. Results of the bandwidthTest from the cuda samples:

Device 0: GeForce GTX 1080
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 380.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 417.1

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 251427.2

Result = PASS

I would be thankfull for any ideas you guys have.

Thanks in advance.
Eric

My best guess would be that the GPU is plugged into the wrong PCIe slot, but the measured data looks way too low even for a PCIe x4 slot (the GPU should be in a x16 slot).

Is it possible some other program is hammering the system memory, thus causing stalls as the DMA controller is trying to read from / write to the system memory?

What kind of system is this? The GTX 1080 presumably has an extra PCIe power connector (somewhere around the top edge of the card), is that plugged in? What is the output of nvidia-smi -q?

So shame over my head. I figuered out that it was running on x8 not x16. But still I guess it’s a bit too slow, isn’t it?
Device 0: GeForce GTX 1080
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6574.9

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6573.0

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 252131.0

@njuffa I don’t have any programms running which I would suspect of hammering the system memory.
Power connector is plugged in.

Processor Intel( i7-2600 CPU @ 3.40GHz, 3800 MHz
Mainboard GA-Z68A-D3H-B3
Ram 12 GB DDR3-1333

The host/device and device/host transfer rates shown are about half of what is expected for a PCIe gen3 x16 link (which is what the GTX 1080 has). You should see around 12 GB/sec in each direction.

The x16 slot on this motherboard can still operate in x8 mode if multiple slots are occupied.

Thanks a lot for your answers!
There are no other slots used at the Moment. One issue that I see which might limit the bandwidth is the DDR3-1333 ram. It has only 10.6 GB/s for each channel. Is the data transfer running on one channel or is it possible to run dual channel at that point?
Is it possible to find out in which mode the slot is running?

“Is it possible to find out in which mode the slot is running?” - Yes, post the output of “nvidia-smi -q” that njuffa has been asking for.

Sorry forgot to run that.
==============NVSMI LOG==============

Timestamp : Thu Dec 01 17:12:49 2016
Driver Version : 369.30

Attached GPUs : 1
GPU 0000:01:00.0
Product Name : GeForce GTX 1080
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-33ee6710-2e31-84a9-d333-13a6b62a50bc
Minor Number : N/A
VBIOS Version : 86.04.17.00.BA
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.03
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B8010DE
Bus Id : 0000:01:00.0
Sub System Id : 0x042619DA
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 4000 KB/s
Rx Throughput : 8000 KB/s
Fan Speed : 0 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 8192 MiB
Used : 431 MiB
Free : 7761 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 4 %
Memory : 7 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 46 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
Power Readings
Power Management : Supported
Power Draw : 15.80 W
Power Limit : 270.00 W
Default Power Limit : 270.00 W
Enforced Power Limit : 270.00 W
Min Power Limit : 135.00 W
Max Power Limit : 326.00 W
Clocks
Graphics : 316 MHz
SM : 316 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2075 MHz
SM : 2075 MHz
Memory : 5405 MHz
Video : 1708 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

It seems to be running on PCIe 1.0 and max is 2.0. But my board should be able to support 3.0.
Mainboard: GA-Z68A-D3H-B3

I found out that the i7 2600 doesn’t support PCIe 3.0. So this seems to be the issue. Since I get the bandwidth of 2.0 when I need it, everything is working as it should.

Thanks a lot for your help.