large fluctuations in PCIe bandwidth from GTX 750Ti?

Hello, I’m trying to extract maximum performance from code that unpacks a 12bit images in main memory to GPU memory, but am experiencing large fluctuations in PCIe bandwidth, making it hard to judge the impact of my changes.

I’m using a GTX 750Ti (connected via PCIe 2.0 x16, no monitor attached) and Windows 7. My workstation also has a Quadro 1800 driving 2 monitors. I’ve even tried turning off PCIe link power management in Windows.

Here’s output from bandwidthTest:
--------------------------trial 0-----------------------------
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6017.3

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5845.9
--------------------------trial 1------------------------------
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4562.9

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3986.9
-------------------------trial 5---------------------------------
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 160000000.0

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5897.4

From trial 5, it seems there’s a bug in the driver causing cudaEventRecord() to return the wrong time stamps. Has anyone else experienced this?

I have a 750 Ti running headless in a PCIe 2.0 x16 slot. The Intel IGP is driving the monitor.

I just ran bandwidthtest+pinned repeatedly and was about to type “It works fine here!” but … it failed after ~6 runs:

Looks like it may be time to file a bug. Are either of these cards factory overclocked models? Best I can tell, the base clock for a GTX 750 Ti reference implementation is 1020 MHz. Please mention in the bug report if the GPU is factory overclocked.

Mine is an EVGA 750 Ti SC. It has been rock solid stable under heavy load although not much D<>H copying is going on.

I’ll let @UncleJoe have the honor of filing any bug.

“SC” = superclocked = factory overclocked? The reason I suggested making note of that in a bug report is that some issues may only be reproducable with very specific configurations. Stating the exact full name of the GPU should work equally well, of course (as far as I understand some consumer GPUs come in both reference clock and factory overclock variants and for repro one would want to know which variant it is).

Tried the same bandwidthTest on my EVGA GeForce GTX 750 Ti FTW ACX (P/N: 02G-P4-3757-KR) and could not replicate it under CUDA 5.5.20 or 6.0.26 SDKs. However, I do have a display connected to the card ATM, unlike allanmac & Uncle Joe.

Running driver version 335.23, Windows 7 x64.

Edit: Connected a Zotac GT 630 driving the display and ran bandwidthTest again… sure enough I can reproduce it also.

If I switch back the monitor to the 750 Ti, I cannot reproduce it anymore. So it seems to be related to headless configurations in Windows 7 at least.

Great, I’m not alone. I’ll file the bug and will mention you guys being able to reproduce it too.
I’ll also mention that this seems to happen when the GTX 750 doesn’t have a monitor attached, but another NVIDIA GPU does.

I’m also using the EVGA 750Ti SC

I can’t reproduce it either with my overclocked GTX 750 Ti (PCIe gen2 x16) under Linux, attached to one display as well. I run the host-device/device-host test 100 times successively and all good.

Linux Version: 3.13.7-200.fc20.x86_64
NVIDIA Driver Version: 334.21
Graphics Clock: 1058 Mhz
Memory Transfer Rate: 5400 Mhz
Current PCIe Link Width: x16
Current PCIe Link Speed: 5.0 GT/s

I faced on same issue with using CUDA SDK6.5,GTX750Ti and GTX750. I coud not measure the correct values with the sample code “MemoryBandWith”.
I think that the reason why the sample code “MemoryBandWith” outputed non-suitable value is “cudaEventElapsedtime()”. Because This API outputed “ZERO” when I always get non-sutable memory bandwidth values.

I chcked that this issue was deleted by using GTX750 and GTX750Ti.
When I used GTX750 and '50Ti,I could not fouud any problems at “BandWidthTest” in CUDA7.0 sample code .

But when I used GTX TITAN X(ZOTAC),I faced the same issue.
On the pinned memory mode, the values of memory band width saomtimes were strange.
Please refer to PDF file as below URL.

https://www.dropbox.com/s/t2tz87n1504clut/MemoryBandWidthTest20150330.pdf?dl=0

I am sure that this issue is not deleted perfectly.

Cuda version which I used is as below.

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_23:00:53_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

By the way ,Nvidia vide driver is GeFroce Game ready Driver for Grand Theft Atuo V 350.12.

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5\bin\win64\Release>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX TITAN X
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     11624.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12433.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     249707.4

Result = PASS

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5\bin\win64\Release>bandwidthTest.exe -device=1
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 1: GeForce GTX 980
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     11592.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12041.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     160946.8

Result = PASS

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5\bin\win64\Release>

I have heard bad things about Zotac so have stuck with EVGA…