Latest driver breaks shmoo mode bandwidthTest for GTX 1080

I recently installed the latest NVIDIA driver (368.69) and now bandwidthTest fails when running in shmoo mode for the GTX 1080. When using the prior driver (368.39), it ran fine.

I’ve built bandwidthTest in Release mode from the CUDA 7.5 samples, on Win 7 (x64).

The error I get:

bandwidthTest.exe --device=0 --mode=shmoo

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080
 Shmoo Mode

..................................................CUDA error at bandwidthTest.cu:831 code=4(cudaErrorLaunchFailure) "cudaDeviceSynchronize()"

Interestingly, it still works fine if I run it on one the GTX 690’s in the same PC (i.e. using --device=1 rather than --device=0).
.

Output from nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 368.69                 Driver Version: 368.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080   WDDM  | 0000:01:00.0     Off |                  N/A |
| 27%   35C    P8     5W / 180W |    107MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690    WDDM  | 0000:04:00.0     N/A |                  N/A |
| 30%   31C    P8    N/A /  N/A |     58MiB /  2048MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 690    WDDM  | 0000:05:00.0     N/A |                  N/A |
| 30%   31C    P8    N/A /  N/A |     58MiB /  2048MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

CUDA 8RC is the recommended CUDA toolkit for use on GTX1080

If the problem still happens with the CUDA 8RC toolkit, I would suggest filing a bug at developer.nvidia.com

Can someone run these tests on a 1070?
I’m curious to know whether or not this is DDR5x (1080) vs DDR5 (1070) related.

I can confirm this issue on Windows 7 x64 CUDA 8 GTX 1080;

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080
 Shmoo Mode

..................................................CUDA error at C:/ProgramData/NVIDIA Corporation/CUDA Samples/v8.0/1_Utilities/bandwidthTest/bandwidthTest.cu:823 code=4(cudaErrorLaunchFailure) "cudaDeviceSynchronize()"

Thanks for the report, I’ve filed a bug internally at NVIDIA, we’ll take a look.

This is fixed with the newer driver 368.81, or at least it is fixed on my system.

I can also confirm that this issue is now resolved using the latest driver (368.81) - see post #1 for my system specs.

Can someone confirm that this driver (a) fixes the very low memory bandwidth when using large strides and (b) is still limited to around 230MB/s memory bandwidth?

Sorry, I mean 230GB/s of course.

See my post (and also @CudaaduC’s) here: https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4929371/#4929371