OpenMP + different GTX GPUs + Driver > v391.35 (Win 10 / Win 7)


We ( using muliple GPUs in our software. Each GPU is used by one thread (openmp). When we have different GPUs we get an error “cudaErrorMemoryAllocation”. With two or four same GPUs, it is OK.

Win7 / Win10 (both x64)
Driver: 398.82 -> OK with same GPUs, NOK with different GPUs
Driver: 391.35 -> OK with same and different GPUs

GPUs: GTX 680 + GTX 1080 Ti -> NOK
GTX 980 Ti + GTX 1080 -> NOK
2 x GTX Titan -> OK

The same error occours in the “cudaOpenMP” sample.

CUDA SDK: 9.1.85 and 7.5

Best regards,

You may wish to test things on the latest CUDA version 9.2.148

If you witness the same error there, my suggestion would be to file a bug report at

I have tested the latest CUDA version. But i get the same error. Also the cudaOpenMP sample returns this error.

When i try to report a bug, i get a javascript error. May i create an empty bug and you link the bug to this topic?

Yes, create an empty bug, I will help.

Bug ID: 2344747

I have the same issue.

There is already a link in the bug to this topic, and a QA engineer has already taken note of it. I don’t have any further information at this time.

I have the same issue, resulting in my path tracer using only single GPU with the newest drivers installed.

(GPU-s in my system are GTX 1080 ti and GTX 980; with driver 391.35 both work just fine, with the newer ones one of the cards fails (which one, depends on the exact driver and seems somewhat arbitrary))

Nice to know, it’s at least a known bug…

So that means that (with newest drivers) the CUDA API function for memory allocation is not thread-safe anymore ? Does not sound good …

Hi TheDonsky,
Can you be more specific with your issue? I tried to reproduce in lab but no luck. "with the newer ones one of the cards fails " can you share the version of newer ones? And you are using CUDA9.2.148 toolkit, right?

anyone having this issue may want to test 399.07 or newer driver

I couldn’t repro in lab with GeForce GTX 1080 Ti and GeForce GTX 980 on Win10.

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.148

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>nvidia-smi
Thu Aug 30 13:57:00 2018
| NVIDIA-SMI 399.07                 Driver Version: 399.07                    |
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 980    WDDM  | 00000000:01:00.0  On |                  N/A |
| 26%   31C    P8    14W / 180W |    146MiB /  4096MiB |      0%      Default |
|   1  GeForce GTX 108... WDDM  | 00000000:06:00.0 Off |                  N/A |
| 23%   31C    P8    13W / 250W |    137MiB / 11264MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1236    C+G   Insufficient Permissions                   N/A      |
|    0      1640    C+G   ...3.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A      |
|    0      2424    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0      3892    C+G   Insufficient Permissions                   N/A      |
|    0      5148    C+G   C:\Windows\explorer.exe                    N/A      |
|    0      6028    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8004    C+G   ...mmersiveControlPanel\SystemSettings.exe N/A      |

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>cudaOpenMP.exe
cudaOpenMP.exe Starting...

number of host CPUs:    8
number of CUDA devices: 2
   0: GeForce GTX 1080 Ti
   1: GeForce GTX 980
CPU thread 0 (of 2) uses CUDA device 0
CPU thread 1 (of 2) uses CUDA device 1

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>

Hello, nvswteam,

I am using CUDA 9.0 and testing on Windows 7 machine (CUDA 9.2 needs newer driver to even run); the code either fails on allocation, or the kernel launch (is an issue from a while ago and I can’t recall the exact error due to the fact, that the application now has a GUI and is not logging anything on console). Also, the code uses some templated function pointers on top of everything and I thought it might be an issue with mismatched addresses, but I have nothing to prove it…

The Project is and the failing code resides somewhere in the file: (lines between 82 and 107 DumbRenderer::renderBlocksGPU; yes, the one, that calls ebsolutely everything).

The memory management system in my project is somewhat too convoluted to be worth a direct investigation, so I would not exactly recommend researching the code; I’ll just try to reproduce the issue on a much smaller scale in a single file or something like that and post it here. The only problem is that this, being my personal side project, is not what I can get involved in right now and it may have to wait for a couple of days.

(Note: on my windows 10 machine everything seems to be working properly, as if it were the older drivers, but it’s a single GPU system and it’s kind of irrelevent for this case)

The latest driver 399.07 fix the problem.

Tested with an GTX 1050 Ti + 780 Ti on Windows 7 x64

<= 391.35 -> OK

391.35 till < 399.07 -> NOK
399.07 -> OK

Thank you for the fast help.

Best regards,

I too confirm, that the bug is gone after updating to 399.07; and the performance is a little better as well.

So, I don’t see an obvious reason for attempting to replicate the issue and will likely ignore my promise, unless someone thinks, we might still need to work on the case.

Thanks Chaos-ThoR and TheDonsky for the feedback, I will close this ticket for now.