OpenMP + different GTX GPUs + Driver > v391.35 (Win 10 / Win 7)

Chaos-ThoR · August 21, 2018, 3:11pm

Hello.

We (http://www.lavision.de) using muliple GPUs in our software. Each GPU is used by one thread (openmp). When we have different GPUs we get an error “cudaErrorMemoryAllocation”. With two or four same GPUs, it is OK.

Specs:
Win7 / Win10 (both x64)
Driver: 398.82 → OK with same GPUs, NOK with different GPUs
Driver: 391.35 → OK with same and different GPUs

GPUs: GTX 680 + GTX 1080 Ti → NOK
GTX 980 Ti + GTX 1080 → NOK
2 x GTX Titan → OK

The same error occours in the “cudaOpenMP” sample.

CUDA SDK: 9.1.85 and 7.5

Best regards,
Thomas

Robert_Crovella · August 21, 2018, 3:17pm

You may wish to test things on the latest CUDA version 9.2.148

If you witness the same error there, my suggestion would be to file a bug report at developer.nvidia.com

Chaos-ThoR · August 23, 2018, 9:57am

I have tested the latest CUDA version. But i get the same error. Also the cudaOpenMP sample returns this error.

When i try to report a bug, i get a javascript error. May i create an empty bug and you link the bug to this topic?

Robert_Crovella · August 23, 2018, 8:26pm

Yes, create an empty bug, I will help.

Chaos-ThoR · August 24, 2018, 6:45am

Bug ID: 2344747

jwill266 · August 24, 2018, 7:47am

I have the same issue.

Robert_Crovella · August 24, 2018, 1:08pm

There is already a link in the bug to this topic, and a QA engineer has already taken note of it. I don’t have any further information at this time.

TheDonsky · August 26, 2018, 10:46pm

I have the same issue, resulting in my path tracer using only single GPU with the newest drivers installed.

(GPU-s in my system are GTX 1080 ti and GTX 980; with driver 391.35 both work just fine, with the newer ones one of the cards fails (which one, depends on the exact driver and seems somewhat arbitrary))

Nice to know, it’s at least a known bug…

HannesF99 · August 29, 2018, 10:00am

So that means that (with newest drivers) the CUDA API function for memory allocation is not thread-safe anymore ? Does not sound good …

nvswteam · August 30, 2018, 1:57am

Hi TheDonsky,
Can you be more specific with your issue? I tried to reproduce in lab but no luck. "with the newer ones one of the cards fails " can you share the version of newer ones? And you are using CUDA9.2.148 toolkit, right?

Robert_Crovella · August 30, 2018, 3:17am

anyone having this issue may want to test 399.07 or newer driver

nvswteam · August 30, 2018, 5:59am

I couldn’t repro in lab with GeForce GTX 1080 Ti and GeForce GTX 980 on Win10.
CUDA9.2.148_399.07

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.148

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>nvidia-smi
Thu Aug 30 13:57:00 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 399.07                 Driver Version: 399.07                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980    WDDM  | 00000000:01:00.0  On |                  N/A |
| 26%   31C    P8    14W / 180W |    146MiB /  4096MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108... WDDM  | 00000000:06:00.0 Off |                  N/A |
| 23%   31C    P8    13W / 250W |    137MiB / 11264MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1236    C+G   Insufficient Permissions                   N/A      |
|    0      1640    C+G   ...3.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A      |
|    0      2424    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0      3892    C+G   Insufficient Permissions                   N/A      |
|    0      5148    C+G   C:\Windows\explorer.exe                    N/A      |
|    0      6028    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8004    C+G   ...mmersiveControlPanel\SystemSettings.exe N/A      |
+-----------------------------------------------------------------------------+

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>cudaOpenMP.exe
cudaOpenMP.exe Starting...

number of host CPUs:    8
number of CUDA devices: 2
   0: GeForce GTX 1080 Ti
   1: GeForce GTX 980
---------------------------
CPU thread 0 (of 2) uses CUDA device 0
CPU thread 1 (of 2) uses CUDA device 1
---------------------------

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.2\bin\win64\Release>

TheDonsky · August 30, 2018, 6:28am

Hello, nvswteam,

I am using CUDA 9.0 and testing on Windows 7 machine (CUDA 9.2 needs newer driver to even run); the code either fails on allocation, or the kernel launch (is an issue from a while ago and I can’t recall the exact error due to the fact, that the application now has a GUI and is not logging anything on console). Also, the code uses some templated function pointers on top of everything and I thought it might be an issue with mismatched addresses, but I have nothing to prove it…

The Project is GitHub - TheDonsky/DumbRay and the failing code resides somewhere in the file: DumbRay/DumbRenderer.cu at master · TheDonsky/DumbRay · GitHub (lines between 82 and 107 DumbRenderer::renderBlocksGPU; yes, the one, that calls ebsolutely everything).

The memory management system in my project is somewhat too convoluted to be worth a direct investigation, so I would not exactly recommend researching the code; I’ll just try to reproduce the issue on a much smaller scale in a single file or something like that and post it here. The only problem is that this, being my personal side project, is not what I can get involved in right now and it may have to wait for a couple of days.

(Note: on my windows 10 machine everything seems to be working properly, as if it were the older drivers, but it’s a single GPU system and it’s kind of irrelevent for this case)

Chaos-ThoR · August 30, 2018, 12:34pm

The latest driver 399.07 fix the problem.

Tested with an GTX 1050 Ti + 780 Ti on Windows 7 x64

Driver:
<= 391.35 → OK

391.35 till < 399.07 → NOK
399.07 → OK

Thank you for the fast help.

Best regards,
Thomas

TheDonsky · August 30, 2018, 3:12pm

I too confirm, that the bug is gone after updating to 399.07; and the performance is a little better as well.

So, I don’t see an obvious reason for attempting to replicate the issue and will likely ignore my promise, unless someone thinks, we might still need to work on the case.

nvswteam · August 31, 2018, 1:11am

Thanks Chaos-ThoR and TheDonsky for the feedback, I will close this ticket for now.

Topic		Replies	Views
Random segmentation fault Legacy PGI Compilers	12	1315	December 30, 2020
Multiple GPUs not working CUDA Programming and Performance	1	824	July 9, 2009
Run-time error for multi-gpu programming with openmp (pgfort Legacy PGI Compilers	11	11572	January 25, 2014
CUDA & openMP Problem with the SDK sample code CUDA Programming and Performance	11	14034	September 12, 2015
Multi-GPU Memory Allocation behaves differently with different order of allocation CUDA Programming and Performance	1	776	June 15, 2013
gtx295 global memory out of memory CUDA Programming and Performance	4	3863	October 23, 2009
cudaMalloc(Pitch) _significantly_ slower on windows with Geforce drivers > 350.12 CUDA Programming and Performance	10	2558	February 10, 2017
Omp_target_alloc never returns NULL nvc, nvc++ and nvfortran	4	581	July 28, 2023
[980 Ti, Windows 10, CUDA 7.5] Out of memory after allocating 4.5 out of 6gb CUDA Programming and Performance	7	5159	December 6, 2015
cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	0	1754	April 18, 2019

OpenMP + different GTX GPUs + Driver > v391.35 (Win 10 / Win 7)

Related topics