Error when invoking a CUDA algorithm on two devices simultanously

HannesF99 · June 13, 2016, 1:46pm

I have a CUDA algorithm which works fine when I invoke it either on my Maxwell GPU (GTX 960) or Kepler GPU (GTX 770).

But when I invoke the CUDA algorithm simultanously on both devices (first CPU thread calls the CUDA algorithm on GPU 0, second CPU threads call the CUDA algorithm on GPU 1), I get an error (launch error), always on GPU 0 (GTX 960), after the first CUDA kernel call (which is a own-written convert function which converts a uint8 image to float image)

What am I doing wrong, why does this error only occur in that special setting ?

One guess of me is that when several GPUs are involved, then allocated images may get mapped into different GPU memory ranges, and maybe my convert routines has troubles for images allocated in a certain memory range.

Running the program with cuda-memcheck command line tool gives me

Program hit cudaErrorLaunchFailure (error 4) due to “unspecified launch failure” on CUDA API call to cudaDeviceS
ronize.
===== Saved host backtrace up to driver entry point at error
===== Host Frame:C:\Windows\system32\nvcuda.dll (cuMemcpy2D_v2_ptds + 0xa3f99e) [0xa6360b]
===== Host Frame:K:\common\libsUsage\JRSPointTrackerTest1\bin\cudart64_70.dll (cudaDeviceSynchronize + 0xf9) [0x1a699]
===== Host Frame:K:\common\libsUsage\JRSPointTrackerTest1\bin\CudaCVCore3.1_w64_vc120d.dll (cucv_Convert + 0x9e57) [0x1d887]
===== Host Frame:K:\common\libsUsage\JRSPointTrackerTest1\bin\CudaCVCore3.1_w64_vc120d.dll (cucv::Convert + 0x6f) [0x4caf]
===== Host Frame:K:\common\libsUsage\JRSPointTrackerTest1\bin\CudaCVCore3.1_w64_vc120d.dll (cucv_CreateLKPlan + 0xa45) [0x2afd5]

My system is Windows 7 (64-bit), Cuda Toolkit 7.0, latest Geforce Drivers.

cbuchner1 · June 13, 2016, 2:38pm

I remember having had similar inexplicable startup issues when simultaneously launching work jobs on multiple GPUs (in my case it was for crypto coin mining with cudaminer)

My fix was to stagger the initialization phases of several GPU threads, so CUDA context creation and memory allocations of different threads would not overlap.

Christian

HannesF99 · June 14, 2016, 7:48am

Hi thanks, something like that was also along my thoughts cause the error occurs at the very first kernel call on that device (the GTX 960).
I added now a ‘GPU warmup’ phase at the begin of the application, where l force the cuda context creation by doing a cudaMalloc / cudaFree on each device.
Unfortunately, thate did not help, i still get the same error.

HannesF99 · July 19, 2016, 8:33am

Update: I updated my multi-GPU system (GTX 960 / GTX 770) to Windows 10, still get the same error. Furthermore, I tested the program on two other Windows Multi-GPU workstations (a mixed Kepler/Maxwell system with one Titan Black and one Titan X, and a pure Kepler system with two Quadro K6000). On both workstations, the executable failed also.
I filed a bug to NVIDIA.

Topic		Replies	Views
Program received signal CUDA_EXCEPTION_10, Device Illegal Address. CUDA-GDB	5	3242	March 3, 2017
CPU threads and CUDA CUDA Programming and Performance	8	7157	January 15, 2018
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3462	March 10, 2011
CUDA Error: Invalid Device Function Debugging CUDA errors CUDA Programming and Performance	3	5761	July 29, 2009
multi GPUs programming error CUDA Programming and Performance	4	890	June 4, 2013
Wanted to start using CUDA, getting "no CUDA-capable device is detected" error CUDA Setup and Installation	10	6772	May 13, 2024
MultiGPU start help CUDA Programming and Performance	8	10522	August 10, 2010
'Invalid argument' error in a cudaMemset on 2 GPU configuration in multithreaded application CUDA Programming and Performance	3	3574	November 7, 2017
Cannot set up CUDA with C# app on Windows CUDA Setup and Installation	0	161	June 10, 2024
Error 719 (failure to launch) for JCUDA and PyCUDA; How to run GPU consecutive times for 'large' data blocks CUDA Programming and Performance	0	2327	December 13, 2016

Error when invoking a CUDA algorithm on two devices simultanously

Related topics