Get error cudaErrorLaunchOutOfResources from kernel after removing -G compiler option

jianping · March 8, 2019, 9:08am

I have Nvidia Geforce GTX 1070 card on my desktop with windows 10 and Visual Studio 2017. I made a CUDA C++ project to do some math computation. the project will generate a dll and exports some functions in dll. these exported functions are called from my main (GUI) C++ project. Both of debug and release versions of dll are working correctly until I change “Device/Generate GPU Debug Information” in CUDA C/C++ tab in project property from Yes (-g/-G) to No for both of debug and release versions. I was doing so because I noticed the major GPU function exported from dll runs slower in release version than debug version – the time checking is in main project side, so it is not threads sync problem. but after such change, both versions report error cudaErrorLaunchOutOfResources when invoking kennel function. why it is working before the setting change if it is because the function call takes too many registers? I also notice in the option for Host/Optimization is for both versions. do you know which option I should chose in order to get accurate and high performance result?

the kernel function looks like
global void ToolShootTriangleKernel(const tagTriIndices *d_pTriIndices, size_t triNum,
const tagXyz *d_pTriVertices, const tagXyz &d_toolCen, const tagToolSec *d_pToolShape,
int toolSecNum, tagAtomicVar *d_pToolCtrlZ)

it is called like
int threadsPerBlock = 512;
int blocksPerGrid = (int)((triNum + threadsPerBlock - 1) / threadsPerBlock);

ToolShootTriangleKernel <<<blocksPerGrid, threadsPerBlock>>> (d_pTriIndices,
triNum, d_pTriVertices, d_toolCen, d_pToolShape, toolSecNum, d_pToolCtrlZ);

thanks in advance for any help.

jianping · March 10, 2019, 2:34am

figure it out.

“Register usage will also be affected if you are passing the -G switch to the compiler” from gpu - Counting registers/thread in Cuda kernel - Stack Overflow explaining why it works in debug not in release version.
Cannot set “CUDA C/C++ – Max Used Register” to 0 unless launch_bounds is used in the code.
To set “Generate GPU Debug Information” to No will double the speed in release version.

Topic		Replies	Views
Kernel is massivly slower when compiling without the "-G" flag CUDA Programming and Performance	3	724	June 21, 2016
Bug appears only when compiling to "release" How to track it down? CUDA Programming and Performance	15	4745	January 2, 2012
Strange performance results when changing from debug to release build CUDA Programming and Performance	1	660	February 17, 2016
Kernel WORKS in Release mode, "too many resources requested for launch" in Debug mode CUDA Programming and Performance cuda , kernel , linux , debugger , cuda-gdb	7	1398	August 15, 2022
Kernel crash when GPU Debug Info is disabled in Visual Studio CUDA Programming and Performance	5	995	March 12, 2018
kernel launch error: 'too many resources requested for launch' CUDA Programming and Performance	4	2421	May 29, 2017
Error in CUDA 5.5 Debug mode:"too many resources requested for launch" CUDA Programming and Performance	4	1385	August 16, 2013
Without GPU Debug Information limited to 512 kernels in CUDA 2.0 CUDA Programming and Performance	0	3669	September 19, 2011
[NVCC BUG] Kernel seems to silently fail CUDA NVCC Compiler	2	532	November 2, 2021
CUDA kernel about 60 times slower when compiled with -G Nsight Visual Studio Edition	2	937	December 19, 2013

Get error cudaErrorLaunchOutOfResources from kernel after removing -G compiler option

Related topics