CUDA Recursive Call

canberk_unal · November 29, 2021, 4:17pm

Hi Everyone,
I am trying to call my kernel functions recursively. But I am getting Cuda Error 719, even if my program makes 1 recursive call. When I add new functions with same content and call them inside each other like making 4 recursive call, everything works fine. What is the problem?

Thanks in advance.
Canberk

Robert_Crovella · November 29, 2021, 4:35pm

Calling a kernel from a kernel is referred to as dynamic parallelism in CUDA. There are CUDA sample projects that demonstrate proper usage of dynamic parallelism, and there is a whole section of the programming guide dedicated to it.

However I’m not able to sort out the problem you are describing based on that description. My usual suggestion in these cases is to provide a short, complete example that demonstrates the problem. The most important word there is complete. I should be able to copy, paste, compile and run, and see the issue, without having to add anything or change anything. Do as you wish, of course.

christian.weiss · November 19, 2023, 9:31am

In the occasions where I need recursive calls, I use the kernel function (the “global” one) as a wrapper for a device function, which implements the recursion. However, the parallelism I can achieve this way is very limited as I quickly encounter the error “too many resources requested for launch” when I increase the number of threads. The number of registers which are required for each thread seem to increase very steeply with the recursion level.
Also, I have often encountered invalid memory references, which I could solve by increasing the stack limit per thread. I guess that each device call comes with a significant overhead and that they cannot easily accumulated as much as on a CPU.

Robert_Crovella · November 20, 2023, 2:21pm

The sort of recursion you are describing is not based on CDP.

I’m not sure why that would be, in the general case. In the general case, the compiler has no knowledge of the recursion level (or depth) and therefore could not possibly be making register usage decisions based on recursion level.

Correct, increasing recursion depth will increase stack usage, which has to be accounted for.

christian.weiss · November 22, 2023, 12:00pm

Hi Robert,

I was unprecise in describing this observation. The issue happens at runtime. When I check cudaGetLastError after the kernel execution, I get too many resources requested for launch. Googling this error was not very successful, some posts that I found talk about the GPU running out of SMT registers, but I did not form a conclusive answer yet. However, the error message persists if I set the recursion level to one, so the issue is not related to recursion. I will probably create a new post about the issue once I get it into a small example.

Robert_Crovella · November 22, 2023, 3:04pm

The most common reason for too many resources requested for launch in my experience is that your (GPU kernel) code uses enough registers per thread, that when multiplied by the number of threads you are requesting to launch, exceeds the number of registers available on the GPU SM. There are other possibilities, though.

To assess whether my “most likely guess” is applicable, you would compile the file that contains the kernel in question using -Xptxas -v on the nvcc compile command line. Assuming you are running a situation where the kernel code is all in a single file, that will indicate how many registers per thread are used by each kernel in that file. If you multiply that number by the number of threads you are asking for at kernel launch time, you can compare the product to the hardware technical specification in the programming guide. As you can see there, all “current” GPUs offer 65536 registers per SM. Divide that number by the registers per thread reported by nvcc -Xptxas -v ... and you will get the maximum number of threads that can be launched, subject to granularity considerations. Any attempt to launch more than that many threads, for that kernel, will result in this error.

This general methodology along with suggestions to remedy is covered in various forum posts, such as here and here

Topic		Replies	Views
Origin of "too many resources requested for launch" in CUDA example CUDA Programming and Performance	4	277	November 22, 2023
"too many resources requested for launch." - on second launch of a kernel CUDA Programming and Performance	5	2203	March 8, 2018
Why does my kernel launch? CUDA Programming and Performance	5	5990	February 13, 2009
kernel invocation from another cuda file CUDA Programming and Performance	10	762	February 10, 2018
kernel launch error: 'too many resources requested for launch' CUDA Programming and Performance	4	2402	May 29, 2017
cudaGetLastError. Which kernel execution raised it? CUDA Programming and Performance	10	3537	March 8, 2019
cannot resolve the error in running multi-block, mutli-threads kernel CUDA Programming and Performance	5	1066	February 5, 2014
Call to cuLaunchKernel results in CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES CUDA Programming and Performance cuda , kernel , python	4	58	March 21, 2025
Linking device code CUDA Programming and Performance	13	7359	December 8, 2014
Too many resources requested for launch: Strange Case CUDA Programming and Performance	6	1037	February 25, 2020

CUDA Recursive Call

Related topics