ERROR: too many resources requested for launch.

kwang · November 20, 2009, 12:41am

Hey, there,

I got some error information as:

Kernel execution failed!!! in file <gpuforward .cu>, line 101 : too many resources requested for launch.

I did a google search and found it may be caused by too many registers or shared memory required. I believe it’s due to too the registers as I remove some of them, the error is gone. However, I donot understand why. I’m using Tesla 1060. Running the device Query shows It has 16384 registers per multiprocessor. I set each block with 512 threads. So if each multiprocessor has two blocks, each thread should be able to use 16384/1024=16 registers. But actually, I used only 14 registers in my kernel. And it seems I can only use 10 of them.

Moreover, I can claim the 11th register without using it. Once I use it. It gives the same error as above.

Did I misunderstand the limits on registers I can use?

Thank you very much.

seibert · November 20, 2009, 1:21am

Even if you used more than 16 registers/thread, the block scheduler would only run one block at a time per multiprocessor and you would still be fine.

The three possible culprits for a resource error are:

Too large a block or grid size.
Too many registers
Too much shared memory

You’ve mentioned the registers, which don’t seem like a problem. And you’ve mentioned that the block size is 512 (I assume that is 1D). How big is your grid, and how much shared memory do you use?

kwang · November 20, 2009, 1:48am

Thank you for your reply.

yes, the block size is 1D. grid size is also 1D. 360 blocks/grid.

I used 4096 bytes in shared memory. Tesla 1060 has 16384 bytes shared memory per multiprocessor. So I think it’s OK. I tried to reduce the shared memory usage, the error exists. So I think shared memory is not the problem.

Now I reduce the number of threads in each block. It works fine, but it’s slower.

Is there any restrictions on the grid size?

seibert · November 20, 2009, 4:21am

Yes, but the grid restriction is very large (65535x65535), so you are nowhere near that either.

I’m stumped now. Your grid size, block size, register usage and shared memory usage should be well within the limits of a C1060 (and even within the range of an older C870). I have no idea what other resources CUDA could be lacking…

ehodzic · December 15, 2009, 5:40pm

How many arguments do you have to the failing kernel?

I have a kernel with 32 arguments that fails to launch. Reducing arguments to 30 makes it launch successfully.

I limit the number of registers to 64, shared memory is within the limits, and so it the grid and block size. The only strange thing is the use of cmem[13]?! Here’s the ptxas output:

ptxas info : Used 64 registers, 320+0 bytes lmem, 240+16 bytes smem, 4084 bytes cmem[0], 140 bytes cmem[1], 20 bytes cmem[13]

The exact error is:

Cuda call at test.cu:1032 fails with: 7: too many resources requested for launch

Which resources is that?

Since the limit on the arguments is 256 bytes (from the programming guide: global function parameters are currently passed via shared memory to the
device and limited to 256 bytes,) 16 of which might be the execution configuration (in some packed form?!), we are left with 240 bytes for the actual arguments. Assuming each argument takes 8 bytes (e.g. sizeof(int*) is 8 on 64 bit machine,) it comes out that we can pass only 30 arguments (when they are (most of them) pointers.) This is just an educated speculation.

My target device is a Tesla C1060. CUDA toolkit version 2.3, driver version 190.16, running on Fedora 9, Linux kernel version 2.6.27.25-78.2.56.fc9.x86_64, 64-bit.

Edin Hodzic

LSChien · December 16, 2009, 1:24am

from result of decuda, formal parameter of a kernel function starts at 0x0010 of shared memory.

0x0000 ~ 0x000f (16 bytes) contains 8 objects, each has 2 bytes, including

blockDim.x , blockDim.y , blockDim. z

gridDim.x , gridDim.y , gridDim.z

blockIdx.x , blockIdx.y

ehodzic · December 16, 2009, 4:17am

Very good.

It is also interesting that the same kernel with 32 arguments we got works on a 32-bit XP but fails to launch on a 64-bit Linux. The reason is perhaps sizeof(void*) and __alignof(void*) on the former machine is 4, whereas on the latter it is 8. The 32-bit machine can pack 240/4=60 arguments while the 64-bit machine can pack only 240/8=30 (pointer) arguments to a kernel.

Edin Hodzic

nitin.life · December 16, 2009, 9:18am

Hey, there,

I got some error information as:

Kernel execution failed!!! in file <gpuforward .cu>, line 101 : too many resources requested for launch.

I did a google search and found it may be caused by too many registers or shared memory required. I believe it’s due to too the registers as I remove some of them, the error is gone. However, I donot understand why. I’m using Tesla 1060. Running the device Query shows It has 16384 registers per multiprocessor. I set each block with 512 threads. So if each multiprocessor has two blocks, each thread should be able to use 16384/1024=16 registers. But actually, I used only 14 registers in my kernel. And it seems I can only use 10 of them.

Moreover, I can claim the 11th register without using it. Once I use it. It gives the same error as above.

Did I misunderstand the limits on registers I can use?

Thank you very much.

The thing is that , I guess the ptx code is further optimized before its executed which maybe increasing your register usage by 1 or 2. Trying using the maxxregcount feature and restrict register usage to 11… you still get the error ?

nitin.life · December 16, 2009, 9:20am

Some issue with firefox// reposted the post

Topic		Replies	Views
too many resources requested for launch what does it exactly mean? CUDA Programming and Performance	3	1578	January 28, 2009
Kernel execution failed: Too many resources.. CUDA Programming and Performance	8	11520	November 29, 2007
too many resources requested for launch CUDA Programming and Performance	28	25010	December 1, 2010
Too many resources requested for launch Legacy PGI Compilers	3	8082	September 23, 2010
Too Many Resources Requested CUDA Programming and Performance	8	1446	June 11, 2009
cudaErrorLaunchOutOfResources aka "too many resources requested for launch" CUDA Programming and Performance	3	10293	July 29, 2013
too many resources requested for launch CUDA Programming and Performance	2	5938	September 7, 2009
TOO MANY RESOURCES REQUESTED FOR LAUNCH CUDA Programming and Performance	16	11989	September 2, 2008
Too many resources requested for launch: Strange Case CUDA Programming and Performance	6	1178	February 25, 2020
Shared memory limits and cudaError_enum How to precisely determine how much of the shared memory is CUDA Programming and Performance	5	2881	April 29, 2009

ERROR: too many resources requested for launch.

Related topics