Can't launch 1 block with 1024 threads when maximizing shared memory using cudaFuncSetAttribute

gregoryamdg · August 11, 2023, 9:10pm

Just a strange error where I can’t launch a block with 1024 threads if I maximize shared memory using

  returnVal = cudaFuncSetAttribute(
  processBridgesPassStructsTimed,
  cudaFuncAttributeMaxDynamicSharedMemorySize, 
  csr.maxSM);

To dynamically request all the possible shared memory.

cudaErrorLaunchOutOfResources (error 701) due to “too many resources requested for launch”

Works fine for 512 threads.

Robert_Crovella · August 11, 2023, 10:13pm

Maybe the actual problem is a register problem and not a shared memory problem. I think it’s probably not practical to diagnose an issue based on one line of code or the level of information provided so far. OTOH you might get much more useful help if you provided a short, complete test case that demonstrates the issue. Note that asking for a “short, complete test case” is not the same as asking for “your whole code”.

Do as you wish, of course. Just a suggestion.

gregoryamdg · August 11, 2023, 11:23pm

The global kernel uses 92 registers
ptxas info : Used 92 registers, 16016 bytes smem, 904 bytes cmem[0], 4 bytes cmem[2]

The global kernel in question calls 3 device kernels
K1:
ptxas info : Used 40 registers, 16016 bytes smem, 512 bytes cmem[0]
K2:
ptxas info : Used 40 registers, 16016 bytes smem, 492 bytes cmem[0], 4 bytes cmem[2]
K3:
ptxas info : Used 40 registers, 16016 bytes smem, 560 bytes cmem[0]

Going off the global kernel’s register usage 1024*92 > 64K, which appears to be the limit for ampere registers per SM.

Thanks

Topic		Replies	Views
cudaErrorLaunchOutOfResources aka "too many resources requested for launch" CUDA Programming and Performance	3	10154	July 29, 2013
<500 threads and out of resources? 9600GT should support 512 threads/block CUDA Programming and Performance	9	3519	September 17, 2008
regsPerBlock CUDA Programming and Performance	4	2456	September 28, 2008
"too many resources requested for launch" and the broader question of understanding limitations. CUDA Programming and Performance	1	730	January 17, 2020
Max shared memory CUDA Programming and Performance	0	1262	July 28, 2020
cudaErrorLaunchOutOfResources(701) when launching __global__ function CUDA Programming and Performance cuda	6	3089	March 28, 2021
Launch out of Resources: Why? CUDA Programming and Performance	12	14563	May 28, 2008
Launching Kernel Fail CUDA Programming and Performance	15	3406	May 28, 2014
"too many resources requested for launch." - on second launch of a kernel CUDA Programming and Performance	5	2190	March 8, 2018
Too Many Resources Requested CUDA Programming and Performance	8	1368	June 11, 2009

Can't launch 1 block with 1024 threads when maximizing shared memory using cudaFuncSetAttribute

Related topics