cudaErrorLaunchOutOfResources: too many resources requested for launch, when 3480 registers used

cuda and cuda-gdb are 11.7 version

ptxas info    : 834 bytes gmem, 17 bytes cmem[3]

ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a658526MyMath4InitEPKhm
    56 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a658526MyMath10WriteToEPKhmRm
    120 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a658526MyMath7DoPlusEPPmmS1_
    72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a658526MyMath10FreeEPPmm
    32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a6585210BytesToHexEPKhj
    16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a6585212octosToBytesEPKmmPPh
    32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads
    
ptxas info    : Compiling entry function '__nv_static_28__02f6b9f1_7_main_cu_73a65852__ZN9UnitTests16MyMathTestKernelEPNS_7TestJobEj' for 'sm_52'
ptxas info    : Function properties for __nv_static_28__02f6b9f1_7_main_cu_73a65852__ZN9UnitTests16MyMathTestKernelEPNS_7TestJobEj
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 20 registers, 332 bytes cmem[0]
ptxas info    : Function properties for _ZN37_INTERNAL_02f6b9f1_7_main_cu_73a658526MyMath9appendLenEmRmS1_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
    
328 bytes stack frame
300 bytes spill stores
300 bytes spill loads

20 registers.

Total number of registers available per block: 65536

threads per block 174; 20*174 = 3480 registers per block < 65536

1150 blocks.
4 002 000 registers per grid.

error: 'cudaErrorLaunchOutOfResources' 'too many resources requested for launch' 

when there are fewer blocks in the grid, this error is not present.
there is a lot of free video memory and the limit for it is large

why this error can be?

Hi @lavshyak
This question might be more suitable for CUDA Programming and Performance - NVIDIA Developer Forums forum. I have moved it there.

You may be running out of local memory. A typical writeup is here. You haven’t provided enough information to diagnose. The number of blocks in the grid shouldn’t have any effect on the resource issues I am aware of (registers, local/stack memory).