Stack overflow error with compute-sanitizer

Robert_Crovella · September 20, 2023, 8:30pm

Just like a CPU, each CUDA thread maintains a stack. The stack is of fixed size. If your code behaves in such a way that it uses more stack space than that fixed size, it will result in a stack overflow. The machine is not ordinarily guaranteed to automatically catch such an error the moment it happens (although the resultant corrupted program behavior may eventually trigger a machine fault of some kind), but the additional instrumentation in compute-sanitizer can detect it. here is a general description, not GPU-specific.

Along with the above general description, I would pay close attention to any compiler warning messages that are emitted when compiling your code. Sometimes compiler warning messages take the form of “… stack size … cannot be statically determined …” Any such messages may provide additional clues. I’m not suggesting that the compiler would always emit such a warning message, or that it is doing so in your case. However if there are any such warning messages, they may provide additional clues.

Here is an example program that triggers that stack overflow error in compute-sanitizer:

# cat t48a.cu
#include <cstdio>
#include <iostream>
#include <cstdlib>

__device__ int a(int m, int n);

__global__ void k(int x, int y){
  printf("%d\n", a(x,y));
}

int main(int argc, char *argv[]){

  int val = 9;  // 3 or higher seems to be enough to trigger the fault
  if (argc > 1) val = atoi(argv[1]);
  k<<<1,1>>>(val,val);
  cudaError_t err = cudaDeviceSynchronize();
  std::cout << cudaGetErrorString(err) << std::endl;
}
# cat t48b.cu
__device__ int a(int m, int n){
    if (m == 0)
        return n + 1;
    if ((m > 0) && (n == 0))
        return a(m-1,1);
    else
        return a(m-1,a(m,n-1));
}
# nvcc -rdc=true t48a.cu t48b.cu -o t48 -lineinfo
nvlink warning : Stack size for entry function '_Z1kii' cannot be statically determined
# compute-sanitizer ./t48
========= COMPUTE-SANITIZER
========= Stack overflow
=========     at 0x10 in /root/bobc/t48b.cu:1:a(int, int)
=========     by thread (0,0,0) in block (0,0,0)
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x390]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x390]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x310]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48b.cu:7:a(int, int) [0x1d0]
=========     Device Frame:/root/bobc/t48a.cu:6:k(int, int) [0x80]
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x32e950]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x1093e]
=========                in /root/bobc/./t48
=========     Host Frame:cudaLaunchKernel [0x70b4e]
=========                in /root/bobc/./t48
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0xb14e]
=========                in /root/bobc/./t48
=========     Host Frame:__device_stub__Z1kii(int, int) [0xb024]
=========                in /root/bobc/./t48
=========     Host Frame:k(int, int) [0xb05f]
=========                in /root/bobc/./t48
=========     Host Frame:main [0xae53]
=========                in /root/bobc/./t48
=========     Host Frame: [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xab05]
=========                in /root/bobc/./t48
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x47e786]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaDeviceSynchronize [0x48a64]
=========                in /root/bobc/./t48
=========     Host Frame:main [0xae58]
=========                in /root/bobc/./t48
=========     Host Frame: [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xab05]
=========                in /root/bobc/./t48
=========
unspecified launch failure
========= ERROR SUMMARY: 2 errors
#

(CUDA 12.2)
NOTES:

I lifted the recursive function definition from here (also see here).
The compiler seems to be quite good at converting my naive attempts at recursion into a partially unrolled loop.
In the above example, the error message printout at the end (“unspecified launch failure”) may change to some other message (e.g. “an illegal memory access was encountered”) if the code is not run under compute-sanitizer. This is an indication that compute-sanitizer instruments the code and machine in such a way that the fault is detected differently. But in the general case I know of no guarantees that a runtime fault will be triggered in the event of stack overflow, without the use of a tool like compute-sanitizer. YMMV

Topic		Replies	Views
Error cp.async.ca.shared.global CUDA Programming and Performance	12	1502	August 17, 2022
Unspecified launch failure error CUDA Programming and Performance	10	18604	January 6, 2018
Random failures in quick-sort. Compute-sanitizer says there's no error CUDA Programming and Performance	4	24	October 5, 2024
How to choose how many threads/blocks to have? CUDA Programming and Performance	43	52488	June 7, 2022
Unknow error when calling device subroutine Legacy PGI Compilers	18	10480	September 1, 2015
[SOLVED]memory leak issues CUDA Programming and Performance	12	3641	October 12, 2015
cufftPlan1D initialisation hides subsequence memory access errors CUDA Programming and Performance	8	656	November 24, 2020
Upgrading to CUDA 12.4 broke down the application GPU-Accelerated Libraries cublas , cusparse	13	1175	July 21, 2024
CUDA thrust compilation error CUDA Programming and Performance	85	396	April 4, 2025
Passing dynamically allocated memory in kernel to sub kernel via dynamic parallelism CUDA Programming and Performance	6	693	May 3, 2019

Stack overflow error with compute-sanitizer

Related topics