On rtx5060, with 576.52 driver and 12.9 cuda toolkit, cuda-gdb cannot step into kernel function. Please Help!

My device is rtx5060 and I’m trying to debug the following demo program in Ubuntu-24.04 WSL:

#include <cstdio>
#include <cuda_runtime.h>

__global__ void testKernel(int *data) {
    int idx = threadIdx.x;
    data[idx] += 1;
}

int main() {
    cudaSetDevice(0);

    const int N = 4;
    int h_data[N] = {1, 2, 3, 4};
    int *d_data = nullptr;

    cudaMalloc(&d_data, N * sizeof(int));
    cudaMemcpy(d_data, h_data, N * sizeof(int), cudaMemcpyHostToDevice);

    // 启动 kernel
    testKernel<<<1, N>>>(d_data);
    cudaDeviceSynchronize();

    // 拷回结果
    cudaMemcpy(h_data, d_data, N * sizeof(int), cudaMemcpyDeviceToHost);
    cudaFree(d_data);

    printf("Results: ");
    for (int i = 0; i < N; i++)
        printf("%d ", h_data[i]);
    printf("\n");

    return 0;
}

When I start cuda-gdb, set the breakpoint on testKernel and run, it comes to the error below:

(cuda-gdb) break testKernel
Breakpoint 1 at 0x8ec4: file /home/sameta/my-flash-attention-minimal/test.cu, line 5.
(cuda-gdb) run
Starting program: /home/sameta/my-flash-attention-minimal/cuda_test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5bff000 (LWP 1773)]
[New Thread 0x7ffff49e0000 (LWP 1775)]
[Detaching after fork from child process 1776]
[New Thread 0x7fffefbff000 (LWP 1785)]
[Thread 0x7fffefbff000 (LWP 1785) exited]
[New Thread 0x7fffefbff000 (LWP 1786)]
[New Thread 0x7fffee87e000 (LWP 1787)]
[Detaching after vfork from child process 1793]
cuda-gdb/14/gdb/cuda/cuda-state.c:274: internal-error: create_module: Assertion `context' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x54234b ???
0x9932e4 ???
0x993508 ???
0xb3cac1 ???
0x637ee0 ???
0x603ec4 ???
0x6044be ???
0x4afe0f ???
0x7be492 ???
0x94b0bd ???
0x7736b2 ???
0x786a34 ???
0xb3d77c ???
0xb3d895 ???
0x7d1e56 ???
0x7d38a4 ???
0x44ae64 ???
0x7f8322bb91c9 __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7f8322bb928a __libc_start_main_impl
        ../csu/libc-start.c:360
0x45e7fd ???
0xffffffffffffffff ???
---------------------
cuda-gdb/14/gdb/cuda/cuda-state.c:274: internal-error: create_module: Assertion `context' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

nvidia-smi output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.04              Driver Version: 576.52         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060        On  |   00000000:01:00.0  On |                  N/A |
|  0%   36C    P8             12W /  145W |     773MiB /   8151MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I use “nvcc -g -G test.cu -o cuda_test” to generate the execute file.

What I have done to solve the problem:

I have tried to solve this problem for 2 days. Currently I have give up using cuda-gdb to debug my program and use print function to debug. If anyone could help, I will appreciate much!!!QAQ

Oh I have to provide another imformation, the program run normally without cuda-gdb.

Hi, @chenxlei23

Can you please check with latest published CUDA 13.1 ?
I have tried, it works.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.