Unable to debug kernel code with cuda-gdb

louis.child · February 6, 2024, 6:03pm

I’m trying to get VSCode + WSL setup as my dev environment so that I can use cuda-gdb for debugging kernels. I’ve managed to get everything installed and I can build and run Cuda applications, but I can only debug CPU code using cuda-gdb. This isn’t just limited to VSCode either. If I run my app through cuda-gdb on the command line and add a break point inside the kernel code it will set the breakpoint on the closing bracket of the kernel instead of the line I picked

tasks.json
{
    "version": "2.0.0",
    "tasks": [
        {
            "type": "cmake",
            "label": "CMake: configure",
            "command": "configure",
            "problemMatcher": [],
            "detail": "CMake template configure task"
        },
        {
            "type": "shell",
            "label": "Build",
            "command": "make dbg=1",
            "group": {
                "kind": "build",
                "isDefault": true,
            },
            "problemMatcher": ["$nvcc"],
            "dependsOn": ["CMake: configure"],
            "options": {
                "cwd": "${workspaceFolder}/build"
            }
        }
    ]
}

launch.json
{
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "${workspaceFolder}/build/ISP_AMF",
        }
    ]
}

The output of cuda-gdb when used via the command line:

(cuda-gdb) break main.cu:6
Breakpoint 1 at 0xb0a7: file /mnt/c/Dev/ISP-AMF/main.cu, line 8.
(cuda-gdb)

The default saxpy kernel code

#include <stdio.h>

__global__
void saxpy(int n, float a, float *x, float *y)
{
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) y[i] = a*x[i] + y[i];
}

int main(void)
{
  int N = 1<<20;
  float *x, *y, *d_x, *d_y;
  x = (float*)malloc(N*sizeof(float));
  y = (float*)malloc(N*sizeof(float));

  cudaMalloc(&d_x, N*sizeof(float)); 
  cudaMalloc(&d_y, N*sizeof(float));

  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

  // Perform SAXPY on 1M elements
  saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);

  cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);

  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = max(maxError, abs(y[i]-4.0f));
  printf("Max error: %f\n", maxError);

  cudaFree(d_x);
  cudaFree(d_y);
  free(x);
  free(y);
}

Based on some other posts similar I have already checked Windows regedit for the key that is commonly missing or set to false and it is set to 1. So I am at a bit of a loss of what to do to get kernel debugging working. Any help would be appreciated

Topic		Replies	Views
Cannot step into CUDA kernel while debugging with VSCode on WSL2 Ubuntu 22.04 CUDA-GDB	2	193	June 27, 2025
Cuda-gdb not working for kernel code under WSL2 CUDA on Windows Subsystem for Linux	2	2473	December 30, 2022
Segmentation fault occured when using cuda-gdb in wsl2 CUDA-GDB wsl	17	2058	December 21, 2023
Cuda-GDB doesn't hit breakpoints inside kernel/ if the kernel is in a library and then linked to the executable CUDA-GDB vscode , cuda-gdb	9	2987	April 13, 2023
Break points ignored and does not step into cuda Kernels. CUDA-GDB	2	1299	August 7, 2017
Cuda-gdb cannot run properly on WSL2 CUDA-GDB cuda , cuda-gdb	10	720	March 19, 2024
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53837	August 1, 2011
How to setup a cmake project with cuda debugging support in vscode - Ubuntu 20.04, Nvidia RTX 2080 Max Q, driver 470, cuda toolkit 11.5 Nsight Visual Studio Code Edition	7	6269	January 28, 2023
How to setup remote debugging in VS code CUDA-GDB	2	373	April 2, 2025
Problem in debugging CUDA kernel Nsight Visual Studio Edition	3	1451	February 5, 2015

Unable to debug kernel code with cuda-gdb

Related topics