HPC computing in VS Code: "unable to find cuda-gdb in undefined"

vinceverspeek · May 30, 2023, 12:12pm

Hi,

I am currently working on a master thesis, in which I wish to utilize a SLURM HPC to do expensive computing with CUDA in the field of satisfiability solving. These are the steps I now follow to set up a CUDA environment in VS Code:

Connect to the HPC and enter the project folder.
Set up an interactive session with a computing node that contains a GPU (NVIDIA A100).
Load the module CUDA/12.1.0 onto the node (I have the Nsight plugin installed in VS Code).

However, whenever I try to use any project with a launch.json to attempt debugging, I will receive the error message “unable to find cuda-gdb in undefined” when trying to run. This is the case with my own project, but also with sample projects provided by CUDA. However, running cuda-gdb in the node’s terminal does work and results in an interactive environment. Also, running a project with nvcc from the terminal works, so the cuda developer toolkit is correctly loaded.

To me, the error message is not really meaningful. Does anyone know what the problem could be, or perhaps pinpoint me to wherever I need to look to fix the problem? Thanks

navyaasanan · May 30, 2023, 4:24pm

Hi!

In terms of resolution of this error - are you using the latest version of Nsight VS Code Edition? We have seen this issue in older versions but this has been resolved recently.

In terms of working around this error - I would recommend setting “debuggerPath” in your launch.json so that we know where cuda-gdb is on your system. It would look like adding the following line to your launch.json:

"debuggerPath": "/replace/with/path/to/cuda-gdb"

Hope that helps!

vinceverspeek · June 1, 2023, 10:23am

Hi nsanan,

Thanks a lot for the quick reply. In fact adding the debuggerPath to the launch.json did solve the error message “unable to find cuda-gdb in undefined”, but debugging still does not work as it should, which I have been occupied with the last day. The version of NVIDIA Nsight I am using is 2023.2.

The problem I am having now is as follows. Whenever I compile (with a Makefile) my program and run it after with the command ./parafrost uf50-01.cnf --verbose=0 I get correct results. In this, “uf50-01.cnf” indicates a domain specific problem file, but it is of not much importance here. However, whenever I attempt to run the program with the VS Code debugger, I get the error message ERROR - no GPU(s) available that support CUDA. I believe that for some reason debugging is not correctly configured in VS Code, which results in the program not recognizing the device. Below I have provided the launch.json file that I am now using.

`    "version": "0.2.0",
      "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "${workspaceFolder}/parafrost",
            "args": ["${workspaceFolder}/uf50-01.cnf"],
            "stopAtEntry": false,
            "cwd": "${fileDirname}",
            "environment": [],
            "debuggerPath": "/sw/arch/RHEL8/EB_production/2022/software/CUDA/11.7.0/bin/cuda-gdb"
        },
        {
            "name": "CUDA C++: Attach",
            "type": "cuda-gdb",
            "request": "attach"
        }
    ]
}`

Do you happen to know what could be the problem? Thank you!

navyaasanan · June 1, 2023, 4:48pm

Are you able to debug with cuda-gdb on command line? The error you mentioned is oftentimes an issue with an incorrect/mismatched driver installation.

As a starting step, I would recommend trying to open a dummy program (maybe a NVIDIA provided CUDA sample) and trying to debug it using cuda-gdb CLI. If that does not work, I would recommend inspecting that further or re installing the CTK. Also, I noticed that you are using CUDA 11.7. We recommend upgrading to 12.1 or higher for best compatibility with Nsight VS Code Edition.

vinceverspeek · June 1, 2023, 5:42pm

So, I have done some experiments with several programs and different CUDA versions. The reason I was using CUDA/11.7.0 is because the CUDA driver version that is reported when running nvidia-smi is 11.7.0. I have the possibility to use version 12.1.0.

When using my own program and CUDA version 11.7.0, both cuda-gdb in the terminal works (I can jump from line to line, etc.) as well as running the entire program from the terminal with nvcc. Furthermore, debugging in VS code works, but when I run the entire program with debugging in VS Code I get the ERROR - no GPU(s) available that support CUDA.
When using my own program and CUDA version 12.1.0, cuda-gdb does work, as well as debugging in VS Code, but running the entire program results in ERROR - no GPU(s) available that support CUDA both in VS Code as well as when running from terminal with nvcc.
As a sample program, I picked the vectorAdd project. Debugging works from terminal as well as in VS Code, but advancing further in the program results into a Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!. This happens in 11.7.0 and in 12.1.0.

I would like to mention also, that debugging in VS Code behaves strangely to me. When I have a file opened in a tab in a correct location, whenever debugging is started that same file opens up in a new tab. The newly opened file is in a different location and IntelliSense often indicates several include errors.

Could it perhaps be the case that the version of NVIDIA Nsight is too new for the CUDA driver version that is installed?

navyaasanan · June 1, 2023, 6:21pm

debugging in VS code works, but when I run the entire program with debugging in VS Code I get the ERROR - no GPU(s) available that support CUDA .

Can you please give a little more detail here? I am not sure I understand what you mean by debugging works but when you run the program with debugging that does not work.

Also, in the config you shared there are both launch and attach configs - which one are you using?

vinceverspeek · June 1, 2023, 6:52pm

What I mean is that the principle of debugging works, so I can step over, step in, etc. However, when I skip to the end of the program while debugging (hence running the entire program) I get the error that indicates no GPU is available. This is contrary to when I run the entire program with nvcc, which does work normally.

Regarding the configuration, I presume I am using the launch config…? The launch and attach part were both automatically generated by VS Code. I adapted the launch segment and just left the attach segment. Should I remove the attach segment?

navyaasanan · June 1, 2023, 10:45pm

You don’t have to delete the attach config, I would just double check that you are running the launch config. Also, if debugging starts and you are not attaching to a specific PID, that would imply that you are using launch config but it’s nice to be sure. You can double check what you’re running by looking at the top right on the ‘Run and Debug’ view in VS Code. Whatever is next to the green triangle is what you are running.

A few things:
Can you run which cuda-gdb and make sure the path there is the same thing you have in debuggerPath. It is atypical for such discrepancies to happen between cuda-gdb/Nsight Visual Studio Code Edition so it would be nice to be sure.
Second, can you share system details - OS, CTK, driver version, VS Code version and so on. I tried out what you said I am unable to repro it on my end so I am looking for any system information I might be missing.