Next-gen debugging became very slow after upgrading to CUDA 12.5

I upgraded from CUDA 12.2 to CUDA 12.5 using cuda_12.5.0_555.85_windows.exe.

Now next-gen debugging runs very slowly compared to running the program with the VS debugger.
For example, a kernel launch and synchronize takes about 3 seconds with next-gen debugging vs 420 ms with the VS debugger as measured with breakpoints in host code. The legacy debugger does not seem to have the slowdown issue, but it does not stop at any breakpoints, so I have no firm numbers. I don’t recall any significant slowdown when doing next-gen debugging with 12.2 or earlier releases.

I have a real-time application. An 8x slowdown can interfere with its normal operation, making debugging very difficult.

Any suggestions for fixing the issue or finding its cause?

Some details:
The CUDA 12.5 install repeatedly failed at Nsight Compute, so I had to uncheck it to have the install complete. Before I figured that out, I upgraded the Nvidia driver to 555.99.

I do local debugging with a GPU that is not used for display.
The CUDA memory checker is disabled in the Nsight menu and in Nsight options.

GTX 1070 Ti
Nvidia driver 555.99
Windows 10 22H2
Visual Studio 2017
HAGS: enabling/disabling does not change the debugging behavior.
The code is compiled with debug info enabled.
The slowdown happened both before and after I rebuilt the entire solution (i.e., built with 12.2 and 12.5.)

Visual Studio Info (abridged):

Microsoft Visual Studio Professional 2017 (2)
Version 15.9.50
VisualStudio.15.Release/15.9.50+28307.2094
Microsoft .NET Framework
Version 4.8.09037

Installed Version: Professional

Visual C++ 2017 00369-60000-00001-AA551
Microsoft Visual C++ 2017

ASP.NET and Web Tools 2017 15.9.04012.0
ASP.NET and Web Tools 2017

Microsoft MI-Based Debugger 1.0
Provides support for connecting Visual Studio to MI compatible debuggers

Microsoft Visual Studio VC Package 1.0
Microsoft Visual Studio VC Package

MLGen Package Extension 1.0
MLGen Package Visual Studio Extension Detailed Info

NuGet Package Manager 4.6.0
NuGet Package Manager in Visual Studio. For more information about NuGet, visit http://docs.nuget.org/.

NVIDIA CUDA 11.7 Wizards 11.7
Wizards to create new NVIDIA CUDA projects and source files.

NVIDIA CUDA 12.2 Wizards 12.2
Wizards to create new NVIDIA CUDA projects and source files.

NVIDIA CUDA 12.5 Wizards 12.5
Wizards to create new NVIDIA CUDA projects and source files.

NVIDIA Nsight Visual Studio Edition 2024.2.0.24102

NVIDIA Nsight Visual Studio Edition - CUDA support 2024.2.0.24102
NVIDIA Nsight Visual Studio Edition - CUDA support provides tools for CUDA development and debugging.

Visual Studio Code Debug Adapter Host Package 1.0
Interop layer for hosting Visual Studio Code debug adapters in Visual Studio

Visual Studio Tools for CMake 1.0
Visual Studio Tools for CMake

Update:

I uninstalled most Nvidia CUDA-related software on my machine, including the driver.
I then did a custom install of 2.4, using cuda_12.4.1_windows_network.exe.

The slow next-gen debugging issue is unchanged.

Could it just be my imagination that next-gen debugging in 12.2 was not so slow?

Can anyone confirm that an 8x slowdown is to be expected with the next-gen debugger vs. the VS debugger (which does not debug device code)?

P.S. I upgraded to 12.4/12.5 because I heard that the issue of missing values in watch windows was fixed. That seems to be the case, so thanks to Nvidia for fixing that frustrating issue.

Hi, @CU_Steve

Sorry for the issue you met.

Did you set any breakpoint to get the time 3s VS 420ms ?
I think it is abnormal if you didn’t set any breakpoint in both tests, but have 8x slowdown.

Can you help us confirm if it is an regression issue ? You can get previous version from below and installed directly(no need to reinstall cuda)

Also is this issue also seen in other VS version, like VS2019, VS2022 ?

I see the issue regardless of breakpoints.
Running with the next-gen debugger is noticeably slower than running with the host-only Visual Studio debugger based on console output alone.

To measure the 3s vs 420 ms, I set two breakpoints in host code only.
One before a kernel launch. One after its corresponding cudaDeviceSynchronize().

So one breakpoint on the first line and one breakpoint on the last line of this example host code:

>   std::cout  <<  "\nInitializing memman..." << std::flush;
>   k_aud_memman_init<<< 1, 256 >>> ( SetupSml, SetupLrg );
> 
>   ce = cudaGetLastError();
>   if ( ce != cudaSuccess ) {
>     std::cout  <<  "\n**** CUDA Error during k_aud_memman_init() launch: "  <<  ce  <<  " ("  << cudaGetErrorString( ce ) <<  ")"  << std::endl;
>     cudaGetLastError();    // clear the error
>     return;
>   }
>   ce = cudaDeviceSynchronize();
>   std::cout  <<  "\nDone." << std::endl;

I see the same issue with 12.4 and 12.5. I did not try 12.3.
I have only Visual Studio 2017 installed.

Hi, @CU_Steve

Is it possible to provide us a repro for 8x slowdown ?
We didn’t see this internally.

My application is very big.
I will see if I can make a self-contained small demo that has the same slowdown.

I tried to make a small program that reproduces the issue.

To my surprise, the small program’s execution time was the same no matter which debugger I used.

The project settings look the same for both the demo project and the large application’s project.

The debug .exe sizes are ~800 kB and ~10 MB.
The release .exe for the large application is only ~4 MB.

I also notice that the slowdown seems to happen only with the debug executable, not the release executable. I.e., the release executable runs about as fast in the next-gen debugger as in the VS debugger. Of course, I cannot do much debugging with the release executable.

Sorry. We can’t do further investigation without reproduction.
But we will pay attention to the debug performance during our test.

Thanks !

Ok, thanks.

I tried a few ideas, but I never figured out how to reproduce the issue in a small program.
If/when I do, I will re-post.

Thanks !

I’m going to close this topic. Feel free to start a new topic whenever you have new findings.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.