NSight debugger hangs on cudaMalloc

I am trying to debug a kernel that uses some surface objects and cuda arrays. To that end I am using NVIDIA NSight with my VS2017, debugging in Next-Gen mode. However, it takes forever to run over any cudaMalloc invocation that are necessary before my kernel launches. This seems like a very basic use case. Am I doing something wrong?

The following minimal code that actually does nothing but allocate, never finishes (waited several minutes) with NSight debugging.


#include <cuda_runtime.h>

int main() {
  int width = 800;
  int height = 600;

  void* devMemory;
  cudaMalloc(&devMemory, 1);

  return 0;

I am working with Visual Studio 2017 and CUDA 10.1, running on Windows 10 and GTX 780.
Nsight version: 2019.2.0.19109

I’m facing similar issues on RTX 2060 and VS2019. What’s more, “next-gen debugging” is only available after VS is freshly launched. If I let code freeze on cudaMalloc and cancel this debugging session, “next-gen debugging” becomes greyed out until I restart VS. This is really infuriating, as this is a very basic use case.

Note that the

  • Legacy Debugger supports Pascal and earlier (including the GTX780/Kepler)
    • I would expect there to be an error notification and/or message in the output view
  • Next-Gen Debugger supports Pascal and later (including the RTX2060/Turing)
    • I can't repro this on my TITAN RTX
    • Is your win10 is RS4 or later? (for WDDM driver mode debugging)
    • I'd recommend getting the latest driver (at least 425.25)

See the debugger support table (for OS/driver/GPU configuration support)

I’m running Windows 10 1903 and 431.60 drivers. Rest of CUDA stack is also at newest versions, downloaded just yesterday.