I am trying to debug a kernel that uses some surface objects and cuda arrays. To that end I am using NVIDIA NSight with my VS2017, debugging in Next-Gen mode. However, it takes forever to run over any cudaMalloc invocation that are necessary before my kernel launches. This seems like a very basic use case. Am I doing something wrong?
The following minimal code that actually does nothing but allocate, never finishes (waited several minutes) with NSight debugging.
main.cpp:
#include <cuda_runtime.h>
int main() {
int width = 800;
int height = 600;
void* devMemory;
cudaMalloc(&devMemory, 1);
return 0;
}
I am working with Visual Studio 2017 and CUDA 10.1, running on Windows 10 and GTX 780.
Nsight version: 2019.2.0.19109
I’m facing similar issues on RTX 2060 and VS2019. What’s more, “next-gen debugging” is only available after VS is freshly launched. If I let code freeze on cudaMalloc and cancel this debugging session, “next-gen debugging” becomes greyed out until I restart VS. This is really infuriating, as this is a very basic use case.