NSight Debugger Freezes When Stepping into/over Shared Memory in Visual Studio 2019

I am trying to write CUDA code in Visual Studio 2019 but the NSight debugger seems to freeze when I try to step over a extern __shared__ float s_data[];. If I remove the shared memory from the code, it seems to work fine. I can set break point later in the code however when I take a step from there, it freezes again.

My setup:

  • Windows 10 Pro
  • Evga RTX 3090 FTW3 Ultra
  • Visual Studio 2019 Community Edition
  • Cuda 11.2
  • Compute version 8.6

I am not sure if this is a bug in the debugger if there is special settings I need to debug shared memory.

have you properly dynamically allocated shared memory for the kernel?

I did not. Thank you!

I was doing this:

extern __shared__ float s_data[];

when I should be doing this:

extern __shared__ float s_data[256];

not exactly.

You should study the difference between a dynamic shared memory allocation and a static one.

static:

__shared__ float s_data[CONST];

CONST must be an expression that can be evaluated to a constant at compile-time

dynamic:

extern __shared__ float s_data[];

which also requires:

my_kernel<<<grid, blocks, shared_size>>>(...);

at kernel launch time. shared_size can be a runtime computed number, and it specifies the number of bytes of shared memory that will be available via the s_data pointer/array in the kernel.

If you omit shared_size (a fairly common oversight) in the kernel launch, then your kernel will still launch properly, but any attempt to dereference s_data in kernel code will result in a kernel crash. The debugger behavior might be unstable at that point.

Interesting… I will do some research. Thank you!