I am trying to write CUDA code in Visual Studio 2019 but the NSight debugger seems to freeze when I try to step over a extern __shared__ float s_data[];
. If I remove the shared memory from the code, it seems to work fine. I can set break point later in the code however when I take a step from there, it freezes again.
My setup:
- Windows 10 Pro
- Evga RTX 3090 FTW3 Ultra
- Visual Studio 2019 Community Edition
- Cuda 11.2
- Compute version 8.6
I am not sure if this is a bug in the debugger if there is special settings I need to debug shared memory.
have you properly dynamically allocated shared memory for the kernel?
I did not. Thank you!
I was doing this:
extern __shared__ float s_data[];
when I should be doing this:
extern __shared__ float s_data[256];
not exactly.
You should study the difference between a dynamic shared memory allocation and a static one.
static:
__shared__ float s_data[CONST];
CONST
must be an expression that can be evaluated to a constant at compile-time
dynamic:
extern __shared__ float s_data[];
which also requires:
my_kernel<<<grid, blocks, shared_size>>>(...);
at kernel launch time. shared_size
can be a runtime computed number, and it specifies the number of bytes of shared memory that will be available via the s_data
pointer/array in the kernel.
If you omit shared_size
(a fairly common oversight) in the kernel launch, then your kernel will still launch properly, but any attempt to dereference s_data
in kernel code will result in a kernel crash. The debugger behavior might be unstable at that point.
Interesting… I will do some research. Thank you!