k<<<4, 512, 0, stream>>> (…
First launch succeeds.
Second launch fails with “too many resources requested for launch”.
Same block-count and thread-count; same stream.
This doesn’t occur in an optimized build. Only in a debug build (-g -G -O0). Reducing the thread-count to 256 didn’t fix it.
I thought resource usage only depended on the shared memory size (0 here) and the number of registers required (thread count * kernel requirements per thread) … which should be the same on both launches.
What am I missing?