Dear GPU/Cuda experts,
When I have kernels that use more than 255 registers / thread (T4 GPU), it fails to compile (i was using clang). This is a good but I am also wondering when will register spilling happen? I thought it would be enabled when I use more than the register limit. How to check whether my kernel is using register spilling? I guess I can check in nsight? which metric?
For the nvidia toolchain, see here for a description of register spilling:
For the nvidia toolchain, you can determine spills at compile-time passing the
-Xptxas=-v switch to the compiler. This is covered in many forum posts.
Thank you so much! Maybe make this information more accessible in the public cuda programming doc :)
If you’d like to see a change to CUDA documentation, you can always file a bug.