Register spilling

Dear GPU/Cuda experts,
When I have kernels that use more than 255 registers / thread (T4 GPU), it fails to compile (i was using clang). This is a good but I am also wondering when will register spilling happen? I thought it would be enabled when I use more than the register limit. How to check whether my kernel is using register spilling? I guess I can check in nsight? which metric?

Pei Sun

For the nvidia toolchain, see here for a description of register spilling:

For the nvidia toolchain, you can determine spills at compile-time passing the -Xptxas=-v switch to the compiler. This is covered in many forum posts.

Thank you so much! Maybe make this information more accessible in the public cuda programming doc :)

If you’d like to see a change to CUDA documentation, you can always file a bug.

Thank you for the link!