Dear GPU/Cuda experts,
When I have kernels that use more than 255 registers / thread (T4 GPU), it fails to compile (i was using clang). This is a good but I am also wondering when will register spilling happen? I thought it would be enabled when I use more than the register limit. How to check whether my kernel is using register spilling? I guess I can check in nsight? which metric?