CUDA Toolkit 12.8 what GPU is 'sm_120'?

Yes, I faced that problem too.

But I’m not sure it is issue just with sm_100,… did you try to compile your code for Hopper sm_90?

The compiler for sm_90 is forced to allocate registers in larger “chunks” (for example, in pairs or in fixed blocks) compared to sm_89. Even if your kernel only uses a given number of logical registers, the physical allocation on sm_90 might be higher because of these coarser allocation units. This means that code which fits comfortably into the register file when compiled for sm_89 may exceed the available physical registers on sm_90, leading to heavy spilling.