What are the differences between “sm_120” and “sm_120a” when using the -arch flag in NVCC? The documentation lists the options, but does not explain what they mean. If I’m compiling for an RTX5080 GPU, should I use “sm_120” or “sm_120a”? Specifically, I noticed that ptx assembler assumes different shared memory sizes for these options(49kB for sm120 and 102kB for sm120a).
This may be helpful: Overview — NVIDIA CUTLASS Documentation
CUDA 12.0 introduced the concept of “architecture-accelerated features” whose PTX does not have forward compatibility guarantees. Several Hopper and Blackwell PTX instructions fall under this category of architecture-accelerated features, and thus require a
sm_90aorsm100atarget architecture (note the “a” appended).
It sounds like the ‘a’ and ‘f’ variants give you some extra goodies for your specific hardware, if supported, but compiling for these architectures also breaks some backward compatability.