Choosing the right "-arch" flag for NVCC?

What are the differences between “sm_120” and “sm_120a” when using the -arch flag in NVCC? The documentation lists the options, but does not explain what they mean. If I’m compiling for an RTX5080 GPU, should I use “sm_120” or “sm_120a”? Specifically, I noticed that ptx assembler assumes different shared memory sizes for these options(49kB for sm120 and 102kB for sm120a).

This may be helpful: Overview — NVIDIA CUTLASS Documentation

CUDA 12.0 introduced the concept of “architecture-accelerated features” whose PTX does not have forward compatibility guarantees. Several Hopper and Blackwell PTX instructions fall under this category of architecture-accelerated features, and thus require a sm_90a or sm100a target architecture (note the “a” appended).

It sounds like the ‘a’ and ‘f’ variants give you some extra goodies for your specific hardware, if supported, but compiling for these architectures also breaks some backward compatability.

This is covered here, with arch/family specific features here.

I believe most of these features favour data centre class GPUs.