I’ve come to realize that my stuff runs faster if I nix independent thread rescheduling, and that I can STILL do this with tricks like
-gencode arch=compute_60,code=sm_86. I’ve been told this even works on Lovelace and Hopper. I know that the grace period for warp synchronous code will end someday, but if I’m trying to use the
__CUDA_ARCH__ macro to direct launch parameters or kernel code array sizes, GP100 (compute_60) is not like RTX A6000 (compute_86), and I think that
__CUDA_ARCH__ is indeed conveying
arch=compute_XX (times ten, I know). Is there some macro that corresponds to the
code=sm_XX part which I might use to guide the same decisions even if I’m suppressing independent thread rescheduling?