Conditional Compilation (__CUDA_ARCH__)


I would like to compile conditionally based on SM value (sm_20, sm21, sm_30, sm35, etc), under Visual Studio.
For testing, I set the GPU Architecture(1) under Properties of the cu file and of the Project to sm_21. Left all other GPU Architecture(?) fields at 0.
The compile statement has: … -gencode=arch=compute_20,code=“sm_21,compute_20” …
and the CUDA_ARCH variable is set at 200, not 210 that I was expecting.
Is there a way to check the sm_ value in the code with a #ifdef?

The compiler invocation specifies arch=compute_20, so CUDA_ARCH will be defined as 200. Best I can tell, there is no compute_21 or sm_21 as a compiler-defined architecture, and therefore the predefined symbol CUDA_ARCH cannot take the value 210.

Instead, the compiler uses arch=compute_20 for all platforms with compute capability 2.x. The reason is presumably that purely from an instruction set perspective there is no difference between compute capability 2.0 and compute capability 2.1.

One can, however, tweak the code generation for a GPU with compute capability 2.1, which already happens in the example above: code=“sm_21,compute_20”.

Thanks for the responses. That tweak comes from the standard installation. The pulldown menus in the property pages gives me the 2.1 option. Yet there is no straightforward way to detect that with a simple #if WHATEVER == sm21 or something.

As far as I know, that is correct: While the “arch” setting is mapped to CUDA_ARCH, there is no equivalent mapping for the “code” setting.

What is the specific use case that would make that desirable, i.e. requires to distinguish between sm_20 and sm_21 at the source code level?

Nothing concrete at the moment. But since the pulldown menus make that distinction, I should be able to catch that in the code. I can enclose the new kepler with CUDA_ARCH >= 300.
I guess my question is why the pulldown menus lets you specify sm<major,minor> if there are no differences?

While compute capability 2.0 and 2.1 share the same instruction set, there are differences in hardware organization between GPUs with those two compute capabilities. The “arch” flag instructs the compiler to generate instructions from the sm_2x instruction set, the “code” flag can then be used to tweak the code generations (e.g. instruction selection, instruction scheduling) differently for sm_20 and sm_21. The same concept exists with other compilers. For example, gcc has -march and -mtune flags, where -march selects an ISA to target and -mtune tweaks the code for specific CPUs using that ISA.

The pulldown menu presumably gives you theses choices for programmer convenience, so if you know you have a GPU with compute caopability 2.1, you can simply select that.

I have never checked how much difference there is between code tweaked for sm_20 and code tweaked for sm_21. As far as I understand the differences between these two HW architectures, I would expect the generated code for the two targets to look quite similar, and performance-wise it is likely a second order effect.