I can specify to the cuda nvcc compiler the compute capability, and the default is 2.0: -gencode=arch=compute_20,code="sm_20,compute_20".
I have two computers. One can do compute_20, the other can do compute_30. I am using visual studio. Is there away to specify to nvcc to use the maximum local card capability? Otherwise, I would need to have a separate project (.vcxproj) on each computer (specifying the max compute capability manually), which isn’t ideal.
So the code would compile all options. Is the driver smart enough to use the highest one?
Is there a way to verify that (which code is picked in runtime)?
The driver is smart enough to pick the best one for your device (not necessarily the “highest one”). If you want to learn more about it, you could read various section of the nvcc manual, to learn about the fatbinary system.
If you wanted to verify at runtime, it would be fairly tedious, but you could create separate paths for the code based on the CUDA_ARCH macro, which is also discussed in the nvcc manual.