Is there a way to instruct the NVCC compiler to compile for different compute capabilities and then at runtime automatically the best implementation is loaded by the GPU driver, depending on the compute capability of the GPU ?
E.g. in the Intel performance primitives (IPP) library a similar mechanism is implemented (but by the library developers itself).
So i have different implementations e.g. for fermi architecture (CC 2.0) and prior architectures (< CC 2.0) in the source code files via
#if (CUDA_ARCH < 200)
// kernel implementation for compute cability < 2.0
#else
// kernel implementation for compute cability >= 2.0
#endif
and would like to get rid of manually checking the CC of the device on which the kernel is executed and then calling the approbiate kernel.