Embedding Multiple Build Targets Different Code for Different Compute Capabilities

Here’s the problem, I have a program and I want to use some compute 1.2 specific optimizations (involving warp voting) for systems that support it, and vanilla 1.0 otherwise. How do you go about compiling multiple code images for each target and embed them in a single executable? Basically, I want a cc1.0 kernal and a cc1.2 kernal, and it would pick which one to run at run time.

You could always perform a runtime check of the compute capability and set some flags that are used through the rest of your code ( see deviceQuery example).