Is there a design pattern for setting optimum launch configuration based on device?

It’s pretty typical to have to fine tune launch configuration based the device type, including things like the number of threads, the size of the shared memory area, thread coarsening factor, etc.

My ideal case is to have a configuration file somewhere that provides these parameters per kernel for each device we’ve tested on (plus a fallback). Then at kernel launch, these are used as launch parameters.

Is there a well established pattern that I could turn to for this use case?

One important proviso: these configurations must be known at compile time since, for loop unrolling etc., some of these parameters appear as template parameters, and so each template variation needs to be compiled ahead of time.

You would need an if statement for each possible combination of template parameters.

if(blocksize == 32) launch kernel<32>
else if(blocksize == 64) launch kernel<64>

For one of my projects, I used the following approach.

struct LaunchParams{
    int a,b,c;
};

#define FOR_EACH_CONFIG_DO_IT \
  IT(32,4,4) IT(64,8,16)


void launchkernel(LaunchParams params){
   #define IT(a,b,c) \
      if(params.a == a && params.b == b && params.c == c){ \
          kernel<a,b,c>(); \
      } else

   FOR_EACH_CONFIG_DO_IT
   { std::cout << "config not possible\n"; } // the final else case


   #undef IT
}