It’s pretty typical to have to fine tune launch configuration based the device type, including things like the number of threads, the size of the shared memory area, thread coarsening factor, etc.
My ideal case is to have a configuration file somewhere that provides these parameters per kernel for each device we’ve tested on (plus a fallback). Then at kernel launch, these are used as launch parameters.
Is there a well established pattern that I could turn to for this use case?
One important proviso: these configurations must be known at compile time since, for loop unrolling etc., some of these parameters appear as template parameters, and so each template variation needs to be compiled ahead of time.