Hi,
the result of an optimization exercise with different Tesla GPUs (K20-K80, P100) was that it is impossible to get the best performance for each card with the same value for the compiler option maxregcount, at least with my application.
I know that one can create a unified binary containing code targeted at each of the cards and their supported compute capability(e.g. tesla:cc30,cc35,cc60), but I was wondering whether it is also possible to associate a specific value of maxregcount with each compute capability.
The workaround would be to compile different binaries and use a wrapper script that queries the available GPU type and then selects the appropriate binary.
Thanks,
LS