questions about unified binary

Hi,

the result of an optimization exercise with different Tesla GPUs (K20-K80, P100) was that it is impossible to get the best performance for each card with the same value for the compiler option maxregcount, at least with my application.
I know that one can create a unified binary containing code targeted at each of the cards and their supported compute capability(e.g. tesla:cc30,cc35,cc60), but I was wondering whether it is also possible to associate a specific value of maxregcount with each compute capability.

The workaround would be to compile different binaries and use a wrapper script that queries the available GPU type and then selects the appropriate binary.

Thanks,
LS

Hi LS,

I was wondering whether it is also possible to associate a specific value of maxregcount with each compute capability.

No, sorry. “maxregcount” would be applied to all of the created device binaries.

-Mat