I am trying to compile some code with the flag -maxrregcount=32 but I got the following message from the compiler:
“Overriding global maxrregcount 32 with entry-specific value 63 computed using thread count”.
What is that supposed to mean?
I am using Cuda 5.0.35 on CentOS 6.5 with Nvidia Tesla K20.
Do you have a launch bounds directive in your code?
“Register usage can also be controlled for all global functions in a file using the maxrregcount compiler option. The value of maxrregcount is ignored for functions with launch bounds.”
CUDA 5.0.35 is pretty old, by the way.
I’m working on a cluster, and I found a way to load Cuda 6.5, but I’m still getting the same message.
I’m not using that kind of directives in my code, but I’m using the Thrust library, and I noticed in the output of the compiler (with the flag --ptxas-options=-v) that the kernels with more than 32 registers are in the thrust namespace.
So, maybe, the Thrust library uses launch bounds directives and prevents the compiler from using at most 32 registers.
Thanks for your answer.