is there any way to limit the OpenCL’s kernel register like we can do in CUDA using the -maxregcount ?
I have a 57 registers kernel and I need to use 32 max or my occupancy will suck!



Take a look at http://developer.download.nvidia.com/compu…ler_options.txt

You need to pass it when calling clBuildProgram


I think that doc is not correct though. It’s not “-cl-nv-maxrregcount 32” but “-cl-nv-maxrregcount=32”.

If I pass “-cl-nv-maxrregcount 32” without the equal then CL just crashes.

Btw… there is something strange with that option. If I pass as options this:

clBuildProgram ( program, 1, dev, "-cl-fast-relaxed-math -cl-nv-maxrregcount=32 -cl-nv-verbose", NULL, NULL );

the CL compiler reports is not reducing the registers to 32, it keeps them to 57, so seems the “-cl-nv-maxrregcount=32” is not taking effect. Curiously, the ocg compiler recognizes the command because it shows this:

and… should be the maxrregcount applied by kernel function better instead of for the whole program?

and -cl-nv-opt-level … what’s the max “N”, pls? 9?