register count for an OpenCL kernel

I’m trying to use nvidia’s opencl profiler from command line. It seems that I’m not able to get the register count.

I use

and config contains regperworkitem

when I look at the *.log files however I get this:
“NV_Warning: Ignoring the invalid profiler config option: regperworkitem” and so I get only the values for the default counters (method,gputime,cputime,occupancy)

It seems that with the nvidia opencl profiler 3.0, “regperworkitem” is not accepted as profiler parameter.

Fix: put “regperthread” in the config file, instead of “regperworkitem”,
and the profiler output will show you a field named “regperworkitem”:


OPENCL_DEVICE 0 Tesla T10 Processor

TIMESTAMPFACTOR fffff711e90e7498

timestamp=[ 2125626.000 ] method=[ memcpyHtoAasync ] gputime=[ 373.344 ] cputime=[ 1238.000 ] streamid=[ 1 ]

In my opinion, this is a bug.


Wouldn’t it be great if OpenCL used CUDA terminology… I find it confusing when reading CUDA and OpenCL stuff as I always forget which terms mean the same thing.

Did you log this in the NVDeveloper bug report system?