when I use the “-cl-nv-verbose” option in clBuildProgram sometimes no info about register usage and used memory is shown in the output.
I think that’s because you use some kind of kernel hashin/binary blob cache.
Pls, modify the behavior or add other option like “-cl-nv-no-blob” so I can force the OpenCL JiT compiler to show the register/memory usage info . That’s very important because, as there i no way to precompile an OpenCL kernel and see the PTX, I need to modify my kernel until I’m happy with the register count/occupancy… but if you add the blob cache I won’t be able to see the register count properly…(only one time, when it’s NOT cached )…
Other bug: if I pass “-cl-nv-verboseX” istead of the correct “-cl-nv-verbose” option then the clBuildProgram call just hangs indefinitely instead of ignoring the option/emit an error code.
And btw… what 60+16bytes smem means? 60 shared-memory(local) per thread + 16 bytes per thread block?
And a petition… could you add an option to output the PTX code pls(like -cl-nv-show-ptx)? Or make a tool like the ATI’s Stream Kernel Analyzer. I need to see what the silly compiler does my my code, pls !