bug: -cl-nv-verbose

Hi,

when I use the “-cl-nv-verbose” option in clBuildProgram sometimes no info about register usage and used memory is shown in the output.
I think that’s because you use some kind of kernel hashin/binary blob cache.
Pls, modify the behavior or add other option like “-cl-nv-no-blob” so I can force the OpenCL JiT compiler to show the register/memory usage info . That’s very important because, as there i no way to precompile an OpenCL kernel and see the PTX, I need to modify my kernel until I’m happy with the register count/occupancy… but if you add the blob cache I won’t be able to see the register count properly…(only one time, when it’s NOT cached )…

Other bug: if I pass “-cl-nv-verboseX” istead of the correct “-cl-nv-verbose” option then the clBuildProgram call just hangs indefinitely instead of ignoring the option/emit an error code.

And btw… what 60+16bytes smem means? 60 shared-memory(local) per thread + 16 bytes per thread block?

And a petition… could you add an option to output the PTX code pls(like -cl-nv-show-ptx)? Or make a tool like the ATI’s Stream Kernel Analyzer. I need to see what the silly compiler does my my code, pls !

thx

Have you seen what the 1.1 new clGetKernelWorkGroupInfo() cl_kernel_work_group_info value CL_KERNEL_PRIVATE_MEM_SIZE correlates to in -cl-nv-verbose, if anything? The new CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE looks like an ATi inspired option to handle a WaveFront size that varies by GPU not just vendor.

Bump, I second these requests!

Caching of OpenCL binaries should really be optional!

Bump, I second these requests!

Caching of OpenCL binaries should really be optional!

Hi, it seems that the compiler is doing some caching and won’t compile the code again if it isn’t necessary (that’s probably why you don’t see the register usage every time). I found a workaround, I also pass -cl-nv-maxrregcount to compiler and every time I change its value - so compiler has to compile and says register usage. I admit it is an annoying feature.

The compiler output is described in nvcc_2.0.pdf, it says that + 16 represents the amount of system- allocated data in these memory segments device function
parameter block (in shared memory) and thread/grid index information (in local memory)

(Hijackui showed me how to get ptx code) ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.

Hi, it seems that the compiler is doing some caching and won’t compile the code again if it isn’t necessary (that’s probably why you don’t see the register usage every time). I found a workaround, I also pass -cl-nv-maxrregcount to compiler and every time I change its value - so compiler has to compile and says register usage. I admit it is an annoying feature.

The compiler output is described in nvcc_2.0.pdf, it says that + 16 represents the amount of system- allocated data in these memory segments device function
parameter block (in shared memory) and thread/grid index information (in local memory)

(Hijackui showed me how to get ptx code) ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.