bug: -cl-nv-verbose

santyhyammer · July 31, 2010, 9:45pm

Hi,

when I use the “-cl-nv-verbose” option in clBuildProgram sometimes no info about register usage and used memory is shown in the output.
I think that’s because you use some kind of kernel hashin/binary blob cache.
Pls, modify the behavior or add other option like “-cl-nv-no-blob” so I can force the OpenCL JiT compiler to show the register/memory usage info . That’s very important because, as there i no way to precompile an OpenCL kernel and see the PTX, I need to modify my kernel until I’m happy with the register count/occupancy… but if you add the blob cache I won’t be able to see the register count properly…(only one time, when it’s NOT cached )…

Other bug: if I pass “-cl-nv-verboseX” istead of the correct “-cl-nv-verbose” option then the clBuildProgram call just hangs indefinitely instead of ignoring the option/emit an error code.

And btw… what 60+16bytes smem means? 60 shared-memory(local) per thread + 16 bytes per thread block?

And a petition… could you add an option to output the PTX code pls(like -cl-nv-show-ptx)? Or make a tool like the ATI’s Stream Kernel Analyzer. I need to see what the silly compiler does my my code, pls !

thx

jcpalmer · August 1, 2010, 3:49pm

Have you seen what the 1.1 new clGetKernelWorkGroupInfo() cl_kernel_work_group_info value CL_KERNEL_PRIVATE_MEM_SIZE correlates to in -cl-nv-verbose, if anything? The new CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE looks like an ATi inspired option to handle a WaveFront size that varies by GPU not just vendor.

enjalot · October 20, 2010, 6:23pm

Hi,

when I use the “-cl-nv-verbose” option in clBuildProgram sometimes no info about register usage and used memory is shown in the output.

I think that’s because you use some kind of kernel hashin/binary blob cache.

Pls, modify the behavior or add other option like “-cl-nv-no-blob” so I can force the OpenCL JiT compiler to show the register/memory usage info . That’s very important because, as there i no way to precompile an OpenCL kernel and see the PTX, I need to modify my kernel until I’m happy with the register count/occupancy… but if you add the blob cache I won’t be able to see the register count properly…(only one time, when it’s NOT cached )…

Other bug: if I pass “-cl-nv-verboseX” istead of the correct “-cl-nv-verbose” option then the clBuildProgram call just hangs indefinitely instead of ignoring the option/emit an error code.

And btw… what 60+16bytes smem means? 60 shared-memory(local) per thread + 16 bytes per thread block?

And a petition… could you add an option to output the PTX code pls(like -cl-nv-show-ptx)? Or make a tool like the ATI’s Stream Kernel Analyzer. I need to see what the silly compiler does my my code, pls !

thx

Bump, I second these requests!

Caching of OpenCL binaries should really be optional!

enjalot · October 20, 2010, 6:23pm

Hi,

when I use the “-cl-nv-verbose” option in clBuildProgram sometimes no info about register usage and used memory is shown in the output.

I think that’s because you use some kind of kernel hashin/binary blob cache.

Pls, modify the behavior or add other option like “-cl-nv-no-blob” so I can force the OpenCL JiT compiler to show the register/memory usage info . That’s very important because, as there i no way to precompile an OpenCL kernel and see the PTX, I need to modify my kernel until I’m happy with the register count/occupancy… but if you add the blob cache I won’t be able to see the register count properly…(only one time, when it’s NOT cached )…

Other bug: if I pass “-cl-nv-verboseX” istead of the correct “-cl-nv-verbose” option then the clBuildProgram call just hangs indefinitely instead of ignoring the option/emit an error code.

And btw… what 60+16bytes smem means? 60 shared-memory(local) per thread + 16 bytes per thread block?

And a petition… could you add an option to output the PTX code pls(like -cl-nv-show-ptx)? Or make a tool like the ATI’s Stream Kernel Analyzer. I need to see what the silly compiler does my my code, pls !

thx

Bump, I second these requests!

Caching of OpenCL binaries should really be optional!

karbous · October 21, 2010, 7:49pm

Hi, it seems that the compiler is doing some caching and won’t compile the code again if it isn’t necessary (that’s probably why you don’t see the register usage every time). I found a workaround, I also pass -cl-nv-maxrregcount to compiler and every time I change its value - so compiler has to compile and says register usage. I admit it is an annoying feature.

The compiler output is described in nvcc_2.0.pdf, it says that + 16 represents the amount of system- allocated data in these memory segments device function
parameter block (in shared memory) and thread/grid index information (in local memory)

(Hijackui showed me how to get ptx code) ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.

karbous · October 21, 2010, 7:49pm

Hi, it seems that the compiler is doing some caching and won’t compile the code again if it isn’t necessary (that’s probably why you don’t see the register usage every time). I found a workaround, I also pass -cl-nv-maxrregcount to compiler and every time I change its value - so compiler has to compile and says register usage. I admit it is an annoying feature.

The compiler output is described in nvcc_2.0.pdf, it says that + 16 represents the amount of system- allocated data in these memory segments device function
parameter block (in shared memory) and thread/grid index information (in local memory)

(Hijackui showed me how to get ptx code) ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.

Topic		Replies	Views
indeterministic output of compiler with option -cl-nv-verbose CUDA Programming and Performance	7	2659	October 21, 2010
--ptxas-options=-v in OpenCL Getting more information about the compilation CUDA Programming and Performance	2	12275	April 27, 2010
Registers per thread CUDA Programming and Performance	4	2355	July 8, 2011
-cl-nv-verbose blank output build log is empty CUDA Programming and Performance	1	1454	October 18, 2017
Impossible to see registers cl-nv-verbose not working CUDA Programming and Performance	0	4649	November 23, 2011
how to use CUDA_Occupancy_calculator.xls CUDA Programming and Performance	17	10650	October 15, 2010
Build log from OpenCL compiler is cut off CUDA Programming and Performance	1	889	December 2, 2013
How to force recompilation? CUDA Programming and Performance	1	684	May 16, 2013
-keep with OpenCL CUDA Programming and Performance	0	4632	August 31, 2010
compiler message re memory usage CUDA Programming and Performance	1	704	April 14, 2011

bug: -cl-nv-verbose

Related topics