indeterministic output of compiler with option -cl-nv-verbose

karbous · October 12, 2010, 1:56pm

What is the cause of indeterministic output of compiler with option -cl-nv-verbose? I’m compiling several kernels serially and compiler log is for some of them very short like:

: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_7’

for another ones it is with more details:
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Retrieving binary for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, usage mode=’ --verbose ’
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Control flags for ‘cuModuleLoadDataEx_8’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_8’, architecture=‘compute_13’
: Ptx compilation for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, ocg options=’ --verbose ’
ptxas info : Compiling entry function ‘Intersection’ for ‘sm_13’
ptxas info : Used 48 registers, 48+16 bytes smem, 232 bytes cmem[0], 32 bytes cmem[1]

every time I run my program it changes for which kernel I have detailed output. What’s wrong? Even specifying or omitting #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in kernel code doesn’t make any difference. I do have a mutex in my host code, to be sure that next kernel clBuildProgram isn’t called earylier than the older callback function with clGetProgramBuildInfo.

I’m testing it under Ubuntu 10.4, on GTX 275, driver 256.40)

karbous · October 12, 2010, 1:56pm

What is the cause of indeterministic output of compiler with option -cl-nv-verbose? I’m compiling several kernels serially and compiler log is for some of them very short like:

: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_7’

for another ones it is with more details:
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Retrieving binary for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, usage mode=’ --verbose ’
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Control flags for ‘cuModuleLoadDataEx_8’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_8’, architecture=‘compute_13’
: Ptx compilation for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, ocg options=’ --verbose ’
ptxas info : Compiling entry function ‘Intersection’ for ‘sm_13’
ptxas info : Used 48 registers, 48+16 bytes smem, 232 bytes cmem[0], 32 bytes cmem[1]

every time I run my program it changes for which kernel I have detailed output. What’s wrong? Even specifying or omitting #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in kernel code doesn’t make any difference. I do have a mutex in my host code, to be sure that next kernel clBuildProgram isn’t called earylier than the older callback function with clGetProgramBuildInfo.

I’m testing it under Ubuntu 10.4, on GTX 275, driver 256.40)

karbous · October 12, 2010, 9:19pm

I’ve got a workaround (but it is still annoying). Specify also -cl-nv-maxrregcount and you get complete information for each kernel. However, it functions only for first run, than I have to change the -cl-nv-maxrregcount to a different value and again it functions as expected just for first time. A bug? Or am I missing something?

karbous · October 12, 2010, 9:19pm

I’ve got a workaround (but it is still annoying). Specify also -cl-nv-maxrregcount and you get complete information for each kernel. However, it functions only for first run, than I have to change the -cl-nv-maxrregcount to a different value and again it functions as expected just for first time. A bug? Or am I missing something?

enjalot · October 20, 2010, 7:45pm

Thank you!

That is annoying but it gets me unstuck for now! I added some code to set my max registers to a random number between 100 and 200 which is way more than I’ll need, but allows me to quickly see a register count everytime I run.

I’ve also run into a similar “caching” problem on Apple’s platform (but related to using include statements). I really wish OpenCL compilers wouldn’t hide the caching from the coder.

enjalot · October 20, 2010, 7:45pm

Thank you!

That is annoying but it gets me unstuck for now! I added some code to set my max registers to a random number between 100 and 200 which is way more than I’ll need, but allows me to quickly see a register count everytime I run.

I’ve also run into a similar “caching” problem on Apple’s platform (but related to using include statements). I really wish OpenCL compilers wouldn’t hide the caching from the coder.

karbous · October 21, 2010, 8:31pm

Glad you found it useful. There seems to be another workaround - to delete temporary cached files

karbous · October 21, 2010, 8:31pm

Glad you found it useful. There seems to be another workaround - to delete temporary cached files

Topic		Replies	Views
bug: -cl-nv-verbose CUDA Programming and Performance	5	4893	October 21, 2010
-cl-nv-verbose blank output build log is empty CUDA Programming and Performance	1	1410	October 18, 2017
Significant speedup of OpenCL vs CUDA CUDA Programming and Performance	23	7889	February 12, 2022
why adding 1 line =exploding time to compile CUDA Programming and Performance	13	8448	June 8, 2009
OpenCL compile error when using constant memory Use of global is fine, but change to constant the co CUDA Programming and Performance	2	2044	April 19, 2010
How to specify maxrregcount to clBuildProgram? CUDA Programming and Performance	8	6208	January 21, 2010
Very long kernels resulting in unoptimized compilation CUDA Programming and Performance	2	453	March 10, 2023
When -maxrregcount option is used, kernel fail to run CUDA Programming and Performance	8	14538	February 10, 2011
OpenCL in Fedora vs Windows Same code. Runs in Windows. Kernel error in Fedora CUDA Programming and Performance	7	2621	December 11, 2009
NVIDIA OpenCL SDK deployment so 90ies CUDA Setup and Installation	1	716	November 5, 2016

indeterministic output of compiler with option -cl-nv-verbose

Related topics