indeterministic output of compiler with option -cl-nv-verbose

What is the cause of indeterministic output of compiler with option -cl-nv-verbose? I’m compiling several kernels serially and compiler log is for some of them very short like:

: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_7’

for another ones it is with more details:
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Retrieving binary for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, usage mode=’ --verbose ’
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Control flags for ‘cuModuleLoadDataEx_8’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_8’, architecture=‘compute_13’
: Ptx compilation for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, ocg options=’ --verbose ’
ptxas info : Compiling entry function ‘Intersection’ for ‘sm_13’
ptxas info : Used 48 registers, 48+16 bytes smem, 232 bytes cmem[0], 32 bytes cmem[1]

every time I run my program it changes for which kernel I have detailed output. What’s wrong? Even specifying or omitting #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in kernel code doesn’t make any difference. I do have a mutex in my host code, to be sure that next kernel clBuildProgram isn’t called earylier than the older callback function with clGetProgramBuildInfo.

I’m testing it under Ubuntu 10.4, on GTX 275, driver 256.40)

What is the cause of indeterministic output of compiler with option -cl-nv-verbose? I’m compiling several kernels serially and compiler log is for some of them very short like:

: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_7’

for another ones it is with more details:
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Retrieving binary for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, usage mode=’ --verbose ’
: Considering profile ‘compute_13’ for gpu=‘sm_13’ in ‘cuModuleLoadDataEx_8’
: Control flags for ‘cuModuleLoadDataEx_8’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_8’, architecture=‘compute_13’
: Ptx compilation for ‘cuModuleLoadDataEx_8’, for gpu=‘sm_13’, ocg options=’ --verbose ’
ptxas info : Compiling entry function ‘Intersection’ for ‘sm_13’
ptxas info : Used 48 registers, 48+16 bytes smem, 232 bytes cmem[0], 32 bytes cmem[1]

every time I run my program it changes for which kernel I have detailed output. What’s wrong? Even specifying or omitting #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in kernel code doesn’t make any difference. I do have a mutex in my host code, to be sure that next kernel clBuildProgram isn’t called earylier than the older callback function with clGetProgramBuildInfo.

I’m testing it under Ubuntu 10.4, on GTX 275, driver 256.40)

I’ve got a workaround (but it is still annoying). Specify also -cl-nv-maxrregcount and you get complete information for each kernel. However, it functions only for first run, than I have to change the -cl-nv-maxrregcount to a different value and again it functions as expected just for first time. A bug? Or am I missing something?

I’ve got a workaround (but it is still annoying). Specify also -cl-nv-maxrregcount and you get complete information for each kernel. However, it functions only for first run, than I have to change the -cl-nv-maxrregcount to a different value and again it functions as expected just for first time. A bug? Or am I missing something?

Thank you!

That is annoying but it gets me unstuck for now! I added some code to set my max registers to a random number between 100 and 200 which is way more than I’ll need, but allows me to quickly see a register count everytime I run.

I’ve also run into a similar “caching” problem on Apple’s platform (but related to using include statements). I really wish OpenCL compilers wouldn’t hide the caching from the coder.

Thank you!

That is annoying but it gets me unstuck for now! I added some code to set my max registers to a random number between 100 and 200 which is way more than I’ll need, but allows me to quickly see a register count everytime I run.

I’ve also run into a similar “caching” problem on Apple’s platform (but related to using include statements). I really wish OpenCL compilers wouldn’t hide the caching from the coder.

Glad you found it useful. There seems to be another workaround - to delete temporary cached files

Glad you found it useful. There seems to be another workaround - to delete temporary cached files