how to use CUDA_Occupancy_calculator.xls

hi, does anyone know how to use “CUDA_Occupancy_calculator.xls”, i wonder how to calculate “Threads Per Block” of OPENCL ?

hi, does anyone know how to use “CUDA_Occupancy_calculator.xls”, i wonder how to calculate “Threads Per Block” of OPENCL ?

“Threads Per Block” means block size in OpenCL, doesn’t it? I just wonder how to find out register usage. Trying to pass -cl-nv-verbose when calling clBuildProgram but with no luck :mellow:

“Threads Per Block” means block size in OpenCL, doesn’t it? I just wonder how to find out register usage. Trying to pass -cl-nv-verbose when calling clBuildProgram but with no luck :mellow:

Aaah, finally I got it. To obtain the information about number of registers use:
clBuildProgram(cpPrograms[i], 0, NULL, “-cl-nv-verbose”, &pfn_notify, NULL);

and in callback function use clGetProgramBuildInfo with CL_PROGRAM_BUILD_LOG. Also don’t forget to put #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in your OpenCL kernel code.

Aaah, finally I got it. To obtain the information about number of registers use:
clBuildProgram(cpPrograms[i], 0, NULL, “-cl-nv-verbose”, &pfn_notify, NULL);

and in callback function use clGetProgramBuildInfo with CL_PROGRAM_BUILD_LOG. Also don’t forget to put #pragma OPENCL EXTENSION cl_nv_compiler_options : enable in your OpenCL kernel code.

thanks a lot.
for register usage, i save the ptx code, and compile it by ptxas, register & mem use info can be found, yours is a better way.
for cmem, ptxas dumps used mem size of cmem0 & cmem1, do you know what’s the difference(cmem0 cmem1) ?

thanks a lot.
for register usage, i save the ptx code, and compile it by ptxas, register & mem use info can be found, yours is a better way.
for cmem, ptxas dumps used mem size of cmem0 & cmem1, do you know what’s the difference(cmem0 cmem1) ?

you’re welcome:-) More info about cmem[0] and cmem[1] is in nvcc_2.0.pdf at the last page. It means memory banks.

How did you manage to save ptx code?

you’re welcome:-) More info about cmem[0] and cmem[1] is in nvcc_2.0.pdf at the last page. It means memory banks.

How did you manage to save ptx code?

ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.
i tried your method, but can’t get register use info, always get info like ": Considering profile ‘compute_11’ for gpu=‘sm_11’ in ‘cuModuleLoadDataEx_24’ ", can you post your call back function code for reference ?

ptx code can be get by clGetProgramInfo(cpProgram, CL_PROGRAM_BINARY_SIZES, num_devices * sizeof(size_t), binary_sizes, NULL); , you can find demo in SDK project oclUtils.
i tried your method, but can’t get register use info, always get info like ": Considering profile ‘compute_11’ for gpu=‘sm_11’ in ‘cuModuleLoadDataEx_24’ ", can you post your call back function code for reference ?

There’s nothing wrong with your code, I bet. I get similar output unless I specify -cl-nv-maxrregcount and always set it to different maximum. It seems that compiler won’t compile the code again if there wasn’t any change or maximum register count hasn’t change. (I tried to ask why at http://forums.nvidia.com/index.php?showtopic=182744 till now with no luck )

Thanks for showing me how to get ptx code.

There’s nothing wrong with your code, I bet. I get similar output unless I specify -cl-nv-maxrregcount and always set it to different maximum. It seems that compiler won’t compile the code again if there wasn’t any change or maximum register count hasn’t change. (I tried to ask why at http://forums.nvidia.com/index.php?showtopic=182744 till now with no luck )

Thanks for showing me how to get ptx code.

build log sometimes contain register use info, but in most case, register use info is missing, it seems we get the same result

build log sometimes contain register use info, but in most case, register use info is missing, it seems we get the same result

it seems that opencl compiler save some code during compilation, the compiler will compare code with last compiled.
i delete directory C:\Documents and Settings\Administrator\Application Data\NVIDIA\ComputeCache before compile, then compile build log will be fully dumped.

it seems that opencl compiler save some code during compilation, the compiler will compare code with last compiled.
i delete directory C:\Documents and Settings\Administrator\Application Data\NVIDIA\ComputeCache before compile, then compile build log will be fully dumped.