Compute Profiler - Confused by Work Group Information

Basilios · March 10, 2011, 1:20pm

I have just started using the Compute Profiler for OpenCL, and I’m a little bit baffled by the numbers I’m seeing in the Kernel Table.

nd range size [9 9]
work group size [8 8 1]
local work group size 1

There are a few things I don’t understand:

My clEnqueueNDRangeKernel() call supplies a 3D range: {9, 9, 64}. I realise that CUDA doesn’t support 3D ranges, but the NVIDIA OpenCL implementation seems to allow it. Am I mistaken?
What is the “local” work group size, and why is it 1 for all of my kernels?

Basilios · March 14, 2011, 8:15pm

I find it hard to believe that nobody else has encountered these problems.

I’m most concerned by what “local work group size” means.

philipjfry · March 16, 2011, 8:56pm

/usr/local/cuda/doc/Compute_Profiler.txt says:

The keywords of the low-level profiler and the column titles in the visual profiler are not fully identical, but I believe they coincide with the CSV export titles.

PS: If you are not sure what this means look into the OpenCL specs for the parameters to enqueueNDRangeKernel…

PPS: What starts with RT and ends with FM?

Basilios · March 17, 2011, 10:39am

The Compute Visual Profiler’s help file only has this to say on the matter:

“The kernel option ‘localworkgroupsize’ is valid only for OpenCL. If this option is selected for a CUDA program a column ‘localblocksize’ is added to the profiler table, but this column is hidden by default.”

In my past experiences with the profiler, the explanation of profiler counters provided by this help file have been adequate – and I wasn’t aware that there was extra/different documentation available in the /doc directory.

I don’t appreciate your tone, but thank you for bringing the /doc directory to my attention.

Topic		Replies	Views
Opencl Global work size CUDA Programming and Performance	2	5435	December 23, 2010
CL_INVALID_WORK_GROUP_SIZE with clEnqueueNDRangeKernel CUDA Programming and Performance	12	12196	April 3, 2012
Changing the amount of threads per block does nothing, please help CUDA Programming and Performance	0	1869	February 19, 2010
I am having trouble launching multiple threads per block CUDA Programming and Performance	2	4667	February 19, 2010
null workgroup size bug CUDA Programming and Performance	1	1166	January 26, 2010
Local_work_size on NVidia drivers CUDA Programming and Performance	0	577	May 20, 2011
Question about OpenCL Profiling CUDA Programming and Performance	5	11160	August 24, 2011
Commandline Profiling Trying to use the profiler via the command line CUDA Programming and Performance	3	10510	August 3, 2011
work group and work group size CUDA Programming and Performance	0	3425	December 7, 2011
New comute profiler explanation CUDA Programming and Performance	0	3551	December 7, 2010

Compute Profiler - Confused by Work Group Information

Related topics