CUDA Pro Tip: Occupancy API Simplifies Launch Configuration

jwitsoe · July 18, 2014, 4:44am

Originally published at: https://developer.nvidia.com/blog/cuda-pro-tip-occupancy-api-simplifies-launch-configuration/

CUDA programmers often need to decide on a block size to use for a kernel launch. For key kernels, its important to understand the constraints of the kernel and the GPU it is running on to choose a block size that will result in good performance. One common heuristic used to choose a good block…

anon66475735 · July 30, 2014, 6:35pm

Nice. That looks quite useful!

anon751385 · July 31, 2014, 5:16am

Cooooooool!

anon81125543 · September 16, 2014, 12:19pm

How does it look when we try 2d or even 3d block?

anon95180265 · September 17, 2014, 3:21am

For now you will need to compute your own 2D/3D block dimensions from the 1D thread counts suggested by the API.

anon84996908 · October 17, 2014, 2:03pm

Hello Mark,

This API looks great. I compiled the example you provided above using CUDA 6.5 install. Also wanted to comment that I got a warning concerning the method signature for the kernel parameter.

$ nvcc example_occupancy.cu
/usr/local/cuda-6.5/bin/../targets/x86_64-linux/include/cuda_runtime.h(1394): warning: argument of type "void (*)(int *, int)" is incompatible with parameter of type "const void *"
detected during:
instantiation of "cudaError_t <unnamed>::cudaOccupancyMaxPotentialBlockSizeVariableSMem(int *, int *, T, UnaryFunction, int) [with UnaryFunction=<unnamed>::__cudaOccupancyB2DHelper, T=void (*)(int *, int)]"
(1278): here
instantiation of "cudaError_t <unnamed>::cudaOccupancyMaxPotentialBlockSize(int *, int *, T, size_t, int) [with T=void (*)(int *, int)]"
example_occupancy.cu(19):

Nevertheless the code is running fine. I just wanted to tell in case someone else experienced this. I should also tell my compiler is gcc
$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

Cheers,

anon32329497 · February 20, 2015, 4:58pm

Launched blocks of size 768. Theoretical occupancy: 0.000000

GPU - Tesla C2075

Why I have 0 occupancy when I use cudaSetDevice and GPU provided above ?

anon95180265 · February 23, 2015, 5:37am

What are you using to measure Theoretical occupancy? What are the resources used by your kernel (registers per thread, shared memory per block)?

anon67306954 · April 23, 2015, 10:47am

Hi, very helpful, thanks! However I have a kernel where the amount of shared memory depends on the block dimensions, what can I do in this case?

anon95180265 · April 24, 2015, 12:51am

There's a C++ version of the API which takes a unary function callback as an argument. You define this function to take a block size and return a dynamic shared memory size in bytes, and the API uses this in its calculations. See http://docs.nvidia.com/cuda...

anon83005435 · January 20, 2017, 3:32am

Is it possible for these values to change at runtime?

anon48000202 · February 21, 2017, 8:16am

float occupancy = (maxActiveBlocks * blockSize / props.warpSize) /
(float)(props.maxThreadsPerMultiProcessor /
props.warpSize);
why do we divide twice by props.warpSize ??? it's a redundant operation that can be mathematically simplified
occupancy = maxActiveBlocks * blockSize / props.maxThreadsPerMultiProcessor;

anon95180265 · February 21, 2017, 9:56pm

Your calculation is semantically different because it ignores integer division. Remember that blockSize might not be a multiple of warpSize (although that's generally not a good idea, it's legal).

Topic		Replies	Views
[ON HOLD] Issue with cuda_occupancy and cudaDeviceSetCacheConfig(...) CUDA Programming and Performance	7	2302	June 26, 2018
CUDA Occupancy Calculator Helps pick optimal thread block size CUDA Programming and Performance	76	312092	September 13, 2011
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	5898	July 25, 2007
Need help with using a card to it's max CUDA Programming and Performance	14	1349	October 2, 2014
I've a question about CUDA Occuapncy Calculator by NVIDIA CUDA Programming and Performance	13	2563	March 5, 2013
Confusion about setting kernel block and grid size for maximum occupancy CUDA Programming and Performance cuda	11	684	March 30, 2024
cudaOccupancyMaxPotentialBlockSize() deleted in cuda 6.5? CUDA Programming and Performance	7	5958	March 6, 2015
CUDA image processing Accelaration tips anyone? CUDA Programming and Performance	20	6057	November 16, 2010
Discrepancy between theoretical occupancy and achieved occupancy depending on ThreadsPerBlock CUDA Programming and Performance cuda	7	98	September 6, 2024
Launch Parameters for Large Problems CUDA Programming and Performance cuda , kernel	13	1957	October 12, 2021

CUDA Pro Tip: Occupancy API Simplifies Launch Configuration

Related topics