CUDA occupancy - beginner question

e.ping · June 30, 2008, 6:33am

Hi All,

I’m a beginner at CUDA programming - so please forgive my possible misunderstandings! thanks

First the algorithm I try to implement will work as follow.

Initialization:

transfer two images (of +/- 10MP) from CPU to GPU

Computation:

from given seeds in the two image, extract two NxN sub-image (say N=32)
perform element-wise multiplication between the two sub images, store the temporary result in shared memory (shared memory usage = NxNxsizeof(float) )
perform a box filter (once in direction x, once in direction y) on the temporary result (-> note that this constraint me of having thread blocks of N threads, or of a multiple of N threads at least to keep all threads busy)
perform a correlation, given a formula, not relevant here.
copy memory from GPU to HOST.

(one would have to consider border effect as well).

My very issue here is that the amount of shared memory is proportional to N^2, which will result in an extremely poor occupancy…

How would you approach such kind of problem? Or are this kind of algorithms not well suited to CUDA?

Any help would be greatly appreciated. Thanks

Cheers,

Greg

MisterAnderson42 · June 30, 2008, 11:49am

Poor occupancy does not always imply poor performance. A high occupancy means that the device is more capable of overlapping global memory reads and computations among different warps. As your algorithm will be performing a significant number of computations all in shared memory, it will most likely not be global memory access bound and the effect of occupancy on performance should be minimal.

Topic		Replies	Views
Help with some CUDA concepts CUDA Programming and Performance	7	1457	August 16, 2009
Processing pictures, low load efficiency..? CUDA Programming and Performance	3	532	April 10, 2018
Low occupancy ratio using texture memory Image correlation using texture memory CUDA Programming and Performance	2	4696	September 20, 2008
CUDA Pro Tip: Occupancy API Simplifies Launch Configuration Technical Blog	12	703	February 21, 2017
Noob needs advice CUDA Programming and Performance	1	575	February 7, 2015
GPU profiling 33% occupancy faster then 50-66% CUDA Programming and Performance	2	3327	March 13, 2007
low concurrency and low kernel utilization, but kernels are filled. CUDA Programming and Performance	6	1422	November 18, 2018
Odd performance problem/question CUDA Programming and Performance	3	839	June 3, 2009
CUDA image processing Accelaration tips anyone? CUDA Programming and Performance	20	6095	November 16, 2010
About reduction About reduction performance VS occupancy CUDA Programming and Performance	3	5801	December 19, 2009

CUDA occupancy - beginner question

Related topics