Can we relate occupancy to performance in case of memory bound application…

i mean is there a linear relation or it is not defined…

For a completely memory bound kernel (i.e. it does no calculations), there is no connection between occupancy and performance.

Occupancy can only help when you have a mix of calculations to memory reads (i.e. tens to hundreds of FLOPs for each float read). In this situation performance is proportional to occupany in the 0th order estimate: There are lots of other variables that also change performance as you change block size so the only way to truly know the optimal block size is to perform benchmarks at 32,64,96,…,max allowed by resource usage.

Hi mister anderson… What do you mean by zero order estimate…

It’s a phrase we use often in physics. It means the crudest possible approximation, and comes from the idea of fitting polynomials to a function: Higher order estimates are better just like higher order polynomials can fit data better.

As an example, Earth is a sphere in a first order estimate. To zeroth order, Earth is flat :)

Got it… :biggrin: thanks…nice example External Media