Constant memory or function parameters?

Usually when i have a bunch of parameters that are the same for every thread, i put the in constant memory. For example, the number of points, the dimension of a grid in x,y,z, the scale of the grid, etc…

What i assume that one the first read, the structure containing these informations will be fetched and stored in the constant cache, then every read will be from cached data and “as fast as registers”.

The other way to do this would be to pass the struct as a parameter to the kernel call, where it would reside in shared memory and would benifit from the broadcast mechanism since every thread should access the same element at every read from that structure.

Does anyone see a clear winner between those two approach?

When I last did benchmarks of this (way back in CUDA 0.8), using constant memory for values (like pointers) that stay the same for many kernel calls in a row won out over parameters by a few percent. However, the programming challenge involved with requesting updates to these values (say, many class instances could be calling the kernel with different pointers) made me go with just using parameters.

I don’t know whether one would win vs the other if you updated the values every kernel call.