Usually when i have a bunch of parameters that are the same for every thread, i put the in constant memory. For example, the number of points, the dimension of a grid in x,y,z, the scale of the grid, etc…
What i assume that one the first read, the structure containing these informations will be fetched and stored in the constant cache, then every read will be from cached data and “as fast as registers”.
The other way to do this would be to pass the struct as a parameter to the kernel call, where it would reside in shared memory and would benifit from the broadcast mechanism since every thread should access the same element at every read from that structure.
Does anyone see a clear winner between those two approach?