Comparing Shaders & Cuda Calls

I’ve just a simple question wich might be answered easily.

In case of Shaders there are program parameters which can be set prior to the program execution.
Now I am curious where these parameters are stored and if it is equal to the program parameters in CUDA in case of
cudaFunc<<< grid, threads, smem >>>( params … )

The programming manual writes: function parameters currently passed via shared memory to the global function parameters to the device and limited to 256 bytes.

Does this means more passed parameters = less shared mem?
Is there an advantage to put parameters in registers ?

Only slightly so. You can only pass up to 256 bytes in parameters, so that isn’t a huge chunk of the shared mem.

And waste all those precious resources! You wouldn’t want to do that. Well, effectively the compiler will do it where it is needed. I.e. if you use a paramter a dozen times in a row, it may choose to cache it in a register. Accessing shared memory is not slow so you shouldn’t have to worry about it.

If you have a series of constant values that are going to be the same across many kernel launches, you can put them in constant memory (this is probably the closest match to what you used in shader land). As long as every thread in a warp is reading the same value from constant simultaneously (like you would a function parameter) there won’t be a performance difference compared to registers/shared mem. And you then get the miniscule benefit of not having CUDA copy those parameters over before every kernel call.