When to use constant memory...

Is there any heuristic on when to use constant memory and when to pass constant values as kernel parameters (which I believe are stored in shared memory)?

For example, my application is not bound by shared memory and has only a few hundred bytes of constant parameters. I first copied my data to constant memory, but later ran into unrelated issues and started passing these as parameters. How expensive is copying to constant memory? Should constant memory be used often, or only when shared memory must be reserved for other purposes?


You might be interested in reading the section “STAGING COEFFICIENTS” in these slides from a workshop that Paulius Micikevicius presented at SC’08 late last year. Basically, if the parameters are constants which are read by all threads, then the constant cache+broadcast mechanism is as fast as any other way of getting data into running kernels, and saves shared memory and registers in the process.