__constant__ Strangeness

I’ve been experimenting with constant variables.

The results have proven very curious and I’m wondering if anyone else has had similar experiences:

  • I can’t get constant variables declared as pointers and then malloced using cudaMalloc() to work. Doing this slows things down to speeds equivalent to global storage.

  • constant variables (which now have to be declared as arrays – constant float mydata[64];) can’t apparently be cudaMemcpy’ed to without similar results – it works, but things become incredibly slow.

  • As far as I can figure out, the only ways to get data into a constant variable is to assign it at compile time or memcpy to it using cudaMemcpyToSymbol during runtime.

Are these results consistent with what everybody else has been seeing or am I doing something wrong? I don’t have postable code at the moment but I can probably whip something up if anyone wants to see.

Thanks in advance!

Ben Weiss

Oregon State University Graphics

Edit: toned down the post a bit…

I am using cudaMemcpyToSymbol to work with constants, and everything is perfect. Not sure why you don’t like it.

It’s not that I don’t like MemcpyToSymbol, it just seems awkward to use with Memcpy should do the job just fine. It certainly works for me, and I’m not complaining, just wondering.

Well, CudaMemCpyToSymbol is meant for constant memory, CudaMemCopy is meant for global memory :P

(it certainly looks like you are complaining ;))

cudaMemcpyToSymbol is needed because constant variables exist only on the device. You can’t just copy directly to them with cudaMemcpy because the host cannot take the address of a variable that only exists on the device! If you are dead set on using cudaMemcpy, then you can use cudaGetSymbolAddress to get the address and then copy to it with cudaMemcpy. This isn’t a problem with constant memory, you need to do the same with device variables.

As for cudaMallocing an array and stuffing the pointer into constant memory, why did you expect it to be faster? The array is still in global memory so all the limitations on performance still apply. Of course you can access the pointer to that global memory space slightly faster this way, since you don’t need to pass it as an argument to the kernel potentially shaving off a few % of launch overhead (I’ve done it in the past and it works, but isn’t worth the code complexity in my application).

Thank you for the clarification. The syntactic similarities and some ambiguity in the programming manual led me to the incorrect conclusion that constant and device variables should behave similarly. My bad.