Quick swapping of constant memory possible? Is it possible to declare an area of global memory as co


My application needs lots of constant memory, more than the available 64KB. Therefore I’d like to change the contents of the constant memory between kernel calls. This should happen as fast as possible and therefore want to avoid having to copy data back and forth between host and device. Is it possible to copy all the data - say 256KB - to global memory during startup so that I later declare one certain region of global memory as being my constant memory? Or is the only way to change constant memory between kernel calls by copying from host to device? (I’d rather not want to use texture memory for several reasons)

You can use the cudaMemcpyToSymbol function with a cudaMemcpyDeviceToDevice argument. This should be fast, although I haven’t timed it. (Let me know if you do.)

Thanks for the hint. I did a little benchmark with both arguments. cudaMemcpyDeviceToDevice is indeed a lot faster, as to be expected. For the test I copied a random array of 64 KB for 10000 iterations. Here are my results on a GTX 280 on a Intel QuadCore Q9450 2.66GHz running Ubuntu 8.04:


Total time : 544.484009 msec

Average: 0.054448 msec for one copy

1147.875768 MB/sec

HOST TO DEVICE COPY (cudaMemcpyToSymbol)

Total time : 551.583008 msec

Average: 0.055158 msec for one copy

1133.102346 MB/sec

DEVICE TO DEVICE COPY (cudaMemcpyToSymbol)

Total time : 60.993999 msec

Average: 0.006099 msec for one copy

10246.909619 MB/sec

Conclusion: cudaMemcpyDeviceToDevice is more than 9 times faster.

As a reference, the Host to Device test from the SDK bandwidthTest takes reaches 1864.5 MB/s on my setup.