If you manage to get the memory address if the constant you can just use the second. But last time I tried this wasn’t possible when using the Cuda runtime, only when using the API directly External Image
It can be that the memCpyToSymbol is the first cuda method and the card need to be initialized this can cause a large slowdown. I’m using cudaMemcpyToSymbol like this:
Constant memory size is limited to 64 kilobytes so I don’t see how slower memcpy can affect overall performance. And I don’t see why copying to const space should be slower than copying to global space: const is just global with some caching.