const vs shared speed

What do you think is faster…??

  1. To read values directly from const memory

  2. To copy const values to shader memory using

if (threadIdx.x==0)


   //copy from const to shared


read shared data


I guess it depends on 2 things: access pattern and data size.

By using shared memory, you’re consuming a scarce resource that will limit the number of blocks per MP. Also, you need to be careful about bank conflicts.

Constant memory, in the end, relies on device memory and is only fast if you have good spatial locality to effectively use it’s cache. You can fit more data there, if it is used by all blocks of your kernel. Else, the shared memory should be faster.

Since you’re spending a read from const to shared, you should access your shared memory at least more than once, to justify the setup phase. However, it should pay off if you make one single coherent copy from const to shared in the beginning, and then several incoherent accesses to shared memory.

I guess it all comes down to experimenting…

On a side note, to get fast transfer from constant (and global) to shared, you should make each thread copy a block of data to effectively make the transfer in parallel.

const memory is very fast when the constants fit in the cache. The size of the cache is 8k per multiprocessor, which is half the size of the sram (so not that small). However, nvidia hasn’t told us anything more about it such as whether it’s prone to bank conflicts. We can be sure, tho, that it doesn’t rely on spacial locality (the way the texture cache does) because constants don’t have any spacial locality even in D3D. That means that probably as long as you’re not using more than 8k worth of constants (check in the cubin), you’re good.

anyway, the point is that it’s pretty much as good as shared memory as far as we can tell. (it might even be faster… there’s no instructions spent loading it, and it might not suffer from bank conclicts.) I’d say, if you can use the space, then feel free to use constants.

Even if you use more than 8k (you can have a max of 64k constants), remember that constants are almost always a better choice than global memory because global memory has 0kb cache.

Textures also have an 8k cache. But when you use 2D textures, it caches squares.

P.S. did you guys ever look at the stars and think “gosh, what could those old, puny computers with just 32kb of memory do?” i wonder if it’s karma that we get to find out