Constant memory

hi all,
i remeber for Read only data using constant memory is very fast if in cache. How much constant memory do i have? (i havent found it from any nvidia references). Does it partition into different SM (Cuda core) like shared memory? Otherwise how much texture cashe do I have?

i remeber also, in G80, blocks under the same SM share 16KB shared memory. Does it mean, if my graphics card has 30 multiprocessor, it has totally 16*30=460KB shared memory?

regards

Appendix A of the programming guide - 64kb total across the whole GPU.

Yes, 16kb per MP (Also appendix A of the programming guide), 480kb for a GT200 (not 460). But remember that shared memory is block/MP local, so you effectively only have 16kb.

thx for ur reply!
With 65 kb can I put a couple of data… the question is, how can I put them in constant memory without pointer? (for example for a 50X50 constant Matrix)

thx for ur reply!

With 65 kb can I put a couple of data… the question is, how can I put them in constant memory without pointer? (for example for a 50X50 constant Matrix)

cudaMemcpyToSymbol() will let you copy data to constant memory declarations. There is no dynamic allocation for constant memory, however.

how about the bandwidth? is the bandwidth of constant memory identic with tha bandwidth of shared memory? (Which access is faster?)

I’m not sure about bandwidth, but I believe that when multiple threads concurrently access the same location in constant memory the reads aren’t serialized as they would were the memory non-constant. Not 100% sure on this, however.

Constant memory is optimised for broadcast - everything’s fast if all threads read the same location. If different threads read different locations, then the warp gets serialised.