hi all,
i remeber for Read only data using constant memory is very fast if in cache. How much constant memory do i have? (i havent found it from any nvidia references). Does it partition into different SM (Cuda core) like shared memory? Otherwise how much texture cashe do I have?
i remeber also, in G80, blocks under the same SM share 16KB shared memory. Does it mean, if my graphics card has 30 multiprocessor, it has totally 16*30=460KB shared memory?
Appendix A of the programming guide - 64kb total across the whole GPU.
Yes, 16kb per MP (Also appendix A of the programming guide), 480kb for a GT200 (not 460). But remember that shared memory is block/MP local, so you effectively only have 16kb.
thx for ur reply!
With 65 kb can I put a couple of data… the question is, how can I put them in constant memory without pointer? (for example for a 50X50 constant Matrix)
With 65 kb can I put a couple of data… the question is, how can I put them in constant memory without pointer? (for example for a 50X50 constant Matrix)
I’m not sure about bandwidth, but I believe that when multiple threads concurrently access the same location in constant memory the reads aren’t serialized as they would were the memory non-constant. Not 100% sure on this, however.
Constant memory is optimised for broadcast - everything’s fast if all threads read the same location. If different threads read different locations, then the warp gets serialised.