Where is Constant Memory Physically Located?

Hi I’ve got a quick question about the following line from the CUDA 1.1 Programming Guide:

“The total amount of constant memory is 64 KB”

Is this constant memory situated in SRAM on the GPU die, or is it found on one of the DRAM chips on the board?

Other posts I’ve seen seem to suggest that it is in DRAM, but because its such a small amount of memory, I thought it might actually be on chip.

It is in the DRAM, but backed by a very small on-chip cache.

Hi thanks for getting back to me.

OK, so there’s the 64 kB portion of DRAM, then there’s the 8kB caches for each multiprocessor and a little on-chip cache between them? What size is this cache?

I have a follow-up question too :-) As I understand, there is an on-chip cache for every two physical banks of DRAM. How big are these caches?

I’m trying to get a feel for how much SRAM there is on the GPU, in all the different caches, memory blocks and registers. There doesn’t seem to be one nice handy diagram anywhere…

global memory is not cached
and the on-chip cache might very well be the cache per multiprocessor.

I’m looking at the attached block diagram. I’d quite like to understand what is physically on the chips in terms of SRAM memory. Does anyone know what the size of these L1 and L2 caches are in the picture?

I’ve seen this picture before, and always been confused by it. It looks like the exact transpose of how the G80 is described in the programmers manual. The manual says there are 16 multiprocessors, each containing 8 stream processors. But that picture shows 8 large blocks (presumably multiprocessors), with 16 smaller blocks inside with “SP” stamped on them. Does someone know if this picture is inaccurate, or does the physical organization of the chip differ from the high level view of it from CUDA?

Given that the image is from 2006, this might well be how someone understood it back then. There are plenty of other pictures from NVIDIA sources (e.g. in one of the many powerpoint presentations) that paint a completely different picture.

Here’s a couple more pictures from the same guy.

to seibert: It seems that the multiprocessors (with their eight streaming processors) are arranged in pairs, so each multiprocessor pair has shared resources, and contains 16 streaming processors.

As Denis is saying, we can’t be sure of the accuracy of these diagrams, because they represent one person’s interpretation of the G80 architecture. The most recent one is from 2007.

Can anyone make a categorical statement about the accuracy of these diagrams?



I can state only three things:

  1. Those are not official NVIDIA documentation.
  2. I assume the author is referring to texture caches in the first diagram.
  3. There is no global memory cache on G8X/G9X, only texture and constant caches.


I think this statement is strictly spoken wrong. The G8x/G9x can address up to 16 segments of 64kB of constants (in DirectX10, for example).

In CUDA two (or three) of these segments are used. One is for kernel-local constants, the other for global constants within the .cu file.

If you run against the 64kB limit, it will usually be because the global constant segment is full. Anyway, it is an addressing limitation not a physical one.

Thanks everyone for your replies, I’m probably confused more now than ever by the G80 memory hierarchy, but that’s better than thinking I understand when I clearly don’t.