There are certain amount of on-chip memory in integrated GPU (e.g. 64MB for GeForce 2Go 200). I have a question on the usage of such memory.
The “NVIDIA CUDA C Programming Best Practices Guide” 2.3 mentions the following when discussing zero-copy:
“On integrated GPUs, mapped pinned memory is always a performance gain because it avoids superfluous copies as integrated GPU and CPU memory are physically the same.”
This sentence seems to suggest that all memory used through CUDA is off-chip memory (e.g., the host memory). Is the on-chip GPU memory usable for programming? If so, how can it be accessed in a CUDA program? If not, why? (Reserved for graphic displaying?)
Thanks!
When talking about “global” or “device” memory on a CUDA device (which is the memory size mentioned with the card specifications), that memory is always “off-chip”. The GPU die itself only has a small amount of memory in the form of shared memory and texture/constant caches. Depending on the GPU, the global memory is either connected through a direct memory bus between the GPU and dedicated graphics memory (as is the case on higher performance GPUs), or through the motherboard chipset to a portion of the host memory. The latter is the technique used on cheaper integrated GPUs. The integrated GPU gets a chunk of host memory mapped into its address space, effectively.
All that quote is suggesting is that with zero-copy and pinning, some virtual memory mapping trickery can make the same chunk of host memory visible directly to both the CPU and the GPU, with no need to actually copy bytes from one location to another. In the case of an integrated GPU, the speed of accessing this zero-copy memory is identical to normal global memory, since global memory is just another part of the host memory.