Global memory? Need to have Global Memory cleared up

Hi there,

I’ve been working through the developer documentation för CUDA over the past few days and there’s some parts of it that still isn’t quite clear to me. The “Global Memory” feature is explained as the developer being able to have full scatter/gather-read from the memory pool, but which memory pool does this refer to? The GPU video-RAM or the main system RAM? The Parallell Data Cache is a memory pool for the ALU:s to use in cooperation, but if the main RAM is what global memory refers to, then what’s the term for the video-RAM? Is that the Local Memory?

Thank you in advance,

  • Andreas Eklöv

“global memory” is the memory of your graphics card, you can copy from and to there using the various cudaMemCpy commands.

All memory types mentioned are on the GPU board. The global (or device) memory are memory chips on the board next to the GPU chip. The shared memory is inside the GPU chip. Accessing shared memory is therefore much faster than going to device memory but it is very limited in size. This should sound familiar to you if you think of the caches a CPU has. The main (or host) memory is the source for all data. You can transfer data from host to device memory using cudaMemcpy in the CPU program. You can transfer data from device memory to shared memory using = for two appropriately typed variables in a CUDA kernel.


Thank you ever so much guys, I had to get that cleared out in order to wrap my head around the concept of CUDA. So in other words, the GPU video-RAM is fetched via global memory, and then each of the ALU:s in the multi-ALU clusters has their own Parallel Data Cache. The texture-memory is, from what I’ve gathered, also outside of the GPU. But what about the constant memory (or local memory) that’s described in the documentation, is that also separate memory resources outside of the GPU, or are they inside the GPU?

Local and constant memory are on the card (outside GPU). Local memory is just another concept of device memory wrt visibility to a thread. Constant memory is read-only by all threads (writeable by the host) but access is cached.

Btw I wouldn’t talk about video memory as with CUDA you cannot acquire a graphics context, ie. you cannot draw onscreen anyway.