Global memory vs device memory

Inkj · March 25, 2023, 8:46am

Hi everyone, I’m recently keen to the usage of Nsight Compute to help profile my CUDA programming, but I met with some issues on its “memory work load analysis”:

1.What’s the difference between “global memory” and “device memory”? I’ve checked Nsight Compute kernel profiling guide, and I’ve learnt that global memory is a logic concept, and it must have a corresponding physical unit, which I thought was “device memory”. BUT as depicted in the kernel profiling guide, “Device Memory is On-chip device (GPU) memory of the CUDA device that executes the kernel”. I think global memory is off-chip. I experiment with a kernel in the Nsight Compute and I got the following result. As you can see, data transportation between L1/TEX and L2 is quite different from transportation between L2 and Device Memory. Besides, the former one argues a severe uncoalesced memory access, while the latter doesn’t.
It’s a real bummer as I did not found valuable references. Could any me help me clarify these?

rs277 · March 25, 2023, 6:19pm

I would have said there’s no difference, if the definition of “On-Chip”, means resident on the GPU die. The definitions in the Nsight Compute documentation seem ambiguous. As well as the Device Memory quote you posted, “Peer Memory” is also stated to be “On-chip” memory of other devices.

Maybe “Device DRAM” would be a better description.

The “Best Practices Guide” depicts the memory heirachy here, which gives a clear indication of “On-chip” and “DRAM”.

Robert_Crovella · March 25, 2023, 8:35pm

Device memory means the DRAM attached to a GPU. The memory that is accessed over the GPU external memory bus. It can be thought of as a “physical” space.

“global memory” is a logical space. It is the memory you get when you do a host cudaMalloc operation, a device malloc operation, or a cudaHostAlloc operation. All 3 of these types of allocations live in the logical global space. “global” memory is distinguished from the other common logical spaces, “local”, “shared”, and “constant”.

Note that host memory, if allocated via a pinned allocator such as cudaHostAlloc lives in the logical global space, and is “global memory”.

Device memory is also one of the possible backings for the logical “local” space. Device memory can be off-chip in the case of discrete GPUs, and it can be “on-chip” in the case of Jetson devices. I personally don’t think on-chip vs. off-chip is a defining distinction for device memory. It can reside on-chip, and it can reside off-chip.

The 3 entities on the right hand side of that diagram could be thought of as physical backings. The entities all the way on the left can be thought of as distinct logical spaces.

Inkj · March 26, 2023, 1:29am

thank you for your prompt reply and the link.

Inkj · March 26, 2023, 1:40am

Thank you for clarifying. Maybe I present my understanding on your reply?

Global memory is a logical space. It may correspond to physical memory backings in the device and host.
Device memory means the DRAM attached to the GPU. It could be the physical backing for “global memory”, as well as other logical memory spaces such as “logical memory”, “constant memory”, “texture memory”, and “surface memory”.
Technically, global memory and device memory is not equivalent.

It would be great if I take it correctly. Thank you again.

Robert_Crovella · March 26, 2023, 7:00pm

Yes, correct.

system · April 9, 2023, 7:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to correctly understand CUDA global memory v.s. off-chip pysical location? CUDA Programming and Performance cuda	5	478	December 12, 2023
Device memory VS Shared memory CUDA Programming and Performance	4	4077	September 22, 2008
Global memory? Need to have Global Memory cleared up CUDA Programming and Performance	4	4891	April 19, 2007
Gpu Memory: Dram Or Sram? CUDA Programming and Performance	3	14861	May 25, 2012
Memory terms CUDA Programming and Performance	5	631	May 16, 2019
Nsight->unguided application->kernel memory meaning? CUDA Programming and Performance	4	738	September 12, 2016
[Solved]CUDA Memory Statistics CUDA Programming and Performance	2	1416	June 15, 2016
What's different between LD and LDG (load from generic memory vs. load from global memory) CUDA Programming and Performance	10	10293	March 13, 2022
Question About Memory Hierarchy CUDA Programming and Performance	2	977	August 4, 2010
Memory types and CUDA access CUDA Programming and Performance	5	59221	February 3, 2009

Global memory vs device memory

Related topics