Jetson system architecture (CPU/GPU/Memory)

Hi,
I am using Jetson Nano, and not sure about how CPU/GPU/Memory is structured in the system.
As far as I know, CPU and GPU shares the main memory, and I drew a simple diagram to express my understanding. Could you please check if my understanding of the system architecture is correct??

This is almost complete: You need to place a memory controller between DRAM and other components. The GPU is directly wired to the memory controller, making it an integrated GPU (iGPU) instead of discrete GPU (dGPU).

I’m not positive of whether there is cache between GPU and memory controller. I’m thinking there is cache between CPU and memory controller (perhaps though it is between memory controller and DRAM), but that’s something NVIDIA might be able to clarify.

1 Like

Thank you for the reply!
Based on your comments, I re-draw the diagram as following. Is it more concrete now?

Can I ask a couple of more questions?

  • What is the role of memory contoller? Does it manage the memory address by both CPU and GPU?
  • What does “(perhaps though it is between memory controller and DRAM)” mean? This is quite opposite to the previous sentence, so i am confused. I think between CPU and memory controller is more appropriate place for cache?

That looks correct, but I don’t know enough to say if it requires other changes to the diagram, e.g., I don’t know if the GPU has a cache or not.

Every computer out there, short of a simple controller, will have the ability to use both “physical” memory (at a physical address) or “virtual” memory (either physical memory translated to some other address, or else swap or some other “address based” data operation). When the computer first boots, all of the hardware is at some exact physical address, and some of the operations during boot are hard wired to use those addresses, while others are named via firmware (such as the device tree).

Once you introduce user space programs you would typically see the memory in the program starting at “0x0” and moving upwards. The real memory is at some odd physical address offset, and perhaps fragmented in chunks. The memory controller, during boot, can be told to simply use a named address without translation, but in the case of user space, create a map that looks like everything from “0x0” up to whatever range of addresses the program uses (but in reality the RAM or storage related to that address has some different physical address).

Also, when there are two or more processes, security of having one process protected against either bugs or intentional access to memory the other process does not own, it is the job of the memory controller to allow or deny access to that range. Since other hardware and software does not have direct access to the RAM, this is possible. The memory controller is also a mediator for memory security.

RAM is just any volatile memory. DRAM is a specific type of RAM, Dynamic RAM. DRAM is cheap, and fast, but it requires a “refresh” of content to keep it in memory (thus the “dynamic” part; “static” RAM just remembers its content until power loss, and tends to be faster, but is also much more expensive). Cache RAM tends to be static RAM, whereas main system memory tends to be dynamic RAM.

There is more than one way cache can be implemented, and I don’t know the details on a Jetson. Typically, cache is controlled differently than system memory. System memory is recalled in specific addresses and content, whereas cache is obtained in “lines” or blocks. When something uses memory not in cache, then the first use fills the entire line, despite only needing some subset of that line. There may be many lines, and software trying to “predict” usage will retrieve more of the memory in a block than what is requested. That extra fills in lines. From those lines the specific memory is presented to the CPU. If the lines of cache contain the next request, then no system RAM is requested, and the answer to the request is from the subset of the line of cache.

Writing is similar for cache, whereby only the subset of the cache line is written to when the CPU writes back. Depending on cache policies, the change might start writing to system RAM in the background, or policy might be to not even write to system memory until something forces the write. Reading from cache without retrieving system memory is a cache hit. Reading from system memory because cache does not have what it needs is a cache miss. Any time cache must change due to constraints and main memory must be read, one can say the cache line is invalidated, and that the operation will make cache once again valid. Cache is very fast, and if several operations can read from a line, it saves a lot of time. Even so, something as simple as a different process needing memory not in cache will cause a cache miss, and the invalidated cache line will take extra time (beyond just accessing system memory) to become valid again. There is different hardware and policy configuration for different cache, and I don’t know what the details are on a Jetson.

In summary:

  • The memory controller mediates access to system RAM. Perhaps it has knowledge of and assists with cache RAM, though I think it is separate (someone might comment on that).
  • Physical addresses are the address selection mechanism directly on the bus of any addressable device. A memory controller can function in a direct pass-through mode to do this. Or it can use a 1:1 map which emulates direct physical address pass-through.
  • Virtual addresses are a map from the contiguous set of addresses presented to a process and the physical address on the bus.
  • Boot always starts with physical addresses.
  • Security of memory is enforced by the memory controller.
  • Cache is fast static RAM (SRAM), but it takes time to fill it; thereafter if that content is sufficient, then access is fast enough to make up for the filling of the cache line.
  • Quite often, firmware (such as device tree) contains a physical address specification for different hardware, and the memory controller typically sends that address to drivers, but user space access is indirect and maps some other address to that physical address (sanity checking and security are then possible).
2 Likes

Hi,
Thank linuxdev for providing the information.

For block diagrams of Jetson Nano(TX1), please check technical reference manual of TX1:
Jetson Download Center | NVIDIA Developer

In software implementation, we have NvBuffer APIs and you can access the memory on CPU/GPU without memory copy. Please check jetson_multimedia_api samples and documents.

1 Like

One thing I forgot to mention: The GPU can use only physical RAM. It cannot use virtual memory. It does share system memory, and most drivers refer to physical address. Swap memory won’t help when you need more memory for a GPU. The memory controller does provide access to ranges of memory for the GPU via physical address, but won’t provide a virtual address for that device.

1 Like

Thank you for your detailed response!
It helped me a lot to understand the architecture of jetson nano and general concepts.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.