Hello,
In the Jetson AGX Orin Technical brief document, I see that the 4MB system-level cache is accessible from both the CPU and the GPU according to Figure 2:
However, when we see the figure 4. the GPU has no access to the system-level cache:
Figure 8 shows that the system-level cache is part of the CPU complex:
Then we come to Figure 9 and see that the GPU has direct access to the system-level cache and memory controller interface:
I saw one old question (the linkorin-system-cache) where it turns out that the system-level cache is actually L4 for the CPU. But what about the GPU-side? Is it L3 for the GPU?
Another question: If we allocate memory using cudaMalloc(), whenever accessing this memory from GPU (in CUDA application), do we go through GPU L2 → LPDDR5 or GPU L2 → system-level cache → LPDDR5?
Is there any way to do so if not going through the system-level cache?
I would be happy if someone with knowledge clarified these.
Best. Thanks in advance.