Memory Architecture Differences in x86 and SoC GPUs

Hi, I developed CUDA programs on x86 architecture in the past year. I am aware that GPUs have different types of memory, such as local memory, global memory, shared memory, etc. However, I have encountered some difficulties while developing CUDA programs on SoCs like Jetson Orin.
I would like to understand the differences in memory architecture between x86 GPUs and Arm GPUs. I know that the CPU and GPU on Orin share the same internal memory, but I’m unsure if those special memory units still exist.
By the way, are there any guidelines or codes specifically developed on Jetson that you could provide? I am particularly interested in using some technologies like zero-copy to reduce the runtime of memcopy operations and acquire new skills related to developing CUDA programs on SoCs, but don’t know how to do that.
Thanks for your help


User-space API is the same.
For example, unified memory, and page-lock memory is also available on Jetson.
But the physical memory is shared between the CPU and GPU, which means the CPU allocation will also impact the available memory of the GPU.

Here is the document to explain the memory type for Jetson:


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.