1.
The physical memory is shared by CPU and GPU so that data can shared from CPU to GPU.
Please noticed that CPU and GPU have their own memory address so a special type allocation is required for this feature.
2.
There is a new hardware on Xavier for IO coherency.
It allows GPU to snoop CPU cache but CPU cannot snoop through device’s cache.
3.
Our L3 cache is configured to share cross CPU clusters.
Thank you for the info! I had a few followup questions.
1.So there would be no memcopies but there would be an address translation and potentially page faults?
If I understand this correctly, the CPU caches are not flushed upon kernel launch, and when the GPU accesses that cached data, a coherence mechanism gets it from the CPU’s cache.
However, upon kernel finish, the GPU’s caches are flushed. Is this correct?
Tying in with the above, the GPU cannot access the L3 but if there is data in the L3, the new IO coherence mechanism will get it for the GPU from the the L3?