CUDA unified memory and concurrent read-only accesses

makkarpov · January 9, 2022, 12:41pm

Consider the following scenario:

There are two host threads, T1 and T2, and two CUDA streams S1 and S2 owned and used exclusively by corresponding host thread.
T2 is initially locked on some semaphore and idle.
T1 allocates managed memory and associates it with S1 in a single-stream mode.
T1 launches some kernels on S1 that will populate memory. Then T1 synchronizes on S1 to ensure that memory is populated.
T1 signals semaphore and unlocks T2
From now on, T1 and T2 work concurrently. Both of them are doing read-only access to managed memory, but T1 runs host code and T2 launches GPU kernel on S2.

As I understand, there is no danger for cache coherency and consistency, because all write accesses are explicitly synchronized. But there is a concurrent memory access from host and GPU side by two threads, which is AFAIK not allowed in unified memory model despite being read-only.

This code is run on Jetson AGX Xavier, so the question relates only to specific Xavier CUDA implementation. But since it is still a generic CUDA question I decided to post it in a generic CUDA forum. Please move the thread if the question is more suited for the Xavier forum.

Q1: Is this access pattern valid from the perspective of unified memory model?
Q2: If it is not, how to efficiently run such code without violating CUDA memory constraints and still have zero-copy access?