Yes.
I’ve tried using cudaStreamSynchronize() or cudaDeviceSynchronize() after the kernel call and it works fine.
But using synchronize() after the kernel call is not what I want. This is because we wanted to read the next parameter into the same memory while the kernel was running on the GPU. As you can see from the code, it is never designed to access the same memory address.
And what you quoted says jetson managed memory cannot be accessed concurrently, why do you enable concurrent access using the cudaMemAttachHost flag? Shouldn’t this be used?