It is my understanding that when using unified memory, data is first created on host and then copied to device when it needs to be accessed on device. There is an implicit mechanism to copy data from host to device in using unified memory. It is also my understanding that if there is no special treatment, such copy is executed on default stream. Then I have the following question: in a multi-process/multi-thread env, if unified memory is not managed by a specific user-defined stream, will it be a potential risk for the whole system?
Unified memory as a general term means that different kinds of memory addresses or pointers live in the same space.
[When it only relates to shared, global, local, those pointers are called generic pointers.]
Unified memory more specifically are unified host and device pointers.
There are different kinds, zero-copy memory only lives on the host, and the GPU accesses it over the PCIe bus for each access. Managed memory is copied / moved page-wise with accesses on the GPU or CPU. I believe this is the kind of memory you are talking about.
What I believe you are looking for is cudaStreamAttachMemAsynch
, which assigns managed memory to a stream.