I have multiple processes sharing a single device. Each process involves transferring some data to the GPU followed by a kernel call. I understand that CUDA creates a different context corresponding to each process and that kernels in different context are executed serially. So, I was wondering if the same is true for memory transfers as well?
In other words, Is memory transfer from multiple processes to a GPU executed serially or concurrently?