For example, two GPUs(Kepler or cc 3.0+ ) launch kernels concurrently and,
GPU-1 writes to a, a, a (will not collide with 1,200,3,5 ever)
GPU-2 writes to a, a, a, a (will not collide with 0,155,1000 ever)
After streams are synchronized on host(no memcpy, just using Unified Memory), can we trust on data specifically on the “CPU” side, where accesses for read will be between indices 0 and 1000?
I don’t care if a GPU sees other GPUs writes. I’m asking only for what CPU will see.
If there is no problem, what kind of performance degradation can be expected? For example, fully randomized writes(but again, no collisions on 8-byte-wide regions) to a 50MB array, using 3 GPUs concurrently.