I’m trying to clarify some details about GPUDirect2.
Lets say I have an array A. I want to store A[0:10] on the CPU, A[11:50] on GPU1 and A[51:99] on GPU2. Since GPUDirect2 gives me a unified memory space, I can access data from any of the 3 devices while executing a kernel on GPU1.
I heard that GPUDirect2 uses the old SLI infrastructure. SLI used to store the same block of memory on GPU1 and GPU2 (i.e., A[11:99]), but divide the work across GPUs. So, with GPUDirect2, do we assume GPU1 will have A[11:99] stored in global device memory (and occupy an equal space on GPU2 as well), or has this been improved so GPU1 only stores the subset A[11:50]?
The GPUDirect2 is supported by Tesla 20-series (Fermi) cards. Is it also supported on a dual Quadro 6000 (Fermi level) SLI rig?