GPUDirect2 Questions

I’m trying to clarify some details about GPUDirect2.

Lets say I have an array A[100]. I want to store A[0:10] on the CPU, A[11:50] on GPU1 and A[51:99] on GPU2. Since GPUDirect2 gives me a unified memory space, I can access data from any of the 3 devices while executing a kernel on GPU1.

  1. I heard that GPUDirect2 uses the old SLI infrastructure. SLI used to store the same block of memory on GPU1 and GPU2 (i.e., A[11:99]), but divide the work across GPUs. So, with GPUDirect2, do we assume GPU1 will have A[11:99] stored in global device memory (and occupy an equal space on GPU2 as well), or has this been improved so GPU1 only stores the subset A[11:50]?

  2. The GPUDirect2 is supported by Tesla 20-series (Fermi) cards. Is it also supported on a dual Quadro 6000 (Fermi level) SLI rig?

Thanks

GPUDirect2 doesn’t give you the unified memory space. It describes the peer-to-peer reading and writing of unified memory, which is different. You’re thinking of UVA (Unified Virtual Address) is also part of CUDA 4.0. GPUDirect2 uses UVA, but you can have UVA without GPUDirect2.

With UVA, you request allocations from CUDA, which provides you appropriate pointers. You can’t give CUDA pointers and ask it to make them valid in the topology you like. So it’s not practical to manually assemble a single contiguous array with multiple memory domains like you ask for.

GPUDirect2 has nothing to do with SLI and SLI has nothing to do with GPUDirect2. SLI is all on the graphics side and is about coordinating rendering. It’s not related to CUDA.

UVA means each valid address has a single location where its data is stored. There are no addresses where the memory is automatically duplicated and stored on two devices at once. (Big caveat, caches might transparently hold duplicates of the same data.)

Yes, GPUDirect is supported by your Tesla cards, and your dual Quadro 6000s as well. All Fermis of the same architecture can share their addresses and do peer to peer. Here architecture means GF100, GF104, GF110.