Usefulness of GPUDirect Usefullnes of GPUDirect to transfer Render Scene to host

I am wondering if GPUDirect would be useful to my project. Having downloaded the package from Nvidia and looked at the example, mpi_pinned.c, it appears that data can be transferred directly from the gpu to system memory via cudamemcpy… from the example…

// allocate host memory

cudaMallocHost( (void **) &pHostMemory, NBYTES);

// allocate device memory

cudaMalloc( (void **) &pDeviceMemory, NBYTES);

// transfer data...from host to gpu

cudaMemcpy(pDeviceMemory, pHostMemory, NBYTES, cudaMemcpyHostToDevice);

My questions are as follows:

Is the transfer bidirectional? Can I transfer data from the gpu to the host equally as fast?

How can I use this with OpenGL? I want to render a scene, to a PBO, or FBO, and transfer it to the host memory, avoiding the CPU, and push it around a network using, say, MPI.

CD

I am wondering if GPUDirect would be useful to my project. Having downloaded the package from Nvidia and looked at the example, mpi_pinned.c, it appears that data can be transferred directly from the gpu to system memory via cudamemcpy… from the example…

// allocate host memory

cudaMallocHost( (void **) &pHostMemory, NBYTES);

// allocate device memory

cudaMalloc( (void **) &pDeviceMemory, NBYTES);

// transfer data...from host to gpu

cudaMemcpy(pDeviceMemory, pHostMemory, NBYTES, cudaMemcpyHostToDevice);

My questions are as follows:

Is the transfer bidirectional? Can I transfer data from the gpu to the host equally as fast?

How can I use this with OpenGL? I want to render a scene, to a PBO, or FBO, and transfer it to the host memory, avoiding the CPU, and push it around a network using, say, MPI.

CD