Can someone please explain to me how GPUDirect works a high level. From what I have gathered it allows certain devices to bypass going through Main Memory and instead having data piped directly to the GPU. Is this the case. Any information would be awesome as I am very new to CUDA but this looks promising to my application which needs to get large amounts of data processed at nearly real time. The data will be coming in on a GigE connection so bypassing main memory would be awesome.
Can someone please explain to me how GPUDirect works a high level. From what I have gathered it allows certain devices to bypass going through Main Memory and instead having data piped directly to the GPU. Is this the case. Any information would be awesome as I am very new to CUDA but this looks promising to my application which needs to get large amounts of data processed at nearly real time. The data will be coming in on a GigE connection so bypassing main memory would be awesome.
I was also looking for the same feature (see this post from me [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA)
But currently there does not seem like a way to bypass the memory transactions. GPUDirect2.0 has directpath between GPU-to-GPU, but is not yet open for other devices to access directly.
GPUDirect 1.0 (which removes the extra memory copy between Device memory and GPU memory) might also work for your application as it takes away the burden on CPU of copying. This uses pinned memory shared by both GPU and the device. And there are infiniband cards which uses this functionality (Qlogic/Mellanox)
I was also looking for the same feature (see this post from me [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA)
But currently there does not seem like a way to bypass the memory transactions. GPUDirect2.0 has directpath between GPU-to-GPU, but is not yet open for other devices to access directly.
GPUDirect 1.0 (which removes the extra memory copy between Device memory and GPU memory) might also work for your application as it takes away the burden on CPU of copying. This uses pinned memory shared by both GPU and the device. And there are infiniband cards which uses this functionality (Qlogic/Mellanox)
With SDK 4.0 Release, there are new functions which appears to do this. (For disclaimer, I never used it in a practical situation)
Check the CUDA_C_Programmin_Guide.pdf which comes with SDK.
Look for cudaHostRegister() [font=“Garamond,Garamond”][font=“Garamond,Garamond”]page-locks a range of memory allocated by [/font][/font]malloc()[font=“Garamond,Garamond”][font=“Garamond,Garamond”].
So if your device can use the same 'malloc()'ed memory to dump data, the GPU will be able to read it and use as a page-locked memory. So you can use any external cards to do that (not just the IB cards I mentioned), even the one you design.
With SDK 4.0 Release, there are new functions which appears to do this. (For disclaimer, I never used it in a practical situation)
Check the CUDA_C_Programmin_Guide.pdf which comes with SDK.
Look for cudaHostRegister() [font=“Garamond,Garamond”][font=“Garamond,Garamond”]page-locks a range of memory allocated by [/font][/font]malloc()[font=“Garamond,Garamond”][font=“Garamond,Garamond”].
So if your device can use the same 'malloc()'ed memory to dump data, the GPU will be able to read it and use as a page-locked memory. So you can use any external cards to do that (not just the IB cards I mentioned), even the one you design.
Thanks for the clarification.
I have been looking for more information on this. Can I use this with any external device? If we design an external PCIe card, would I be able to use this to share the same pinned memory between the GPU and the external card? Or is this flag intended only for Mellanox IB card?
And another question is if I can get a sneakpeek on if and when GPUDirect2.0 (peer-to-peer) might be available for use between GPU-to-thirdparty PCIe card. I can get rid off the memory access for data loading.
Thanks for the clarification.
I have been looking for more information on this. Can I use this with any external device? If we design an external PCIe card, would I be able to use this to share the same pinned memory between the GPU and the external card? Or is this flag intended only for Mellanox IB card?
And another question is if I can get a sneakpeek on if and when GPUDirect2.0 (peer-to-peer) might be available for use between GPU-to-thirdparty PCIe card. I can get rid off the memory access for data loading.