Hi,
We are exploring the possibility of using the GPUDirect RDMA feature to transfer data from GPU to PCIe devices directly. Is it possible? If so, can anyone provide some pointers to the same?
Regards,
Kumar.
Hi,
We are exploring the possibility of using the GPUDirect RDMA feature to transfer data from GPU to PCIe devices directly. Is it possible? If so, can anyone provide some pointers to the same?
Regards,
Kumar.
Hi kumar81,
Regarding GPUdirect, it’s not supported in current BSP.
Thanks
Hello Kayccc,
Thanks lot for your reply.
Actually we want to transfer data generated by GPU to PCIe device. As Tegra K1 has unified physical memory, all we need is a DMA'able (contingious and physical) memory to be allocated (either by ARM or by GPU) and accessible at the GPU over CUDA. How is it possible? Is there a mechanism for that in CUDA (like cudaMAllocHost or extensions)? Or is it possible using the NVMAP driver?
I am sure there must be a similar kind mechanism being used by GRALLOC in Android. But there are no references for the same anywhere :(
Looking forward for answers.
Thanks in advance.
Regards,
Kumar.
You probably need to use a kernel function like vmalloc_to_page() or walk_page_range() to trace the userspace virtual pointer to the physical bus address (similarly, you might find virt_to_page() or get_user_pages() useful). Then once you obtain this physical address of the cudaMallocHost() buffer, you can send it over to your PCIe device. See [url]http://stackoverflow.com/a/28987409[/url] for an example from userspace.
Hi,
Thanks a lot for the info and we could succesfully work with cudaMallocHost memory.
When we allocate an address using cudaMalloc, we are getting a device pointer. Since this is mapped to a different virtual space outside the processor’s, it will not be accessible in the CPU space (will return Bus error on access).
Now how can we get the physical address of a cudaMalloc-ated memory? And who does this virtual to physical mapping? SMMU? Is it how the current implementation works?
Regards,
Kumar.
Hi kumar81,
All memory accessible to TK1 GPU needs to be page-locked
Please check the CUDA doc to get more details for how to use it, see:
[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#page-locked-host-memory[/url]
Thanks