I’m having some troubles porting my GPU RDMA application to Tegra (on a Xavier AGX). Using a discrete GPU, I was successfully allocating GPU memory using cuMemAlloc, passing it to the kernel where I pin the memory (nvidia_p2p_get_pages) and make it accessible to the device I want to DMA to/from (nvidia_p2p_dma_map_pages). I then use those DMA addresses (dma_mapping->dma_addresses[page]) with my device’s DMA engine, while also using the GPU memory’s physical addresses (page_table->pages[page]->physical_address) to set-up a userspace mapping for application compatibility reasons.
To port this to Tegra, I followed the docs in changing the allocation to a cuMemAllocHost and adapting use of the RDMA APIs. I couldn’t seem to access the page’s physical addresses anymore, so I’m using dma_to_phys to convert the handles (now in dma_mapping->hw_address[page]) to physical addresses; these addresses look OK (identical to the DMA handles, but I assume that’s to be expected). However, passing these physical addresses to remap_pfn_range when setting up the mmap instantly hangs the kernel without any debug message. The documentation doesn’t mention any incompatibility, only that vm_page_prot should be adjusted when using cudaHostAllocWriteCombined, which I’m not.
Would you mind sharing the source to reproduce this issue in our environment?
We want to check this further to see if any update is required.
Since the API from host and Jetson is slightly different.
It’s not trivial to isolate the code to something that can be executed without our target device. I can try though.
In the mean time, I’ve stumbled on an issue that might explain crashes. So for a given GPU allocation from user space, I both need the DMA bus addresses for use with my device, and physical addresses for use with mmap. On non-Tegra hardware, I iterate the entries from nvidia_p2p_get_pages and nvidia_p2p_dma_map_pages together, since I always seem to get the same number of entries. I can then easily get the DMA address by looking at dma_mapping->dma_addresses[...] and the physical address from page_table->pages[...]->physical_address.
On Tegra, I’m not getting the same number of entries in the page table and DMA mapping (e.g., for a 4MB allocation I get 1024 page table entries and 2 dma mapping entries). However, I’m not sure how to get the correct DMA handles and physical addresses from either. I’ve tried two approaches:
iterate the page table, get phys_addr using page_to_phys, get dma_addr using phys_to_dma
iterate the dma mapping, get dma_addr directly from hw_address, get phys_addr using dma_to_phys
Both these approaches yield different addresses, although within each approach the DMA addresses and physical addresses are always equal to each other. The fact that there is a difference means that I’m probably doing something wrong though, and using those wrong addresses with mmap is the likely to crash the system.
Any thoughts? What’s the correct way of getting valid DMA bus addresses as well as physical addresses for use with mmap out of the NVIDIA P2P APIs?
Thanks for your patience.
Here are some of the suggestions:
1.
The input physical address for io_remap_pfn_range should be address of struct page on Jetson.
Please ensure to compile the sources with the Jetson version of nv-p2p.h.
I’ve read that blog post, and have adapted my code (using cuMemAllocHost, not setting write-combined so not having to do anything while remapping).
I need the physical address, not the PFN, since I correct for that when mapping (shifting addresses by PAGE_SHIFT). But since you mention page_to_pfn here, I understand that my use of page_to_phys is correct, and that I shouldn’t be doing the inverse (recovering the physical address from the hw_address in the dma mapping). I’m left wondering then if and how I should use the dma mapping from nvidia_p2p_dma_map_pages, which gives me far fewer entries than the page table contains (see previous post). Is this API not supported on Tegra, and in fact, why should I ever use it instead of just doing phys_to_dma on the physical addresses from the page table?