Issues porting desktop RDMA app to Tegra: mmap hangs kernel

tbesard · February 28, 2022, 1:16pm

Hi,

I’m having some troubles porting my GPU RDMA application to Tegra (on a Xavier AGX). Using a discrete GPU, I was successfully allocating GPU memory using cuMemAlloc, passing it to the kernel where I pin the memory (nvidia_p2p_get_pages) and make it accessible to the device I want to DMA to/from (nvidia_p2p_dma_map_pages). I then use those DMA addresses (dma_mapping->dma_addresses[page]) with my device’s DMA engine, while also using the GPU memory’s physical addresses (page_table->pages[page]->physical_address) to set-up a userspace mapping for application compatibility reasons.

To port this to Tegra, I followed the docs in changing the allocation to a cuMemAllocHost and adapting use of the RDMA APIs. I couldn’t seem to access the page’s physical addresses anymore, so I’m using dma_to_phys to convert the handles (now in dma_mapping->hw_address[page]) to physical addresses; these addresses look OK (identical to the DMA handles, but I assume that’s to be expected). However, passing these physical addresses to remap_pfn_range when setting up the mmap instantly hangs the kernel without any debug message. The documentation doesn’t mention any incompatibility, only that vm_page_prot should be adjusted when using cudaHostAllocWriteCombined, which I’m not.

Any thoughts on what might be the issue here?

AastaLLL · March 1, 2022, 2:58am

Hi,

Below is an example for RDMA on Jetson.
Would you mind checking it to see if it can meet your requirement?

Thanks.

tbesard · March 1, 2022, 6:38am

Yes, I’m already looking at that example code (which was very handy in porting my use of the RDMA APIs). However, my problem isn’t with the RDMA APIs, it’s with the mmap I do on the physical addresses I get out of it (see xtrx_julia/main.c at 07e7985d0c8a3a5ceecd7d794e8fd075941f87e5 · JuliaComputing/xtrx_julia · GitHub). The example code doesn’t do any userspace mapping.

AastaLLL · March 7, 2022, 3:47am

Hi,

Would you mind sharing the source to reproduce this issue in our environment?
We want to check this further to see if any update is required.
Since the API from host and Jetson is slightly different.

Thanks.

tbesard · March 7, 2022, 6:35am

It’s not trivial to isolate the code to something that can be executed without our target device. I can try though.

In the mean time, I’ve stumbled on an issue that might explain crashes. So for a given GPU allocation from user space, I both need the DMA bus addresses for use with my device, and physical addresses for use with mmap. On non-Tegra hardware, I iterate the entries from nvidia_p2p_get_pages and nvidia_p2p_dma_map_pages together, since I always seem to get the same number of entries. I can then easily get the DMA address by looking at dma_mapping->dma_addresses[...] and the physical address from page_table->pages[...]->physical_address.

On Tegra, I’m not getting the same number of entries in the page table and DMA mapping (e.g., for a 4MB allocation I get 1024 page table entries and 2 dma mapping entries). However, I’m not sure how to get the correct DMA handles and physical addresses from either. I’ve tried two approaches:

iterate the page table, get phys_addr using page_to_phys, get dma_addr using phys_to_dma
iterate the dma mapping, get dma_addr directly from hw_address, get phys_addr using dma_to_phys

Both these approaches yield different addresses, although within each approach the DMA addresses and physical addresses are always equal to each other. The fact that there is a difference means that I’m probably doing something wrong though, and using those wrong addresses with mmap is the likely to crash the system.

Any thoughts? What’s the correct way of getting valid DMA bus addresses as well as physical addresses for use with mmap out of the NVIDIA P2P APIs?

tbesard · March 14, 2022, 4:01pm

@AastaLLL Any thoughts on my latest post?

AastaLLL · March 21, 2022, 5:54am

Hi,

Sorry for the late update.
We are checking this internally and share more information with you soon.

Thanks.

AastaLLL · March 23, 2022, 5:09am

Hi,

Thanks for your patience.
Here are some of the suggestions:

1.

The input physical address for io_remap_pfn_range should be address of struct page on Jetson.
Please ensure to compile the sources with the Jetson version of nv-p2p.h.

Something like:

#if (defined(CONFIG_ARM64) && defined(CONFIG_ARCH_TEGRA))
    dmachan->reader_addr[j] = page_to_pfn((struct page*)nvp);
#else
    dmachan->reader_addr[j] = (uint32_t*)(nvp->physical_address + offset);
#endif

It looks like the nv-p2p.h header has different page_table struct fields for desktop and Jetson:

Desktop:

typedef
struct nvidia_p2p_page {
    uint64_t physical_address;
    union nvidia_p2p_request_registers {
        struct {
            uint32_t wreqmb_h;
            uint32_t rreqmb_h;
            uint32_t rreqmb_0;
            uint32_t reserved[3];
        } fermi;
    } registers;
} nvidia_p2p_page_t;

typedef
struct nvidia_p2p_page_table {
    uint32_t version;
    uint32_t page_size; /* enum nvidia_p2p_page_size_type */
    struct nvidia_p2p_page **pages;
    uint32_t entries;
    uint8_t *gpu_uuid;
} nvidia_p2p_page_table_t;

Jetson:

typedef struct nvidia_p2p_page_table {
    u32 version;
    u32 page_size;
    u64 size;
    u32 entries;
    struct page **pages;
  
    u64 vaddr;
    u32 mapped;
  
    struct mm_struct *mm;
    struct mmu_notifier mn;
    struct mutex lock;
    void (*free_callback)(void *data);
    void *data;
} nvidia_p2p_page_table_t;

2.

Ensure to follow “Modification to Kernel API” from https://developer.nvidia.com/blog/gpudirect-rdma-nvidia-jetson-agx-xavier/.
The mapping size should be multiple of 4K, Write combine requirement while remapping.

Thanks.

tbesard · March 25, 2022, 7:36am

Thanks for your reply.

I’ve read that blog post, and have adapted my code (using cuMemAllocHost, not setting write-combined so not having to do anything while remapping).

I need the physical address, not the PFN, since I correct for that when mapping (shifting addresses by PAGE_SHIFT). But since you mention page_to_pfn here, I understand that my use of page_to_phys is correct, and that I shouldn’t be doing the inverse (recovering the physical address from the hw_address in the dma mapping). I’m left wondering then if and how I should use the dma mapping from nvidia_p2p_dma_map_pages, which gives me far fewer entries than the page table contains (see previous post). Is this API not supported on Tegra, and in fact, why should I ever use it instead of just doing phys_to_dma on the physical addresses from the page table?

AastaLLL · March 31, 2022, 3:52am

Thanks for your reply.

We are checking this with our internal team.
Will share more information with you later.

AastaLLL · April 1, 2022, 5:35am

Hi,

Below is some advice for you.

Please do this to get the physical address:

#if (defined(CONFIG_ARM64) && defined(CONFIG_ARCH_TEGRA))
    dmachan->reader_addr[j] = (uint32_t*)(page_to_pfn((struct page*)nvp) << PAGE_SHIFT) + offset;
#else
    dmachan->reader_addr[j] = (uint32_t*)(nvp->physical_address + offset);
#endif

Note the differences in the nvidia_p2p_page_table_t for desktop and Jetson.
Check nv-p2p.h for both. On Jeston, there is no struct nvidia_p2p_page.

Physical addresses and DMA addresses may not be one-to-one.
It may be that the iommu is mapping multiple physical addresses to single DMA range.

Check here for how dma_map gets used in picoevb-rdma sample:

DMA maps the physical pages. This will give the DMA/PCIe addresses:
jetson-rdma-picoevb/picoevb-rdma.c at master · NVIDIA/jetson-rdma-picoevb · GitHub
Uses the DMA/PCIe addresses for PCI-PCI transfer:
jetson-rdma-picoevb/picoevb-rdma.c at master · NVIDIA/jetson-rdma-picoevb · GitHub

Thanks.

system · April 27, 2022, 3:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPUDirect RDMA on NVIDIA Jetson AGX Xavier Technical Blog	1	840	June 12, 2019
Is GPU Direct RDMA supported on Xavier? (Solved) Jetson AGX Xavier	23	3957	October 18, 2021
GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages) Jetson AGX Orin gpu	13	2689	November 16, 2022
I have a few questions about GPU Direct RDMA Jetson AGX Xavier cuda , kernel	4	762	December 1, 2022
Slow remote DMA write and read Jetson TX2	26	2735	June 29, 2019
Error when trying to write data to GPU DMA memory (using GPU Direct RDMA) Jetson AGX Xavier pcie , kernel , fpga	8	1473	May 30, 2023
GPUDirect RDMA - Module can not be insert into kernel Jetson AGX Orin pcie , kernel , nvbugs	27	4545	November 2, 2022
GDS and Jetson Xavier Jetson AGX Xavier gds	5	844	June 7, 2023
GPU direct access to DMA memory over PCIe Jetson Xavier NX pcie , cuda	4	2266	April 22, 2022
Jetson Orin Developer Kit - RDMA not working Jetson Nano gpu	7	159	January 2, 2025

Issues porting desktop RDMA app to Tegra: mmap hangs kernel

1.

Desktop:

Jetson:

2.

Related topics