GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages)

yan-eng.ang · October 7, 2022, 8:41am

Hi,

I have followed the below link and resolve nvidia-p2p lib issue.

When I perform gpudirect rdma to iGPU memory with size of 33177600bytes, I find that nvidia_p2p_dma_map_pages function will return only 1 entry (pages) as shown below:

[27983.043212] xdma:pevb_get_userbuf_cuda: before nvidia_p2p_dma_map_pages
[27983.043749] xdma:pevb_get_userbuf_cuda: ubuf->map->entries = 1
[27983.043751] xdma:pevb_get_userbuf_cuda: cusurf->offset = 0
[27983.043752] xdma:pevb_get_userbuf_cuda: cusurf->len = 33177600

When I perform gpudirect rdma(x86 PC) to RTX 4000 Quadro memory with size of 33177600bytes, nvidia_p2p_dma_map_pages function will return correct no of entries (pages). Everything is working fine on x86 system using GPUdirect rdma functions.

Any idea how to resolve this?

Regards
YE

AastaLLL · October 11, 2022, 1:53am

Hi,

Have you tried the same on Xavier before?
The APIs between Jetson and desktop GPU is slightly different.

Thanks.

yan-eng.ang · October 11, 2022, 2:39am

Hi,

I do not have Xavier board, only Orin.

I change the API to Tegra as show below.

I followed the code closely using this link

github.com

NVIDIA/jetson-rdma-picoevb/blob/master/kernel-module/picoevb-rdma.c

/*
 * Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify it
 * under the terms and conditions of the GNU General Public License,
 * version 2, as published by the Free Software Foundation.
 *
 * This program is distributed in the hope it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 * more details.
 */

#include <linux/cdev.h>
#include <linux/idr.h>
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
#include <linux/pci.h>

This file has been truncated. show original

One thing I note is once the hardware change to Jetson, the page will change to 4K as shown below

#ifdef NV_BUILD_DGPU
#define GPU_PAGE_SHIFT 16
#else
#define GPU_PAGE_SHIFT 12
#endif
#define GPU_PAGE_SIZE (((u64)1) << GPU_PAGE_SHIFT)
#define GPU_PAGE_OFFSET (GPU_PAGE_SIZE - 1)
#define GPU_PAGE_MASK (~GPU_PAGE_OFFSET)

Regards
YE

AastaLLL · October 12, 2022, 4:33am

Hi,

Thanks for the details.

Let us check with the dev team about this issue.
Will share more information with you later.

AastaLLL · October 13, 2022, 5:37am

Hi,

It’s expected that RDMA can work on Orin like Xavier.
Since there are some differences in API between dGPU and iGPU, please check if you have applied all the requirements shown in the porting document below:

Thanks.

yan-eng.ang · October 17, 2022, 1:12am

Hi,

Ok will check again.

Can I confirm that nvidia_p2p_dma_mapping **dma_mapping → entries cannot be 1 for 32Mbyte transfer size?

Thanks

Regards
YE

AastaLLL · October 18, 2022, 7:07am

Hi,

The mapping size should be multiple of 4K.
You can find some discussion in the below topic:

Thanks.

yan-eng.ang · October 20, 2022, 7:41am

Hi,

I managed to perform RDMA on Orin board using FPGA sending 4K RGBA image. Here are the draft performance values.

Both directions same performance. Around 21ms (47FPS)

Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
c2h Bytes:33177600 usecs:20897 MB/s:1587.672872
h2c Bytes:33177600 usecs:20799 MB/s:1595.153613
ioctl(UNPIN_CUDA buf) passed: 0
ioctl(UNPIN_CUDA buf) passed: 0

When executing RDMA on dGPU, the values are 13.8ms on both directions.

I also use zero copy/unified memory on Orin hardware and the transfer values also around 47FPS.

Can I conclude that using RDMA/zero copy/unified memory data transfer on Orin will result same performance?

Thank you for your help and assistance.

Regards
YE

AastaLLL · October 21, 2022, 2:33am

Hi,

We are double-confirming this with the internal.

Thanks.

yan-eng.ang · October 21, 2022, 3:05am

Hi,

I try to load nvidia.ko and nvidia-p2p.ko together using the suggested modification to have display kernel as well.

I run the RDMA test again. This time is ~13.8ms comparable to dGPU on x86.

Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
c2h Bytes:33177600 usecs:13895 MB/s:2387.736596
h2c Bytes:33177600 usecs:13814 MB/s:2401.737368
ioctl(UNPIN_CUDA buf) passed: 0
ioctl(UNPIN_CUDA buf) passed: 0

Thanks again for your help

Regards
YE

AastaLLL · October 24, 2022, 2:16am

Hi,

Good to know you can get comparable performance now.

We also confirmed this with our internal team.
Since RDMA/ZeroCopy/Unified are all using system memory, it is expected that the performance to be similar.

Thanks.

yan-eng.ang · October 24, 2022, 2:53am

Hi,

Ok thanks for the confirmation.

Need some clarification on iGPU and dGPU on Jetson Orin hardware.

Orin can connect dGPU on PCIe interface, correct?
If using dGPU on Jetson Orin, iGPU cannot be used right?
If we got external PCIe device with video traffic accessing GPU memory using RDMA, which is better? iGPU or dGPU?
Most dGPU has only 256MB PCIe BAR 1 memory, performing RDMA on GPU memory may not be enough. Can BAR1 memory be increased??

Thank you.

Regards
YE

AastaLLL · October 27, 2022, 6:50am

Hi,

Please check below topic for more information:

Thanks

system · November 16, 2022, 5:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPUDirect RDMA - Module can not be insert into kernel Jetson AGX Orin pcie , kernel , nvbugs	27	4304	November 2, 2022
Support GPUDirect RDMA on Jetson AGX Orin development kit Jetson AGX Orin cuda	9	1060	April 26, 2023
PCIe DMA driver can not be loaded Jetson AGX Orin pcie	9	1606	August 31, 2022
GPUDirect RDMA - Module can not be insert into kernel cont'd Jetson AGX Orin gpu	18	895	May 15, 2024
GPUDirect RDMA on NVIDIA Jetson AGX Xavier Technical Blog	1	834	June 12, 2019
Can Orin PCIe endpoint receive rendered graphics from GPU under Windows control? Jetson AGX Orin pcie , gpu-computing	19	1527	October 17, 2023
Error when trying to write data to GPU DMA memory (using GPU Direct RDMA) Jetson AGX Xavier pcie , kernel , fpga	8	1398	May 30, 2023
Issues porting desktop RDMA app to Tegra: mmap hangs kernel Jetson AGX Xavier cuda	11	1480	April 1, 2022
GPUDirect RDMA - use Jetson's DMA Jetson AGX Orin gpu-computing	15	333	July 12, 2024
RDMA - PCIe module can not be inserted into kernel Jetson AGX Orin pcie	2	1076	February 21, 2023

GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages)

Related topics