Hi,
I have followed the below link and resolve nvidia-p2p lib issue.
This is a followup of PCIe DMA driver can not be loaded
I installed a fresh install on the Jetson Orin with Jetpack 5.0.2.
The file /etc/nv_tegra_release has the following content: # R35 (release), REVISION: 1.0, GCID: 31346300, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 25 18:41:45 UTC 2022.
I build my custom kernel module which uses the direct DMA transfers from the PCIe card to the memory space of the GPU (GPUDirect RDMA).
But it is not possible to insert that module, as the following e…
When I perform gpudirect rdma to iGPU memory with size of 33177600bytes, I find that nvidia_p2p_dma_map_pages function will return only 1 entry (pages) as shown below:
[27983.043212] xdma:pevb_get_userbuf_cuda: before nvidia_p2p_dma_map_pages
[27983.043749] xdma:pevb_get_userbuf_cuda: ubuf->map->entries = 1
[27983.043751] xdma:pevb_get_userbuf_cuda: cusurf->offset = 0
[27983.043752] xdma:pevb_get_userbuf_cuda: cusurf->len = 33177600
When I perform gpudirect rdma(x86 PC) to RTX 4000 Quadro memory with size of 33177600bytes, nvidia_p2p_dma_map_pages function will return correct no of entries (pages). Everything is working fine on x86 system using GPUdirect rdma functions.
Any idea how to resolve this?
Regards
YE
Hi,
Have you tried the same on Xavier before?
The APIs between Jetson and desktop GPU is slightly different.
Thanks.
Hi,
I do not have Xavier board, only Orin.
I change the API to Tegra as show below.
I followed the code closely using this link
/*
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/cdev.h>
#include <linux/idr.h>
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
#include <linux/pci.h>
This file has been truncated. show original
One thing I note is once the hardware change to Jetson, the page will change to 4K as shown below
#ifdef NV_BUILD_DGPU
#define GPU_PAGE_SHIFT 16
#else
#define GPU_PAGE_SHIFT 12
#endif
#define GPU_PAGE_SIZE (((u64)1) << GPU_PAGE_SHIFT)
#define GPU_PAGE_OFFSET (GPU_PAGE_SIZE - 1)
#define GPU_PAGE_MASK (~GPU_PAGE_OFFSET)
Regards
YE
Hi,
Thanks for the details.
Let us check with the dev team about this issue.
Will share more information with you later.
Hi,
It’s expected that RDMA can work on Orin like Xavier.
Since there are some differences in API between dGPU and iGPU, please check if you have applied all the requirements shown in the porting document below:
Thanks.
Hi,
Ok will check again.
Can I confirm that nvidia_p2p_dma_mapping **dma_mapping → entries cannot be 1 for 32Mbyte transfer size?
Thanks
Regards
YE
Hi,
The mapping size should be multiple of 4K.
You can find some discussion in the below topic:
Hi,
Thanks for your patience.
Here are some of the suggestions:
1.
The input physical address for io_remap_pfn_range should be address of struct page on Jetson.
Please ensure to compile the sources with the Jetson version of nv-p2p.h.
Something like:
#if (defined(CONFIG_ARM64) && defined(CONFIG_ARCH_TEGRA))
dmachan->reader_addr[j] = page_to_pfn((struct page*)nvp);
#else
dmachan->reader_addr[j] = (uint32_t*)(nvp->physical_address + offset);
#endif
It looks like the nv-p2p.h header…
Thanks.
Hi,
I managed to perform RDMA on Orin board using FPGA sending 4K RGBA image. Here are the draft performance values.
Both directions same performance. Around 21ms (47FPS)
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
c2h Bytes:33177600 usecs:20897 MB/s:1587.672872
h2c Bytes:33177600 usecs:20799 MB/s:1595.153613
ioctl(UNPIN_CUDA buf) passed: 0
ioctl(UNPIN_CUDA buf) passed: 0
When executing RDMA on dGPU, the values are 13.8ms on both directions.
I also use zero copy/unified memory on Orin hardware and the transfer values also around 47FPS.
Can I conclude that using RDMA/zero copy/unified memory data transfer on Orin will result same performance?
Thank you for your help and assistance.
Regards
YE
Hi,
We are double-confirming this with the internal.
Thanks.
Hi,
I try to load nvidia.ko and nvidia-p2p.ko together using the suggested modification to have display kernel as well.
I run the RDMA test again. This time is ~13.8ms comparable to dGPU on x86.
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
Allocation of GPU buffer passed: 0
cuPointerSetAttribute(buf) passed: 0
ioctl(PIN_CUDA buf) passed: ret=0 errno=17
c2h Bytes:33177600 usecs:13895 MB/s:2387.736596
h2c Bytes:33177600 usecs:13814 MB/s:2401.737368
ioctl(UNPIN_CUDA buf) passed: 0
ioctl(UNPIN_CUDA buf) passed: 0
Thanks again for your help
Regards
YE
Hi,
Good to know you can get comparable performance now.
We also confirmed this with our internal team.
Since RDMA/ZeroCopy/Unified are all using system memory, it is expected that the performance to be similar.
Thanks.
Hi,
Ok thanks for the confirmation.
Need some clarification on iGPU and dGPU on Jetson Orin hardware.
Orin can connect dGPU on PCIe interface, correct?
If using dGPU on Jetson Orin, iGPU cannot be used right?
If we got external PCIe device with video traffic accessing GPU memory using RDMA, which is better? iGPU or dGPU?
Most dGPU has only 256MB PCIe BAR 1 memory, performing RDMA on GPU memory may not be enough. Can BAR1 memory be increased??
Thank you.
Regards
YE
Hi,
Please check below topic for more information:
Of course I would need to provide external power for the GPU, but I was wondering if it is even possible to use the PCI slot to connect an external GPU to the Orin for potentially unlimited AI power.
Thanks
system
Closed
November 16, 2022, 5:33am
16
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.