Suppose we have a standard PC with Intel CPU running Windows, rendering graphics on Nvidia GPU card and there is also Orin AGX as PCIe endpoint. Can we transfer rendered frames from GPU to Orin RAM via GPU Direct RDMA?
If yes, who should control the transfer - Windows host or Orin Linux? Are there existing drivers or user space APIs for that?
Do you indicate this PCIe endpoint mode?
On Jetson, the RDMA mechanism should look like as below:
Yes, I mean Orin as Pci Endpoint.
However, the jetson-rdma-picoevb project appears to be different from my needs - it uses FPGA as DMA controller, never mentions Windows and it is not clear who controls a dGPU.
I was told that Orin cannot control dGPU and iGPU at the same time, right?
Since I am using iGPU, then my Orin cannot control dGPU.
So, I was hoping that, may be, Windows host can control dGPU, as it normally does, and then transfer buffers to Orin. Can this be done?
Orin cannot be connected to a desktop machine through PCIe.
So it’s not possible to share a dGPU buffer with Orin.
I will add that when PCIe is used to share, then typically it is with a discrete GPU (dGPU) on the PCI bus itself. Jetsons all have a GPU which is integrated directly to the memory controller (iGPU). Even if PCI is shared, it wouldn’t be so simple as with a dGPU.
“Orin cannot be connected to a desktop machine through PCIe.”
This statement seem to contradict the documentation:
"To use PCIe endpoint support, you need the following hardware:
Any Jetson Xavier NX, AGX Xavier, or Orin series device, which is running Jetson Linux, to act as the PCIe endpoint.
Another computer system to act as the PCIe root port … any standard x86-64 PC that is running Linux.
So, when you wrote that Orin cannot be connected to a desktop machine, did you specifically mean Windows PC and not Linux PC?
Will Linux PC as a root port and Orin as endpoint work, as documentation say?
Second, assuming Linux PC root / Orin endpoint work in general, is there any way to transfer frame buffers between GPUs?
We were planning to use Orin’s iGPU for camera ISP and PC/Windows dGPU for UI rendering. And they need to merge at some point.
Can GPUDirect RDMA be used somehow, either on PC to transfer buffer from Orin or vice versa?
By default Orin developer kit cannot be connected to PCIe slot in x86 host PC and an adapter is required. Is it good for you to use Linux host?
And the Linux host can access PCIe memory, but cannot control the GPU engine in Orin. Please note this.
I never mentioned developer kit in this question.
We are trying to make our own board (in PCIe daughter board form) based on Orin AGX module and I am trying to understand what is possible and what is not before finalizing hardware design.
We would prefer to use Windows host because WPF appears to be superior for GUI design than Linux based libraries and we have significant code base for Windows.
What would happen if I plug Orin-based daughter board into Windows host? Will it be recognized by default Windows drivers? Can, theoretically, a custom Windows driver be developed to map Orin’s memory?
I see a similar technology “NVIDIA GPUDirect for Video”:
“NVIDIA® GPUDirect® for Video technology helps IO board manufacturers write device drivers that efficiently transfer video frames in and out of NVIDIA GPU memory.”
This seem to describe our architecture perfectly: Orin is the “IO board” and it should transfer data to/from Nvidia GPU under Windows control.
Can this work?
For now there is no Windows driver for the use-case so please consider use Linux host. We will check possibility to support further use-case in the future.
OK, suppose I use Linux host.
How would that work? Will Linux host control dGPU and initiate GPUDirect DMA transfers to/from Orin? Or vice versa?
Also, suppose there is a second PCIe endpoint device in the same PC, like an FPGA board described in picoevb project. Can that FPGA endpoint access Orin endpoint using Peer-to-Peer or PCIe option without needing Windows host driver?
RDMA allows PCIe devices to directly access the GPU memory.
For DMA, you can check if GPUDirect Storage, which is not available on Jetson, can meet your requirements.
For the second question, do you mean both FPGA and Orin are connected to a host through PCIe?
I am just trying to imagine all possible configurations and see what is possible.
If the main PC CPU is running Windows and cannot access Orin on a PCIe daughterboard,
then can another daughterboard access Orin via PCIe Peer-to-Peer option?
The jetson-rdma-picoevb project, which you mentioned above, is for FPGA daughterboard, right?
So, what would happen if I plug these to PCIe bus on standard PC:
- Orin daughterboard
- FPGA daughterboard
- Nvidia dGPU/graphics card
Can that FPGA access Orin and/or dGPU in this configuration?
On second thought, forget about FPGA.
What if I plug 2 Orin PCIe daughter boards to a PC, both configured as endpoints.
Can they access each other via PCIe Peer-to-Peer option?
Can they access a dGPU, which is also a PCIe daughter board?
We need to check this with our internal team.
Will share feedback with you later.
Thanks for your patience.
While we are still discussing with our internal team, does a software solution work for you?
Or you need a direct access solution without copying?
I am not sure what do you mean as software solution.
Reserving an Orin core to copy frames in a loop, one word at a time? Or calling an API like cudaMemcpy?
We are trying to finalize the design for our orin-based board and trying to understand what features are possible
before placing them in HW, where any change later will be very costly.
So, if PCIe is not really supported, as we thought it is, then we will need to abandon it
and make some other HW transfer, such as display-port-MIPI-capture-card or Ethernet or something like that.
This is really unfortunate, that Nvidia does not maintain a simple support/compatibility table
for all possible combinations, such as Orin-pcie-root vs Orin-pcie-endpoint,
Linux-pcie-host vs Windows-pcie-host, dGPU vs iGPU. RDMA vs cudaMemcpy , etc.
Otherwise it takes days and weeks to find feasibility of each particular use case
and the results are often contradictory, for example “GPUDirect for Video” is supposed to work on Windows Host,
but you are saying it cannot work with Orin, etc, etc
NvStreams support streaming data between Orin<->Orin and x86<->Orin.
The flow looks like below:
- Set up NvStream CUDA producer on Desktop & CUDA consumer on Jetson.
- Alloc CUDA memory on dGPU and run kernel on dGPU on the above memory
- Once that is complete, send the data over nvstream (this would be a copy over PCIe).
- Read the buffer on Jetson (could be Cuda on iGPU or whatever).
Does this workflow meet your requirements?
May be it will work, thank you for the information.
I found it hard to find documentation and code samples for NvStreams except in Drive OS SDK
but is drive-os doc applicable to Jetson? Or to Windows Desktop Nvidia libraries?
And then there is document which you suggested: https://developer.nvidia.com/embedded/downloads#?search=nvsci
Is it applicable to Jetson and Desktop APIs?
On Jetson, NvStream is included in the NvSCI library.
There is no online document right now. Please use the pdf document instead.
The NvSCI package for both x86 and Jetson can be found in SDKmanager or below link:
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.