Host Machine Version
native Ubuntu Linux 24.04 Host installed with DRIVE OS Docker Containers
other
Issue Description
Hello,
I’m looking into the NVIDIA DRIVE Thor platform and how it works with Holoscan SDK, especially for high-speed, low-latency data transfer.
From what I understand, one of Holoscan’s key strengths is using GPUDirect RDMA for fast and zero-copy I/O. However, as far as I know, GPUDirect RDMA typically requires a Smart NIC like the ConnectX series that can offload network processing.
I have a few questions that I hope someone here might be able to clarify:
The DRIVE Thor platform has multiple Network Interfaces — are any of them considered Smart NICs?
In other words, do any of these interfaces perform hardware offload for network processing (similar to ConnectX)?
If the network interfaces are not Smart NICs, can we still expect GPUDirect RDMA-level performance?
Specifically, is it possible for the network drivers to feed data directly to the single GPU on Thor without any intermediate copies?
Even if full RDMA isn’t supported, is there any mechanism (like GPUDirect v1 / Gen1) to reduce data copies between the network driver and the GPU driver?
I’m also interested in Magnum I/O, especially for direct data transfers between NVMe SSDs and GPUs.
Is this feature already integrated into Holoscan on DRIVE Thor, or are there plans to include it?
Any insights, documentation links, or experiences from others working with DRIVE Thor + Holoscan would be really helpful.
Thank you very much for your prompt reply.
I’ve been believing that DRIVE Thor and Jetson Thor are quite similar products.
As you wrote, there is no mention about DRIVE Thor in Holoscan SDK document.
Isn’t it just because DRIVE Thor is quite new to the market and the document is out of date?
Is there any clear reason why Holoscan SDK can’t be used on DRIVE Thor?
For example, lack of mandatory hardware/firmware/driver, NVIDIA’s marketing policy, etc.
Even if Holoscan SDK can’t be used on DRIVE Thor, I’d like to ask the same questions about Jetson Thor.
Thank you for clarification about the difference between DRIVE Thor and Jetson Thor.
I had been believing that they are similar products for different markets, but I was totally wrong.
I’ll appreciate it very much if you could answer my final questions.
Q1.
Is “GPU Direct RDMA” available on DRIVE Thor?
From the information on 1. Overview — GPUDirect RDMA 13.0 documentation , I learned the following facts.
a. “GPU Direct RDMA” supports Drive AGX Xavier since CUDA 11.2.
b. “GPU Direct RDMA for Tegra family” is a bit different from “GPU Direct RDMA for PCIe-based x86_64 architecture”, with which I have some experience.
c. NV-P2P API support is deprecated for Nvidia Blackwell-based SoCs and new “GPU Direct RDMA” implementations should use the Linux upstream kernel API.
Q2.
Does the native network driver for DRIVE Thor employ any “GPU Direct based” technology?
For example, can the network driver feed data stream directly to the GPU without copying?
This question is what I wanted to ask in question 2 & 3 of my original post.
Q3.
Is “GPU Direct Storage (formerly Magnum I/O)” available on DRIVE Thor?
I understand that DRIVE Thor’s CPU and GPU share the same memory and the same page table (Unified Access Memory).
With that hardware architecture, I’m afraid that the merit of “GPU Direct Storage” is smaller than its merit with PCIe-based x86_64 architecture, but there should be some room for reducing copying of data between the storage driver and the GPU.