Issue Description: Dear Team,
We are currently working on establishing a high-speed data interface between the Jetson Thor board and our FPGA-based custom board using a QSFP28 (100G) link.
On the FPGA side, we are using Vivado with the CMAC IP core to implement the 100G QSFP28 link, and the design has been successfully programmed into the FPGA.
Our intended workflow is as follows:
-
Stream raw IQ sample data from the FPGA board to the Jetson Thor over the QSFP28 interface.
-
Perform live 5G physical layer (L1) inference on the Jetson Thor.
-
Send the updated model or processed results back to the FPGA board over the same QSFP28 interface, where the 5G L1 stack is running.
We are seeking guidance on the most efficient method to:
-
Capture and store high-throughput raw IQ samples arriving from the FPGA to the Jetson Thor.
-
Run real-time inference on the Thor platform.
-
Transmit the processed data or updated model back to the FPGA via the same QSFP28 interface.
We have explored DPDK as a potential solution for high-speed packet capture and transmission. However, since DPDK consumes a significant portion of CPU and RAM resources, we would like to evaluate alternative approaches.
Specifically, we would like to know:
-
Are there any NVIDIA kernel-level drivers or libraries optimized for high-throughput packet processing on Jetson platforms?
-
Is there a GPUDirect-based solution that would allow direct data movement to GPU memory for inference?
-
What would be the recommended architecture for achieving low-latency, high-bandwidth streaming and bidirectional data exchange in this setup?
We would greatly appreciate your recommendations on the most efficient and scalable solution for this application.