I got seg fault with l2fwd-nv when using option -m 1 and -w 0 (or 2).
I want to test GPUDirect RDMA between the GPU and the ConnectX-4. The Rx/Tx packets should be DMA directly to/from GPU memory. Everything works fine when the split buffer mode (-s) is used. The seg fault occurs when DPDK wants to send the packet, for some reason it deferences a GPU memory address and it seg fault.
My understanding is that the CPU should only be used for orchestration and control. I don’t understand why the current code does access to the GPU memory. The entire packet (header and payload) is located in GPU memory, so the PMD driver should only manipulate the Tx buffer descriptors and the NIC should perform the DMA by directly accessing the GPU memory, right?
Command line used:
sudo ./l2fwdnv -l 0-9 -n 4 -a 03:00.1,txq_inline_max=0 -a 55:00.0 – -m 1 -w 0 -b 64 -p 1 -v 0 -z 0
Devices info:
(base) robert@delta:~$ lspci | grep NVIDIA
55:00.0 VGA compatible controller: NVIDIA Corporation AD107GL [RTX 2000 Ada Generation] (rev a1)
55:00.1 Audio device: NVIDIA Corporation Device 22be (rev a1)
a2:00.0 VGA compatible controller: NVIDIA Corporation AD107GL [RTX 2000 Ada Generation] (rev a1)
a2:00.1 Audio device: NVIDIA Corporation Device 22be (rev a1)
(base) robert@delta:~$ lspci | grep Mel
03:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
03:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]