Problem with DOCA GPU packet processing

I tried to run the gpu_packet_processing example application described in the DOCA GPU Packet Processing Application Guide, but the packet processing on the cuda kernel could not be confirmed.

I am using an x86_64 system with A100X converged cards. I have set them up following the “Application on Host CPU / DPU Converged Accelerator in NIC mode” option for BlueField-2, referring the DOCA Installation Guide for Linux and DOCA GPUNetIO.

The app launched successfully and appeared to be waiting to receive packets with the legacy nvidia-peermem mode:

$ sudo ./doca_gpu_packet_processing -g 24:00.0 -n 21:00.0 -q 1 -l 70 
[09:12:48:729502][5156][DOCA][INF][gpu_packet_processing.c:296][main] ===========================================================
[09:12:48:729558][5156][DOCA][INF][gpu_packet_processing.c:297][main] DOCA version: 2.9.0072
[09:12:48:729572][5156][DOCA][INF][gpu_packet_processing.c:298][main] ===========================================================
[09:12:48:729607][5156][DOCA][INF][gpu_packet_processing.c:319][main] Options enabled:
        GPU 24:00.0
        NIC 21:00.0
        GPU Rx queues 1
        GPU HTTP server enabled No
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:21:00.0 (socket 0)
EAL: Probe PCI driver: gpu_cuda (10de:20b8) device: 0000:24:00.0 (socket 0)
[09:12:49:618255][5156][DOCA][WRN][engine_model.c:92][adapt_queue_depth] adapting queue depth to 128.
[09:12:50:399201][5156][DOCA][INF][udp_queues.c:57][create_udp_queues] Creating UDP Eth Rxq 0
[09:12:50:399898][5156][DOCA][INF][udp_queues.c:133][create_udp_queues] Mapping receive queue buffer (0x0x7f2980000000 size 536870912B dmabuf fd 582) with dmabuf mode
[09:12:50:413059][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:415103][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:416707][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:417510][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:422292][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:447916][5156][DOCA][DBG][flow.c:279][create_udp_pipe] Created Pipe GPU_RXQ_UDP_PIPE
[09:12:50:447939][5156][DOCA][INF][tcp_queues.c:65][create_tcp_queues] Creating TCP Eth Rxq 0
[09:12:50:448431][5156][DOCA][INF][tcp_queues.c:141][create_tcp_queues] Mapping receive queue buffer (0x0x7f295e000000 size 536870912B dmabuf fd 588) with dmabuf mode
[09:12:50:459677][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:462302][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:464176][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:465091][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:468951][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:493035][5156][DOCA][DBG][flow.c:531][create_tcp_gpu_pipe] Created Pipe GPU_RXQ_TCP_PIPE
[09:12:50:493056][5156][DOCA][INF][icmp_queues.c:58][create_icmp_queues] Creating ICMP Eth Rxq 0
[09:12:50:493400][5156][DOCA][INF][icmp_queues.c:134][create_icmp_queues] Mapping receive queue buffer (0x0x7f297ea00000 size 8388608B dmabuf fd 594) with dmabuf mode
[09:12:50:498954][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:500767][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:502475][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:503365][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:507151][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:510793][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:533412][5156][DOCA][DBG][flow.c:641][create_icmp_gpu_pipe] Created Pipe GPU_RXQ_ICMP_PIPE
[09:12:50:563379][5156][DOCA][DBG][flow.c:863][create_root_pipe] Created Pipe ROOT_PIPE
[09:12:50:563977][5156][DOCA][INF][gpu_packet_processing.c:452][main] Warm up CUDA kernels
[09:12:50:578529][5156][DOCA][INF][gpu_packet_processing.c:467][main] Launching CUDA kernels
[09:12:50:578649][5156][DOCA][INF][gpu_packet_processing.c:505][main] Waiting for termination
[09:12:50:578643][5164][DOCA][INF][gpu_packet_processing.c:139][stats_core] Core 1 is reporting filter stats

Seconds 5
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0

However, when I tried with ICMP requests from the external host, it did not send any responses. I also tested UDP and TCP using dig and curl, but there was no response either.

The above case is for Linux kernel 5.25 + nvidia-peermem mode, but the situation was the same for Linux kernel 6.2 + DMA-BUF mode.

The only difference from the installation guide is that I couldn’t set SetValid - in AcsCtl. I also failed to find the setting option of Access Control Services in the BIOS menu.

$ lspci -vvvt | egrep 'Mellanox|NVIDIA'
 |           \-02.0-[1f-24]----00.0-[20-24]--+-00.0-[21]--+-00.0  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
 |                                           |            +-00.1  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
 |                                           |            \-00.2  Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
 |                                           \-01.0-[22-24]----00.0-[23-24]----08.0-[24]----00.0  NVIDIA Corporation Device 20b8
$ sudo setpci -s 20:00.0 ECAP_ACS+6.w=0:fc
$ sudo lspci -s 20:00.0 -vvvv | grep -i ACSCtl
                ACSCtl: SrcValid+ TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

However, I have confirmed successfully that GPUDirect RDMA worked correctly using ib_write_bw with --use_cuda and either with and without --use_cuda_dmabuf, so I believe that direct communication between the BF2 (CX-6DX) and A100 has no issue.

Can anyone help?

doca_gpu_packet_processing_logs.zip (23.7 KB)

1.Use GPU direct need NIC/GPU same PCIE bridge and disable ACS try below,

sudo setpci -s 03:00.0 ECAP_ACS+0x6.w=0000

2.check is IOMMU on, it is better turn off.

3.If you use DMABUF need open source GPU driver, check on NVIDIA github.

And please check route no issue on NIC ip.