I tried to run the gpu_packet_processing
example application described in the DOCA GPU Packet Processing Application Guide, but the packet processing on the cuda kernel could not be confirmed.
I am using an x86_64 system with A100X converged cards. I have set them up following the “Application on Host CPU / DPU Converged Accelerator in NIC mode” option for BlueField-2, referring the DOCA Installation Guide for Linux and DOCA GPUNetIO.
The app launched successfully and appeared to be waiting to receive packets with the legacy nvidia-peermem mode:
$ sudo ./doca_gpu_packet_processing -g 24:00.0 -n 21:00.0 -q 1 -l 70
[09:12:48:729502][5156][DOCA][INF][gpu_packet_processing.c:296][main] ===========================================================
[09:12:48:729558][5156][DOCA][INF][gpu_packet_processing.c:297][main] DOCA version: 2.9.0072
[09:12:48:729572][5156][DOCA][INF][gpu_packet_processing.c:298][main] ===========================================================
[09:12:48:729607][5156][DOCA][INF][gpu_packet_processing.c:319][main] Options enabled:
GPU 24:00.0
NIC 21:00.0
GPU Rx queues 1
GPU HTTP server enabled No
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:21:00.0 (socket 0)
EAL: Probe PCI driver: gpu_cuda (10de:20b8) device: 0000:24:00.0 (socket 0)
[09:12:49:618255][5156][DOCA][WRN][engine_model.c:92][adapt_queue_depth] adapting queue depth to 128.
[09:12:50:399201][5156][DOCA][INF][udp_queues.c:57][create_udp_queues] Creating UDP Eth Rxq 0
[09:12:50:399898][5156][DOCA][INF][udp_queues.c:133][create_udp_queues] Mapping receive queue buffer (0x0x7f2980000000 size 536870912B dmabuf fd 582) with dmabuf mode
[09:12:50:413059][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:415103][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:416707][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:417510][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:422292][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:447916][5156][DOCA][DBG][flow.c:279][create_udp_pipe] Created Pipe GPU_RXQ_UDP_PIPE
[09:12:50:447939][5156][DOCA][INF][tcp_queues.c:65][create_tcp_queues] Creating TCP Eth Rxq 0
[09:12:50:448431][5156][DOCA][INF][tcp_queues.c:141][create_tcp_queues] Mapping receive queue buffer (0x0x7f295e000000 size 536870912B dmabuf fd 588) with dmabuf mode
[09:12:50:459677][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:462302][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:464176][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:465091][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:468951][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:493035][5156][DOCA][DBG][flow.c:531][create_tcp_gpu_pipe] Created Pipe GPU_RXQ_TCP_PIPE
[09:12:50:493056][5156][DOCA][INF][icmp_queues.c:58][create_icmp_queues] Creating ICMP Eth Rxq 0
[09:12:50:493400][5156][DOCA][INF][icmp_queues.c:134][create_icmp_queues] Mapping receive queue buffer (0x0x7f297ea00000 size 8388608B dmabuf fd 594) with dmabuf mode
[09:12:50:498954][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:500767][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:502475][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:503365][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:507151][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:510793][5156][DOCA][WRN][linux_devx_adapter.cpp:324][umem_reg] devx adapter 0x55ca2c7d5270: Registration using dmabuf is not supported, falling back to legacy registration
[09:12:50:533412][5156][DOCA][DBG][flow.c:641][create_icmp_gpu_pipe] Created Pipe GPU_RXQ_ICMP_PIPE
[09:12:50:563379][5156][DOCA][DBG][flow.c:863][create_root_pipe] Created Pipe ROOT_PIPE
[09:12:50:563977][5156][DOCA][INF][gpu_packet_processing.c:452][main] Warm up CUDA kernels
[09:12:50:578529][5156][DOCA][INF][gpu_packet_processing.c:467][main] Launching CUDA kernels
[09:12:50:578649][5156][DOCA][INF][gpu_packet_processing.c:505][main] Waiting for termination
[09:12:50:578643][5164][DOCA][INF][gpu_packet_processing.c:139][stats_core] Core 1 is reporting filter stats
Seconds 5
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
However, when I tried with ICMP requests from the external host, it did not send any responses. I also tested UDP and TCP using dig and curl, but there was no response either.
The above case is for Linux kernel 5.25 + nvidia-peermem mode, but the situation was the same for Linux kernel 6.2 + DMA-BUF mode.
The only difference from the installation guide is that I couldn’t set SetValid
- in AcsCtl. I also failed to find the setting option of Access Control Services
in the BIOS menu.
$ lspci -vvvt | egrep 'Mellanox|NVIDIA'
| \-02.0-[1f-24]----00.0-[20-24]--+-00.0-[21]--+-00.0 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
| | +-00.1 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
| | \-00.2 Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
| \-01.0-[22-24]----00.0-[23-24]----08.0-[24]----00.0 NVIDIA Corporation Device 20b8
$ sudo setpci -s 20:00.0 ECAP_ACS+6.w=0:fc
$ sudo lspci -s 20:00.0 -vvvv | grep -i ACSCtl
ACSCtl: SrcValid+ TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
However, I have confirmed successfully that GPUDirect RDMA worked correctly using ib_write_bw
with --use_cuda
and either with and without --use_cuda_dmabuf
, so I believe that direct communication between the BF2 (CX-6DX) and A100 has no issue.
Can anyone help?
doca_gpu_packet_processing_logs.zip (23.7 KB)