Dear Nvidia support team,
I am following the tutorial to run gpu_packet_processing.
My hardware spec is given as follows:
GPU: RTX A6000
DPU: BlueField2 MBF2H532C-AECOT
CPU: 13th Gen intel core i7-13700K
Motherboard: ASUS ROG STRIX B660-A GAMING WIFI D4
OS is ubuntu server 22.04, and kernel is 5.15.0-1042-nvidia-lowlatency.
My CUDA version is 12.5, DOCA version is 2.5.0.
gpu_packet_processing can run, but it just receives nothing when I ping the interface that has PCIe address 04:00.0. The printouts are given as follows.
sudo ./doca_gpu_packet_processing -g 01:00.0 -n 04:00.0 -q 2 -l 60
[07:48:12:287398][8984][DOCA][INF][gpu_packet_processing.c:279][main] ===========================================================
[07:48:12:287422][8984][DOCA][INF][gpu_packet_processing.c:280][main] DOCA version: 2.5.0108
[07:48:12:287427][8984][DOCA][INF][gpu_packet_processing.c:281][main] ===========================================================
[07:48:12:287440][8984][DOCA][INF][gpu_packet_processing.c:302][main] Options enabled:
GPU 01:00.0
NIC 04:00.0
GPU Rx queues 2
GPU HTTP server enabled No
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[07:48:14:390985][8984][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[07:48:14:920072][8984][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
[07:48:14:956873][8984][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 0
[07:48:14:956995][8984][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd75c000000 size 536870912B) with nvidia-peermem mode
[07:48:15:048133][8984][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 1
[07:48:15:048189][8984][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd73a000000 size 536870912B) with nvidia-peermem mode
[07:48:15:149374][8984][DOCA][DBG][flow.c:196][create_udp_pipe] Created Pipe GPU_RXQ_UDP_PIPE
[07:48:15:149379][8984][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 0
[07:48:15:149465][8984][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd718000000 size 536870912B) with nvidia-peermem mode
[07:48:15:236185][8984][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 1
[07:48:15:236239][8984][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd6f6000000 size 536870912B) with nvidia-peermem mode
[07:48:15:336600][8984][DOCA][DBG][flow.c:368][create_tcp_gpu_pipe] Created Pipe GPU_RXQ_TCP_PIPE
[07:48:15:336604][8984][DOCA][INF][icmp_queues.c:42][create_icmp_queues] Creating ICMP Eth Rxq 0
[07:48:15:336656][8984][DOCA][INF][icmp_queues.c:88][create_icmp_queues] Mapping receive queue buffer (0x0x7fd716a00000 size 8388608B) with nvidia-peermem mode
[07:48:15:371549][8984][DOCA][DBG][flow.c:436][create_icmp_gpu_pipe] Created Pipe GPU_RXQ_ICMP_PIPE
[07:48:15:393726][8984][DOCA][DBG][flow.c:567][create_root_pipe] Created Pipe ROOT_PIPE
[07:48:15:393878][8984][DOCA][INF][gpu_packet_processing.c:415][main] Warm up CUDA kernels
[07:48:15:398773][8984][DOCA][INF][gpu_packet_processing.c:430][main] Launching CUDA kernels
[07:48:15:398789][8996][DOCA][INF][gpu_packet_processing.c:125][stats_core] Core 1 is reporting filter stats
[07:48:15:398791][8984][DOCA][INF][gpu_packet_processing.c:463][main] Waiting for termination
Seconds 5
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[UDP] QUEUE: 1 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
[TCP] QUEUE: 1 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
I have checked the setups of BAR1 size and hugepages. They seem okay as well as follows.
nvidia-smi -q | grep -i bar -A 3
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 15
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 16777216 kB
I also notice that, when gpu_packet_processing is NOT running, ping the NIC is okay. When gpu_packet_processing IS running, ping just stops, receieves no response. But I see no packets being processing by gpu_packet_processing either.
As soon as gpu_packet_processing stops, ping becomes normal again.
I notice there was one post facing the similar problem, but it was using dmabuf. I use nvidia_peermem in my case. The PCIe setup is given as follows.
lspci -tvvv
-[0000:00]-+-00.0 Intel Corporation Device a703
+-01.0-[01]--+-00.0 NVIDIA Corporation GA102GL [RTX A6000]
| \-00.1 NVIDIA Corporation GA102 High Definition Audio Controller
+-02.0 Intel Corporation Device a780
+-06.0-[02]----00.0 Kingston Technology Company, Inc. Device 5013
+-0a.0 Intel Corporation Device a77d
+-0e.0 Intel Corporation Device a77f
+-14.0 Intel Corporation Device 7a60
+-14.2 Intel Corporation Device 7a27
+-14.3 Intel Corporation Device 7a70
+-15.0 Intel Corporation Device 7a4c
+-15.1 Intel Corporation Device 7a4d
+-15.2 Intel Corporation Device 7a4e
+-16.0 Intel Corporation Device 7a68
+-17.0 Intel Corporation Device 7a62
+-1a.0-[03]--
+-1c.0-[04]--+-00.0 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
| +-00.1 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
| \-00.2 Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
+-1d.0-[05]----00.0 Intel Corporation Device 125c
sudo lspci -s 04:00.0 -vvvv | grep -i ACSCtl
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Is this PCIe setup correct?
I also compiled the GPUNetIO samples of āgpunetio_send_wait_timeā and āgpunetio_simple_receiveā.
For gpunetio_simple_receive, it just stops there as follows and receives nothing when I ping the NIC interface.
sudo ./doca_gpunetio_simple_receive -g 01:00.0 -n 04:00.0
[08:45:43:970661][9241][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[08:45:45:925724][9241][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
GPU 01:00.0
NIC 04:00.0
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:45:46:067284][9241][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[08:45:46:588660][9241][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:45:46:627461][9241][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq
[08:45:46:627552][9241][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f388a000000 size 33554432B) with nvidia-peermem mode
[08:45:46:683412][9241][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[08:45:46:688556][9241][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination
For gpunetio_send_wait_time, the synchronization and phc2sys service are confirmed. The printouts also look okay as follows, but still capture nothing with tcpdump when sending packets.
sudo ./doca_gpunetio_send_wait_time -g 01:00.0 -n 04:00.0 -t 500000
[08:49:07:309016][9288][DOCA][INF][gpunetio_send_wait_time_main.c:195][main] Starting the sample
[08:49:09:246703][9288][DOCA][INF][gpunetio_send_wait_time_main.c:224][main] Sample configuration:
GPU 01:00.0
NIC 04:00.0
Timeout 500000ns
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:49:09:414376][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:633][gpunetio_send_wait_time] Wait on time supported mode: DPDK
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:49:09:419372][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:278][create_tx_buf] Mapping send queue buffer (0x0x7efda2693000 size 262144B) with legacy nvidia-peermem mode
[08:49:09:420426][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:683][gpunetio_send_wait_time] Launching CUDA kernel to send packets
[08:49:09:425648][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:690][gpunetio_send_wait_time] Waiting 10 sec for 256 packets to be sent
[08:49:19:444473][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:710][gpunetio_send_wait_time] Sample finished successfully
[08:49:19:444476][9288][DOCA][INF][gpunetio_send_wait_time_main.c:239][main] Sample finished successfully
It seems that the DPU and GPU cannot communicate with each other. My questions are given as follows.
- Is my PCIe setup correct?
- Are there specific logs I can check to diagnose the issue?
- The DPU is installed with DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb image. Is this image file okay?
- Are there any hardware requirements? Or specific setups in BIOS, grub/kernel?
- Any other debugging suggestions would be greatly appreciated.
Please also let me know if more information is needed. Thanks in advance for your help!