DOCA GPUNetIO gpu_packet_processing receives no packets

hsinlichiu1 · February 6, 2025, 9:14am

Dear Nvidia support team,

I am following the tutorial to run gpu_packet_processing.

My hardware spec is given as follows:
GPU: RTX A6000
DPU: BlueField2 MBF2H532C-AECOT
CPU: 13th Gen intel core i7-13700K
Motherboard: ASUS ROG STRIX B660-A GAMING WIFI D4

OS is ubuntu server 22.04, and kernel is 5.15.0-1042-nvidia-lowlatency.
My CUDA version is 12.5, DOCA version is 2.5.0.

gpu_packet_processing can run, but it just receives nothing when I ping the interface that has PCIe address 04:00.0. The printouts are given as follows.

sudo ./doca_gpu_packet_processing -g 01:00.0 -n 04:00.0 -q 2 -l 60
[07:48:12:287398][8984][DOCA][INF][gpu_packet_processing.c:279][main] ===========================================================
[07:48:12:287422][8984][DOCA][INF][gpu_packet_processing.c:280][main] DOCA version: 2.5.0108
[07:48:12:287427][8984][DOCA][INF][gpu_packet_processing.c:281][main] ===========================================================
[07:48:12:287440][8984][DOCA][INF][gpu_packet_processing.c:302][main] Options enabled:
        GPU 01:00.0
        NIC 04:00.0
        GPU Rx queues 2
        GPU HTTP server enabled No
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[07:48:14:390985][8984][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[07:48:14:920072][8984][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
[07:48:14:956873][8984][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 0
[07:48:14:956995][8984][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd75c000000 size 536870912B) with nvidia-peermem mode
[07:48:15:048133][8984][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 1
[07:48:15:048189][8984][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd73a000000 size 536870912B) with nvidia-peermem mode
[07:48:15:149374][8984][DOCA][DBG][flow.c:196][create_udp_pipe] Created Pipe GPU_RXQ_UDP_PIPE
[07:48:15:149379][8984][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 0
[07:48:15:149465][8984][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd718000000 size 536870912B) with nvidia-peermem mode
[07:48:15:236185][8984][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 1
[07:48:15:236239][8984][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd6f6000000 size 536870912B) with nvidia-peermem mode
[07:48:15:336600][8984][DOCA][DBG][flow.c:368][create_tcp_gpu_pipe] Created Pipe GPU_RXQ_TCP_PIPE
[07:48:15:336604][8984][DOCA][INF][icmp_queues.c:42][create_icmp_queues] Creating ICMP Eth Rxq 0
[07:48:15:336656][8984][DOCA][INF][icmp_queues.c:88][create_icmp_queues] Mapping receive queue buffer (0x0x7fd716a00000 size 8388608B) with nvidia-peermem mode
[07:48:15:371549][8984][DOCA][DBG][flow.c:436][create_icmp_gpu_pipe] Created Pipe GPU_RXQ_ICMP_PIPE
[07:48:15:393726][8984][DOCA][DBG][flow.c:567][create_root_pipe] Created Pipe ROOT_PIPE
[07:48:15:393878][8984][DOCA][INF][gpu_packet_processing.c:415][main] Warm up CUDA kernels
[07:48:15:398773][8984][DOCA][INF][gpu_packet_processing.c:430][main] Launching CUDA kernels
[07:48:15:398789][8996][DOCA][INF][gpu_packet_processing.c:125][stats_core] Core 1 is reporting filter stats
[07:48:15:398791][8984][DOCA][INF][gpu_packet_processing.c:463][main] Waiting for termination

Seconds 5
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[UDP] QUEUE: 1 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
[TCP] QUEUE: 1 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0

I have checked the setups of BAR1 size and hugepages. They seem okay as well as follows.
nvidia-smi -q | grep -i bar -A 3
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB

grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 15
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 16777216 kB

I also notice that, when gpu_packet_processing is NOT running, ping the NIC is okay. When gpu_packet_processing IS running, ping just stops, receieves no response. But I see no packets being processing by gpu_packet_processing either.

As soon as gpu_packet_processing stops, ping becomes normal again.

I notice there was one post facing the similar problem, but it was using dmabuf. I use nvidia_peermem in my case. The PCIe setup is given as follows.

lspci -tvvv
-[0000:00]-+-00.0  Intel Corporation Device a703
           +-01.0-[01]--+-00.0  NVIDIA Corporation GA102GL [RTX A6000]
           |            \-00.1  NVIDIA Corporation GA102 High Definition Audio Controller
           +-02.0  Intel Corporation Device a780
           +-06.0-[02]----00.0  Kingston Technology Company, Inc. Device 5013
           +-0a.0  Intel Corporation Device a77d
           +-0e.0  Intel Corporation Device a77f
           +-14.0  Intel Corporation Device 7a60
           +-14.2  Intel Corporation Device 7a27
           +-14.3  Intel Corporation Device 7a70
           +-15.0  Intel Corporation Device 7a4c
           +-15.1  Intel Corporation Device 7a4d
           +-15.2  Intel Corporation Device 7a4e
           +-16.0  Intel Corporation Device 7a68
           +-17.0  Intel Corporation Device 7a62
           +-1a.0-[03]--
           +-1c.0-[04]--+-00.0  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
           |            +-00.1  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
           |            \-00.2  Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
           +-1d.0-[05]----00.0  Intel Corporation Device 125c


sudo lspci -s 04:00.0 -vvvv | grep -i ACSCtl
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

Is this PCIe setup correct?

I also compiled the GPUNetIO samples of “gpunetio_send_wait_time” and “gpunetio_simple_receive”.

For gpunetio_simple_receive, it just stops there as follows and receives nothing when I ping the NIC interface.

sudo ./doca_gpunetio_simple_receive -g 01:00.0 -n 04:00.0
[08:45:43:970661][9241][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[08:45:45:925724][9241][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
        GPU 01:00.0
        NIC 04:00.0

EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:45:46:067284][9241][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[08:45:46:588660][9241][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:45:46:627461][9241][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq

[08:45:46:627552][9241][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f388a000000 size 33554432B) with nvidia-peermem mode
[08:45:46:683412][9241][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[08:45:46:688556][9241][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination

For gpunetio_send_wait_time, the synchronization and phc2sys service are confirmed. The printouts also look okay as follows, but still capture nothing with tcpdump when sending packets.

sudo ./doca_gpunetio_send_wait_time -g 01:00.0 -n 04:00.0 -t 500000
[08:49:07:309016][9288][DOCA][INF][gpunetio_send_wait_time_main.c:195][main] Starting the sample
[08:49:09:246703][9288][DOCA][INF][gpunetio_send_wait_time_main.c:224][main] Sample configuration:
        GPU 01:00.0
        NIC 04:00.0
        Timeout 500000ns
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:49:09:414376][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:633][gpunetio_send_wait_time] Wait on time supported mode: DPDK
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:49:09:419372][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:278][create_tx_buf] Mapping send queue buffer (0x0x7efda2693000 size 262144B) with legacy nvidia-peermem mode
[08:49:09:420426][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:683][gpunetio_send_wait_time] Launching CUDA kernel to send packets
[08:49:09:425648][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:690][gpunetio_send_wait_time] Waiting 10 sec for 256 packets to be sent
[08:49:19:444473][9288][DOCA][INF][gpunetio_send_wait_time_sample.c:710][gpunetio_send_wait_time] Sample finished successfully
[08:49:19:444476][9288][DOCA][INF][gpunetio_send_wait_time_main.c:239][main] Sample finished successfully

It seems that the DPU and GPU cannot communicate with each other. My questions are given as follows.

Is my PCIe setup correct?
Are there specific logs I can check to diagnose the issue?
The DPU is installed with DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb image. Is this image file okay?
Are there any hardware requirements? Or specific setups in BIOS, grub/kernel?
Any other debugging suggestions would be greatly appreciated.

Please also let me know if more information is needed. Thanks in advance for your help!

hsinlichiu1 · February 6, 2025, 9:47am

Hello,

I notice this error in the attached log file:

[63752.719764] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529)
[63793.272980] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184)
[63793.274258] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477)

These 3 rows keep showing up everytime I do something to GPU.

Is it the reason why the gpu_packet_processing cannot receive packets?

nvidia-bug-report.log.gz (291.7 KB)

cs25mtech02005 · February 18, 2025, 3:50pm

Hey,
could you please run dpdk-testpmd and check whether it gets killed due to oom-killer, you can find the dpdk-testpmd binary at /opt/mellanox/doca/dpdk/bin

hsinlichiu1 · February 19, 2025, 3:41am

Hello,

Thank you for your reply. Enclosed please find the returned error message after running dpdk-testpmd.

sudo ./dpdk-testpmd
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.1 (socket -1)
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=8558592, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Error - exiting with code: 1
  Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory
Port 0 is closed
Port 1 is closed

Then I google it and found it is perhaps due to my hugepages configuration. So I ran the following commands to update and increase the hugepage size.

$ echo 0 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
$ echo 1024 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
$ echo 0 | sudo tee /proc/sys/vm/nr_hugepages
$ echo 1024 | sudo tee /proc/sys/vm/nr_hugepages
0
1024

$ grep -i huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      53
HugePages_Free:       52
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        57671680 kB

It is limited by 53 GB, probably because my host has only 64GB RAM. Then I got the following outcome after running dpdk-testpmd.

$ sudo ./dpdk-testpmd
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: VFIO support initialized
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.1 (socket -1)
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=8558592, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: B8:CE:F6:49:64:EA
Configuring Port 1 (socket 0)
Port 1: B8:CE:F6:49:64:EB
Checking link statuses...
Done
No commandline core given, start packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x10000
    RX queue: 0
      RX desc=256 - RX free threshold=64
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=256 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x10000 - TX RS bit threshold=0
  port 1: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x10000
    RX queue: 0
      RX desc=256 - RX free threshold=64
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=256 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x10000 - TX RS bit threshold=0
Press enter to exit

Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 15             RX-dropped: 0             RX-total: 15
  TX-packets: 0              TX-dropped: 0             TX-total: 0
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 15             TX-dropped: 0             TX-total: 15
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 15             RX-dropped: 0             RX-total: 15
  TX-packets: 15             TX-dropped: 0             TX-total: 15
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.
Stopping ports...
Port 0 is stopped
Port 1 is stopped
Done
Closing ports...
Port 0 is closed
Port 1 is closed
Done

Bye...

I kept ping the interface that has PCIe address 04:00.0 when I ran dkdp-testpmd. So I can receive some packets. The results seem okay? Please let me know if I do anything wrong.

However, I still received no packets when running “sudo ./doca_gpu_packet_processing -g 01:00.0 -n 04:00.0 -q 2 -l 60”.

$ sudo ./doca_gpu_packet_processing -g 0000:01:00.0 -n 0000:04:00.0 -q 2
[02:45:49:155492][9664][DOCA][INF][gpu_packet_processing.c:279][main] ===========================================================
[02:45:49:155514][9664][DOCA][INF][gpu_packet_processing.c:280][main] DOCA version: 2.5.0108
[02:45:49:155518][9664][DOCA][INF][gpu_packet_processing.c:281][main] ===========================================================
[02:45:49:155532][9664][DOCA][INF][gpu_packet_processing.c:302][main] Options enabled:
        GPU 0000:01:00.0
        NIC 0000:04:00.0
        GPU Rx queues 2
        GPU HTTP server enabled No
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[02:45:51:181510][9664][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[02:45:51:750680][9664][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
[02:45:51:788080][9664][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 0
[02:45:51:788531][9664][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd11e000000 size 536870912B) with nvidia-peermem mode
[02:45:51:893077][9664][DOCA][INF][udp_queues.c:40][create_udp_queues] Creating UDP Eth Rxq 1
[02:45:51:893293][9664][DOCA][INF][udp_queues.c:85][create_udp_queues] Mapping receive queue buffer (0x0x7fd0fc000000 size 536870912B) with nvidia-peermem mode
[02:45:52:010840][9664][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 0
[02:45:52:011130][9664][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd0da000000 size 536870912B) with nvidia-peermem mode
[02:45:52:111961][9664][DOCA][INF][tcp_queues.c:46][create_tcp_queues] Creating TCP Eth Rxq 1
[02:45:52:112166][9664][DOCA][INF][tcp_queues.c:91][create_tcp_queues] Mapping receive queue buffer (0x0x7fd0b8000000 size 536870912B) with nvidia-peermem mode
[02:45:52:226525][9664][DOCA][INF][icmp_queues.c:42][create_icmp_queues] Creating ICMP Eth Rxq 0
[02:45:52:226754][9664][DOCA][INF][icmp_queues.c:88][create_icmp_queues] Mapping receive queue buffer (0x0x7fd0d8a00000 size 8388608B) with nvidia-peermem mode
[02:45:52:289346][9664][DOCA][INF][gpu_packet_processing.c:415][main] Warm up CUDA kernels
[02:45:52:295576][9664][DOCA][INF][gpu_packet_processing.c:430][main] Launching CUDA kernels
[02:45:52:295595][9676][DOCA][INF][gpu_packet_processing.c:125][stats_core] Core 1 is reporting filter stats
[02:45:52:295598][9664][DOCA][INF][gpu_packet_processing.c:463][main] Waiting for termination

Seconds 5
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[UDP] QUEUE: 1 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
[TCP] QUEUE: 1 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0

Seconds 10
[UDP] QUEUE: 0 DNS: 0 OTHER: 0 TOTAL: 0
[UDP] QUEUE: 1 DNS: 0 OTHER: 0 TOTAL: 0
[TCP] QUEUE: 0 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0
[TCP] QUEUE: 1 HTTP: 0 HTTP HEAD: 0 HTTP GET: 0 HTTP POST: 0 TCP [SYN: 0 FIN: 0 ACK: 0] OTHER: 0 TOTAL: 0

I did notice that “EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size” is new. This warning is also there when running dpdk-testpmd. Does this warning matter?

gpu_packet_processing still receives no pakets. What other reason might be? Is it because of the NUMA configuration? Enclosed please find the outcome of mst status.

$ sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE             MST      PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       NA       04:00.1   mlx5_1          net-enp4s0f1np1                         -1

BlueField2(rev:1)       NA       04:00.0   mlx5_0          net-enp4s0f0np0                         -1

I am not familiar with NUMA and feel worried that the NUMA indicators for BlueField2 are -1. Please also let me know if NUMA is needed to use GPUNetIO. Any clues or suggestions to check why gpu_packet_processing still receives nothing are appreciated.

Thank you so much for your reply again.

cs25mtech02005 · February 19, 2025, 8:29am

Hey,
I see that you have only 1 numa node and your device is bound to none (as it shows -1) , id suggest you bind it via running
# echo 0 | sudo tee /sys/bus/pci/devices/<pci_bus_id>/numa_node as numa nodes start from 0 index
you could check which numa node your device is bound to by running
# cat /sys/bus/pci/devices/<pci_bus_id>/numa_node to confirm.

regarding hugetlbfs error you can do the following

sudo mkdir -p /dev/hugepages
sudo mount -t hugetlbfs nodev /dev/hugepages

also your dpdk-testpmd command was incomplete could you share the output by running this complete command

/opt/mellanox/dpdk/bin# ./dpdk-testpmd -l 0-3 -n 4 --socket-mem=1024 -- -i

if it gets killed ,also share output of the following command

# demsg | tail -20

hsinlichiu1 · February 19, 2025, 10:43am

Hello,

Thank you for your prompt reply. I follow your suggestions to bind the DPU and GPU to the only 1 numa node.

/opt/mellanox/dpdk/bin$ echo 0 | sudo tee /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0
/opt/mellanox/dpdk/bin$ echo 0 | sudo tee /sys/bus/pci/devices/0000\:04\:00.0/numa_node
0
/opt/mellanox/dpdk/bin$ echo 0 | sudo tee /sys/bus/pci/devices/0000\:04\:00.1/numa_node
0
acarcubb@acarcubb:/opt/mellanox/dpdk/bin$
acarcubb@acarcubb:/opt/mellanox/dpdk/bin$ sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE             MST      PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       NA       04:00.1   mlx5_1          net-enp4s0f1np1                         0

BlueField2(rev:1)       NA       04:00.0   mlx5_0          net-enp4s0f0np0                         0

/opt/mellanox/dpdk/bin$ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0
/opt/mellanox/dpdk/bin$ cat /sys/bus/pci/devices/0000\:04\:00.0/numa_node
0
/opt/mellanox/dpdk/bin$ cat /sys/bus/pci/devices/0000\:04\:00.1/numa_node
0

They seem correct. I also checked GPU NUMA ID with “nvidia-smi topo -m”, but it seems not binding.

$ nvidia-smi topo -m
        GPU0    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     PHB     0-15    0               N/A
NIC0    PHB      X      PIX
NIC1    PHB     PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

Then I continue on the dpdk-testpmd test. Here is the output.

$ sudo ./dpdk-testpmd -l 0-3 -n 4 --socket-mem=1024 -- -i
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: VFIO support initialized
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.1 (socket 0)
Interactive-mode selected
testpmd: create a new mbuf pool <mb_pool_0>: n=5486592, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: B8:CE:F6:49:64:EA
Configuring Port 1 (socket 0)
Port 1: B8:CE:F6:49:64:EB
Checking link statuses...
Done
testpmd> start
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x10000
    RX queue: 0
      RX desc=256 - RX free threshold=64
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=256 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x10000 - TX RS bit threshold=0
  port 1: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x10000
    RX queue: 0
      RX desc=256 - RX free threshold=64
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=256 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x10000 - TX RS bit threshold=0
testpmd> start tx_first
Packet forwarding already started
testpmd>
testpmd> Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 26             RX-dropped: 0             RX-total: 26
  TX-packets: 0              TX-dropped: 0             TX-total: 0
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 26             TX-dropped: 0             TX-total: 26
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 26             RX-dropped: 0             RX-total: 26
  TX-packets: 26             TX-dropped: 0             TX-total: 26
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.
Stopping ports...
Port 0 is stopped
Port 1 is stopped
Done
Closing ports...
Port 0 is closed
Port 1 is closed
Done

Bye...

It seems working, not been killed. Enclosed is the output of running “sudo dmesg | tail -20”.

$ sudo dmesg | tail -20

[437957.363074] Initialized Arguments for Method [_DSM]:  (4 arguments defined for method invocation)
[437957.363074]   Arg0:   0000000099714e5a <Obj>           Buffer(16) 75 0B A5 D4 C7 65 F7 46
[437957.363078]   Arg1:   00000000128c0989 <Obj>           Integer 0000000000000102
[437957.363080]   Arg2:   000000002e89436b <Obj>           Integer 0000000000000010
[437957.363082]   Arg3:   00000000eddca523 <Obj>           Buffer(4) 00 10 52 44

[437957.363085] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529)
[439294.091082] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184)
[439294.092076] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477)

[439294.093050] No Local Variables are initialized for Method [_DSM]

[439294.093051] Initialized Arguments for Method [_DSM]:  (4 arguments defined for method invocation)
[439294.093051]   Arg0:   000000003c0c0cf0 <Obj>           Buffer(16) 75 0B A5 D4 C7 65 F7 46
[439294.093055]   Arg1:   0000000023b48649 <Obj>           Integer 0000000000000102
[439294.093057]   Arg2:   00000000dcc91140 <Obj>           Integer 0000000000000010
[439294.093058]   Arg3:   00000000eb8b2fb2 <Obj>           Buffer(4) 00 10 52 44

[439294.093061] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529)

gpu_packet_processing still receives no packets. Do the results mean that DPDK is okay, but gpu_packet_processing/GPUNetIO is not? "Are there any additional tests or configurations I should verify? Or are there other commands I should run in the ‘testpmd>’ terminal?

Thank you again for your prompt reply.

p.s. I did follow the suggestions to mount hugepages, but the warning is still there. I am not sure if this would cause any problems.

sudo mkdir -p /dev/hugepages
sudo mount -t hugetlbfs nodev /dev/hugepages

Evotique · March 5, 2025, 12:39pm

Are you running the ping command on the same host as where you are running the program, i.e. locally pinging the interface? That won’t work. You need to send the packets from a separate host.

hsinlichiu1 · March 5, 2025, 2:55pm

Hello,

Thank you for your response.

I tried running the ping command from another host, but unfortunately, it still didn’t work.

Is there any additional information I can provide to help with the diagnosis? I would really appreciate any further guidance.

Additionally, I was wondering—does the connection between the GPU and DPU need to be in PIX mode for GPUNetIO to function? Do the PCIe slots for the GPU and DPU need to be under the same root complex to support PIX mode?

If that is the case, would this indicate a hardware limitation? Would enabling GPUNetIO require changing the motherboard, or even the CPU?

Thank you again for your time and assistance. I truly appreciate your help.

cs25mtech02005 · March 12, 2025, 11:47am

Hey ,
please share the screenshots confirming sending packets from Sender and the GPU packet processing application output
thanks

hsinlichiu1 · March 18, 2025, 6:42pm

Hello,

Apologies for the delay in recovering the system.

Attached is a screenshot of the sender, which is sending ping requests to the host server (where the NIC/DPU has the IP address 192.168.200.32). The ping replies stop when when gpu_packet_processing is initiated.

Enclosed is the screen shot of the host server, seems not processing any packets.

Please let me know if there is any other information I can provide to help diagnose the issue. Any debugging suggestions would be greatly appreciated.

Thank you again for your help!

cs25mtech02005 · March 26, 2025, 8:42am

are you using two hosts or a single host , looks like you are using the same host, id suggest you connect two seperate machines and then ping the interface IP, as if you ping fro the same host it implements weak host where packet are routed via the (lo) interface rather than the physical interface , you can still ping from the same host but you would have to set the two interfaces in different network namespaces and physically populate their arp tables., a better implementation is using two hosts

hsinlichiu1 · March 30, 2025, 5:16pm

Hello,

Thank you for your reply. I truly appreciate your time and assistance.

I indeed used two hosts for testing. I’m pleased to inform you that the issue has been resolved. After switching to a motherboard and CPU that support NUMA and ensuring that the PCIe slots for the GPU and DPU are on the same root complex, gpu_packet_processing application is now functioning well.

Once again, I sincerely appreciate your support and guidance.

palok · July 23, 2025, 9:02am

@hsinlichiu1 are you using Gpudirect Async KI in this example? Does the new motherboard has PCie Swith?

hsinlichiu1 · July 31, 2025, 5:26am

Hi,

I am using GPUDiect RDMA. I don’t think the new motherboard has PCIe switch. It seems that it can work as long as the GPU and DPU are under the same root complex of the same NUMA node.

Topic		Replies	Views
DOCA GPUNetIO does not receive packets on some machine Enterprise Networking	3	190	November 20, 2024
DOCA gpu_packet_processing runtime error Enterprise Networking networking , dpdk	5	550	June 18, 2024
Problem with DOCA GPU packet processing Getting Started & Resources a100 , gpunetio	2	253	December 24, 2024
Inline GPU Packet Processing with NVIDIA DOCA GPUNetIO Technical Blog	6	1020	July 23, 2025
Doca_gpu_packet_processing start failed Getting Started & Resources	0	66	June 2, 2025
Clarification on requirements for GPUDirect RDMA CUDA Programming and Performance	16	5243	November 7, 2023
DOCA Example App Runtime Error: errno=UNKNOWN-errno14 Getting Started & Resources	16	702	August 6, 2025
Issue with DOCA gpunetio_simple_receive – No Incoming UDP Packets Getting Started & Resources	1	92	March 10, 2025
DOCA: GPU Packet Processing - Failed to start mmap DOCA Driver call failure Getting Started & Resources	3	911	July 23, 2025
Doca_gpu_packet_processing Failed to start mmap DOCA Driver call failure Enterprise Networking	6	1340	September 29, 2023

DOCA GPUNetIO gpu_packet_processing receives no packets

Related topics