Issue with DOCA gpunetio_simple_receive – No Incoming UDP Packets

junxian · February 17, 2025, 10:40am

Hi all,

I am trying to follow the example of gpunetio_simple_receive using the DOCA library version 2.5.2. The guildline I followed is :DOCA GPUNetIO - NVIDIA Docs
After compiling the simple_receive example on Host A, I sent UDP packets from Host B, but Host A did not receive any incoming UDP packets. Below are the logs and ifconfig outputs from both hosts:

[host A: 192.168.200.32]

$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:81ff:fec2:7e20  prefixlen 64  scopeid 0x20<link>
        ether 02:42:81:c2:7e:20  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5  bytes 526 (526.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.200.32  netmask 255.255.255.0  broadcast 192.168.200.255
        inet6 fe80::bace:f6ff:fe49:64ea  prefixlen 64  scopeid 0x20<link>
        ether b8:ce:f6:49:64:ea  txqueuelen 1000  (Ethernet)
        RX packets 17  bytes 1438 (1.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 102  bytes 7484 (7.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.26  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ca7f:54ff:fe67:d52b  prefixlen 64  scopeid 0x20<link>
        ether c8:7f:54:67:d5:2b  txqueuelen 1000  (Ethernet)
        RX packets 106603  bytes 50420938 (50.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20416  bytes 1498307 (1.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x85e00000-85efffff

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 180  bytes 21885 (21.8 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 180  bytes 21885 (21.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::21a:caff:feff:ff02  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:ca:ff:ff:02  txqueuelen 1000  (Ethernet)
        RX packets 239077  bytes 10043710 (10.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 85  bytes 6046 (6.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.5/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 04:00.0 -g 01:00.0
[08:49:46:775224][7122][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[08:49:48:697835][7122][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
        GPU 01:00.0
        NIC 04:00.0

EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:49:48:832543][7122][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[08:49:49:405936][7122][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:49:49:443917][7122][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq

[08:49:49:444259][7122][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f5774000000 size 33554432B) with nvidia-peermem mode
[08:49:49:498094][7122][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[08:49:49:505775][7122][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination
^C[08:51:00:069538][7122][DOCA][INF][gpunetio_simple_receive_sample.c:45][signal_handler] Signal 2 received, preparing to exit!
[08:51:00:069547][7122][DOCA][INF][gpunetio_simple_receive_sample.c:620][gpunetio_simple_receive] Exiting from sample
[08:51:00:069966][7122][DOCA][INF][gpunetio_simple_receive_sample.c:362][destroy_rxq] Destroying Rxq
[08:51:00:109067][7122][DOCA][INF][gpunetio_simple_receive_sample.c:631][gpunetio_simple_receive] Sample finished successfully
[08:51:00:109083][7122][DOCA][INF][gpunetio_simple_receive_main.c:204][main] Sample finished successfully

[host B: 192.168.200.33]

$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:39:f6:fa:44  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.199  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::52eb:f6ff:fe29:22a2  prefixlen 64  scopeid 0x20<link>
        ether 50:eb:f6:29:22:a2  txqueuelen 1000  (Ethernet)
        RX packets 10107738  bytes 12815873507 (12.8 GB)
        RX errors 0  dropped 15  overruns 0  frame 0
        TX packets 7292619  bytes 4793595135 (4.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x85700000-857fffff

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.200.33  netmask 255.255.255.0  broadcast 192.168.200.255
        inet6 fe80::ba3f:d2ff:fe03:6a02  prefixlen 64  scopeid 0x20<link>
        ether b8:3f:d2:03:6a:02  txqueuelen 1000  (Ethernet)
        RX packets 6515  bytes 1613313 (1.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3862  bytes 385136 (385.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 888  bytes 96632 (96.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 888  bytes 96632 (96.6 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.1  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::21a:caff:feff:ff02  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:ca:ff:ff:02  txqueuelen 1000  (Ethernet)
        RX packets 69518  bytes 4780373 (4.7 MB)
        RX errors 0  dropped 3  overruns 0  frame 0
        TX packets 810194  bytes 1213824341 (1.2 GB)
        TX errors 0  dropped 118 overruns 0  carrier 0  collisions 0
nping --udp -c 10 -p 2090 192.168.200.32 --data-length 1024 --delay 500ms

Starting Nping 0.7.80 ( https://nmap.org/nping ) at 2025-02-17 08:50 UTC
SENT (0.0012s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (0.5013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.0013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.5018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.0018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.5023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.0023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.5028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.0028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.5033s) UDP packet with 1024 bytes to 192.168.200.32:2090

Issues Encountered:

Is my network configuration correct for this test?
There are no error messages on either host, but I do not see any logs indicating received UDP packets on Host A.
I noticed that the NUMA mode of the DPU on both hosts is -1, which is inconsistent with the logs shown in the guideline, where NUMA nodes are detected as 2 and associated with specific sockets.

Could this discrepancy in NUMA node detection be causing the issue with simple_receive?

[log messge from the documentation]

$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/${YOUR_CUDA_VERSION}/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 17:00.1 -g ca:00.0
[11:00:30:397080][2328673][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[11:00:30:652622][2328673][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
	GPU ca:00.0
	NIC 17:00.1
 
EAL: Detected CPU lcores: 128
**EAL: Detected NUMA nodes: 2**
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:17:00.1 (**socket 0**)
[11:00:31:036760][2328673][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[11:00:31:928926][2328673][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:20b5) device: 0000:ca:00.0 (**socket 1**)
[11:00:31:977261][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq
 
[11:00:31:977841][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f86cc000000 size 33554432B) with nvidia-peermem mode
[11:00:32:043182][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[11:00:32:055193][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination

MST Results from Both Hosts:

[host A]

sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE             MST      PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       NA       04:00.1   mlx5_1          net-enp4s0f1np1                         -1

BlueField2(rev:1)       NA       04:00.0   mlx5_0          net-enp4s0f0np0                         -1

[host B]

sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE             MST                           PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       /dev/mst/mt41686_pciconf0.1   05:00.1   rocep5s0f1      net-eth1                                -1

BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     05:00.0   rocep5s0f0      net-eth0

I would appreciate any suggestions on troubleshooting this issue. Please let me know if additional information is needed.
Thank you!

Best Regards,
JunXian

jtal · March 10, 2025, 8:08am

Hi,

It’s possible that the network configuration is affected by discrepancies in NUMA node detection, as you mentioned. This could be causing issues with the simple_receive function. I suggest to check on BIOS that NUMA is enabled and verify that all drivers and firmware are up to date.
If the issue still persists after that, I suggest opening a case with enterprisesupport@nvidia.com, and it will be handled based on entitlement.

Thanks
Jonathan.

Topic		Replies	Views
DOCA GPUNetIO does not receive packets on some machine Enterprise Networking	3	183	November 20, 2024
DOCA GPUNetIO gpu_packet_processing receives no packets Getting Started & Resources	13	301	July 31, 2025
Gpunetio_simple_receive example configuration differences to other examples Getting Started & Resources	2	86	February 13, 2025
DOCA: GPU Packet Processing Getting Started & Resources	0	215	June 18, 2024
Error while running Doca sample code, doca_gpunetio/gpunetio_simple_receive Getting Started & Resources	5	135	March 7, 2025
Problem with DOCA GPU packet processing Getting Started & Resources a100 , gpunetio	2	220	December 24, 2024
Inline GPU Packet Processing with NVIDIA DOCA GPUNetIO Technical Blog	6	1009	July 23, 2025
DOCA GPU Packet Processing example Getting Started & Resources	2	768	May 19, 2023
Run DOCA GPUNETIO On CX7 and H200 Getting Started & Resources	3	97	August 5, 2025
DOCA: GPU Packet Processing - Failed to start mmap DOCA Driver call failure Getting Started & Resources	3	881	July 23, 2025

Issue with DOCA gpunetio_simple_receive – No Incoming UDP Packets

Related topics