Issue with DOCA gpunetio_simple_receive – No Incoming UDP Packets

Hi all,

I am trying to follow the example of gpunetio_simple_receive using the DOCA library version 2.5.2. The guildline I followed is :DOCA GPUNetIO - NVIDIA Docs
After compiling the simple_receive example on Host A, I sent UDP packets from Host B, but Host A did not receive any incoming UDP packets. Below are the logs and ifconfig outputs from both hosts:

[host A: 192.168.200.32]

$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:81ff:fec2:7e20  prefixlen 64  scopeid 0x20<link>
        ether 02:42:81:c2:7e:20  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5  bytes 526 (526.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.200.32  netmask 255.255.255.0  broadcast 192.168.200.255
        inet6 fe80::bace:f6ff:fe49:64ea  prefixlen 64  scopeid 0x20<link>
        ether b8:ce:f6:49:64:ea  txqueuelen 1000  (Ethernet)
        RX packets 17  bytes 1438 (1.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 102  bytes 7484 (7.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.26  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ca7f:54ff:fe67:d52b  prefixlen 64  scopeid 0x20<link>
        ether c8:7f:54:67:d5:2b  txqueuelen 1000  (Ethernet)
        RX packets 106603  bytes 50420938 (50.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20416  bytes 1498307 (1.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x85e00000-85efffff

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 180  bytes 21885 (21.8 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 180  bytes 21885 (21.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::21a:caff:feff:ff02  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:ca:ff:ff:02  txqueuelen 1000  (Ethernet)
        RX packets 239077  bytes 10043710 (10.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 85  bytes 6046 (6.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.5/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 04:00.0 -g 01:00.0
[08:49:46:775224][7122][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[08:49:48:697835][7122][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
        GPU 01:00.0
        NIC 04:00.0

EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:49:48:832543][7122][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[08:49:49:405936][7122][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:49:49:443917][7122][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq

[08:49:49:444259][7122][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f5774000000 size 33554432B) with nvidia-peermem mode
[08:49:49:498094][7122][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[08:49:49:505775][7122][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination
^C[08:51:00:069538][7122][DOCA][INF][gpunetio_simple_receive_sample.c:45][signal_handler] Signal 2 received, preparing to exit!
[08:51:00:069547][7122][DOCA][INF][gpunetio_simple_receive_sample.c:620][gpunetio_simple_receive] Exiting from sample
[08:51:00:069966][7122][DOCA][INF][gpunetio_simple_receive_sample.c:362][destroy_rxq] Destroying Rxq
[08:51:00:109067][7122][DOCA][INF][gpunetio_simple_receive_sample.c:631][gpunetio_simple_receive] Sample finished successfully
[08:51:00:109083][7122][DOCA][INF][gpunetio_simple_receive_main.c:204][main] Sample finished successfully

[host B: 192.168.200.33]

$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:39:f6:fa:44  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.199  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::52eb:f6ff:fe29:22a2  prefixlen 64  scopeid 0x20<link>
        ether 50:eb:f6:29:22:a2  txqueuelen 1000  (Ethernet)
        RX packets 10107738  bytes 12815873507 (12.8 GB)
        RX errors 0  dropped 15  overruns 0  frame 0
        TX packets 7292619  bytes 4793595135 (4.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x85700000-857fffff

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.200.33  netmask 255.255.255.0  broadcast 192.168.200.255
        inet6 fe80::ba3f:d2ff:fe03:6a02  prefixlen 64  scopeid 0x20<link>
        ether b8:3f:d2:03:6a:02  txqueuelen 1000  (Ethernet)
        RX packets 6515  bytes 1613313 (1.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3862  bytes 385136 (385.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 888  bytes 96632 (96.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 888  bytes 96632 (96.6 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.1  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::21a:caff:feff:ff02  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:ca:ff:ff:02  txqueuelen 1000  (Ethernet)
        RX packets 69518  bytes 4780373 (4.7 MB)
        RX errors 0  dropped 3  overruns 0  frame 0
        TX packets 810194  bytes 1213824341 (1.2 GB)
        TX errors 0  dropped 118 overruns 0  carrier 0  collisions 0
nping --udp -c 10 -p 2090 192.168.200.32 --data-length 1024 --delay 500ms

Starting Nping 0.7.80 ( https://nmap.org/nping ) at 2025-02-17 08:50 UTC
SENT (0.0012s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (0.5013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.0013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.5018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.0018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.5023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.0023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.5028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.0028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.5033s) UDP packet with 1024 bytes to 192.168.200.32:2090

Issues Encountered:

  • Is my network configuration correct for this test?
  • There are no error messages on either host, but I do not see any logs indicating received UDP packets on Host A.
  • I noticed that the NUMA mode of the DPU on both hosts is -1, which is inconsistent with the logs shown in the guideline, where NUMA nodes are detected as 2 and associated with specific sockets.

Could this discrepancy in NUMA node detection be causing the issue with simple_receive?

[log messge from the documentation]

$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/${YOUR_CUDA_VERSION}/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 17:00.1 -g ca:00.0
[11:00:30:397080][2328673][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[11:00:30:652622][2328673][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
	GPU ca:00.0
	NIC 17:00.1
 
EAL: Detected CPU lcores: 128
**EAL: Detected NUMA nodes: 2**
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:17:00.1 (**socket 0**)
[11:00:31:036760][2328673][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[11:00:31:928926][2328673][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:20b5) device: 0000:ca:00.0 (**socket 1**)
[11:00:31:977261][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq
 
[11:00:31:977841][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f86cc000000 size 33554432B) with nvidia-peermem mode
[11:00:32:043182][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[11:00:32:055193][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination

MST Results from Both Hosts:

[host A]

sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE             MST      PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       NA       04:00.1   mlx5_1          net-enp4s0f1np1                         -1

BlueField2(rev:1)       NA       04:00.0   mlx5_0          net-enp4s0f0np0                         -1

[host B]

sudo mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE             MST                           PCI       RDMA            NET                                     NUMA
BlueField2(rev:1)       /dev/mst/mt41686_pciconf0.1   05:00.1   rocep5s0f1      net-eth1                                -1

BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     05:00.0   rocep5s0f0      net-eth0 

I would appreciate any suggestions on troubleshooting this issue. Please let me know if additional information is needed.
Thank you!

Best Regards,
JunXian

Hi,

It’s possible that the network configuration is affected by discrepancies in NUMA node detection, as you mentioned. This could be causing issues with the simple_receive function. I suggest to check on BIOS that NUMA is enabled and verify that all drivers and firmware are up to date.
If the issue still persists after that, I suggest opening a case with enterprisesupport@nvidia.com, and it will be handled based on entitlement.

Thanks
Jonathan.