Hi all,
I am trying to follow the example of gpunetio_simple_receive
using the DOCA library version 2.5.2. The guildline I followed is :DOCA GPUNetIO - NVIDIA Docs
After compiling the simple_receive
example on Host A, I sent UDP packets from Host B, but Host A did not receive any incoming UDP packets. Below are the logs and ifconfig
outputs from both hosts:
[host A: 192.168.200.32]
$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:81ff:fec2:7e20 prefixlen 64 scopeid 0x20<link>
ether 02:42:81:c2:7e:20 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 526 (526.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.200.32 netmask 255.255.255.0 broadcast 192.168.200.255
inet6 fe80::bace:f6ff:fe49:64ea prefixlen 64 scopeid 0x20<link>
ether b8:ce:f6:49:64:ea txqueuelen 1000 (Ethernet)
RX packets 17 bytes 1438 (1.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 102 bytes 7484 (7.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.26 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::ca7f:54ff:fe67:d52b prefixlen 64 scopeid 0x20<link>
ether c8:7f:54:67:d5:2b txqueuelen 1000 (Ethernet)
RX packets 106603 bytes 50420938 (50.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20416 bytes 1498307 (1.4 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x85e00000-85efffff
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 180 bytes 21885 (21.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 180 bytes 21885 (21.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::21a:caff:feff:ff02 prefixlen 64 scopeid 0x20<link>
ether 00:1a:ca:ff:ff:02 txqueuelen 1000 (Ethernet)
RX packets 239077 bytes 10043710 (10.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 85 bytes 6046 (6.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.5/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 04:00.0 -g 01:00.0
[08:49:46:775224][7122][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[08:49:48:697835][7122][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
GPU 01:00.0
NIC 04:00.0
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:04:00.0 (socket -1)
[08:49:48:832543][7122][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[08:49:49:405936][7122][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:2230) device: 0000:01:00.0 (socket -1)
[08:49:49:443917][7122][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq
[08:49:49:444259][7122][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f5774000000 size 33554432B) with nvidia-peermem mode
[08:49:49:498094][7122][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[08:49:49:505775][7122][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination
^C[08:51:00:069538][7122][DOCA][INF][gpunetio_simple_receive_sample.c:45][signal_handler] Signal 2 received, preparing to exit!
[08:51:00:069547][7122][DOCA][INF][gpunetio_simple_receive_sample.c:620][gpunetio_simple_receive] Exiting from sample
[08:51:00:069966][7122][DOCA][INF][gpunetio_simple_receive_sample.c:362][destroy_rxq] Destroying Rxq
[08:51:00:109067][7122][DOCA][INF][gpunetio_simple_receive_sample.c:631][gpunetio_simple_receive] Sample finished successfully
[08:51:00:109083][7122][DOCA][INF][gpunetio_simple_receive_main.c:204][main] Sample finished successfully
[host B: 192.168.200.33]
$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:39:f6:fa:44 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.199 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::52eb:f6ff:fe29:22a2 prefixlen 64 scopeid 0x20<link>
ether 50:eb:f6:29:22:a2 txqueuelen 1000 (Ethernet)
RX packets 10107738 bytes 12815873507 (12.8 GB)
RX errors 0 dropped 15 overruns 0 frame 0
TX packets 7292619 bytes 4793595135 (4.7 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x85700000-857fffff
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.200.33 netmask 255.255.255.0 broadcast 192.168.200.255
inet6 fe80::ba3f:d2ff:fe03:6a02 prefixlen 64 scopeid 0x20<link>
ether b8:3f:d2:03:6a:02 txqueuelen 1000 (Ethernet)
RX packets 6515 bytes 1613313 (1.6 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3862 bytes 385136 (385.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 888 bytes 96632 (96.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 888 bytes 96632 (96.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tmfifo_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.100.1 netmask 255.255.255.0 broadcast 192.168.100.255
inet6 fe80::21a:caff:feff:ff02 prefixlen 64 scopeid 0x20<link>
ether 00:1a:ca:ff:ff:02 txqueuelen 1000 (Ethernet)
RX packets 69518 bytes 4780373 (4.7 MB)
RX errors 0 dropped 3 overruns 0 frame 0
TX packets 810194 bytes 1213824341 (1.2 GB)
TX errors 0 dropped 118 overruns 0 carrier 0 collisions 0
nping --udp -c 10 -p 2090 192.168.200.32 --data-length 1024 --delay 500ms
Starting Nping 0.7.80 ( https://nmap.org/nping ) at 2025-02-17 08:50 UTC
SENT (0.0012s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (0.5013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.0013s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (1.5018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.0018s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (2.5023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.0023s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (3.5028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.0028s) UDP packet with 1024 bytes to 192.168.200.32:2090
SENT (4.5033s) UDP packet with 1024 bytes to 192.168.200.32:2090
Issues Encountered:
- Is my network configuration correct for this test?
- There are no error messages on either host, but I do not see any logs indicating received UDP packets on Host A.
- I noticed that the NUMA mode of the DPU on both hosts is -1, which is inconsistent with the logs shown in the guideline, where NUMA nodes are detected as 2 and associated with specific sockets.
Could this discrepancy in NUMA node detection be causing the issue with simple_receive?
[log messge from the documentation]
$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/${YOUR_CUDA_VERSION}/lib64:/opt/mellanox/gdrcopy/src:/opt/mellanox/dpdk/lib/x86_64-linux-gnu:/opt/mellanox/doca/lib/x86_64-linux-gnu ./build/doca_gpunetio_simple_receive -n 17:00.1 -g ca:00.0
[11:00:30:397080][2328673][DOCA][INF][gpunetio_simple_receive_main.c:159][main] Starting the sample
[11:00:30:652622][2328673][DOCA][INF][gpunetio_simple_receive_main.c:189][main] Sample configuration:
GPU ca:00.0
NIC 17:00.1
EAL: Detected CPU lcores: 128
**EAL: Detected NUMA nodes: 2**
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:17:00.1 (**socket 0**)
[11:00:31:036760][2328673][DOCA][WRN][engine_model.c:72][adapt_queue_depth] adapting queue depth to 128.
[11:00:31:928926][2328673][DOCA][WRN][engine_port.c:321][port_driver_process_properties] detected representor used in VNF mode (driver port id 0)
EAL: Probe PCI driver: gpu_cuda (10de:20b5) device: 0000:ca:00.0 (**socket 1**)
[11:00:31:977261][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:425][create_rxq] Creating Sample Eth Rxq
[11:00:31:977841][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:466][create_rxq] Mapping receive queue buffer (0x0x7f86cc000000 size 33554432B) with nvidia-peermem mode
[11:00:32:043182][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:610][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[11:00:32:055193][2328673][DOCA][INF][gpunetio_simple_receive_sample.c:614][gpunetio_simple_receive] Waiting for termination
MST Results from Both Hosts:
[host A]
sudo mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField2(rev:1) NA 04:00.1 mlx5_1 net-enp4s0f1np1 -1
BlueField2(rev:1) NA 04:00.0 mlx5_0 net-enp4s0f0np0 -1
[host B]
sudo mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField2(rev:1) /dev/mst/mt41686_pciconf0.1 05:00.1 rocep5s0f1 net-eth1 -1
BlueField2(rev:1) /dev/mst/mt41686_pciconf0 05:00.0 rocep5s0f0 net-eth0
I would appreciate any suggestions on troubleshooting this issue. Please let me know if additional information is needed.
Thank you!
Best Regards,
JunXian