DOCA GPUNetIO does not receive packets on some machine

Hi all,

Using the original gpunetio_simple_receive sample with DOCA 2.8, we do receive packets on one machine (AMD EPYC 9124, AsusTek RS720A-E12-RS12 motherboard). But not on the other one, (i9-12900KS, Z790 Taichi Lite). On the first one we see the print, but nothing appears:

sudo -E ./build/doca_gpunetio_simple_receive -g 02:00.0 -n 01:00.0 -l 50 --sdk-log-level 50                                                                                                                                           < /opt/mellanox/doca/samples/doca_gpunetio/gpunetio_simple_receive [13:47:36]
[13:47:42:224995][7618][DOCA][INF][gpunetio_simple_receive_main.c:167][main] Starting the sample
[13:47:42:490730][7618][DOCA][INF][gpunetio_simple_receive_main.c:197][main] Sample configuration:
	GPU 02:00.0
	NIC 01:00.0

[13:47:42:490873][7618][DOCA][INF][doca_dev.cpp:579][doca_devinfo_create_list] Devinfo list 0x610bcfbc9288: Added device=0x610bcfbc9500 to devinfo list
[13:47:42:490881][7618][DOCA][INF][doca_dev.cpp:579][doca_devinfo_create_list] Devinfo list 0x610bcfbc9288: Added device=0x610bcfbc9580 to devinfo list
[13:47:42:490886][7618][DOCA][INF][doca_dev.cpp:588][doca_devinfo_create_list] Devinfo list 0x610bcfbc9288 was created
[13:47:42:497999][7618][DOCA][INF][doca_dev.cpp:1004][doca_dev_open] Local device 0x610bcfbc9500 was opened
[13:47:42:498022][7618][DOCA][INF][doca_dev.cpp:147][dev_put] Device 0x610bcfbc9580 was destroyed
[13:47:42:498026][7618][DOCA][INF][doca_dev.cpp:669][doca_devinfo_destroy_list] Devinfo list 0x610bcfbc9288 was destroyed
[13:47:42:498038][7618][DOCA][WRN][engine_model.c:90][adapt_queue_depth] adapting queue depth to 128.
[13:47:42:498041][7618][DOCA][INF][engine_model.c:151][engine_model_init] engine model defined with mode=vnf
[13:47:42:498043][7618][DOCA][INF][engine_model.c:152][engine_model_init] engine model defined with nr_pipe_queues=1
[13:47:42:498045][7618][DOCA][INF][engine_model.c:153][engine_model_init] engine model defined with pipe_queue_depth=128
[13:47:42:498047][7618][DOCA][INF][engine_model.c:155][engine_model_init] engine model defined in isolated mode
[13:47:42:498049][7618][DOCA][INF][engine_model.c:156][engine_model_init] engine model defined RSS with nr_queues=0
[13:47:42:498050][7618][DOCA][INF][engine_model.c:157][engine_model_init] engine model defined with nr_counters=524228
[13:47:42:498052][7618][DOCA][INF][engine_model.c:158][engine_model_init] engine model defined with nr_meters=0
[13:47:42:498054][7618][DOCA][INF][engine_model.c:159][engine_model_init] engine model defined with nr_acl_collisions=3
[13:47:42:498259][7618][DOCA][INF][engine_field_mapping.c:109][engine_field_mapping_init] Engine field mapping initialized
[13:47:42:498264][7618][DOCA][INF][engine_shared_resources.c:155][engine_shared_resources_init] Engine shared resources initialized successfully
EAL: Detected CPU lcores: 24
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
[13:47:42:587489][7618][DOCA][INF][hws_port.c:960][hws_port_module_init] hws port module init
[13:47:42:587493][7618][DOCA][INF][hws_matcher.c:1418][hws_matcher_module_init] Initializing hws matcher successfully
[13:47:42:587494][7618][DOCA][INF][hws_flow.c:62][hws_flow_module_init] Initializing dpdk flow successfully
[13:47:42:587496][7618][DOCA][INF][hws_resource_manager.c:210][hws_resource_manager_module_init] Dpdk resource manager register completed
[13:47:42:606149][7618][DOCA][INF][hws_pipe_items.c:214][hws_pipe_items_module_init] Initialized dpdk pipe items module
[13:47:42:606159][7618][DOCA][INF][hws_pipe_geneve_opt.c:125][hws_pipe_geneve_opt_module_init] Initialized hws pipe GENEVE options module
[13:47:42:606206][7618][DOCA][INF][hws_pipe.c:199][hws_pipe_module_init] Dpdk pipe initialized successfully
[13:47:42:606208][7618][DOCA][INF][hws_layer.c:150][hws_layer_register] Dpdk layer register completed
[13:47:42:606283][7618][DOCA][INF][doca_flow_match.c:643][doca_flow_match_init] Doca flow match UDS initialized
[13:47:42:606379][7618][DOCA][INF][doca_flow_actions.c:1347][doca_flow_actions_init] Doca flow actions UDS initialized
[13:47:42:606384][7618][DOCA][INF][doca_flow_monitor.c:121][doca_flow_monitor_init] Doca flow monitor UDS initialized
[13:47:42:606388][7618][DOCA][INF][doca_flow_layer.c:94][doca_flow_layer_init] Doca flow layer initialized
[13:47:42:606390][7618][DOCA][INF][doca_flow.c:629][doca_flow_init] Doca flow initialized successfully
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:01:00.0 (socket -1)
[13:47:43:051010][7618][DOCA][INF][utils_hash_table.c:119][utils_hash_table_create] hashmatcher a_tmplt_t port 0 created
[13:47:43:051052][7618][DOCA][INF][utils_hash_table.c:119][utils_hash_table_create] hashmatcher p_tmplt_t port 0 created
[13:47:43:051265][7618][DOCA][INF][utils_hash_table.c:119][utils_hash_table_create] hashmatcher tbl_mgr port 0 created
[13:47:43:052481][7618][DOCA][INF][hws_meter_profiles.c:202][hws_meter_profiles_create] Created meter profiles on port 0 with 2 caches, 128 profiles
[13:47:43:672222][7618][DOCA][INF][hws_port.c:1197][hws_port_create] Hws port 0 initialized successfully with 2 queues
[13:47:43:708110][7618][DOCA][INF][doca_flow.c:1555][doca_flow_port_start] doca flow port with id=0 started
EAL: Probe PCI driver: gpu_cuda (10de:2531) device: 0000:02:00.0 (socket -1)
[13:47:43:709113][7618][DOCA][INF][doca_sub_dev.cpp:45][priv_doca_sub_dev_gpu_ops_set] sub_dev: gpu_ops was set to 0x7d50e4ad6d20
[13:47:43:709129][7618][DOCA][INF][gpunetio_simple_receive_sample.c:475][create_rxq] Creating Sample Eth Rxq
[13:47:43:709143][7618][DOCA][INF][doca_eth_rxq.c:1955][doca_eth_rxq_set_type] ETH_RXQ 0x610bcfd5b6c0: queue_type was set to DOCA_ETH_RXQ_TYPE_CYCLIC
[13:47:43:709165][7618][DOCA][INF][doca_mmap.cpp:618][doca_mmap_create] Mmap 0x610bcfd5bb80 was created, access_mask=0x1
[13:47:43:709554][7618][DOCA][INF][doca_mmap.cpp:1900][doca_mmap_set_memrange] Mmap 0x610bcfd5bb80: Set memrange addr=0x7d50ae000000, len=33554432
[13:47:43:709570][7618][DOCA][INF][doca_mmap.cpp:2103][doca_mmap_set_permissions] Mmap 0x610bcfd5bb80: Set permissions with access_mask=0x1
[13:47:43:715358][7618][DOCA][INF][doca_mmap.cpp:801][doca_mmap_start] Mmap 0x610bcfd5bb80: mmap was started
[13:47:43:715372][7618][DOCA][INF][doca_eth_rxq.c:2642][doca_eth_rxq_set_pkt_buf] ETH_RXQ 0x610bcfd5b6c0: mmap was set to 0x610bcfd5bb80
[13:47:43:715382][7618][DOCA][INF][doca_eth_rxq.c:2643][doca_eth_rxq_set_pkt_buf] ETH_RXQ 0x610bcfd5b6c0: offset was set to 0
[13:47:43:715391][7618][DOCA][INF][doca_eth_rxq.c:2644][doca_eth_rxq_set_pkt_buf] ETH_RXQ 0x610bcfd5b6c0: size was set to 33554432
[13:47:43:715402][7618][DOCA][INF][doca_ctx.cpp:253][doca_ctx_start] CTX 0x610bcfd5b6c0 does not require PE
[13:47:43:717751][7618][DOCA][INF][doca_buf_array.cpp:218][doca_buf_arr] buf_arr 0x610bcfcd4b20 was created
[13:47:43:717763][7618][DOCA][INF][doca_buf_array.cpp:219][doca_buf_arr]   num_elem=16384
[13:47:43:717772][7618][DOCA][INF][doca_buf_array.cpp:286][set_target_gpu] buf_arr 0x610bcfcd4b20: target_gpu was set to 0x610bcfda6d80
[13:47:43:717781][7618][DOCA][INF][doca_buf_array.cpp:236][set_params] buf_arr 0x610bcfcd4b20: elem_size was set to 2048
[13:47:43:717790][7618][DOCA][INF][doca_buf_array.cpp:237][set_params] buf_arr 0x610bcfcd4b20: start_offset was set to 0
[13:47:43:718711][7618][DOCA][INF][doca_buf_array.cpp:414][start] buf_arr 0x610bcfcd4b20: buf_arr was started
[13:47:43:719008][7618][DOCA][INF][doca_uar.cpp:207][bridge_init] UAR 0x610bcfd65440 created: page=0x7d50dee30000, reg_addr=0x7d50dee30800, base_addr=0x7d50dee30000, id=133, alloc_type=BLUEFLAME
[13:47:43:722083][7618][DOCA][INF][eth_rxq_common.c:520][eth_rxq_common_create_cq] ETH_RXQ 0x610bcfd5b6c0: Created CQ 0x4ac
[13:47:43:723885][7618][DOCA][INF][eth_rxq_common.c:787][eth_rxq_common_create_rq] ETH_RXQ 0x610bcfd5b6c0: Created RQ 0xc0004a
[13:47:43:724532][7618][DOCA][INF][eth_rxq_common.c:520][eth_rxq_common_create_cq] ETH_RXQ 0x610bcfd5b6c0: Created CQ 0x4ad
[13:47:43:726968][7618][DOCA][INF][doca_qp.cpp:1087][priv_doca_dev_qp_create] Device 0x610bcfbc9500: qp=0x610bcfd67950 was created
[13:47:43:729093][7618][DOCA][INF][doca_qp.cpp:1049][set_state] QP 0x610bcfd67950: State change INIT -> CONNECTED
[13:47:43:729779][7618][DOCA][INF][doca_dev.cpp:2829][priv_doca_dev_mapped_memory_region_create_dmabuf] Device 0x610bcfbc9500: mapped_memory_region=0x610bcff15e40 was created with dmabuf
[13:47:43:729796][7618][DOCA][INF][eth_rxq_common.c:362][eth_rxq_common_create_flush_qp] ETH_RXQ 0x610bcfd5b6c0: Created flush QP 0xdc
[13:47:43:731117][7618][DOCA][INF][doca_eth_rxq.c:1679][eth_rxq_start_gpu_ctx] ETH_RXQ 0x610bcfd5b6c0: Context was started successfully
[13:47:43:731189][7618][DOCA][INF][dpdk_pipe_common.c:889][adjust_mempool_entry_nb] entry pool 2 cache enabled, change nb_entries from 8193 to 8705
[13:47:43:751907][7618][DOCA][INF][engine_pipe.c:610][engine_pipe_create] Pipe with pipe_id 0 is created.
[13:47:43:751984][7618][DOCA][INF][dpdk_pipe_common.c:889][adjust_mempool_entry_nb] entry pool 2 cache enabled, change nb_entries from 8192 to 8704
[13:47:43:752958][7618][DOCA][INF][engine_pipe.c:610][engine_pipe_create] Pipe with pipe_id 1 is created.
[13:47:43:770049][7618][DOCA][INF][gpunetio_simple_receive_sample.c:663][gpunetio_simple_receive] Launching CUDA kernel to receive packets
[13:47:43:774358][7618][DOCA][INF][gpunetio_simple_receive_sample.c:667][gpunetio_simple_receive] Waiting for termination



^C[13:47:53:203511][7618][DOCA][INF][gpunetio_simple_receive_sample.c:51][signal_handler] Signal 2 received, preparing to exit!
[13:47:53:203519][7618][DOCA][INF][gpunetio_simple_receive_sample.c:673][gpunetio_simple_receive] Exiting from sample
[13:47:53:203853][7618][DOCA][INF][gpunetio_simple_receive_sample.c:393][destroy_rxq] Destroying Rxq
[13:47:53:215509][7618][DOCA][INF][dpdk_pipe_legacy.c:936][dpdk_pipe_entries_flush] Pipe ROOT_PIPE - all 1 entries freed
[13:47:53:215516][7618][DOCA][INF][dpdk_pipe_common.c:263][dpdk_pipe_common_pre_pipe_destroy] portid 0 destroy pipe ROOT_PIPE
[13:47:53:215518][7618][DOCA][INF][dpdk_pipe_legacy.c:936][dpdk_pipe_entries_flush] Pipe ROOT_PIPE - all 0 entries freed
[13:47:53:215696][7618][DOCA][INF][dpdk_pipe_legacy.c:936][dpdk_pipe_entries_flush] Pipe GPU_RXQ_UDP_PIPE - all 1 entries freed
[13:47:53:215699][7618][DOCA][INF][dpdk_pipe_common.c:263][dpdk_pipe_common_pre_pipe_destroy] portid 0 destroy pipe GPU_RXQ_UDP_PIPE
[13:47:53:216617][7618][DOCA][INF][dpdk_pipe_legacy.c:936][dpdk_pipe_entries_flush] Pipe GPU_RXQ_UDP_PIPE - all 0 entries freed
[13:47:53:223779][7618][DOCA][INF][doca_buf_array.cpp:427][stop] buf_arr 0x610bcfcd4b20: buf_arr was stopped
[13:47:53:225182][7618][DOCA][INF][eth_rxq_common.c:687][eth_rxq_common_destroy_rq] ETH_RXQ 0x610bcfd5b6c0: Destroyed RQ 0xc0004a
[13:47:53:225593][7618][DOCA][INF][eth_rxq_common.c:593][eth_rxq_common_destroy_cq] ETH_RXQ 0x610bcfd5b6c0: Destroyed CQ 0x4ac
[13:47:53:225595][7618][DOCA][INF][doca_qp.cpp:1100][priv_doca_dev_qp_destroy] Destroying qp=0x610bcfd67950
[13:47:53:227203][7618][DOCA][INF][eth_rxq_common.c:174][eth_rxq_common_destroy_flush_qp] ETH_RXQ 0x610bcfd5b6c0: Destroyed flush QP 0xdc
[13:47:53:227425][7618][DOCA][INF][eth_rxq_common.c:593][eth_rxq_common_destroy_cq] ETH_RXQ 0x610bcfd5b6c0: Destroyed CQ 0x4ad
[13:47:53:227428][7618][DOCA][INF][doca_dev.cpp:2933][priv_doca_memory_region_destroy] Destroying memory_region=0x610bcff15e40
[13:47:53:227942][7618][DOCA][INF][doca_eth_rxq.c:1783][eth_rxq_stop_gpu_ctx] ETH_RXQ 0x610bcfd5b6c0: Context was started successfully
[13:47:53:227945][7618][DOCA][INF][doca_ctx.cpp:656][priv_doca_ctx_set_state_to_idle] CTX 0x610bcfd5b6c0 has successfully stopped
[13:47:53:227974][7618][DOCA][INF][doca_pe.cpp:115][priv_doca_pe_ctx_destroy] Destroying progress engine ctx=0x610bcfd5b6c0
[13:47:53:241903][7618][DOCA][INF][utils_hash_table.c:158][utils_hash_table_destroy] hashmatcher destroyed
[13:47:53:241943][7618][DOCA][INF][utils_hash_table.c:158][utils_hash_table_destroy] hashmatcher destroyed
[13:47:53:242210][7618][DOCA][INF][utils_hash_table.c:158][utils_hash_table_destroy] hashmatcher destroyed
[13:47:53:242218][7618][DOCA][INF][hws_meter_profiles.c:304][hws_meter_profiles_destroy] Destroyed meter profiles on port 0
[13:47:53:729453][7618][DOCA][INF][hws_port.c:1364][hws_port_destroy] Hws port 0 destroyed successfully with 2 queues
[13:47:53:729481][7618][DOCA][INF][doca_flow.c:1585][doca_flow_port_stop] port id = 0 stopped
[13:47:53:729501][7618][DOCA][INF][doca_flow_match.c:654][doca_flow_match_destroy] Doca flow match UDS destroyed
[13:47:53:729508][7618][DOCA][INF][doca_flow_actions.c:1358][doca_flow_actions_destroy] Doca flow actions UDS destroyed
[13:47:53:729510][7618][DOCA][INF][doca_flow_monitor.c:132][doca_flow_monitor_destroy] Doca flow monitor UDS destroyed
[13:47:53:729513][7618][DOCA][INF][doca_flow_actions.c:1368][doca_flow_encap_cfg_destroy] Doca flow res_encap_cfg UDS destroyed
[13:47:53:729515][7618][DOCA][INF][doca_flow_actions.c:1378][doca_flow_decap_cfg_destroy] Doca flow res_decap_cfg UDS destroyed
[13:47:53:729520][7618][DOCA][INF][doca_flow_layer.c:110][doca_flow_layer_destroy] Doca flow layer destroyed
[13:47:53:729527][7618][DOCA][INF][hws_pipe_geneve_opt.c:131][hws_pipe_geneve_opt_module_destroy] Destroyed hws pipe GENEVE options module
[13:47:53:729533][7618][DOCA][INF][hws_pipe_actions.c:3497][dpdk_pipe_actions_module_destroy] Destroyed dpdk pipe actions module
[13:47:53:729535][7618][DOCA][INF][hws_pipe_items.c:225][hws_pipe_items_module_destroy] Destroyed dpdk pipe items module
[13:47:53:729537][7618][DOCA][INF][hws_pipe.c:207][hws_pipe_module_destroy] Dpdk pipe destroyed
[13:47:53:729543][7618][DOCA][INF][hws_resource_manager.c:221][hws_resource_manager_module_destroy] Dpdk resource manager unregister completed
[13:47:53:729550][7618][DOCA][INF][hws_flow.c:645][hws_flow_module_cleanup] Cleanup dpdk flow
[13:47:53:729556][7618][DOCA][INF][hws_matcher.c:1424][hws_matcher_module_cleanup] Cleanup hws matcher
[13:47:53:729559][7618][DOCA][INF][hws_port.c:966][hws_port_module_cleanup] hws port module cleanup
[13:47:53:729671][7618][DOCA][INF][hws_layer.c:168][hws_layer_unregister] Dpdk layer unregister completed
[13:47:53:729739][7618][DOCA][INF][engine_field_mapping.c:130][engine_field_mapping_destroy] Engine field mapping destroyed
[13:47:53:729745][7618][DOCA][INF][engine_model.c:391][engine_model_destroy] engine model destroyed
[13:47:53:729748][7618][DOCA][INF][doca_flow.c:650][doca_flow_destroy] Doca flow destroyed
[13:47:53:729750][7618][DOCA][INF][doca_mmap.cpp:635][doca_mmap_destroy] Mmap 0x610bcfd5bb80: Destroying mmap
[13:47:53:730379][7618][DOCA][INF][doca_mmap.cpp:822][doca_mmap_stop] Mmap 0x610bcfd5bb80: mmap was stopped
[13:47:53:730808][7618][DOCA][INF][doca_dev.cpp:147][dev_put] Device 0x610bcfbc9500 was destroyed
[13:47:53:730812][7618][DOCA][INF][doca_dev.cpp:1019][doca_dev_close] Local device 0x610bcfbc9500 was closed
[13:47:53:730814][7618][DOCA][INF][gpunetio_simple_receive_sample.c:684][gpunetio_simple_receive] Sample finished successfully
[13:47:53:730818][7618][DOCA][INF][gpunetio_simple_receive_main.c:213][main] Sample finished successfully

doca_flow_resource_query_entry shows some packets in query_stats_entry.counter.total_pkts is not zero.

This is the same with the A2000 and the L4 GPUs. We observed also the same with BF2 (in NIC mode) and a CX6.

Both machine have the infamous Resizeable bar. There is no error whatsoever, the second one simply does not receive any packets. Adding a print in global void receive_packets shows we enter the loop, but doca_gpu_dev_eth_rxq_receive_block always sets the number of received packets to 0.

Any idea?

4 Likes

Did you resize the BAR1? or, alternatively, did you decrease either MAX_PKT_NUM or MAX_PKT_SIZE?

The BAR1 memory is 8GB, more than what the GPU needs (A2000). We tried to reduce MAX_PKT_NUM to 16K or 4K but without result. Same with MAX_PKT_SIZE…

Did you check ACS and PCIe config? DOCA GPUNetIO - NVIDIA Docs

Sometimes AMD PCIe is problematic for GPUDirect technologies