Error while running Doca sample code, doca_gpunetio/gpunetio_simple_receive

Hi,

cuda 12.6 (open kernel)
Doca 2.9 used.

facing issue in doca_ctx_start. Is it possible to guide with below error information?
Is there any configuration which i am missing ?

result = doca_ctx_start(rxq->eth_rxq_ctx);
(cuda-gdb)
warning: Cuda API error detected: cuMemHostRegister_v2 returned (0x1)

warning: Cuda API error detected: cuMemHostGetDevicePointer_v2 returned (0x1)

[DOCA][ERR][cb_ops.cpp:150][cb_doca_gpu_export_uar] Function cuMemHostRegister (err 1) & cuMemHostGetDevicePointer (err 1) failed on addr 0xfffff7fea800 size 512
[DOCA][ERR][eth_rxq_common.c:107][eth_rxq_common_create_uar] ETH_RXQ 0xaaaaab53f2e0: Failed to create UAR: failed to export UAR. err=DOCA_ERROR_DRIVER
[DOCA][ERR][eth_rxq_common.c:929][internal_doca_eth_rxq_gpu_common_create_rxq_objs] ETH_RXQ 0xaaaaab53f2e0: Failed to create eth_rxq_gpu objs:: failed to create UAR. err=DOCA_ERROR_DRIVER
[DOCA][ERR][doca_eth_rxq.c:1669][eth_rxq_start_gpu_ctx] ETH_RXQ 0xaaaaab53f2e0: Failed to start eth_rxq: failed to create receive queue objects. err=DOCA_ERROR_DRIVER
[DOCA][ERR][doca_ctx.cpp:277][doca_ctx_start] Failed to start context 0xffffffffe418 with status DOCA_ERROR_DRIVER

Also with the same issue.

You use open kernel driver? I am not sure if it is OK.

Please check dependence.

https://docs.nvidia.com/doca/sdk/doca+gpu+packet+processing+application+guide/index.html#src-3453015876_id-.DOCAGPUPacketProcessingApplicationGuidev2.9.1-Dependencies

Before running the application you need to be sure you have the following:

  • gdrdrv kernel module – active and running on the system
  • nvidia-peermem kernel module – active and running on the system
  • Network card interface you want to use is up

Yes, It is checked now.

  • gdrdrv kernel module – is active and running
  • nvidia-peermem kernel module – is active and running
  • Network card interface - is UP.

About open kernel prerequisite is found in the below link:

DOCA GPUNetIO - NVIDIA Docs

lsmod | grep gdrdrv
gdrdrv 28672 0
nvidia 9920512 54 nvidia_uvm,nvidia_peermem,gdrdrv,nvidia_modeset

ip link show

1: enP6p1s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 58:a2:e1:ab:40:b8 brd ff:ff:ff:ff:ff:ff
2: enP6p1s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 58:a2:e1:ab:40:b9 brd ff:ff:ff:ff:ff:ff

ethtool -i enP6p1s0f0np0
driver: mlx5_core
version: 24.10-1.1.4
firmware-version: 26.39.2048 (MT_0000000531)
expansion-rom-version:
bus-info: 0006:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

ethtool -i enP6p1s0f1np1
driver: mlx5_core
version: 24.10-1.1.4
firmware-version: 26.39.2048 (MT_0000000531)
expansion-rom-version:
bus-info: 0006:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

lspci | grep -i mell
0006:01:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
0006:01:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

@xiaofengl solution do not work, can you check and comment?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.