Hi,
cuda 12.6 (open kernel)
Doca 2.9 used.
facing issue in doca_ctx_start. Is it possible to guide with below error information?
Is there any configuration which i am missing ?
result = doca_ctx_start(rxq->eth_rxq_ctx);
(cuda-gdb)
warning: Cuda API error detected: cuMemHostRegister_v2 returned (0x1)
warning: Cuda API error detected: cuMemHostGetDevicePointer_v2 returned (0x1)
[DOCA][ERR][cb_ops.cpp:150][cb_doca_gpu_export_uar] Function cuMemHostRegister (err 1) & cuMemHostGetDevicePointer (err 1) failed on addr 0xfffff7fea800 size 512
[DOCA][ERR][eth_rxq_common.c:107][eth_rxq_common_create_uar] ETH_RXQ 0xaaaaab53f2e0: Failed to create UAR: failed to export UAR. err=DOCA_ERROR_DRIVER
[DOCA][ERR][eth_rxq_common.c:929][internal_doca_eth_rxq_gpu_common_create_rxq_objs] ETH_RXQ 0xaaaaab53f2e0: Failed to create eth_rxq_gpu objs:: failed to create UAR. err=DOCA_ERROR_DRIVER
[DOCA][ERR][doca_eth_rxq.c:1669][eth_rxq_start_gpu_ctx] ETH_RXQ 0xaaaaab53f2e0: Failed to start eth_rxq: failed to create receive queue objects. err=DOCA_ERROR_DRIVER
[DOCA][ERR][doca_ctx.cpp:277][doca_ctx_start] Failed to start context 0xffffffffe418 with status DOCA_ERROR_DRIVER
Also with the same issue.
You use open kernel driver? I am not sure if it is OK.
Please check dependence.
https://docs.nvidia.com/doca/sdk/doca+gpu+packet+processing+application+guide/index.html#src-3453015876_id-.DOCAGPUPacketProcessingApplicationGuidev2.9.1-Dependencies
Before running the application you need to be sure you have the following:
gdrdrv
kernel module – active and running on the system
nvidia-peermem
kernel module – active and running on the system
Network card interface you want to use is up
Yes, It is checked now.
gdrdrv
kernel module – is active and running
nvidia-peermem
kernel module – is active and running
Network card interface - is UP.
About open kernel prerequisite is found in the below link:
DOCA GPUNetIO - NVIDIA Docs
lsmod | grep gdrdrv
gdrdrv 28672 0
nvidia 9920512 54 nvidia_uvm,nvidia_peermem,gdrdrv,nvidia_modeset
ip link show
1: enP6p1s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 58:a2:e1:ab:40:b8 brd ff:ff:ff:ff:ff:ff
2: enP6p1s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 58:a2:e1:ab:40:b9 brd ff:ff:ff:ff:ff:ff
ethtool -i enP6p1s0f0np0
driver: mlx5_core
version: 24.10-1.1.4
firmware-version: 26.39.2048 (MT_0000000531)
expansion-rom-version:
bus-info: 0006:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
ethtool -i enP6p1s0f1np1
driver: mlx5_core
version: 24.10-1.1.4
firmware-version: 26.39.2048 (MT_0000000531)
expansion-rom-version:
bus-info: 0006:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
lspci | grep -i mell
0006:01:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
0006:01:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
@xiaofengl solution do not work, can you check and comment?
system
Closed
March 7, 2025, 9:20am
6
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.