After mode switch still unable to expose GPU to the DPU side

Hi,

I have a Bluefield-2 DPU card and a GTX 1070 Ti GPU card. I set DPU to ‘BlueField-X mode – the GPU is exposed to the DPU and is no longer visible on the host’. However, GPU is still exposed to the host.

FOR HOST SIDE

host@host:~/Desktop$ lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
02:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
02:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
02:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)

FOR DPU SIDE

root@localhost:/home/ubuntu# mlxconfig -d /dev/mst/mt41686_pciconf0 q PCI_DOWNSTREAM_PORT_OWNER[4]

Device #1:
----------

Device type:    BlueField2      
Name:           MBF2H332A-AENO_Ax_Bx
Description:    BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; PCIe Gen4 x8; Crypto Disabled; 16GB on-board DDR; 1GbE OOB management; HHHL
Device:         /dev/mst/mt41686_pciconf0

Configurations:                              Next Boot
         PCI_DOWNSTREAM_PORT_OWNER[4]        EMBEDDED_CPU(15)

root@localhost:/home/ubuntu# lspci
00:00.0 PCI bridge: Mellanox Technologies MT42822 BlueField-2 SoC Crypto disabled (rev 01)
01:00.0 PCI bridge: Mellanox Technologies MT42822 Family [BlueField-2 SoC PCIe Bridge] (rev 01)
02:00.0 PCI bridge: Mellanox Technologies MT42822 Family [BlueField-2 SoC PCIe Bridge] (rev 01)
03:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
03:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)

I have already install the recommand driver and cuda on Installation Guide: installing-cuda-on-converged-accelerator. Sadly, I still could not open nvidia-smi successfully.

root@localhost:/usr/local/cuda/bin# uname -r
5.4.0-1035-bluefield
root@localhost:/home/ubuntu# cd /usr/local/
root@localhost:/usr/local# ls
bin  cuda  cuda-11  cuda-11.6  etc  games  include  lib  man  sbin  share  src
root@localhost:/usr/local# cd cuda/bin
root@localhost:/usr/local/cuda/bin# ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:19:06_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
root@localhost:/usr/local/cuda/bin# ls -l /usr/src
total 16
drwxr-xr-x 24 root root 4096 May  1 19:20 linux-bluefield-headers-5.4.0-1035
drwxr-xr-x  6 root root 4096 May  1 19:20 linux-headers-5.4.0-1035-bluefield
drwxr-xr-x  9 root root 4096 Jul  1 04:58 nvidia-510.47.03
drwxr-xr-x  3 root root 4096 May  1 19:20 ofa_kernel
root@localhost:/usr/local/cuda/bin# dkms install -m nvidia -v 510.47.03
Module nvidia/510.47.03 already installed on kernel 5.4.0-1035-bluefield/aarch64

root@localhost:/usr/local/cuda/bin# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I would be really grateful if you cound give me some advice.

Best Regards,
Zhaoyang.