DGX Station (V100) latest driver

Hello,

I’m trying to upgrade the nvidia driver of my DGX Station (V100) from nvidia-driver-470-server to nvidia-driver-525-server following this guide, without success.
I cannot find a clear reference of driver branches compatibility for DGX Station (V100). Is the R525 branch incompatible with my machine? What else could be the problem?
I’m using DGX OS 5.4.2.

Thanks in advance

I have the 525 series installed via your linked guide.

What is the error that you’re seeing?

When I run

apt install -y --reinstall nvidia-peer-memory-dkms

I get

nv_peer_mem.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-146-generic/updates/dkms/

depmod...

DKMS: install completed.
modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument

Note that the module is there:

❯ l  /lib/modules/5.4.0-146-generic/updates/dkms/
total 4,4M
-rw-r--r-- 1 root root  15K apr  3 10:24 auxiliary.ko
-rw-r--r-- 1 root root  87K apr  3 10:24 ib_cm.ko
-rw-r--r-- 1 root root 522K apr  3 10:24 ib_core.ko
-rw-r--r-- 1 root root 229K apr  3 10:24 ib_ipoib.ko
-rw-r--r-- 1 root root  37K apr  3 10:24 ib_umad.ko
-rw-r--r-- 1 root root 195K apr  3 10:24 ib_uverbs.ko
-rw-r--r-- 1 root root  76K apr  3 10:24 iw_cm.ko
-rw-r--r-- 1 root root 2,2M apr  3 10:24 mlx5_core.ko
-rw-r--r-- 1 root root 586K apr  3 10:24 mlx5_ib.ko
-rw-r--r-- 1 root root  21K apr  3 10:24 mlx_compat.ko
-rw-r--r-- 1 root root 176K apr  3 10:24 mlxdevm.ko
-rw-r--r-- 1 root root  40K apr  3 10:24 mlxfw.ko
-rw-r--r-- 1 root root  20K apr  5 10:21 nv_peer_mem.ko
-rw-r--r-- 1 root root 170K apr  3 10:24 rdma_cm.ko
-rw-r--r-- 1 root root  45K apr  3 10:24 rdma_ucm.ko

dmesg:

[mer apr  5 10:51:59 2023] NVRM: API mismatch: the client has the version 525.85.12, but
                           NVRM: this kernel module has the version 470.161.03.  Please
                           NVRM: make sure that this kernel module and all NVIDIA driver
                           NVRM: components have the same version.
[mer apr  5 10:51:59 2023] NVRM: API mismatch: the client has the version 525.85.12, but
                           NVRM: this kernel module has the version 470.161.03.  Please
                           NVRM: make sure that this kernel module and all NVIDIA driver
                           NVRM: components have the same version.
[mer apr  5 10:52:21 2023] nv_peer_mem: Unknown symbol nvidia_p2p_cap_persistent_pages (err -2)
[mer apr  5 10:52:21 2023] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[mer apr  5 10:52:21 2023] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[mer apr  5 10:52:21 2023] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages
[mer apr  5 10:52:21 2023] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[mer apr  5 10:52:21 2023] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[mer apr  5 10:52:21 2023] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)

Actually, solved: to avoid this mismatch we just had to reboot (the older modules are embedded in the linux image and prevent loading the new ones).