Bright 9.2 on ubuntu, unable to update cuda to one of the images

I manage a mid size cluster (around 40 nodes) and it has been requested to update the nvidia gpu drivers. I did this in the past and it used to be an easy task. What I generally do is:

  1. head node: apt update, apt install cuda12.3-sdk cuda12.3-toolkit

  2. then I clone the image I need to update and I chroot into it, then i run:
    chrooted image: apt update; apt install cuda-driver cuda-dcgm

  3. I change the image of the category which the node belongs to and I reboot it

well: I did this in the past and it workede like a charm but now the installation process hangs. Please have a look

Tue Jul 2 16:21:23 2024 [notice] hnode01: Initial ramdisk for node gnode01 base d on image gpu-image-2024 was generated successfully
Tue Jul 2 16:25:46 2024 [notice] hnode01: gnode01 [ INSTALLING ] (node installer started)
Tue Jul 2 16:27:10 2024 [notice] hnode01: gnode01 [ INSTALLER_CALLINGINIT ] (sw itching to local root)
Tue Jul 2 16:37:10 2024 [notice] hnode01: gnode01 [ INSTALLER_UNREACHABLE ] (sw itching to local root)

the process never ends, roles (ie: slurm client) aren’t applied. I switched back to the original image I built when I installed the cluster and it works.
I have tried 5 times, the same process ends with this disastrous end.

Please note, those are the package versions I got form bright:

root@gpu-image-2024:/# apt list -a cuda-driver cuda-dcgm Listing… Done
cuda-dcgm/CM 9.2 1: amd64 [upgradable from: 1:]
cuda-dcgm/CM 9.2,now 1: amd64 [installed,upgradable to: 1:]
cuda-dcgm/CM 9.2 1: amd64
cuda-dcgm/CM 9.2 1: amd64
cuda-dcgm/CM 9.2 1: amd64
cuda-dcgm/CM 9.2 1: amd64

cuda-driver/CM 9.2 550.54.15-767-cm9.2 amd64 [upgradable from: 525.60.13-661-cm9.2]
cuda-driver/CM 9.2 535.129.03-738-cm9.2 amd64
cuda-driver/CM 9.2 530.30.02-711-cm9.2 amd64
cuda-driver/CM 9.2 530.30.02-682-cm9.2 amd64
cuda-driver/CM 9.2 525.85.12-665-cm9.2 amd64
cuda-driver/CM 9.2,now 525.60.13-661-cm9.2 amd64 [installed,upgradable to: 550.54.15-767-cm9.2]
cuda-driver/CM 9.2 520.61.05-640-cm9.2 amd64
cuda-driver/CM 9.2 515.65.01-638-cm9.2 amd64
cuda-driver/CM 9.2 515.65.01-636-cm9.2 amd64
cuda-driver/CM 9.2 515.43.04-609-cm9.2 amd64
cuda-driver/CM 9.2 510.47.03-600-cm9.2 amd64
cuda-driver/CM 9.2 510.39.01-595-cm9.2 amd64


Hi Davide,

Likely a good idea to send in a support request vs the forum. You can just cut and paste the text you’ve included here into the ticket.

Entering a case is easy: ESPCommunity

My teams don’t monitor or provide support here on the developer forum, but this is absolutely something we can assist with.


Ken Woods
Worldwide Manager, Nvidia BCM Support
Direct: +31 61 185 8321

thanks, I opened a case as suggested. We have another profile of nodes with amd gpus. I repeated that same process and it worked as expected.