Hi nvidia team.
I have used 2 Geforce RTX 4060 TI in server for deep learning for 4 months. But as I try using the 2 gpus in one docker container, “end kernel panic” error occurred. Since it happens upon booting, it is hard for me to troubleshoot on my end.
Here are the actions that I took so far.
-
Select “advanced options for ubuntu” Choose 5.15.0-119 (recovery mode)-> Issue reappears
-
After formatting, the following steps were attempted → Issue reappears
- sudo apt-get install ubuntu-drivers-common
- ubuntu-drivers devices | grep recommended
- sudo apt-get install nvidia-driver-550
- sudo reboot now
- Upon reboot, select “ubuntu (default)” kernel instead of “advanced options for ubuntu” in the grub menu
- Initially, it worked fine, but the issue reappeared after updating the nvidia-driver.
4.I mounted ubuntu–vg-ubuntu–lv via a live boot usb and tried deleting nvidia driver after the issue. But the error reappears.
It would be appreciated if I could be advised. Thank you!