Kernal in Panic after reinstalling the DGX OS

Facing issue after updating the drivers from DGX dashboard and restart the device it says Kernal in panic and it doesn’t boot.

Again i have to reinstall the OS and then update the drivers and issue remains the same.
Is this a hardware issue ?

Can you share the exact error message you are seeing and where and what phase you are seeing them at?

  1. Boot corruption issues:

    • The boot sequence frequently gets corrupted.

    • Even after reinstalling the OS multiple times, the device sometimes fails to load properly and displays boot-related errors.

  2. Power failure while device is ON:

    • On several occasions, the device suddenly loses power even while it is running.

    • The screen goes completely blank, and the unit becomes unresponsive.

    • It only turns back on after performing a manual restart.

  3. Frequent freezing during workloads:

    • While running demanding tasks like image generation or AI workloads, the device often hangs.

    • It requires multiple forced restarts to function again.

  4. Overall system instability:

    • Even after following all the recommended steps, creating new OS media, and performing a clean installation, the same problems continue.

    • These issues are severely affecting usability and reliability.

If you need any specific logs, snapshots, or error details, I will be happy to provide them.

  1. Can you take a picture or otherwise relay what the boot-related errors are
  2. After the system crashes and you manually reboot the unit, can you send me an nvidia-bug-report as well as the contents of /var/log/dmesg and /var/log/kern.log?

Hi @subhadip1524, were you able to solve your issue?