System crash

I load the tensorrt model repeatedly, and then close the program. (repeat it).The ubuntu happen system crash.
If I reboot the PC, it will work.
Do you have any idea to avoid the “system crash” ?

Environment
TensorRT Version: 7.2.2 (nvcr.io/nvidia/tensorrt:20.12-py3)
GPU Type: GeForce RTX 2060
Nvidia Driver Version: 455.45.01
CUDA Version: CUDA 11.1
OS system: ubuntu 18.04

Please run nvidia-bug-report.sh as root after the issue appeared and attach the resulting nvidia-bug-report.log.gz file to your post.

nvidia-bug-report.log.gz (819.4 KB)
Here is the log.
Thanks.

Please start nvidia-persistenced and make sure it is continuously running.

command : sudo -i

command : nvidia-persistenced
message : nvidia-persistenced failed to initialize. Check syslog for more details.

command : less /var/log/syslog
message :
Jan 19 17:15:11 sa nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
Jan 19 17:15:11 sa nvidia-persistenced: Shutdown (16047)

How to make sure it is continuously running?
htop. I cant see any nvidia

There should be a systemd service installed. Try to enable and start it.

So the step will be
Step1.
sudo -i
nvidia-persistenced
make sure it is continuously running

Step2.
load the tensorrt model repeatedly.

Step3.
When the System crash happen, I have to reboot it.

Step4.
nvidia-bug-report.sh

Step5.
Get the nvidia-bug-report.log.gz

right?

With the persistence daemon started, the crash shouldn’t happen. In case it does, please enable journald persistence first. Currently, all logs are purged on reboot, so no errors are caught in the logs.

So the step will be

Step1.
sudo -i
nvidia-persistenced
make sure it is continuously running

Step2.
load the tensorrt model repeatedly.

Step3.
If the System have some error, run the nvidia-bug-report.sh

OR

Step3.
After step2, directly run the nvidia-bug-report.sh

right?

Yes, if error, run nvidia-bug-report.sh immediately.

How to make sure "/usr/bin/nvidia-persistenced
" is continuously running?

Use the command, it cannt find nvidia-persistenced
command:
systemctl --type=service --state=running

It should be in the package
nvidia-compute-utils

Here is the log
nvidia-bug-report.log.gz (1.1 MB)

I already use “/usr/bin/nvidia-persistenced”, but it is still crash.
So I reboot it and run “nvidia-bug-report.sh”

Thanks

How about your analysis?
Thanks

Actually, there are no nvidia related errors caught in the logs. Are you able to ssh into the system when the crash happend?

line3867,
When it crash,I reboot it and then use the command ”/usr/bin/nvidia-persistenced”.