Unable to determine the device handle for GPU 000:35:00.0: Unknown Error

system:

os: ubuntu 18.04
GPUs: RTX-3090
CUDA Version: 11.7
Driver Version: 515.65.01

I using these gups to do Large language models inference, but this error happened:
企业微信截图_1705300993924
Here is the detailed info of the bug report
nvidia-bug-report.log.gz (449.3 KB)

BTW, I set vm.max_map_count=262144 because I have an elastic search service, can this possibly lead to this reason?
Also, all service are in docker container
@generix can you help me, thanks a lot

There is no dmesg or journalctl output in your logs so I can only guess the PSU is insufficient on power spikes. Please try limiting clock using nvidia-smi -lgc 300,1500 to check.

thanks for your reply, I update the driver, and now everything is ok