Unable to determine the device for GPU, unknown error

Hello,

I am running a cuda program on 2 RTX 3090. I have encountered this error many time, when running for one day or longer, this error may occur. When I call nvidia-smi, it raises,

Unable to determine the device for GPU 0000:08:00:0: unknown error.

I checked the temperature is ok and here is my bug-report
nvidia-bug-report.log.gz (1.6 MB)

Could you please give me some help? Thanks in advance!

Please try limiting clocks to stock clocks using nvidia-smi -lgc

Thanks for your reply!
I notice that nvidia-smi -lgc needs to set , which value or setting can I refer to?

Depends on the model, please refer to vendor specs. E.g. nvidia-smi -lgc 300,1600