Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error after executing nvidia-smi

Current system:
OS: Debian GNU/Linux bookworm/sid x86_64
Kernel: 5.18.0-2-amd64
CPU: 11th Gen Intel i9-11900K (16) @ 5.100GHz
GPU: NVIDIA GeForce RTX 3080 Ti

Nvidia driver: 510.73.08
CUDA version: 11.6

After few minutes of CNN training with torch the program hangs with no error codewise. Executing “nvidia-smi” prompts the following error: “Unable to determine the device handle for GPU 0000:01:00.0”

The training has been carried out with several architectures and configurations. Eventually all of them ended up halting.

The only workaround to make the gpu work again has been rebooting the machine after the error.

I attach the nvidia log:
nvidia-bug-report.log.gz (395.5 KB)