After a prolonged stress test, the desktop program gnome crashes and the monitor shows no image

After running under full load for 82 hours, the desktop application gnome crashes and does not recover automatically, leaving the monitor without any display. The information queried by nvidia-smi indicates an abnormal status. Currently, this issue occurs sporadically on one machine without any discernible pattern.

The following GPU driver error appears in dmesg:
[294947.067674] NVRM: nvAssertOkFailedNoLog: Assertion failed: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from pRmApi->Control( pRmApi, hClient, hDevice, NV0080_CTRL_CMD_INTERNAL_MEMSYS_SET_ZBC_REFERENCED, ¶ms, sizeof(params)) @ mem_mgr_gm107.c:283

After the stress test, querying the GPU status via nvidia-smi shows abnormal power and utilization values. Executing the nvidia-smi command, the power is displayed as 18W, and utilization reaches 96%. (On a normally idle machine, the power is 4W, and utilization is 0%.)

After executing the following two commands:
systemctl isolate multi-user.target
systemctl isolate graphical.target

The nvidia-smi query information is normal, and the monitor displays correctly.

Please help analyze the cause of this issue.

Is this on DGX Spark? Could you then please ask in the DGX Spark forum?