Intermittent driver crashes with RTX 4090 on Ubuntu 22.04

Hi I have a RTX 4090 installed which is being used for various graphics and CUDA tasks. The graphics drivers crash intermittently with Xid 62 and subsequent Xid 45. This generally happens when compiling CUDA applications or using Blender. After the drivers crash, the display still works but the FAN field shows ERR! in nvidia-smi and power draw is constant at ~40W. The temperatures seem fine before the crash happens. I am attaching the kernel logs, nvidia-bug-report, screenshots of nvidia-smi. I have been able to successfully run gpu-burn for an hour without facing any issue. The card starts working fine after a reboot.

System Details:
CPU: 13th Gen Intel(R) Core™ i9-13900F
RAM: 128GB DDR5
MB: ASUS PRIME Z790-P-CSM
GPU: Zotac Gaming GeForce RTX 4090 Trinity OC 24GB GDDR6X
PSU: Corsair 1000W Gold
OS: Ubuntu 22.04 LTS
Kernel: 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Driver: 525.85.05 (installed from additional drivers)
CUDA: 11.8

Kernel logs (grep NVRM):

Mar  5 18:57:19 chetak kernel: [    4.133806] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.85.05  Sat Jan 14 00:49:50 UTC 2023
Mar  5 21:11:22 chetak kernel: [ 8046.795565] NVRM: GPU at PCI:0000:01:00: GPU-76553937-c765-97b7-be41-94973e536f44
Mar  5 21:11:22 chetak kernel: [ 8046.795568] NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, badfbadf(badfbadf) 00000000 00000000
Mar  5 21:11:22 chetak kernel: [ 8046.797030] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000040
Mar  5 21:11:22 chetak kernel: [ 8046.799139] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000041
Mar  5 21:11:22 chetak kernel: [ 8046.801183] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000042
Mar  5 21:11:22 chetak kernel: [ 8046.803226] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000043
Mar  5 21:11:22 chetak kernel: [ 8046.805258] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000044
Mar  5 21:11:22 chetak kernel: [ 8046.807287] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000045
Mar  5 21:11:22 chetak kernel: [ 8046.809326] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000046
Mar  5 21:11:22 chetak kernel: [ 8046.811415] NVRM: Xid (PCI:0000:01:00): 45, pid=74236, name=instant-ngp, Ch 00000047
Mar  5 21:25:13 chetak kernel: [    4.360214] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.85.05  Sat Jan 14 00:49:50 UTC 2023
Mar  5 21:42:45 chetak kernel: [ 1056.706412] NVRM: GPU at PCI:0000:01:00: GPU-76553937-c765-97b7-be41-94973e536f44
Mar  5 21:42:45 chetak kernel: [ 1056.706415] NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, badfbadf(badfbadf) 00000000 00000000
Mar  5 21:42:45 chetak kernel: [ 1056.707870] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000020
Mar  5 21:42:45 chetak kernel: [ 1056.709912] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000021
Mar  5 21:42:45 chetak kernel: [ 1056.711912] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000022
Mar  5 21:42:45 chetak kernel: [ 1056.713919] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000023
Mar  5 21:42:45 chetak kernel: [ 1056.715914] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000024
Mar  5 21:42:45 chetak kernel: [ 1056.717917] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000025
Mar  5 21:42:45 chetak kernel: [ 1056.719929] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000026
Mar  5 21:42:45 chetak kernel: [ 1056.721940] NVRM: Xid (PCI:0000:01:00): 45, pid=44157, name=instant-ngp, Ch 00000027
Mar  5 21:48:59 chetak kernel: [    5.288245] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.85.05  Sat Jan 14 00:49:50 UTC 2023
Mar  7 12:13:06 chetak kernel: [138250.767277] NVRM: GPU at PCI:0000:01:00: GPU-76553937-c765-97b7-be41-94973e536f44
Mar  7 12:13:06 chetak kernel: [138250.767280] NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, badfbadf(badfbadf) 00000000 00000000
Mar  7 12:26:51 chetak kernel: [    4.601930] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.85.05  Sat Jan 14 00:49:50 UTC 2023
Mar  8 07:55:47 chetak kernel: [70140.517491] NVRM: GPU at PCI:0000:01:00: GPU-76553937-c765-97b7-be41-94973e536f44
Mar  8 07:55:47 chetak kernel: [70140.517495] NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, badfbadf(badfbadf) 00000000 00000000
Mar  8 07:55:47 chetak kernel: [70140.518982] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 00000038
Mar  8 07:55:47 chetak kernel: [70140.521045] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 00000039
Mar  8 07:55:47 chetak kernel: [70140.523059] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003a
Mar  8 07:55:47 chetak kernel: [70140.525075] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003b
Mar  8 07:55:47 chetak kernel: [70140.527092] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003c
Mar  8 07:55:47 chetak kernel: [70140.529107] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003d
Mar  8 07:55:47 chetak kernel: [70140.531118] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003e
Mar  8 07:55:47 chetak kernel: [70140.533138] NVRM: Xid (PCI:0000:01:00): 45, pid=2370038, name=python3.10, Ch 0000003f
Mar  8 07:56:15 chetak kernel: [70169.010182] NVRM: Xid (PCI:0000:01:00): 109, pid=2635, name=Xorg, Ch 00000018, errorString CTX SWITCH TIMEOUT, Info 0x8c003

Bug Report:
nvidia-bug-report.log.gz (865.3 KB)

nvidia-smi:

Temperature graph:

Fan speed:

Power Draw:

The issue seems to be fixed (no crashes for 3 days) by switching the vBIOS to QUIET mode. Probably some bug with the vBIOS from Zotac.