Kernel panic when training with PyTorch & GTX1080Ti

CUDA 11.4, Driver version is 470.57.02; 8x GTX 1080Ti’s are installed on the server.
PyTorch Version is 1.7.1+cu101, which is installed with pip.
I understand that the CUDA version of PyTorch is not corresponding to that of CUDA Toolkit, but it seems unreasonable to directly cause a kernel panic instead of an application crash. This seems to be a bug of the NVIDIA kernel module.

Logs are pasted below.

Crash Log

The following crash log is exported with kdump and extracted with the crash command line utilify.

KERNEL: vmlinux-5.4.0-81-generic                                 
    DUMPFILE: dump.202109071343  [PARTIAL DUMP]
        CPUS: 40
        DATE: Tue Sep  7 21:42:22 2021
      UPTIME: 01:30:34
LOAD AVERAGE: 6.78, 6.05, 4.70
       TASKS: 1093
     RELEASE: 5.4.0-81-generic
     VERSION: #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 191.9 GB
       PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details)
         PID: 608479
     COMMAND: "python"
        TASK: ffff8e1837d91740  [THREAD_INFO: ffff8e1837d91740]
         CPU: 12

PID: 608479  TASK: ffff8e1837d91740  CPU: 12  COMMAND: "python"
 #0 [ffffb3ca29e3b758] machine_kexec at ffffffffaf66b7c3
 #1 [ffffb3ca29e3b7b8] __crash_kexec at ffffffffaf749822
 #2 [ffffb3ca29e3b888] crash_kexec at ffffffffaf74a5a9
 #3 [ffffb3ca29e3b8a0] oops_end at ffffffffaf6344a9
 #4 [ffffb3ca29e3b8c8] no_context at ffffffffaf67a19e
 #5 [ffffb3ca29e3b938] __bad_area_nosemaphore at ffffffffaf67a3b0
 #6 [ffffb3ca29e3b980] bad_area_nosemaphore at ffffffffaf67a516
 #7 [ffffb3ca29e3b990] do_user_addr_fault at ffffffffaf67aa37
 #8 [ffffb3ca29e3b9f8] __do_page_fault at ffffffffaf67af58
 #9 [ffffb3ca29e3ba20] do_page_fault at ffffffffaf67afbc
#10 [ffffb3ca29e3ba50] page_fault at ffffffffb0201284
    [exception RIP: _nv029462rm+1070]
    RIP: ffffffffc1110f7e  RSP: ffffb3ca29e3bb00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8e16e51e4008  RCX: 0000000000000001
    RDX: ffff8e16e51e4008  RSI: 0000000000000000  RDI: ffff8e16da5a0008
    RBP: ffff8e172bd62d30   R8: 0000000000000001   R9: ffffffffc0ce4c00
    R10: ffff8e16e51e0000  R11: 0000000000000001  R12: ffff8e16da5a0008
    R13: ffff8e16da5a0008  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffffb3ca29e3bb28] _nv029436rm at ffffffffc0d2aa59 [nvidia]
#12 [ffffb3ca29e3bb58] _nv002278rm at ffffffffc14476a9 [nvidia]
#13 [ffffb3ca29e3bb68] _nv003733rm at ffffffffc1442f7b [nvidia]
#14 [ffffb3ca29e3bb88] _nv014655rm at ffffffffc143edd6 [nvidia]
#15 [ffffb3ca29e3bbb8] _nv037695rm at ffffffffc143d313 [nvidia]
#16 [ffffb3ca29e3bbe8] _nv037694rm at ffffffffc143d647 [nvidia]
#17 [ffffb3ca29e3bc18] _nv037689rm at ffffffffc143d9e0 [nvidia]
#18 [ffffb3ca29e3bc38] _nv037690rm at ffffffffc143db0b [nvidia]
#19 [ffffb3ca29e3bc68] _nv036056rm at ffffffffc0d5bd10 [nvidia]
#20 [ffffb3ca29e3bc88] _nv000699rm at ffffffffc167b4c8 [nvidia]
#21 [ffffb3ca29e3bca8] rm_cleanup_file_private at ffffffffc167c58a [nvidia]
#22 [ffffb3ca29e3bd78] nvidia_close at ffffffffc0cda9e9 [nvidia]
#23 [ffffb3ca29e3bde0] __fput at ffffffffaf8cc63c
#24 [ffffb3ca29e3be30] ____fput at ffffffffaf8cc83e
#25 [ffffb3ca29e3be40] task_work_run at ffffffffaf6bdb0f
#26 [ffffb3ca29e3be78] do_exit at ffffffffaf69e31e
#27 [ffffb3ca29e3bef0] do_group_exit at ffffffffaf69eb47
#28 [ffffb3ca29e3bf20] __x64_sys_exit_group at ffffffffaf69ebc8
#29 [ffffb3ca29e3bf30] do_syscall_64 at ffffffffaf603fd7
#30 [ffffb3ca29e3bf50] entry_SYSCALL_64_after_hwframe at ffffffffb020008c
    RIP: 00007fac464532c6  RSP: 00007ffd4eab0478  RFLAGS: 00000213
    RAX: ffffffffffffffda  RBX: 000055e76c3f3f90  RCX: 00007fac464532c6
    RDX: 0000000000000000  RSI: 000000000000003c  RDI: 0000000000000000
    RBP: 00007fac45fe2360   R8: 00000000000000e7   R9: ffffffffffffff80
    R10: 00000000000000a1  R11: 0000000000000213  R12: 8000000000000001
    R13: 00007faa92d321f0  R14: 00007faa92d32040  R15: 00007faa92d321e8
    ORIG_RAX: 00000000000000e7  CS: 0033  SS: 002b

Kernel module info

filename:       /lib/modules/5.4.0-81-generic/kernel/drivers/video/nvidia.ko
firmware:       nvidia/470.57.02/gsp.bin
alias:          char-major-195-*
version:        470.57.02
supported:      external
license:        NVIDIA
srcversion:     00F9E8DEACC0FB98727C03C
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
retpoline:      Y
name:           nvidia
vermagic:       5.4.0-81-generic SMP mod_unload modversions 
parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_TCEBypassMode:int
parm:           NVreg_EnableStreamMemOPs:int
parm:           NVreg_RestrictProfilingToAdminUsers:int
parm:           NVreg_PreserveVideoMemoryAllocations:int
parm:           NVreg_EnableS0ixPowerManagement:int
parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
parm:           NVreg_DynamicPowerManagement:int
parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
parm:           NVreg_EnableGpuFirmware:int
parm:           NVreg_EnableUserNUMAManagement:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_KMallocHeapMaxSize:int
parm:           NVreg_VMallocHeapMaxSize:int
parm:           NVreg_IgnoreMMIOCheck:int
parm:           NVreg_NvLinkDisable:int
parm:           NVreg_EnablePCIERelaxedOrderingMode:int
parm:           NVreg_RegisterPCIDriver:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RegistryDwordsPerDevice:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_GpuBlacklist:charp
parm:           NVreg_TemporaryFilePath:charp
parm:           NVreg_ExcludedGpus:charp
parm:           rm_firmware_active:charp