my GPU is always fallen off the bus ,and linkStatus is downgraded to 2.5GT/s when i install the driver , when i uninstall it ,it become to 16GT/s.
LnkCap: Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[ 16.540612] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 575.57.08 Release Build (root@ubuntu22)
[ 20.295422] NVRM: kbifInitLtr_GB202: LTR is disabled in the hierarchy
[ 147.082344] NVRM: GPU at PCI:0000:01:00: GPU-ed522521-b51c-4f5f-22c9-93158071d594
[ 147.082352] NVRM: GPU Board Serial Number: xxxxxxxx
[ 147.082353] NVRM: Xid (PCI:0000:01:00): 79, pid=1977, name=matrixMul, GPU has fallen off the bus.
[ 147.082359] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[ 147.082360] NVRM: GPU 0000:01:00.0: GPU serial number is xxxxxxxx.
[ 147.082369] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all channels for critical error 79.
[ 147.082381] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082392] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082397] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082401] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082405] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082408] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082412] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082416] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082419] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082423] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082430] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082436] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082442] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082452] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082457] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.082484] NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
[ 147.091663] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.091701] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.091716] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.091956] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.091978] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.091990] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092012] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092034] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092044] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092068] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092089] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092099] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092121] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092142] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092152] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092174] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092195] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092204] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092226] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092247] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092257] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092278] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092299] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092308] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092330] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092351] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092361] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092382] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092403] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092413] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092433] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092454] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092464] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092521] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092543] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092554] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092576] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092597] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092607] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092629] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092650] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092660] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092683] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092704] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092714] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092736] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092757] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092767] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092791] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092812] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092822] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092846] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092867] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092877] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092899] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092920] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092930] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.092952] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.092973] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.092983] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093005] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.093026] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.093036] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093057] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.093078] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.093088] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093110] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.093130] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.093140] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093162] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.093182] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.093192] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093214] NVRM: nvAssertFailedNoLog: Assertion failed: !pKernelGsp->bPollingForRpcResponse @ kernel_gsp.c:2219
[ 147.093235] NVRM: _issueRpcAndWait: rpcRecvPoll failed with status 0x00000040 for fn 78!
[ 147.093245] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
[ 147.093737] NVRM: RmLogGpuCrash: RmLogGpuCrash: failed to save GPU crash data
[ 147.093751] NVRM: nvAssertFailedNoLog: Assertion failed: expectedFunc == pHistoryEntry->function @ kernel_gsp.c:2126
[ 147.093775] NVRM: _kgspLogRpcSanityCheckFailure: GPU0 sanity check failed 0xf waiting for RPC response from GSP. Expected function 21 (DUP_OBJECT) (0x0 0x0).
[ 147.093783] NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000.
[ 147.093800] NVRM: GPU0 RPC history (CPU -> GSP):
[ 147.093806] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
[ 147.093813] NVRM: 0 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d403 0x0000000000000000 y
[ 147.093831] NVRM: -1 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d3d0 0x0000000000000000
[ 147.093847] NVRM: -2 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d39c 0x0000000000000000
[ 147.093860] NVRM: -3 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d367 0x0000000000000000
[ 147.093872] NVRM: -4 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d332 0x0000000000000000
[ 147.093883] NVRM: -5 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d2fe 0x0000000000000000
[ 147.093894] NVRM: -6 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d2c9 0x0000000000000000
[ 147.093905] NVRM: -7 78 DUMP_PROTOBUF_COMPONE 0x0000000000000000 0x0000000000000000 0x00060aafe0e7d294 0x0000000000000000
[ 147.093916] NVRM: GPU0 RPC event history (CPU <- GSP):
[ 147.093922] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
[ 147.093929] NVRM: 0 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000000 0x0000000000000000 0x00060aafd94e2d6c 0x00060aafd94e2d6d 1us
[ 147.093948] NVRM: -1 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000001 0x0000000000000000 0x00060aafd94e18d5 0x00060aafd94e18d5
[ 147.093961] NVRM: -2 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000000 0x0000000000000000 0x00060aafd94e17d7 0x00060aafd94e17d7
[ 147.093974] NVRM: -3 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000001 0x0000000000000000 0x00060aafd94e1401 0x00060aafd94e1402 1us
[ 147.093989] NVRM: -4 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000000 0x0000000000000000 0x00060aafd94ca747 0x00060aafd94ca748 1us
[ 147.094003] NVRM: -5 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000001 0x0000000000000000 0x00060aafd94ca6c3 0x00060aafd94ca6c3
[ 147.094015] NVRM: -6 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x00060aafd94ca241 0x00060aafd94ca246 5us
[ 147.094030] NVRM: -7 4124 GSP_LOCKDOWN_NOTICE 0x0000000000000000 0x0000000000000000 0x00060aafd94c9770 0x00060aafd94c9771 1us
[ 147.113252] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9558
[ 147.113297] NVRM: nvGpuOpsRetainChannel: nvGpuOpsRetainChannel:9740: GPU lost from the bus [NV_ERR_GPU_IS_LOST]
[ 147.113314] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113324] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113329] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9824
[ 147.113346] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113356] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113360] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9839
[ 147.113365] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9845
[ 147.113417] NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required)
[ 147.113452] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, RES_GET_CLIENT_HANDLE(pKernelChannel), RES_GET_HANDLE(pKernelChannel), NVA06F_CTRL_CMD_STOP_CHANNEL, &stopChannelParams, sizeof(stopChannelParams)) @ nv_gpu_ops.c:10453
[ 147.113484] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, retainedChannel->session->handle, retainedChannel->rmSubDevice->subDeviceHandle, NV2080_CTRL_CMD_GPU_EVICT_CTX, ¶ms, sizeof(params)) @ nv_gpu_ops.c:10473
[ 147.113524] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113534] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113539] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9824
[ 147.113551] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113558] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113563] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9833
[ 147.113578] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113586] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113591] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9839
[ 147.113595] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9845
[ 147.113621] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, RES_GET_CLIENT_HANDLE(pKernelChannel), RES_GET_HANDLE(pKernelChannel), NVA06F_CTRL_CMD_STOP_CHANNEL, &stopChannelParams, sizeof(stopChannelParams)) @ nv_gpu_ops.c:10453
[ 147.113632] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, retainedChannel->session->handle, retainedChannel->rmSubDevice->subDeviceHandle, NV2080_CTRL_CMD_GPU_EVICT_CTX, ¶ms, sizeof(params)) @ nv_gpu_ops.c:10473
[ 147.113658] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113666] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113671] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9824
[ 147.113682] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113689] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113694] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9833
[ 147.113707] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113714] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113719] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9839
[ 147.113723] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9845
[ 147.113743] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, RES_GET_CLIENT_HANDLE(pKernelChannel), RES_GET_HANDLE(pKernelChannel), NVA06F_CTRL_CMD_STOP_CHANNEL, &stopChannelParams, sizeof(stopChannelParams)) @ nv_gpu_ops.c:10453
[ 147.113753] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, retainedChannel->session->handle, retainedChannel->rmSubDevice->subDeviceHandle, NV2080_CTRL_CMD_GPU_EVICT_CTX, ¶ms, sizeof(params)) @ nv_gpu_ops.c:10473
[ 147.113775] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113783] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113787] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9824
[ 147.113798] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113805] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113810] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9833
[ 147.113823] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113830] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113835] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9839
[ 147.113839] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9845
[ 147.113857] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, RES_GET_CLIENT_HANDLE(pKernelChannel), RES_GET_HANDLE(pKernelChannel), NVA06F_CTRL_CMD_STOP_CHANNEL, &stopChannelParams, sizeof(stopChannelParams)) @ nv_gpu_ops.c:10453
[ 147.113866] NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control(pRmApi, retainedChannel->session->handle, retainedChannel->rmSubDevice->subDeviceHandle, NV2080_CTRL_CMD_GPU_EVICT_CTX, ¶ms, sizeof(params)) @ nv_gpu_ops.c:10473
[ 147.113887] NVRM: nvGpuOpsReportFatalError: uvm encountered global fatal error 0x60, requiring os reboot to recover.
[ 147.113946] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113954] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113959] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9824
[ 147.113970] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.113977] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.113982] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9833
[ 147.113995] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.114002] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.114007] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9839
[ 147.114011] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9845
[ 147.114044] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.114057] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.114083] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.114091] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.115054] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.115067] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.115821] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.115832] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.115849] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.115855] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.116591] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.116603] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.117319] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.117330] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.117348] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.117353] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.118071] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.118083] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.118811] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.118822] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.118841] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.118846] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.119585] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.119598] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.120509] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.120522] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.120557] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.120611] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.130981] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538
[ 147.131558] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.131568] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.131583] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.131589] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.131866] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.131873] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.131886] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.131893] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132112] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132154] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132197] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132202] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132215] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132220] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132441] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132448] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132479] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132485] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.132709] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 147.132749] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 147.133276] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538
[ 147.134069] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ nv_gpu_ops.c:9061
[ 147.134098] NVRM: kgmmuFaultBufferReplayableDestroy_IMPL: Unregistering Replayable Fault buffer failed (status=0x0000000f), proceeding...
[ 147.134177] NVRM: uvmTerminateAccessCntrBuffer_IMPL: Unloading UVM Access counters failed (status=0x0000000f), proceeding...
[ 148.137598] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538
[ 148.137618] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ vaspace_api.c:538
[ 148.137628] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_client.c:844
[ 148.137634] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:259
[ 148.137638] NVRM: nvAssertFailedNoLog: Assertion failed: (status == NV_OK) || (status == NV_ERR_GPU_IN_FULLCHIP_RESET) @ rs_server.c:1375