Black screen on agx orin

Hi,

Black screen on agx orin,
log As follows:

Call trace:
[  126.321850]  dump_backtrace+0x0/0x1d0
[  126.321853]  show_stack+0x30/0x40
[  126.321858]  dump_stack+0xd8/0x138
[  126.321908]  os_dump_stack+0x18/0x20 [nvidia]
[  126.321947]  tlsEntryGet+0x130/0x138 [nvidia]
[  126.321981]  gpumgrGetSomeGpu+0x7c/0x90 [nvidia]
[  126.322016]  threadPriorityStateFree+0x234/0x2a0 [nvidia]
[  126.322051]  RmShutdownAdapter+0x168/0x268 [nvidia]
[  126.322085]  rm_shutdown_adapter+0x50/0x70 [nvidia]
[  126.322119]  nv_shutdown_adapter+0xb4/0x4b0 [nvidia]
[  126.322153]  nv_shutdown_adapter+0x2d8/0x4b0 [nvidia]
[  126.322187]  nvidia_dev_put+0x38/0xc40 [nvidia]
[  126.322226]  nvkms_close_gpu+0x60/0x98 [nvidia_modeset]
[  126.322255]  nvRmFreeDeviceEvo+0x8c/0x130 [nvidia_modeset]
[  126.322277]  nvkms_ioctl_common+0x180/0x1b0 [nvidia_modeset]
[  126.322311]  nvidia_frontend_unlocked_ioctl+0x5c/0x78 [nvidia]
[  126.322318]  __arm64_sys_ioctl+0xac/0xf0
[  126.322322]  el0_svc_common.constprop.0+0x80/0x1d0
[  126.322324]  do_el0_svc+0x38/0xb0
[  126.322327]  el0_svc+0x1c/0x30
[  126.322329]  el0_sync_handler+0xa8/0xb0
[  126.322331]  el0_sync+0x16c/0x180
[  184.695079] nvdec 15480000.nvdec: RISC-V desc binary name:nvhost_nvdec050_desc_prod.bin
[  184.707731] nvdec 15480000.nvdec: RISC-V booting from GSC
[  184.721465] nvdec 15480000.nvdec: RISCV boot success
[  190.337851] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  190.337860] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  190.347172] falcon 154c0000.nvenc: looking for firmware in subdirectory
[  440.284775] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[  440.285620] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[  440.289027] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[  440.360923] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[  440.492879] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[  440.492883] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[  440.554471] cpufreq: cpu0,cur:745000,set:883200,set ndiv:69
[  442.553246] cpufreq: cpu0,cur:1804000,set:883200,set ndiv:69
[  444.265438] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[  444.266310] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[  444.269470] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[  444.337732] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[  444.460040] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[  444.460049] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[  446.610909] cpufreq: cpu4,cur:1915000,set:1651200,set ndiv:129
[  448.615927] cpufreq: cpu4,cur:730000,set:960000,set ndiv:75
[  449.608637] cpufreq: cpu0,cur:1401000,set:1728000,set ndiv:135
[  529.357973] veth7999acb: renamed from eth0
[  529.388547] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  529.452144] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  529.458206] device vethbac5db2 left promiscuous mode
[  529.458216] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  856.156487] br-a34371b46ca3: port 2(veth7abfa9b) entered blocking state
[  856.156512] br-a34371b46ca3: port 2(veth7abfa9b) entered disabled state
[  856.156674] device veth7abfa9b entered promiscuous mode
[  856.473502] eth0: renamed from veth3d52e48
[  856.493458] IPv6: ADDRCONF(NETDEV_CHANGE): veth7abfa9b: link becomes ready
[  856.493568] br-a34371b46ca3: port 2(veth7abfa9b) entered blocking state
[  856.493574] br-a34371b46ca3: port 2(veth7abfa9b) entered forwarding state
[  984.217930] br-a34371b46ca3: port 3(veth18dc1c4) entered blocking state
[  984.217937] br-a34371b46ca3: port 3(veth18dc1c4) entered disabled state
[  984.218054] device veth18dc1c4 entered promiscuous mode
[  984.509643] eth0: renamed from vethf8517aa
[  984.545944] IPv6: ADDRCONF(NETDEV_CHANGE): veth18dc1c4: link becomes ready
[  984.546025] br-a34371b46ca3: port 3(veth18dc1c4) entered blocking state
[  984.546028] br-a34371b46ca3: port 3(veth18dc1c4) entered forwarding state
[ 1064.405120] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1064.405907] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1064.409170] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1064.475123] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 1064.589082] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[ 1064.589087] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[ 1065.742276] cpufreq: cpu0,cur:2198000,set:1728000,set ndiv:135
[ 1066.744378] cpufreq: cpu0,cur:2075000,set:1958400,set ndiv:153
[ 1066.752119] cpufreq: cpu4,cur:988000,set:1190400,set ndiv:93
[ 1068.747613] cpufreq: cpu0,cur:985000,set:1958400,set ndiv:153
[ 1070.750335] cpufreq: cpu4,cur:1252000,set:1113600,set ndiv:87
[ 1072.761498] cpufreq: cpu8,cur:1321000,set:2035200,set ndiv:159
[ 1073.751068] cpufreq: cpu4,cur:1344000,set:1190400,set ndiv:93
[ 1074.758683] cpufreq: cpu8,cur:2032000,set:1497600,set ndiv:117
[ 1075.762317] cpufreq: cpu8,cur:673000,set:960000,set ndiv:75
[ 1081.753741] cpufreq: cpu0,cur:1107000,set:2201600,set ndiv:172
[ 1082.753649] cpufreq: cpu0,cur:1495000,set:2201600,set ndiv:172
[ 1086.755669] cpufreq: cpu0,cur:1765000,set:1881600,set ndiv:147
[ 1087.765710] cpufreq: cpu8,cur:893000,set:1728000,set ndiv:135
[ 1091.769145] cpufreq: cpu8,cur:729000,set:1497600,set ndiv:117
[ 1093.765174] cpufreq: cpu4,cur:1109000,set:2112000,set ndiv:165
[ 1093.772393] cpufreq: cpu8,cur:1334000,set:1497600,set ndiv:117
[ 1099.770440] cpufreq: cpu8,cur:1421000,set:1036800,set ndiv:81
[ 1104.767569] cpufreq: cpu0,cur:2017000,set:2201600,set ndiv:172
[ 1106.781496] cpufreq: cpu8,cur:2260000,set:2112000,set ndiv:165
[ 1107.779740] cpufreq: cpu8,cur:2086000,set:2201600,set ndiv:172
[ 1110.786328] cpufreq: cpu8,cur:368000,set:729600,set ndiv:57
[ 1111.784390] cpufreq: cpu8,cur:882000,set:1190400,set ndiv:93
[ 1137.751587] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1137.752374] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1137.755209] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1137.820430] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 1137.941394] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[ 1137.941404] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[ 1139.130061] cpufreq: cpu0,cur:1172000,set:1728000,set ndiv:135
[ 1142.135893] cpufreq: cpu0,cur:729000,set:1804800,set ndiv:141
[ 1142.138012] cpufreq: cpu0,cur:1032000,set:1420800,set ndiv:111
[ 1144.139263] cpufreq: cpu4,cur:1370000,set:1958400,set ndiv:153
[ 1149.143731] cpufreq: cpu4,cur:2109000,set:1497600,set ndiv:117
[ 1150.135788] cpufreq: cpu0,cur:1252000,set:1804800,set ndiv:141
[ 1155.142967] cpufreq: cpu0,cur:1190000,set:1497600,set ndiv:117
[ 1155.151412] cpufreq: cpu8,cur:729000,set:960000,set ndiv:75
[ 1201.655215] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1201.656025] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1201.658850] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1201.724496] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:

dmesg.txt (103.6 KB)
log.zip (6.6 MB)

使用过程中会黑屏,但是ssh能远程登录,帮忙确认一下是什么问题导致的,谢谢

Devkit or custom carrier board?
Which JetPack/L4T SW?
32GB or 64 GB Orin module?
Repro rate? Steps?

Supposed you have opened many topics in forum, please remember to have clear description in topic, then we can help you in the efficiency way.
Thanks

1 Like

JetPack board
64G agx orin
JetPack 5.1.1
Repro rate:Every seven or eight hours

我們是在問你是DevKit還是custom carrier board,你回答JetPack board是什麼東西…
還是你自己也不知道?

你的意思是原本畫面是亮的,但是用到一半突然黑掉?
有特定可以reproduce的方法?

custom board
我们自己开发板

一开始好的,用一段时间会出现,一般七八个小时会出现

使用环境:
环境在机房,然后是部署了算法,同时对12个大流量数据的相机进行拉流并处理上报,运行28小时左右,直连显示器,无法读出,显示器没有任何显示,点击鼠标等均无显示,但ssh可以进设备里,算法运行正常,温度75°C以上

麻煩先用DevKit驗證一下

这是客户场景,没办法用DevKit
而且开发板不是HDIM接口的,我们板子是HDMI接口,能从log看出来是什么原因导致 吗,谢谢

[ 16692.429] (**) Option "fd" "34"
[ 16692.429] (II) event0  - YSPRINGTECH USB OPTICAL MOUSE: device removed
[ 16692.429] (**) Option "fd" "37"
[ 16692.429] (II) event1  - Dell Dell USB Entry Keyboard: device removed
[ 16692.437] (II) UnloadModule: "libinput"
[ 16692.437] (II) systemd-logind: releasing fd for 13:65
[ 16692.468] (II) UnloadModule: "libinput"
[ 16692.469] (II) systemd-logind: releasing fd for 13:64
[ 16692.666] (II) NVIDIA(GPU-0): Deleting GPU-0
[ 16692.667] (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
[ 16692.667] (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
[ 16692.667] (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
[ 16692.670] (II) Server terminated successfully (0). Closing log file.

這是crash發生的時間點?
Xorg的秒數是16692,但是你的dmesg每一份都只有100多KB、20幾秒
從demsg看起來沒什麼關聯

你能不能確認一下螢幕黑掉之後

lsmod | grep nvidia

nvidianvidia_modeset這兩個driver還在不在
看起來比較像是Xorg和GDM的問題

如果是GDM的問題的話你可以換一個desktop environment試試看
Unity/KDE/LXDE之類的

另外你的log裡同時有Xorg.0.logXorg.1.log
是接了兩台螢幕的意思?如果只接一台也會中嗎?

黑屏的时候, 看nvidianvidia_modeset 這兩個driver還在

cgi-bin_mmwebwx-bin_webwxgetmsgimg_&MsgID=7228917799387529748&skey=@crypt_4b87a3e8_768f5b15820aa99edf2dc10edbb4b316&mmweb_appid=wx_webfilehelper

只接了一个HDMI,另外一个文件是昨天的日志

[  184.695079] nvdec 15480000.nvdec: RISC-V desc binary name:nvhost_nvdec050_desc_prod.bin
[  184.707731] nvdec 15480000.nvdec: RISC-V booting from GSC
[  184.721465] nvdec 15480000.nvdec: RISCV boot success
[  190.337851] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  190.337860] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  190.347172] falcon 154c0000.nvenc: looking for firmware in subdirectory

nvhost_nvdec050_desc_prod.bin请问一下,这个bin是干什么,这部分log是不是说明调用这个bin导致的

這個人的問題跟你差不多,看起來是GNOME的問題
建議你試試看其他desktop環境會不會戳到一樣的問題

這個是正常的
而且NVENC/NVDEC就算出問題也不應該影響到GUI

Orin HDMI issue請去用Orin Nano/NX module + XavierNX devkit進行複製

除此之外沒有任何方法可以debug你的問題.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.