Black screen on agx orin

sorry.shao · February 1, 2024, 4:30am

Hi，

Black screen on agx orin，
log As follows：

Call trace:
[  126.321850]  dump_backtrace+0x0/0x1d0
[  126.321853]  show_stack+0x30/0x40
[  126.321858]  dump_stack+0xd8/0x138
[  126.321908]  os_dump_stack+0x18/0x20 [nvidia]
[  126.321947]  tlsEntryGet+0x130/0x138 [nvidia]
[  126.321981]  gpumgrGetSomeGpu+0x7c/0x90 [nvidia]
[  126.322016]  threadPriorityStateFree+0x234/0x2a0 [nvidia]
[  126.322051]  RmShutdownAdapter+0x168/0x268 [nvidia]
[  126.322085]  rm_shutdown_adapter+0x50/0x70 [nvidia]
[  126.322119]  nv_shutdown_adapter+0xb4/0x4b0 [nvidia]
[  126.322153]  nv_shutdown_adapter+0x2d8/0x4b0 [nvidia]
[  126.322187]  nvidia_dev_put+0x38/0xc40 [nvidia]
[  126.322226]  nvkms_close_gpu+0x60/0x98 [nvidia_modeset]
[  126.322255]  nvRmFreeDeviceEvo+0x8c/0x130 [nvidia_modeset]
[  126.322277]  nvkms_ioctl_common+0x180/0x1b0 [nvidia_modeset]
[  126.322311]  nvidia_frontend_unlocked_ioctl+0x5c/0x78 [nvidia]
[  126.322318]  __arm64_sys_ioctl+0xac/0xf0
[  126.322322]  el0_svc_common.constprop.0+0x80/0x1d0
[  126.322324]  do_el0_svc+0x38/0xb0
[  126.322327]  el0_svc+0x1c/0x30
[  126.322329]  el0_sync_handler+0xa8/0xb0
[  126.322331]  el0_sync+0x16c/0x180
[  184.695079] nvdec 15480000.nvdec: RISC-V desc binary name:nvhost_nvdec050_desc_prod.bin
[  184.707731] nvdec 15480000.nvdec: RISC-V booting from GSC
[  184.721465] nvdec 15480000.nvdec: RISCV boot success
[  190.337851] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  190.337860] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  190.347172] falcon 154c0000.nvenc: looking for firmware in subdirectory
[  440.284775] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[  440.285620] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[  440.289027] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[  440.360923] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[  440.492879] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[  440.492883] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[  440.554471] cpufreq: cpu0,cur:745000,set:883200,set ndiv:69
[  442.553246] cpufreq: cpu0,cur:1804000,set:883200,set ndiv:69
[  444.265438] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[  444.266310] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[  444.269470] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[  444.337732] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[  444.460040] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[  444.460049] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[  446.610909] cpufreq: cpu4,cur:1915000,set:1651200,set ndiv:129
[  448.615927] cpufreq: cpu4,cur:730000,set:960000,set ndiv:75
[  449.608637] cpufreq: cpu0,cur:1401000,set:1728000,set ndiv:135
[  529.357973] veth7999acb: renamed from eth0
[  529.388547] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  529.452144] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  529.458206] device vethbac5db2 left promiscuous mode
[  529.458216] br-3de13ce187bc: port 1(vethbac5db2) entered disabled state
[  856.156487] br-a34371b46ca3: port 2(veth7abfa9b) entered blocking state
[  856.156512] br-a34371b46ca3: port 2(veth7abfa9b) entered disabled state
[  856.156674] device veth7abfa9b entered promiscuous mode
[  856.473502] eth0: renamed from veth3d52e48
[  856.493458] IPv6: ADDRCONF(NETDEV_CHANGE): veth7abfa9b: link becomes ready
[  856.493568] br-a34371b46ca3: port 2(veth7abfa9b) entered blocking state
[  856.493574] br-a34371b46ca3: port 2(veth7abfa9b) entered forwarding state
[  984.217930] br-a34371b46ca3: port 3(veth18dc1c4) entered blocking state
[  984.217937] br-a34371b46ca3: port 3(veth18dc1c4) entered disabled state
[  984.218054] device veth18dc1c4 entered promiscuous mode
[  984.509643] eth0: renamed from vethf8517aa
[  984.545944] IPv6: ADDRCONF(NETDEV_CHANGE): veth18dc1c4: link becomes ready
[  984.546025] br-a34371b46ca3: port 3(veth18dc1c4) entered blocking state
[  984.546028] br-a34371b46ca3: port 3(veth18dc1c4) entered forwarding state
[ 1064.405120] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1064.405907] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1064.409170] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1064.475123] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 1064.589082] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[ 1064.589087] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[ 1065.742276] cpufreq: cpu0,cur:2198000,set:1728000,set ndiv:135
[ 1066.744378] cpufreq: cpu0,cur:2075000,set:1958400,set ndiv:153
[ 1066.752119] cpufreq: cpu4,cur:988000,set:1190400,set ndiv:93
[ 1068.747613] cpufreq: cpu0,cur:985000,set:1958400,set ndiv:153
[ 1070.750335] cpufreq: cpu4,cur:1252000,set:1113600,set ndiv:87
[ 1072.761498] cpufreq: cpu8,cur:1321000,set:2035200,set ndiv:159
[ 1073.751068] cpufreq: cpu4,cur:1344000,set:1190400,set ndiv:93
[ 1074.758683] cpufreq: cpu8,cur:2032000,set:1497600,set ndiv:117
[ 1075.762317] cpufreq: cpu8,cur:673000,set:960000,set ndiv:75
[ 1081.753741] cpufreq: cpu0,cur:1107000,set:2201600,set ndiv:172
[ 1082.753649] cpufreq: cpu0,cur:1495000,set:2201600,set ndiv:172
[ 1086.755669] cpufreq: cpu0,cur:1765000,set:1881600,set ndiv:147
[ 1087.765710] cpufreq: cpu8,cur:893000,set:1728000,set ndiv:135
[ 1091.769145] cpufreq: cpu8,cur:729000,set:1497600,set ndiv:117
[ 1093.765174] cpufreq: cpu4,cur:1109000,set:2112000,set ndiv:165
[ 1093.772393] cpufreq: cpu8,cur:1334000,set:1497600,set ndiv:117
[ 1099.770440] cpufreq: cpu8,cur:1421000,set:1036800,set ndiv:81
[ 1104.767569] cpufreq: cpu0,cur:2017000,set:2201600,set ndiv:172
[ 1106.781496] cpufreq: cpu8,cur:2260000,set:2112000,set ndiv:165
[ 1107.779740] cpufreq: cpu8,cur:2086000,set:2201600,set ndiv:172
[ 1110.786328] cpufreq: cpu8,cur:368000,set:729600,set ndiv:57
[ 1111.784390] cpufreq: cpu8,cur:882000,set:1190400,set ndiv:93
[ 1137.751587] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1137.752374] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1137.755209] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1137.820430] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 1137.941394] NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu - Too early call!.
[ 1137.941404] NVRM nvAssertFailedNoLog: Assertion failed: NV_FALSE @ gpu_mgr.c:296
[ 1139.130061] cpufreq: cpu0,cur:1172000,set:1728000,set ndiv:135
[ 1142.135893] cpufreq: cpu0,cur:729000,set:1804800,set ndiv:141
[ 1142.138012] cpufreq: cpu0,cur:1032000,set:1420800,set ndiv:111
[ 1144.139263] cpufreq: cpu4,cur:1370000,set:1958400,set ndiv:153
[ 1149.143731] cpufreq: cpu4,cur:2109000,set:1497600,set ndiv:117
[ 1150.135788] cpufreq: cpu0,cur:1252000,set:1804800,set ndiv:141
[ 1155.142967] cpufreq: cpu0,cur:1190000,set:1497600,set ndiv:117
[ 1155.151412] cpufreq: cpu8,cur:729000,set:960000,set ndiv:75
[ 1201.655215] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 1201.656025] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 1201.658850] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 1201.724496] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:

dmesg.txt (103.6 KB)
log.zip (6.6 MB)

使用过程中会黑屏，但是ssh能远程登录，帮忙确认一下是什么问题导致的，谢谢

kayccc · February 1, 2024, 5:12am

Devkit or custom carrier board?
Which JetPack/L4T SW?
32GB or 64 GB Orin module?
Repro rate? Steps?

Supposed you have opened many topics in forum, please remember to have clear description in topic, then we can help you in the efficiency way.
Thanks

sorry.shao · February 1, 2024, 5:35am

JetPack board
64G agx orin
JetPack 5.1.1
Repro rate：Every seven or eight hours

DaveYYY · February 1, 2024, 5:39am

我們是在問你是DevKit還是custom carrier board，你回答JetPack board是什麼東西…
還是你自己也不知道？

你的意思是原本畫面是亮的，但是用到一半突然黑掉？
有特定可以reproduce的方法？

sorry.shao · February 1, 2024, 5:39am

custom board
我们自己开发板

一开始好的，用一段时间会出现，一般七八个小时会出现

使用环境：
环境在机房，然后是部署了算法，同时对12个大流量数据的相机进行拉流并处理上报，运行28小时左右，直连显示器，无法读出，显示器没有任何显示，点击鼠标等均无显示，但ssh可以进设备里，算法运行正常，温度75°C以上

DaveYYY · February 1, 2024, 5:43am

麻煩先用DevKit驗證一下

sorry.shao · February 1, 2024, 5:44am

这是客户场景，没办法用DevKit
而且开发板不是HDIM接口的，我们板子是HDMI接口，能从log看出来是什么原因导致吗，谢谢

DaveYYY · February 1, 2024, 6:08am

[ 16692.429] (**) Option "fd" "34"
[ 16692.429] (II) event0  - YSPRINGTECH USB OPTICAL MOUSE: device removed
[ 16692.429] (**) Option "fd" "37"
[ 16692.429] (II) event1  - Dell Dell USB Entry Keyboard: device removed
[ 16692.437] (II) UnloadModule: "libinput"
[ 16692.437] (II) systemd-logind: releasing fd for 13:65
[ 16692.468] (II) UnloadModule: "libinput"
[ 16692.469] (II) systemd-logind: releasing fd for 13:64
[ 16692.666] (II) NVIDIA(GPU-0): Deleting GPU-0
[ 16692.667] (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
[ 16692.667] (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
[ 16692.667] (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
[ 16692.670] (II) Server terminated successfully (0). Closing log file.

這是crash發生的時間點？
Xorg的秒數是16692，但是你的dmesg每一份都只有100多KB、20幾秒
從demsg看起來沒什麼關聯

你能不能確認一下螢幕黑掉之後

lsmod | grep nvidia

看nvidia和nvidia_modeset這兩個driver還在不在
看起來比較像是Xorg和GDM的問題

如果是GDM的問題的話你可以換一個desktop environment試試看
Unity/KDE/LXDE之類的

另外你的log裡同時有Xorg.0.log和Xorg.1.log
是接了兩台螢幕的意思？如果只接一台也會中嗎？

sorry.shao · February 1, 2024, 6:40am

黑屏的时候，看nvidia 和nvidia_modeset 這兩個driver還在

cgi-bin_mmwebwx-bin_webwxgetmsgimg_&MsgID=7228917799387529748&skey=@crypt_4b87a3e8_768f5b15820aa99edf2dc10edbb4b316&mmweb_appid=wx_webfilehelper

只接了一个HDMI，另外一个文件是昨天的日志

sorry.shao · February 1, 2024, 7:49am

[  184.695079] nvdec 15480000.nvdec: RISC-V desc binary name:nvhost_nvdec050_desc_prod.bin
[  184.707731] nvdec 15480000.nvdec: RISC-V booting from GSC
[  184.721465] nvdec 15480000.nvdec: RISCV boot success
[  190.337851] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  190.337860] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  190.347172] falcon 154c0000.nvenc: looking for firmware in subdirectory

nvhost_nvdec050_desc_prod.bin请问一下，这个bin是干什么，这部分log是不是说明调用这个bin导致的

DaveYYY · February 1, 2024, 8:37am

這個人的問題跟你差不多，看起來是GNOME的問題
建議你試試看其他desktop環境會不會戳到一樣的問題

sorry.shao:

[  184.695079] nvdec 15480000.nvdec: RISC-V desc binary name:nvhost_nvdec050_desc_prod.bin
[  184.707731] nvdec 15480000.nvdec: RISC-V booting from GSC
[  184.721465] nvdec 15480000.nvdec: RISCV boot success
[  190.337851] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  190.337860] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  190.347172] falcon 154c0000.nvenc: looking for firmware in subdirectory

nvhost_nvdec050_desc_prod.bin请问一下，这个bin是干什么，这部分log是不是说明调用这个bin导致的

這個是正常的
而且NVENC/NVDEC就算出問題也不應該影響到GUI

WayneWWW · February 1, 2024, 8:48am

Orin HDMI issue請去用Orin Nano/NX module + XavierNX devkit進行複製

除此之外沒有任何方法可以debug你的問題.

system · March 14, 2024, 4:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Orin keeps crashing for no reason Jetson AGX Orin boot	12	589	December 4, 2023
Why does Jetson Orin Nano suddenly fail to turn on, it keeps being stuck in this interface, and there is no picture when I connect the monitor directl Jetson Orin Nano jetson	28	62	April 16, 2025
NVRM gpumgrGetSomeGpu: Failed to retrieve pGpu when reboot on Orin NX module Jetson Orin NX board-design , reboot	20	1807	July 14, 2024
Jetson Orin AGX 64GB devkit only shows black screen after flashing Jetson AGX Orin reflash	25	53	June 17, 2025
System Freeze Issue When Unlocking the Screen Jetson Orin NX boot	31	146	May 7, 2025
No Display out with Session never registered, failing error DRIVE AGX Orin General drive-platform-setup	11	7560	March 22, 2023
Nvidia Jetson AGX Orin 64gb Developer kit doesn't display Jetson AGX Orin boot , reflash	31	1387	August 30, 2023
An assertion error occurred in the kernel. Does it affect normal use? Jetson Orin Nano kernel , board-design	23	84	January 9, 2025
HDMI Display does not work after update to JetPack 5.0.2 Jetson AGX Orin nvbugs , hdmi	51	7639	October 12, 2022
ORIN RGMII can not LINK (phy 88e1512P) Jetson AGX Orin ethernet	34	1617	August 22, 2023

Black screen on agx orin

Related topics