Boot loop with clean image from JetPack 3.2

Hi there,

I have a problem where my Jetson TX2 boots but then some failures cause the board to reboot, only to crash again. I tried re-flashing with JetPack 3.2. It succeeded in flashing the image but fails to copy cuda or other installs because the board crashes. I instead flashed the image with the “flash.sh” command but this was no better. (I also tried the “flash.sh” command with JetPack 3.1 and similar issues arose.) I connected a UART console with directions from https://www.jetsonhacks.com/2017/03/24/serial-console-nvidia-jetson-tx2/ and have copied a typical output below.

Note “Timeout detected @ gr_gk20a_submit_fecs_method_op+0x104/0x274” is an error which is always present. Sometimes the other errors change but the system always freezes then restarts.

[  110.276967] IPVS: Creating netns size=1424 id=1
[  110.449054] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  110.465844] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  110.528675] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  110.539792] 
[  110.539792] Dongle Host Driver, version 1.201.82 (r)
[  110.539792] Compiled in drivers/net/wireless/bcmdhd on Mar  1 2018 at 20:46:20
[  110.571928] wl_android_wifi_on in
[  110.580058] wifi_platform_set_power = 1
[  110.864367] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
[  110.953438] sdhci-tegra 3440000.sdhci: Tuning already done, restoring the best tap value : 58
[  110.963721] F1 signature read @0x18000000=0x17214354
[  110.979525] F1 signature OK, socitype:0x1 chip:0x4354 rev:0x1 pkg:0x2
[  110.987972] DHD: dongle ram size is set to 786432(orig 786432) at 0x180000
[  111.126706] dhdsdio_write_vars: Download, Upload and compare of NVRAM succeeded.
[  111.177656] dhd_bus_init: enable 0x06, ready 0x06 (waited 0us)
[  111.184167] Enabling wake69
[  111.189118] wifi_platform_get_mac_addr
[  111.195645] Firmware up: op_mode=0x0005, MAC=00:04:4b:8d:03:1c
[  111.211358] dhd_preinit_ioctls pspretend_threshold for HostAPD failed  -23
[  111.225691] Firmware version = wl0: Dec 12 2017 15:09:35 version 7.35.221.34 (r679642) FWID 01-e35dbe99
[  111.240371] dhd_interworking_enable: failed to set WNM info, ret=-23
[  111.247189] tegra_sysfs_on
[  111.509211] CFGP2P-ERROR) wl_cfgp2p_add_p2p_disc_if : P2P interface registered
[  111.554890] WLC_E_IF: NO_IF set, event Ignored
[  113.850812] gk20a 17000000.gp10b: Timeout detected @ gr_gk20a_submit_fecs_method_op+0x104/0x274 
[  113.860179] gk20a 17000000.gp10b: gr_gk20a_ctx_wait_ucode: timeout waiting on ucode response
[  113.868737] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[  113.876248] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
[  113.884432] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[  113.892717] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[  113.900906] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[  113.909073] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[  113.917172] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[  113.925254] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[  113.933614] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[  113.942154] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[  113.950406] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
[  113.958787] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x10
[  113.967920] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0
[  113.976922] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x41009
[  113.986241] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x20
[  113.995268] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x3ffd20
[  114.004686] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x0
[  114.013629] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x0
[  114.022675] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
[  114.031720] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
[  114.039816] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
[  114.047910] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
[  114.055998] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
[  114.064997] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
[  114.073999] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
[  114.082988] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
[  114.091994] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
[  114.101077] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  114.109958] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  114.118806] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  114.127738] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  114.136635] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  114.145531] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  114.154408] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  114.163303] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  114.172179] NV_PGRAPH_STATUS: 0xa1
[  114.175749] NV_PGRAPH_STATUS1: 0x0
[  114.179376] NV_PGRAPH_STATUS2: 0x0
[  114.182962] NV_PGRAPH_ENGINE_STATUS: 0x1
[  114.187055] NV_PGRAPH_GRFIFO_STATUS : 0x1
[  114.191251] NV_PGRAPH_GRFIFO_CONTROL : 0x10001
[  114.195888] NV_PGRAPH_PRI_FECS_HOST_INT_STATUS : 0x0
[  114.201024] NV_PGRAPH_EXCEPTION  : 0x0
[  114.204931] NV_PGRAPH_FECS_INTR  : 0x0
[  114.208862] NV_PFIFO_ENGINE_STATUS(GR) : 0x80000000
[  114.213907] NV_PGRAPH_ACTIVITY0: 0x0
[  114.217663] NV_PGRAPH_ACTIVITY1: 0x600
[  114.221570] NV_PGRAPH_ACTIVITY2: 0x0
[  114.225319] NV_PGRAPH_ACTIVITY4: 0x0
[  114.229051] NV_PGRAPH_PRI_SKED_ACTIVITY: 0x0
[  114.233547] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY0: 0x0
[  114.238961] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY1: 0x0
[  114.244432] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY2: 0x0
[  114.249823] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY3: 0x0
[  114.255216] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_ACTIVITY0: 0x0
[  114.261036] NV_PGRAPH_PRI_GPC0_TPC1_TPCCS_TPC_ACTIVITY0: 0x0
[  114.266860] NV_PGRAPH_PRI_GPC0_TPCS_TPCCS_TPC_ACTIVITY0: 0x0
[  114.272677] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY0: 0x0
[  114.278117] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY1: 0x0
[  114.283566] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY2: 0x0
[  114.288997] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY3: 0x0
[  114.294389] NV_PGRAPH_PRI_GPCS_TPC0_TPCCS_TPC_ACTIVITY0: 0x0
[  114.300217] NV_PGRAPH_PRI_GPCS_TPC1_TPCCS_TPC_ACTIVITY0: 0x0
[  114.306036] NV_PGRAPH_PRI_GPCS_TPCS_TPCCS_TPC_ACTIVITY0: 0x0
[  114.311864] NV_PGRAPH_PRI_BE0_BECS_BE_ACTIVITY0: 0x0
[  114.316998] NV_PGRAPH_PRI_BE1_BECS_BE_ACTIVITY0: 0x0
[  114.322129] NV_PGRAPH_PRI_BES_BECS_BE_ACTIVITY0: 0x0
[  114.327251] NV_PGRAPH_PRI_DS_MPIPE_STATUS: 0x0
[  114.331858] NV_PGRAPH_PRI_FE_GO_IDLE_TIMEOUT : 0x800
[  114.337040] NV_PGRAPH_PRI_FE_GO_IDLE_INFO : 0x23000700
[  114.342374] NV_PGRAPH_PRI_GPC0_TPC0_TEX_M_TEX_SUBUNITS_STATUS: 0x0
[  114.348783] NV_PGRAPH_PRI_CWD_FS: 0x201
[  114.352788] NV_PGRAPH_PRI_FE_TPC_FS: 0x3
[  114.356874] NV_PGRAPH_PRI_CWD_GPC_TPC_ID(0): 0x100
[  114.361840] NV_PGRAPH_PRI_CWD_SM_ID(0): 0x100
[  114.366356] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_FE_0: 0x2000
[  114.371818] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_1: 0x140
[  114.376960] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_GPC_0: 0x0
[  114.382776] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_1: 0x300
[  114.388486] NV_PGRAPH_PRI_FECS_CTXSW_IDLESTATE : 0xe
[  114.393618] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_IDLESTATE : 0xf
[  114.399300] NV_PGRAPH_PRI_FECS_CURRENT_CTX : 0x80261f51
[  114.404697] NV_PGRAPH_PRI_FECS_NEW_CTX : 0x80261f51
[  114.409739] NV_PGRAPH_PRI_BE0_CROP_STATUS1 : 0x5f00000
[  114.415053] NV_PGRAPH_PRI_BES_CROP_STATUS1 : 0x5f00000
[  114.420360] NV_PGRAPH_PRI_BE0_ZROP_STATUS : 0x0
[  114.425062] NV_PGRAPH_PRI_BE0_ZROP_STATUS2 : 0x0
[  114.429841] NV_PGRAPH_PRI_BES_ZROP_STATUS : 0x0
[  114.434545] NV_PGRAPH_PRI_BES_ZROP_STATUS2 : 0x0
[  114.439391] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION: 0x0
[  114.444530] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION_EN: 0x0
[  114.449940] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION: 0x0
[  114.455340] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION_EN: 0x30000
[  114.461333] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION: 0x0
[  114.467172] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION_EN: 0x3
[  114.473263] gk20a 17000000.gp10b: gr_gk20a_fecs_ctx_image_save: save context image failed
[  114.485726] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  114.485726] 
[  114.499829] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400
[  114.518500] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  114.518500] 
[  114.530231] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400
[  114.548037] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  114.548037] 
[  114.559721] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400
[  114.574983] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  114.574983] 
[  114.586559] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400
[  114.622170] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  114.622170] 
[  114.633839] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400

Ubuntu 16.04 LTS jetson4 ttyS0

jetson4 login: nvidia (automatic login)

Last login: Tue May 22 18:56:55 UTC 2018 on ttyS0
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.38-tegra aarch64)

 * Documentation:  https://help.ubuntu.com/

324 packages can be updated.
0 updates are security updates.

[  117.960337] gk20a 17000000.gp10b: gk20a_set_error_notifier_locked: error notifier set to 8 for ch 507
[  117.969711] gk20a 17000000.gp10b: gk20a_fifo_handle_sched_error: fifo sched ctxsw timeout error: engine=0, tsg=0, ms=3100
[  117.969904] tegradc 15210000.nvdisplay: unblank
[  117.969956] ---- mlocks ----
[  117.969993] 
[  117.969998] ---- syncpts ----
[  117.970042] id 19 (17000000.gp10b_507) min 2 max 4 refs 1 (previous client : )
[  117.970949] 
[  117.970954] ---- channels ----
[  117.970994] 
[  117.970994] channel 1 - 15820000.se
[  117.970994] 
[  117.971000] NvHost basic channel registers:
[  117.971008] CMDFIFO_STAT_0:  00002040
[  117.971015] CMDFIFO_RDATA_0: 00824082
[  117.971025] CMDP_OFFSET_0:   00000000
[  117.971031] CMDP_CLASS_0:    00000000
[  117.971037] CHANNELSTAT_0:   00000000
[  117.971042] The CDMA sync queue is empty.
[  117.971045] 
[  117.971056] 
[  117.971056] channel 2 - 15830000.se
[  117.971056] 
[  117.971060] NvHost basic channel registers:
[  117.971072] CMDFIFO_STAT_0:  00002040
[  117.971080] CMDFIFO_RDATA_0: a1942170
[  117.971098] CMDP_OFFSET_0:   00000000
[  117.971106] CMDP_CLASS_0:    00000000
[  117.971114] CHANNELSTAT_0:   00000000
[  117.971122] The CDMA sync queue is empty.
[  117.971129] 
[  117.971143] 
[  117.971143] channel 3 - 15840000.se
[  117.971143] 
[  117.971151] NvHost basic channel registers:
[  117.971158] CMDFIFO_STAT_0:  00002040
[  117.971176] CMDFIFO_RDATA_0: 100d1142
[  117.971190] CMDP_OFFSET_0:   00000000
[  117.971197] CMDP_CLASS_0:    00000000
[  117.971204] CHANNELSTAT_0:   00000000
[  117.971212] The CDMA sync queue is empty.
[  117.971219] 
[  117.971247] 
[  117.971247] ---- host general irq ----
[  117.971247] 
[  117.971255] sync_intc0mask = 0x00000001
[  117.971263] sync_intmask = 0x50000003
[  117.971271] 
[  117.971271] ---- host syncpt irq mask ----
[  117.971271] 
[  117.971280] 
[  117.971280] ---- host syncpt irq status ----
[  117.971280] 
[  117.971293] syncpt_thresh_cpu0_int_status(0) = 0x00000000
[  117.971302] syncpt_thresh_cpu0_int_status(1) = 0x00000000
[  117.971310] syncpt_thresh_cpu0_int_status(2) = 0x00000000
[  117.971327] syncpt_thresh_cpu0_int_status(3) = 0x00000000
[  117.971335] syncpt_thresh_cpu0_int_status(4) = 0x00000000
[  117.971343] syncpt_thresh_cpu0_int_status(5) = 0x00000000
[  117.971360] syncpt_thresh_cpu0_int_status(6) = 0x00000000
[  117.971377] syncpt_thresh_cpu0_int_status(7) = 0x00000000
[  117.971385] syncpt_thresh_cpu0_int_status(8) = 0x00000000
[  117.971393] syncpt_thresh_cpu0_int_status(9) = 0x00000000
[  117.971401] syncpt_thresh_cpu0_int_status(10) = 0x00000000
[  117.971409] syncpt_thresh_cpu0_int_status(11) = 0x00000000
[  117.971417] syncpt_thresh_cpu0_int_status(12) = 0x00000000
[  117.971425] syncpt_thresh_cpu0_int_status(13) = 0x00000000
[  117.971432] syncpt_thresh_cpu0_int_status(14) = 0x00000000
[  117.971440] syncpt_thresh_cpu0_int_status(15) = 0x00000000
[  117.971448] syncpt_thresh_cpu0_int_status(16) = 0x00000000
[  117.971455] syncpt_thresh_cpu0_int_status(17) = 0x00000000
[  117.971486] 17000000.gp10b pbdma 0: 
[  117.971488] id: 0 (tsg), next_id: 0 (tsg) chan status: valid
[  117.971514] PUT: 0000001e00008254 GET: 0000001e00000044 FETCH: 00000006 HEADER: 800302ec
[  117.971522] 
[  117.971548] 17000000.gp10b eng 0: 
[  117.971554] id: 0 (channel), next_id: 0 (tsg), ctx status: load 
[  117.971558] busy 
[  117.971559] 
[  117.971576] 17000000.gp10b eng 1: 
[  117.971581] id: 0 (tsg), next_id: 0 (tsg), ctx status: valid 
[  117.971581] 
[  117.971585] 
[  117.971908] 507-17000000.gp10b, pid 734, refs: 4: 
[  117.971909] channel status:  in use on_pbdma_and_eng busy
[  117.971932] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  117.971932] HEADER: 20400000 COUNT: 00000000
[  117.971932] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  117.971936] 
[  117.971957] 508-17000000.gp10b, pid 734, refs: 2: 
[  117.971958] channel status:  in use idle not busy
[  117.971977] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  117.971977] HEADER: 60400000 COUNT: 00000000
[  117.971977] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  117.971981] 
[  117.971998] 509-17000000.gp10b, pid 734, refs: 2: 
[  117.971999] channel status:  in use idle not busy
[  117.972018] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  117.972018] HEADER: 60400000 COUNT: 00000000
[  117.972018] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  117.972022] 
[  117.972039] 510-17000000.gp10b, pid 734, refs: 2: 
[  117.972040] channel status:  in use idle not busy
[  117.972058] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  117.972058] HEADER: 60400000 COUNT: 00000000
[  117.972058] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  117.972062] 
[  117.972079] 511-17000000.gp10b, pid 734, refs: 2: 
[  117.972080] channel status:  in use idle not busy
[  117.972098] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  117.972098] HEADER: 60400000 COUNT: 00000000
[  117.972098] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  117.972101] 
[  117.972720] gk20a 17000000.gp10b: gk20a_fifo_handle_mmu_fault: fake mmu fault on engine 0, engine subid 0 (gpc), client 12 (rast), addr 0x0000ebf2:0xd0400000, type 8 (pitch mask), info 0x01122c08,inst_ptr 0x1e903a0000
[  117.972720] 
[  117.972741] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[  117.972760] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
[  117.972792] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[  117.972804] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[  117.972816] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[  117.972829] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[  117.972841] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[  117.972853] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[  117.972864] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[  117.972877] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[  117.972898] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
[  117.972911] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x10
[  117.972924] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0
[  117.972937] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x41009
[  117.972949] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x20
[  117.972965] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x3ffd20
[  117.972977] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x0
[  117.973007] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x0
[  117.973024] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
[  117.973053] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
[  117.973070] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
[  117.973098] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
[  117.973115] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
[  117.973146] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
[  117.973170] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
[  117.973189] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
[  117.973208] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
[  117.973226] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  117.973245] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  117.973263] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  117.973282] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  117.973301] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  117.973319] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  117.973337] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  117.973355] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  117.973374] gk20a 17000000.gp10b: gk20a_fifo_handle_mmu_fault: gr_status_r : 0xa1
[  117.991845] gk20a 17000000.gp10b: gk20a_fifo_set_ctx_mmu_error_tsg: TSG 0 generated a mmu fault
[  117.991945] gk20a 17000000.gp10b: fifo_error_isr: channel reset initiated from fifo_error_isr; intr=0x00000100
nvidia@jetson4:~$ [  122.728686] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_buffer: Invalid FECS local header: magic value
[  122.728686] 
[  122.740334] gk20a 17000000.gp10b: gr_gk20a_find_priv_offset_in_pm_buffer: Lookup failed for address 0x500400
[  126.167446] gk20a 17000000.gp10b: gk20a_set_error_notifier_locked: error notifier set to 8 for ch 507
[  126.176828] gk20a 17000000.gp10b: gk20a_fifo_handle_sched_error: fifo sched ctxsw timeout error: engine=0, tsg=0, ms=3100
[  126.176972] tegradc 15210000.nvdisplay: unblank
[  126.177053] ---- mlocks ----
[  126.177089] 
[  126.177094] ---- syncpts ----
[  126.177138] id 19 (17000000.gp10b_507) min 6 max 8 refs 1 (previous client : 17000000.gp10b_507)
[  126.177937] 
[  126.177942] ---- channels ----
[  126.177972] 
[  126.177972] channel 1 - 15820000.se
[  126.177972] 
[  126.177977] NvHost basic channel registers:
[  126.177985] CMDFIFO_STAT_0:  00002040
[  126.177991] CMDFIFO_RDATA_0: 00824082
[  126.178000] CMDP_OFFSET_0:   00000000
[  126.178007] CMDP_CLASS_0:    00000000
[  126.178012] CHANNELSTAT_0:   00000000
[  126.178018] The CDMA sync queue is empty.
[  126.178021] 
[  126.178031] 
[  126.178031] channel 2 - 15830000.se
[  126.178031] 
[  126.178035] NvHost basic channel registers:
[  126.178042] CMDFIFO_STAT_0:  00002040
[  126.178048] CMDFIFO_RDATA_0: a1942170
[  126.178055] CMDP_OFFSET_0:   00000000
[  126.178061] CMDP_CLASS_0:    00000000
[  126.178066] CHANNELSTAT_0:   00000000
[  126.178072] The CDMA sync queue is empty.
[  126.178075] 
[  126.178085] 
[  126.178085] channel 3 - 15840000.se
[  126.178085] 
[  126.178089] NvHost basic channel registers:
[  126.178096] CMDFIFO_STAT_0:  00002040
[  126.178102] CMDFIFO_RDATA_0: 100d1142
[  126.178109] CMDP_OFFSET_0:   00000000
[  126.178115] CMDP_CLASS_0:    00000000
[  126.178120] CHANNELSTAT_0:   00000000
[  126.178125] The CDMA sync queue is empty.
[  126.178128] 
[  126.178143] 
[  126.178143] ---- host general irq ----
[  126.178143] 
[  126.178150] sync_intc0mask = 0x00000001
[  126.178156] sync_intmask = 0x50000003
[  126.178160] 
[  126.178160] ---- host syncpt irq mask ----
[  126.178160] 
[  126.178165] 
[  126.178165] ---- host syncpt irq status ----
[  126.178165] 
[  126.178173] syncpt_thresh_cpu0_int_status(0) = 0x00000000
[  126.178181] syncpt_thresh_cpu0_int_status(1) = 0x00000000
[  126.178187] syncpt_thresh_cpu0_int_status(2) = 0x00000000
[  126.178194] syncpt_thresh_cpu0_int_status(3) = 0x00000000
[  126.178200] syncpt_thresh_cpu0_int_status(4) = 0x00000000
[  126.178207] syncpt_thresh_cpu0_int_status(5) = 0x00000000
[  126.178213] syncpt_thresh_cpu0_int_status(6) = 0x00000000
[  126.178220] syncpt_thresh_cpu0_int_status(7) = 0x00000000
[  126.178226] syncpt_thresh_cpu0_int_status(8) = 0x00000000
[  126.178233] syncpt_thresh_cpu0_int_status(9) = 0x00000000
[  126.178240] syncpt_thresh_cpu0_int_status(10) = 0x00000000
[  126.178247] syncpt_thresh_cpu0_int_status(11) = 0x00000000
[  126.178253] syncpt_thresh_cpu0_int_status(12) = 0x00000000
[  126.178260] syncpt_thresh_cpu0_int_status(13) = 0x00000000
[  126.178267] syncpt_thresh_cpu0_int_status(14) = 0x00000000
[  126.178273] syncpt_thresh_cpu0_int_status(15) = 0x00000000
[  126.178280] syncpt_thresh_cpu0_int_status(16) = 0x00000000
[  126.178287] syncpt_thresh_cpu0_int_status(17) = 0x00000000
[  126.178311] 17000000.gp10b pbdma 0: 
[  126.178312] id: 0 (tsg), next_id: 0 (tsg) chan status: valid
[  126.178335] PUT: 0000001e00008254 GET: 0000001e00000044 FETCH: 00000006 HEADER: 800302ec
[  126.178339] 
[  126.178358] 17000000.gp10b eng 0: 
[  126.178363] id: 0 (tsg), next_id: 0 (tsg), ctx status: load 
[  126.178368] busy 
[  126.178369] 
[  126.178385] 17000000.gp10b eng 1: 
[  126.178389] id: 0 (tsg), next_id: 0 (tsg), ctx status: invalid 
[  126.178390] 
[  126.178394] 
[  126.178756] 507-17000000.gp10b, pid 859, refs: 4: 
[  126.178757] channel status:  in use on_pbdma_and_eng busy
[  126.178803] RAMFC : TOP: 8000001e00000044 PUT: 0000001e00008254 GET: 0000001e00000044 FETCH: 00dba81e00001480
[  126.178803] HEADER: 800302ec COUNT: 01110001
[  126.178803] SYNCPOINT 00000000 00001301 SEMAPHORE 0000000d fc002000 00000000 01100002
[  126.178811] 
[  126.178845] 508-17000000.gp10b, pid 734, refs: 2: 
[  126.178846] channel status:  in use idle not busy
[  126.178868] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  126.178868] HEADER: 60400000 COUNT: 00000000
[  126.178868] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  126.178876] 
[  126.178898] 509-17000000.gp10b, pid 734, refs: 2: 
[  126.178899] channel status:  in use idle not busy
[  126.178919] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  126.178919] HEADER: 60400000 COUNT: 00000000
[  126.178919] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  126.178927] 
[  126.178949] 510-17000000.gp10b, pid 734, refs: 2: 
[  126.178950] channel status:  in use idle not busy
[  126.178971] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  126.178971] HEADER: 60400000 COUNT: 00000000
[  126.178971] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  126.178978] 
[  126.179005] 511-17000000.gp10b, pid 734, refs: 2: 
[  126.179006] channel status:  in use idle not busy
[  126.179026] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
[  126.179026] HEADER: 60400000 COUNT: 00000000
[  126.179026] SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000
[  126.179034] 
[  126.179539] gk20a 17000000.gp10b: gk20a_fifo_handle_mmu_fault: fake mmu fault on engine 0, engine subid 0 (gpc), client 12 (rast), addr 0x0000ebf2:0xd0400000, type 8 (pitch mask), info 0x01122c08,inst_ptr 0x1e903a0000
[  126.179539] 
[  126.179555] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[  126.179569] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
[  126.179583] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[  126.179595] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[  126.179607] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[  126.179619] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[  126.179631] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[  126.179643] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[  126.179655] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[  126.179667] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[  126.179679] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
[  126.179692] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x1
[  126.179704] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0
[  126.179717] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x90009
[  126.179729] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x0
[  126.179741] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x3ffd20
[  126.179753] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x1
[  126.179765] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x1
[  126.179776] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
[  126.179789] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
[  126.179801] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
[  126.179812] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
[  126.179825] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
[  126.179852] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
[  126.179866] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
[  126.179879] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
[  126.179892] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
[  126.179905] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  126.179918] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  126.179932] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  126.179945] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  126.179957] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  126.179970] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  126.179982] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  126.179995] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  126.180008] gk20a 17000000.gp10b: gk20a_fifo_handle_mmu_fault: gr_status_r : 0x200081
[  129.182750] gk20a 17000000.gp10b: Timeout detected @ gr_gk20a_submit_fecs_method_op+0x104/0x274 
[  129.208510] gk20a 17000000.gp10b: gr_gk20a_ctx_wait_ucode: timeout waiting on ucode response
[  129.233908] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[  129.258196] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
[  129.283088] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[  129.308140] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[  129.333081] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[  129.358023] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[  129.382840] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[  129.407671] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[  129.432744] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[  129.457546] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[  129.482398] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
[  129.507380] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x0
[  129.532967] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0xffffffff
[  129.559232] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x94019
[  129.586801] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x0
[  129.612477] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x3ffd20
[  129.643352] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x1
[  129.669035] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x1
[  129.694706] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
[  129.720404] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
[  129.745209] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
[  129.770023] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
[  129.794817] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
[  129.820548] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
[  129.846272] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
[  129.871995] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
[  129.897722] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
[  129.923550] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  129.949179] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  129.974934] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  130.000612] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  130.026267] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  130.051906] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  130.077594] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
[  130.103277] gk20a 17000000.gp10b: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
[  130.128951] NV_PGRAPH_STATUS: 0x200089
[  130.141073] NV_PGRAPH_STATUS1: 0x0
[  130.152623] NV_PGRAPH_STATUS2: 0x0
[  130.163890] NV_PGRAPH_ENGINE_STATUS: 0x1
[  130.175505] NV_PGRAPH_GRFIFO_STATUS : 0x1
[  130.186996] NV_PGRAPH_GRFIFO_CONTROL : 0x0
[  130.198381] NV_PGRAPH_PRI_FECS_HOST_INT_STATUS : 0x1
[  130.210485] NV_PGRAPH_EXCEPTION  : 0x0
[  130.221172] NV_PGRAPH_FECS_INTR  : 0x0
[  130.231624] NV_PFIFO_ENGINE_STATUS(GR) : 0xd000b000
[  130.243122] NV_PGRAPH_ACTIVITY0: 0x0
[  130.253108] NV_PGRAPH_ACTIVITY1: 0x7000
[  130.263130] NV_PGRAPH_ACTIVITY2: 0x0
[  130.272627] NV_PGRAPH_ACTIVITY4: 0x0
[  130.281872] NV_PGRAPH_PRI_SKED_ACTIVITY: 0xe01c0000
[  130.292321] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY0: 0xffffff
[  130.303441] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY1: 0xffffff
[  130.314293] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY2: 0x7fcf
[  130.324744] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY3: 0x3f
[  130.334817] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.345220] NV_PGRAPH_PRI_GPC0_TPC1_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.355362] NV_PGRAPH_PRI_GPC0_TPCS_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.365258] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY0: 0xffffff
[  130.374814] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY1: 0xffffff
[  130.384109] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY2: 0x7fcf
[  130.393007] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY3: 0x3f
[  130.401478] NV_PGRAPH_PRI_GPCS_TPC0_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.410362] NV_PGRAPH_PRI_GPCS_TPC1_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.418966] NV_PGRAPH_PRI_GPCS_TPCS_TPCCS_TPC_ACTIVITY0: 0xfff
[  130.427287] NV_PGRAPH_PRI_BE0_BECS_BE_ACTIVITY0: 0x7fff
[  130.434972] NV_PGRAPH_PRI_BE1_BECS_BE_ACTIVITY0: 0x7fff
[  130.442635] NV_PGRAPH_PRI_BES_BECS_BE_ACTIVITY0: 0x7fff
[  130.450345] NV_PGRAPH_PRI_DS_MPIPE_STATUS: 0x0
[  130.457241] NV_PGRAPH_PRI_FE_GO_IDLE_TIMEOUT : 0x0
[  130.464509] NV_PGRAPH_PRI_FE_GO_IDLE_INFO : 0x0
[  130.471519] NV_PGRAPH_PRI_GPC0_TPC0_TEX_M_TEX_SUBUNITS_STATUS: 0x3f77
[  130.480555] NV_PGRAPH_PRI_CWD_FS: 0x0
[  130.486750] NV_PGRAPH_PRI_FE_TPC_FS: 0x2
[  130.493219] NV_PGRAPH_PRI_CWD_GPC_TPC_ID(0): 0x0
[  130.500373] NV_PGRAPH_PRI_CWD_SM_ID(0): 0x0
[  130.507023] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_FE_0: 0x3
[  130.514658] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_1: 0x4080
[  130.522312] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_GPC_0: 0x0
[  130.530708] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_1: 0x1300
[  130.538940] NV_PGRAPH_PRI_FECS_CTXSW_IDLESTATE : 0xe
[  130.546536] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_IDLESTATE : 0xe
[  130.554760] NV_PGRAPH_PRI_FECS_CURRENT_CTX : 0x80265596
[  130.562815] NV_PGRAPH_PRI_FECS_NEW_CTX : 0x80265596
[  130.570619] NV_PGRAPH_PRI_BE0_CROP_STATUS1 : 0x5f00000
[  130.578655] NV_PGRAPH_PRI_BES_CROP_STATUS1 : 0x5f00000
[  130.586664] NV_PGRAPH_PRI_BE0_ZROP_STATUS : 0xe00000
[  130.594557] NV_PGRAPH_PRI_BE0_ZROP_STATUS2 : 0x7
[  130.602021] NV_PGRAPH_PRI_BES_ZROP_STATUS : 0xe00000
[  130.609877] NV_PGRAPH_PRI_BES_ZROP_STATUS2 : 0x7
[  130.617428] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION: 0x0
[  130.625365] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION_EN: 0x0
[  130.633513] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION: 0x0
[  130.641721] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION_EN: 0x30000
[  130.650601] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION: 0x0
[  130.659291] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION_EN: 0x3
[  130.668261] gk20a 17000000.gp10b: gk20a_fifo_reset_engine: failed to HALT gr pipe
[  130.686905] gk20a 17000000.gp10b: gr_gk20a_load_falcon_bind_instblk: arbiter complete timeout
[  130.702759] gk20a 17000000.gp10b: gr_gk20a_load_falcon_bind_instblk: arbiter complete timeout
[  156.266726] Watchdog detected hard LOCKUP on cpu 3
[  156.271699] ------------[ cut here ]------------
[  156.281693] WARNING: at ffffffc00013edf8 [verbose debug info unavailable]
[  156.291206] Modules linked in: bcmdhd pci_tegra bluedroid_pm
[  156.299833] 
[  156.304114] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.38-tegra #1
[  156.313476] Hardware name: quill (DT)
[  156.320098] task: ffffffc1ece81900 ti: ffffffc1ece94000 task.ti: ffffffc1ece94000
[  156.333707] PC is at watchdog_timer_fn+0x230/0x33c
[  156.341684] LR is at watchdog_timer_fn+0x230/0x33c
[  156.349601] pc : [<ffffffc00013edf8>] lr : [<ffffffc00013edf8>] pstate: 600001c5
[  156.363401] sp : ffffffc1ece97ae0
[  156.370059] x29: ffffffc1ece97ae0 x28: 0000000000000003 
[  156.378863] x27: ffffffc001281b30 x26: ffffffc1f5fbe278 
[  156.392463] x25: ffffffc0012502d8 x24: ffffffc1ece97dc0 
[  156.401297] x23: 0000000000000000 x22: 0000000000000000 
[  156.410157] x21: ffffffc001281000 x20: ffffffc001250000 
[  156.419046] x19: ffffffc001250260 x18: ffffffc000bfc248 
[  156.427933] x17: 000000000000000e x16: ffffffc000b88a60 
[  156.436844] x15: ffffffc000b88a60 x14: 0000000000000008 
[  156.445783] x13: ffffffc1eaa686c0 x12: 0000000000000001 
[  156.454756] x11: 00000000ffffffff x10: 0000000000aaaaaa 
[  156.463664] x9 : 0000000000000522 x8 : 0000000000000000 
[  156.472447] x7 : 0000000000000001 x6 : ffffffc001299738 
[  156.481125] x5 : 0000000000000000 x4 : 0000000000000000 
[  156.489696] x3 : 0000000000000000 x2 : 0000000000010001 
[  156.501663] x1 : ffffffc1ece94000 x0 : 0000000000000026 
[  156.513660] 
[  156.522384] ---[ end trace b92561905f843e52 ]---
[  156.531099] Call trace:
[  156.536739] [<ffffffc00013edf8>] watchdog_timer_fn+0x230/0x33c
[  156.545808] [<ffffffc000107d64>] __hrtimer_run_queues+0x140/0x350
[  156.555094] [<ffffffc0001087c4>] hrtimer_interrupt+0x9c/0x1e0
[  156.563982] [<ffffffc000936874>] tegra186_timer_isr+0x24/0x30
[  156.572820] [<ffffffc0000f5650>] handle_irq_event_percpu+0x84/0x290
[  156.582122] [<ffffffc0000f58a0>] handle_irq_event+0x44/0x74
[  156.590582] [<ffffffc0000f8ba8>] handle_fasteoi_irq+0xb4/0x188
[  156.599258] [<ffffffc0000f4c70>] generic_handle_irq+0x24/0x38
[  156.607896] [<ffffffc0000f4f78>] __handle_domain_irq+0x60/0xb4
[  156.616549] [<ffffffc000081774>] gic_handle_irq+0x5c/0xb4
[  156.624692] [<ffffffc000084740>] el1_irq+0x80/0xf8
[  156.632197] [<ffffffc000820d20>] cpuidle_enter+0x18/0x20
[  156.640258] [<ffffffc0000e8354>] call_cpuidle+0x28/0x50
[  156.648224] [<ffffffc0000e84f8>] cpu_startup_entry+0x17c/0x340
[  156.656847] [<ffffffc00008ee44>] secondary_start_kernel+0x12c/0x164
[  156.665904] [<0000000080081acc>] 0x80081acc

If you have any ideas for things to try let me know. Thanks!

Note that when adding packages with JetPack it first flashes (unless you checked to not flash), and then the Jetson reboots after the flash before packages install. The two steps can be separated, you can disable flashing and then install packages, or you can just flash.

So far as the crash goes I am suspicious of the rootfs. How did you set up before using command line flash.sh? Here are the details to consider:

  • Must unpack sample rootfs with sudo (root authority).
  • Must run apply_binaries.sh with root authority.
  • Underlying file system type on host must be a native Linux type, typically ext4.
  • This consumes about 35GB of host space (or more). "df -H -t" can tell you about available host disk space and underlying file system types.
  • JetPack normally does this correctly, but you may get prompted for a password and not see it.

Is all of this correct for you?

A few points of clarification:
I was able to install with JetPack on this same machine originally with everything working fine a few weeks ago. This boot loop problem started so I tried to reflash, first using JetPack and then with flash.sh.

  1. I think JetPack did this for me? I can cleanup and try again from the top.
  2. I just tried running apply_binaries.sh and then flash.sh and had no change
  3. This could be a potential issue. I am on a machine with the ZFS file system. I can try from another machine. I am not sure how this managed to succeed with JetPack earlier if this is the issue however.
  4. I have plenty of space.

I can boot the Jetson into emergency mode with out running into the failure. It seems to be services which trigger this issue. Attempting to start the nv service results in the “Timeout detected @ gr_gk20a_submit_fecs_method_op” error but the system stays responsive.

When running JetPack it does do the rootfs unpack and apply_binaries.sh for you. JetPack will ask for your local system password when doing something needing sudo, so it is only something to worry about when manually using flash.sh or apply_binaries.sh. The apply_binaries.sh will fail to work correctly if you ran this manually and did not use sudo.

Often advice for multiple uses of JetPack is to delete the original install (PC side) and run it again if there are issues with JetPack itself.

The ZFS file system is one I’m not completely familiar with. I would put it high on the list of possible issues. I see information saying it uses FUSE, and FUSE (which is something used with a live DVD distribution) is not capable of preserving everything needed (or at least not the versions from a live DVD…live DVDs also lack loopback capabilities). Can you say more about why the host is using ZFS? Is there anything unusual about it? Is this a live DVD?

I just tried on another machine with a more regular file system. I used JetPack to just flash the os but ran into the same error.

The flash produces file “Linux_for_Tegra/bootloader/system.img.raw” on the PC host. What is the exact byte size of this file (“ls -l system.img.raw”)?

Can you loopback mount and umount this file?

sudo -s
mount -o loop system.img.raw /mnt
ls /mnt
df -H -T /mnt
umount /mnt
exit

What is the exact permission set of “sudo” from the rootfs?

cd /where/ever/it/is/Linux_for_Tegra/rootfs/usr/bin
ls -l sudo

The exact file size is 30064771072 bytes.

I was able to loopback mount and unmount the file system:

# ls /mnt
bin   dev  home  lost+found  mnt  proc        root  sbin  srv  tmp  var
boot  etc  lib   media       opt  README.txt  run   snap  sys  usr
# df -H -T /mnt
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     ext4   30G  3.4G   25G  12% /mnt

It looks like the permissions are root.

$ ls -l sudo
-rwsr-xr-x 1 root root 128480 May 29  2017 sudo

The image size corresponds exact to “-S 28GiB” size, so it is probably complete and not truncated (unless you used something like “-S 29318MiB”…then it guarantees truncation…but truncation is rarely aligned exactly with 1024 byte sizes).

Loopback mount shows all is good with the embedded file system with plenty of space and basically correct behavior (this doesn’t guarantee files on this are correct, but it does substantially suggest the process creating the files was valid).

The permissions of the “sudo” binary are correct as well. The sample rootfs unpacking was probably valid, although this only tests one file. The apply_binaries.sh could still have been via non-root, but I’d expect most of boot to get further even if apply_binaries.sh was invalid.

You might have an actual hardware failure, but it is hard to say. Was your original system working ok and then this happened for no apparent reason? Is it correct that you flashed because of an issue, and that flash itself was not the start of the issue?

Do you happen to have an SD card which can handle 32GB?

Yeah that’s correct. The problem happened out of the blue. Flashing was an attempt to fix the problem.

Yeah I do. Can you point me towards the directions for flashing/booting off and SD card?

Before doing the rest of the testing there is one simple test to try. Since it is GPU related you might see if it boots with no monitor attached. Serial console will still show what is going on.

What I’m hoping to do is create some test rescue disk configurations on the SD card which are derived directly from the flashed image (the system.img.raw file) or from a manually unpacked rootfs. I want to find out if reading from SD instead of eMMC changes anything, or if simplifying changes anything.

The SD card needs to be partitioned with GPT-aware tools, e.g., gdisk instead of fdisk. The first partition is where the rootfs will preside. So take your SD card, wipe all partitions. Then partition it with gdisk (or some other GPT aware partitioning tool) to have the first partition of about 32GB (size won’t need to be an exact match…the original actually uses far less than 32GB…I was being cautious). My example will call the SD card on your host PC “/dev/sdcard”, but you’ll need to adjust for what it shows as on the desktop PC…it might be “/dev/sdc” for example (monitor dmesg while inserting into the card reader for clues). On the Jetson the partition will be “/dev/mmcblk1p1” (versus the eMMC which is “mmcblk0p1”).

Once you’ve created your “/dev/sdcard1” (or more likely it’ll show as something like “/dev/sdc1”) verify GPT tools like the partitioning:

sudo gdisk -l /dev/sdcard

(it should probably mention a protective MBR)
You can use gparted to resize, this is GPT-aware.

Make sure your host does not use 64-bit ext4 extensions (I believe this is already the case). Check that the host’s “/etc/mke2fs.conf” file does not have either of these in the ext4 section:

metadata_csum
64bit

Format the partition (be very careful on getting the right partition):

sudo mkfs.ext4 /dev/sdcard1

To create a copy of the system.img.raw you’ll need to mount both the formatted SD card and system.img.raw. I’ll assume you’ve created mount points “/mnt/sdcard/” and “/mnt/image/” (adjust for your taste):

sudo -s
mkdir /mnt/sdcard
mkdir /mnt/image
mount -o loop /where/ever/it/is/system.img.raw /mnt/image
mount /dev/sdcard1 /mnt/sdcard
# See if it says it will do what you want first ("--dry-run"):
rsync --dry-run -avczrx /mnt/image/* /mnt/sdcard
# Assuming all is good:
rsync -avczrx /mnt/image/* /mnt/sdcard
umount /mnt/image

FYI, rsync is a better copy/restore mechanism due to how it serializes content and works even with odd file types. You could use cp with correct arguments, but I just don’t trust it with entire file systems. Plus it gives you the chance to see what it would do via “–dry-run” and you can afford to make a mistake without it really happening.

Now edit “/mnt/sdcard/boot/extlinux/extlinux.conf”. It should be like this:

TIMEOUT 30
DEFAULT sdcard

MENU TITLE p2771-0000 eMMC boot options

LABEL sdcard
      MENU LABEL sdcard
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/mmcblk1p1 rw rootwait rootfstype=ext4

LABEL emmc
      MENU LABEL eMMC
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4
LABEL sda1
      MENU LABEL SATA sda1
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/sda1 rw rootwait rootfstype=ext4

LABEL ro_fsck
      MENU LABEL eMMC read only fsck
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/mmcblk0p1 ro rootwait rootfstype=ext4 fsck.mode=force

You might want to save a copy of this somewhere safe after editing this for future use on any rescue SD card extlinux.conf. This defaults to booting to the SD card, but if you have a serial console it’ll allow you to test multiple entries.

Depending on the flash setup there is a good chance this extlinux will be read instead of the eMMC for something flashed with R28.2…much earlier releases might bypass the SD card…don’t know for sure. It depends on the U-Boot environment order in which it tests for extlinux.conf.

Boot up. See if it gets any further running on SD card. Report what you find.

If you have serial console and SD card failed, then you might see what the “read only fsck” entry does (if you don’t have a serial console you could set the “DEFAULT” to “ro_fsck” instead of “sdcard”).

If things are still not resolved you can reuse the “/boot” (including extlinux.conf) but simply overwrite the sample rootfs onto the SD card. Assuming SD is mounted on the host at “/mnt/sdcard” it would go something like this (also assumes still in sudo shell, the “sudo -s”):

cd /mnt/sdcard
tar xvfj /where/ever/it/is/Tegra_Linux_Sample-Root-Filesystem_R28.2.0_aarch64.tbz2
cd /where/ever/it/is/Linux_for_Tegra/
./apply_binaries.sh -r /mnt/sdcard
# Verify "/mnt/sdcard/boot" still contains the content from the system.img.raw and edited extlinux.conf.
cd
umount /mnt/sdcard
exit

This should boot with the same entries as before, but the file system itself will be almost entirely from the sample rootfs plus apply_binaries.sh overlay of the NVIDIA hardware accelerated drivers. See if you can boot and name sdcard and get this to boot. If this works, then you know there is something related to either eMMC or the content of eMMC failing. Should SD work then eMMC needs to be tested for whether it is a software failure or a hardware failure (e.g., via cloning).

Should none of the SD card variants work and fail in the same way I suspect it is hardware failure, but someone familiar with the GPU errors would need to comment.

There has been no monitor attached this whole time unfortunately.

It did not get any further booting from the SD card, with or without the sample rootfs. “read only fsk” didn’t help either.

My guesses at this point are either hardware or firmware.

Thanks for all your help @linuxdev!

It probably is time to RMA. One last test if you want to try: Check with an HDMI monitor.

I figured it out. I was using a 5V power cable instead of the 19V one that came with the jetson. It works fine with the original cable.