Running argus camera on single core sometimes fails

Hi, I’m running an argus capture application on a single core with max priority and niceness. Sometimes the stream does not start, the CPU goes to 100% and this error in dmesg:

[15611.115033] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 39s!
[15611.123041] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=-20 stuck for 37s!
[15611.131282] Showing busy workqueues and worker pools:
[15611.131284] workqueue events: flags=0x0
[15611.131286]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[15611.131294]     pending: defense_work_handler
[15611.131317] workqueue kblockd: flags=0x18
[15611.131319]   pwq 11: cpus=5 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[15611.131325]     pending: blk_mq_run_work_fn
[15641.835288] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 70s!
[15641.843342] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=-20 stuck for 67s!
[15641.851569] Showing busy workqueues and worker pools:
[15641.851572] workqueue events: flags=0x0
[15641.851574]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[15641.851582]     pending: defense_work_handler
[15641.851605] workqueue kblockd: flags=0x18
[15641.851606]   pwq 11: cpus=5 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[15641.851613]     pending: blk_mq_run_work_fn
[15659.115529] FAN cooling trip_level:0 cur_temp:37700 trip_temps[1]:46000
[15672.555544] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 101s!
[15672.563709] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=-20 stuck for 98s!
[15672.571943] Showing busy workqueues and worker pools:
[15672.571946] workqueue events: flags=0x0
[15672.571948]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[15672.571956]     pending: defense_work_handler
[15672.571979] workqueue kblockd: flags=0x18
[15672.571980]   pwq 11: cpus=5 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[15672.571987]     pending: blk_mq_run_work_fn
[15703.275801] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 132s!
[15703.283918] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=-20 stuck for 129s!
[15703.292240] Showing busy workqueues and worker pools:
[15703.292243] workqueue events: flags=0x0
[15703.292244]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[15703.292253]     pending: defense_work_handler
[15703.292276] workqueue kblockd: flags=0x18
[15703.292278]   pwq 11: cpus=5 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[15703.292284]     pending: blk_mq_run_work_fn
[15733.996065] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 162s!
[15734.004214] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=-20 stuck for 160s!
[15734.012548] Showing busy workqueues and worker pools:
[15734.012551] workqueue events: flags=0x0
[15734.012552]   pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[15734.012560]     pending: defense_work_handler
[15734.012584] workqueue kblockd: flags=0x18
[15734.012586]   pwq 11: cpus=5 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[15734.012592]     pending: blk_mq_run_work_fn

Hi,
For TX2, the latest release is Jetpack 4.6.3. If you use previous release, please upgrade to this version and try again.

I am using this version, it seems to happen when tensorrt is being initialized in parallel with libargus

This is the dmesg running on a single core:

[ 1196.594294] nvgpu: 17000000.gp10b     gk20a_channel_timeout_handler:1573 [ERR]  Job on channel 502 timed out
[ 1196.604499] NV_PGRAPH_STATUS: 0x0
[ 1196.604503] NV_PGRAPH_STATUS1: 0x0
[ 1196.604506] NV_PGRAPH_STATUS2: 0x0
[ 1196.604510] NV_PGRAPH_ENGINE_STATUS: 0x0
[ 1196.604512] NV_PGRAPH_GRFIFO_STATUS : 0x1
[ 1196.604516] NV_PGRAPH_GRFIFO_CONTROL : 0x10001
[ 1196.604518] NV_PGRAPH_PRI_FECS_HOST_INT_STATUS : 0x0
[ 1196.604521] NV_PGRAPH_EXCEPTION  : 0x0
[ 1196.604524] NV_PGRAPH_FECS_INTR  : 0x0
[ 1196.604527] NV_PFIFO_ENGINE_STATUS(GR) : 0x10061006
[ 1196.604529] NV_PGRAPH_ACTIVITY0: 0x0
[ 1196.604532] NV_PGRAPH_ACTIVITY1: 0x0
[ 1196.604535] NV_PGRAPH_ACTIVITY2: 0x0
[ 1196.604537] NV_PGRAPH_ACTIVITY4: 0x0
[ 1196.604540] NV_PGRAPH_PRI_SKED_ACTIVITY: 0x0
[ 1196.604543] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY0: 0x0
[ 1196.604546] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY1: 0x0
[ 1196.604549] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY2: 0x0
[ 1196.604551] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_ACTIVITY3: 0x0
[ 1196.604554] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604557] NV_PGRAPH_PRI_GPC0_TPC1_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604560] NV_PGRAPH_PRI_GPC0_TPCS_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604563] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY0: 0x0
[ 1196.604566] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY1: 0x0
[ 1196.604569] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY2: 0x0
[ 1196.604571] NV_PGRAPH_PRI_GPCS_GPCCS_GPC_ACTIVITY3: 0x0
[ 1196.604574] NV_PGRAPH_PRI_GPCS_TPC0_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604577] NV_PGRAPH_PRI_GPCS_TPC1_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604580] NV_PGRAPH_PRI_GPCS_TPCS_TPCCS_TPC_ACTIVITY0: 0x0
[ 1196.604583] NV_PGRAPH_PRI_BE0_BECS_BE_ACTIVITY0: 0x0
[ 1196.604585] NV_PGRAPH_PRI_BE1_BECS_BE_ACTIVITY0: 0x0
[ 1196.604588] NV_PGRAPH_PRI_BES_BECS_BE_ACTIVITY0: 0x0
[ 1196.604591] NV_PGRAPH_PRI_DS_MPIPE_STATUS: 0x0
[ 1196.604594] NV_PGRAPH_PRI_FE_GO_IDLE_TIMEOUT : 0x7fffffff
[ 1196.604597] NV_PGRAPH_PRI_FE_GO_IDLE_INFO : 0x33000700
[ 1196.604600] NV_PGRAPH_PRI_GPC0_TPC0_TEX_M_TEX_SUBUNITS_STATUS: 0x0
[ 1196.604603] NV_PGRAPH_PRI_CWD_FS: 0x0
[ 1196.604606] NV_PGRAPH_PRI_FE_TPC_FS: 0x0
[ 1196.604608] NV_PGRAPH_PRI_CWD_GPC_TPC_ID(0): 0x0
[ 1196.604611] NV_PGRAPH_PRI_CWD_SM_ID(0): 0x0
[ 1196.604614] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_FE_0: 0x0
[ 1196.604617] NV_PGRAPH_PRI_FECS_CTXSW_STATUS_1: 0x100
[ 1196.604620] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_GPC_0: 0x0
[ 1196.604623] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_STATUS_1: 0x380
[ 1196.604625] NV_PGRAPH_PRI_FECS_CTXSW_IDLESTATE : 0xf
[ 1196.604628] NV_PGRAPH_PRI_GPC0_GPCCS_CTXSW_IDLESTATE : 0xf
[ 1196.604631] NV_PGRAPH_PRI_FECS_CURRENT_CTX : 0x1ff8b3e
[ 1196.604634] NV_PGRAPH_PRI_FECS_NEW_CTX : 0x1ff8b3e
[ 1196.604637] NV_PGRAPH_PRI_BE0_CROP_STATUS1 : 0x700000
[ 1196.604639] NV_PGRAPH_PRI_BES_CROP_STATUS1 : 0x700000
[ 1196.604642] NV_PGRAPH_PRI_BE0_ZROP_STATUS : 0x0
[ 1196.604645] NV_PGRAPH_PRI_BE0_ZROP_STATUS2 : 0x0
[ 1196.604648] NV_PGRAPH_PRI_BES_ZROP_STATUS : 0x0
[ 1196.604651] NV_PGRAPH_PRI_BES_ZROP_STATUS2 : 0x0
[ 1196.604653] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION: 0x0
[ 1196.604656] NV_PGRAPH_PRI_BE0_BECS_BE_EXCEPTION_EN: 0x0
[ 1196.604659] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION: 0x0
[ 1196.604662] NV_PGRAPH_PRI_GPC0_GPCCS_GPC_EXCEPTION_EN: 0x30000
[ 1196.604664] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION: 0x0
[ 1196.604667] NV_PGRAPH_PRI_GPC0_TPC0_TPCCS_TPC_EXCEPTION_EN: 0x3
[ 1196.604686] nvgpu: 17000000.gp10b   nvgpu_set_error_notifier_locked:137  [ERR]  error notifier set to 8 for ch 502
[ 1196.616468] ---- mlocks ----

[ 1196.616498] ---- syncpts ----
[ 1196.616543] id 11 (15600000.isp_camerad_0) min 11095 max 11095 refs 1 (previous client : 15600000.isp_camerad_0)
[ 1196.616547] id 12 (15600000.isp_camerad_1) min 11063 max 11063 refs 1 (previous client : 15600000.isp_camerad_1)
[ 1196.616552] id 13 (15600000.isp_camerad_2) min 11249 max 11249 refs 1 (previous client : 15600000.isp_camerad_2)
[ 1196.616556] id 14 (15600000.isp_camerad_3) min 7502 max 7502 refs 1 (previous client : 15600000.isp_camerad_3)
[ 1196.616560] id 15 (15600000.isp_camerad_4) min 35026 max 35026 refs 1 (previous client : 15600000.isp_camerad_4)
[ 1196.616563] id 16 (15600000.isp_camerad_5) min 7502 max 7502 refs 1 (previous client : 15600000.isp_camerad_5)
[ 1196.616567] id 17 (gp10b_507) min 31215 max 31215 refs 1 (previous client : gp10b_507)
[ 1196.616570] id 18 (gp10b_506) min 3769 max 3769 refs 1 (previous client : gp10b_506)
[ 1196.616577] id 22 (15700000.vi_0) min 3592 max 3592 refs 1 (previous client : 15340000.vic__dmonitoringmod_0)
[ 1196.616580] id 23 (15700000.vi_0) min 7323 max 7323 refs 1 (previous client : 15340000.vic__modeld_0)
[ 1196.616584] id 24 (15700000.vi_1) min 11155 max 11155 refs 1 (previous client : 15340000.vic_encoderd_0)
[ 1196.616588] id 25 (15700000.vi_2) min 2511 max 2511 refs 1 (previous client : 15700000.vi_0)
[ 1196.616593] id 28 (150c0000.nvcsi_0) min 5023 max 5023 refs 1 (previous client : 15700000.vi_0)
[ 1196.616597] id 29 (15340000.vic_camerad_0) min 5013 max 5013 refs 1 (previous client : 15700000.vi_1)
[ 1196.616600] id 30 (gp10b_502) min 2543 max 2543 refs 1 (previous client : 15700000.vi_2)
[ 1196.616603] id 31 (15340000.vic_camerad_0) min 32 max 32 refs 1 (previous client : 150c0000.nvcsi_0)
[ 1196.616668] id 69 (15340000.vic_nvargus-daemon_0) min 7077 max 7077 refs 1 (previous client : 15340000.vic_WideRoadCamera_0)

[ 1196.617350] ---- channels ----
[ 1196.617361] 
               channel 2 - 15820000.se

[ 1196.617363] NvHost basic channel registers:
[ 1196.617366] CMDFIFO_STAT_0:  00002040
[ 1196.617369] CMDFIFO_RDATA_0: 088c2289
[ 1196.617373] CMDP_OFFSET_0:   00000000
[ 1196.617376] CMDP_CLASS_0:    00000000
[ 1196.617379] CHANNELSTAT_0:   00000000
[ 1196.617380] The CDMA sync queue is empty.

[ 1196.617385] 
               channel 3 - 15830000.se

[ 1196.617386] NvHost basic channel registers:
[ 1196.617388] CMDFIFO_STAT_0:  00002040
[ 1196.617391] CMDFIFO_RDATA_0: 20800a41
[ 1196.617395] CMDP_OFFSET_0:   00000000
[ 1196.617397] CMDP_CLASS_0:    00000000
[ 1196.617399] CHANNELSTAT_0:   00000000
[ 1196.617401] The CDMA sync queue is empty.

[ 1196.617405] 
               channel 4 - 15840000.se

[ 1196.617406] NvHost basic channel registers:
[ 1196.617408] CMDFIFO_STAT_0:  00002040
[ 1196.617411] CMDFIFO_RDATA_0: 08200d40
[ 1196.617414] CMDP_OFFSET_0:   00000000
[ 1196.617417] CMDP_CLASS_0:    00000000
[ 1196.617419] CHANNELSTAT_0:   00000000
[ 1196.617422] The CDMA sync queue is empty.

[ 1196.617447] 
               ---- host general irq ----

[ 1196.617450] sync_intc0mask = 0x00000001
[ 1196.617452] sync_intmask = 0x50000003
[ 1196.617453] 
               ---- host syncpt irq mask ----

[ 1196.617455] 
               ---- host syncpt irq status ----

[ 1196.617458] syncpt_thresh_cpu0_int_status(0) = 0x40000000
[ 1196.617461] syncpt_thresh_cpu0_int_status(1) = 0x00000000
[ 1196.617464] syncpt_thresh_cpu0_int_status(2) = 0x00000000
[ 1196.617466] syncpt_thresh_cpu0_int_status(3) = 0x00000000
[ 1196.617470] syncpt_thresh_cpu0_int_status(4) = 0x00000000
[ 1196.617472] syncpt_thresh_cpu0_int_status(5) = 0x00000000
[ 1196.617475] syncpt_thresh_cpu0_int_status(6) = 0x00000000
[ 1196.617478] syncpt_thresh_cpu0_int_status(7) = 0x00000000
[ 1196.617480] syncpt_thresh_cpu0_int_status(8) = 0x00000000
[ 1196.617483] syncpt_thresh_cpu0_int_status(9) = 0x00000000
[ 1196.617485] syncpt_thresh_cpu0_int_status(10) = 0x00000000
[ 1196.617488] syncpt_thresh_cpu0_int_status(11) = 0x00000000
[ 1196.617490] syncpt_thresh_cpu0_int_status(12) = 0x00000000
[ 1196.617493] syncpt_thresh_cpu0_int_status(13) = 0x00000000
[ 1196.617496] syncpt_thresh_cpu0_int_status(14) = 0x00000000
[ 1196.617498] syncpt_thresh_cpu0_int_status(15) = 0x00000000
[ 1196.617501] syncpt_thresh_cpu0_int_status(16) = 0x00000000
[ 1196.617503] syncpt_thresh_cpu0_int_status(17) = 0x00000000
[ 1196.617508] gp10b pbdma 0: 
[ 1196.617511] id: 6 (tsg), next_id: 6 (tsg) chan status: invalid
[ 1196.617547] PBDMA_PUT: 0000001f00008020 PBDMA_GET: 0000001f00008020 GP_PUT: 00000002 GP_GET: 00000002 FETCH: 00000002 HEADER: 60400000
               HDR: 00000000 SHADOW0: 00008000 SHADOW1: 0000201f

[ 1196.617552] gp10b eng 0: 
[ 1196.617554] id: 6 (tsg), next_id: 6 (tsg), ctx status: invalid 

[ 1196.617559] gp10b eng 1: 
[ 1196.617561] id: 6 (tsg), next_id: 6 (tsg), ctx status: invalid 


[ 1196.617742] 502-gp10b, pid 5277, refs 5: 
[ 1196.617745] channel status: not in use idle not busy
[ 1196.617749] RAMFC : TOP: 8000001f00008020 PUT: 0000001f00008020 GET: 0000001f00008020 FETCH: 0000001f00008020
               HEADER: 60400000 COUNT: 80000000
               SYNCPOINT 00000000 00001e01 SEMAPHORE 0000001e 00060aa0 00000000 00000002

[ 1196.617755] 503-gp10b, pid 5232, refs 2, deterministic: 
[ 1196.617757] channel status:  in use idle not busy
[ 1196.617760] RAMFC : TOP: 0000000000000000 PUT: 00000001006502d8 GET: 00000001006502d8 FETCH: 00000201006502d8
               HEADER: 60400000 COUNT: 84000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617765] 504-gp10b, pid 5232, refs 2, deterministic: 
[ 1196.617767] channel status:  in use idle not busy
[ 1196.617771] RAMFC : TOP: 0000000000000000 PUT: 0000000100550294 GET: 0000000100550294 FETCH: 0000020100550294
               HEADER: 60400000 COUNT: 84000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617775] 505-gp10b, pid 5232, refs 2, deterministic: 
[ 1196.617777] channel status:  in use idle not busy
[ 1196.617781] RAMFC : TOP: 0000000000000000 PUT: 0000000100450294 GET: 0000000100450294 FETCH: 0000020100450294
               HEADER: 60400000 COUNT: 84000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617785] 506-gp10b, pid 5232, refs 3, deterministic: 
[ 1196.617787] channel status:  in use idle not busy
[ 1196.617791] RAMFC : TOP: 8000001f000e0090 PUT: 0000001f000e0090 GET: 0000001f000e0090 FETCH: 0000001f000e0090
               HEADER: 60400000 COUNT: 80000000
               SYNCPOINT 00000000 00001201 SEMAPHORE 00000001 0002fff0 0000014c 00000004

[ 1196.617796] 507-gp10b, pid 5232, refs 2, deterministic: 
[ 1196.617798] channel status:  in use idle not busy
[ 1196.617801] RAMFC : TOP: 0000000000000000 PUT: 000000010025e9a0 GET: 000000010025e9a0 FETCH: 000002010025e9a0
               HEADER: 60400000 COUNT: 84000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000001 0002ffb0 00000007 00001004

[ 1196.617806] 508-gp10b, pid 3828, refs 2: 
[ 1196.617808] channel status:  in use idle not busy
[ 1196.617811] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
               HEADER: 60400000 COUNT: 00000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617815] 509-gp10b, pid 3828, refs 2: 
[ 1196.617817] channel status:  in use idle not busy
[ 1196.617820] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
               HEADER: 60400000 COUNT: 00000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617825] 510-gp10b, pid 3828, refs 2: 
[ 1196.617827] channel status:  in use idle not busy
[ 1196.617830] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
               HEADER: 60400000 COUNT: 00000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[ 1196.617834] 511-gp10b, pid 3828, refs 2: 
[ 1196.617836] channel status:  in use idle not busy
[ 1196.617839] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
               HEADER: 60400000 COUNT: 00000000
               SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

any update? it is hanging on icapture_session->createOutputStream and after this happens irq/52-host_status starts taking up alot of cpu even if nothing else is running