Hi nvidia,
I run my app on x86 with rtx4000, it crash sometimes.
Below is the log:
[ 981.638767] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 981.638922] rcu: 11-…0: (1 GPs behind) idle=449/1/0x4000000000000000 softirq=297802/297802 fqs=4574
[ 981.639146] (detected by 1, t=21002 jiffies, g=516213, q=21077)
[ 981.639290] Sending NMI from CPU 1 to CPUs 11:
[ 981.704018] usb 2-2: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 986.715562] usb 2-2: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 988.726327] usb 2-2: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 989.084825] cmd_mgr_queue cmd timed-out cmd_mgr->queue_sz:1
[ 989.084967] tkn[274] flags:0032 result: -4 cmd: 117-MM_GET_STA_INFO_REQ - reqcfm( 118-MM_GET_STA_INFO_CFM)
[ 989.085364] cmd queue crashed
[ 990.365017] NVRM: Xid (PCI:0000:00:02): 16, pid=‘’, name=, Head 00000003 Count 0000fdf6
[ 991.028904] cmd queue crashed
[ 991.029008] cmd queue crashed
[ 991.640195] rcu: rcu_preempt kthread starved for 9998 jiffies! g516213 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=5
[ 991.640445] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 991.640661] rcu: RCU grace-period kthread stack dump:
[ 991.640782] task:rcu_preempt state:R running task stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 991.641018] Call Trace:
[ 991.641080]
[ 991.641134] __schedule+0x2ea/0x9b0
[ 991.641225] ? restore_regs_and_return_to_kernel+0x23/0x23
[ 991.641359] preempt_schedule_irq+0x3b/0x60
[ 991.641460] irqentry_exit+0x1c/0x50
[ 991.641550] asm_sysvec_reschedule_ipi+0x16/0x20
[ 991.641662] RIP: 0010:_raw_spin_unlock_irqrestore+0x20/0x40
[ 991.641797] Code: cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 f3 e8 22 8d 05 ff 80 e7 02 74 06 e8 58 24 0f ff fb 65 ff 0d 50 63 f4 48 <74> 06 5b c3 cc cc cc cc e8 eb f5 f2 fe 5b c3 cc cc cc cc 66 66 2e
[ 991.642233] RSP: 0018:ffffab9080103e20 EFLAGS: 00000286
[ 991.642359] RAX: 0000000080000001 RBX: 0000000000000296 RCX: 0000000000000040
[ 991.642529] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb70d49e8
[ 991.642697] RBP: 0000000000000296 R08: 0000000000000000 R09: ffff94a41fb6ac70
[ 991.642866] R10: ffff949cc092c410 R11: ffff94a41fb6a2b0 R12: 0000000000000000
[ 991.643034] R13: 000000000002af40 R14: 0000000000000040 R15: ffffffffb7f3e400
[ 991.643203] ? _raw_spin_unlock_irqrestore+0x18/0x40
[ 991.643323] force_qs_rnp+0x85/0x200
[ 991.643412] ? schedule_page_work_fn+0x30/0x30
[ 991.643522] rcu_gp_fqs_loop+0x2da/0x3f0
[ 991.643618] rcu_gp_kthread+0xb0/0x120
[ 991.643709] ? rcu_gp_cleanup+0x370/0x370
[ 991.643806] kthread+0x13a/0x170
[ 991.643887] ? set_kthread_struct+0x50/0x50
[ 991.643989] ret_from_fork+0x1f/0x30
[ 991.644078]
[ 991.644133] rcu: Stack dump where RCU GP kthread last ran:
[ 991.644264] Sending NMI from CPU 1 to CPUs 5:
[ 991.644374] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x5c/0xb0
[ 991.737371] usb 2-2: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 1000.606761] NVRM: Xid (PCI:0000:00:02): 8, pid=‘’, name=, Channel 00000028
[ 1007.702558] usb 2-1: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 1020.716015] usb 2-1: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
[ 1024.291043] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [mivins_ros2_nai:24507]
[ 1024.291239] Modules linked in: aic8800_fdrv(O) cfg80211 aic_load_fw(O) nvidia_uvm(PO) uio_pci_generic uio nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger hid_sensor_iio_common hid_sensor_hub efivarfs
[ 1024.291771] CPU: 5 PID: 24507 Comm: mivins_ros2_nai Tainted: P B O 5.15.163-acrn-perception-vm #1
[ 1024.292007] RIP: 0010:smp_call_function_many_cond+0xee/0x2a0
[ 1024.292148] Code: 33 48 89 ee e8 93 1e 5d 00 3b 05 51 c5 fd 01 89 c7 73 21 48 63 c7 49 8b 0e 48 03 0c c5 80 38 c5 b7 8b 41 08 a8 01 74 0a f3 90 <8b> 51 08 83 e2 01 75 f6 eb cd 48 83 c4 48 5b 5d 41 5c 41 5d 41 5e
[ 1024.292584] RSP: 0018:ffffab908968b940 EFLAGS: 00000202
[ 1024.292710] RAX: 0000000000000011 RBX: 0000000000000001 RCX: ffffcb907fa494a0
[ 1024.292879] RDX: 0000000000000001 RSI: ffff94a41fb6b188 RDI: 0000000000000001
[ 1024.293047] RBP: ffff94a41fb6b188 R08: 0000000000000001 R09: 0000000000000004
[ 1024.293216] R10: 0000000000000000 R11: 00006b63c0000000 R12: 0000000000000000
[ 1024.293385] R13: ffff94a41fb6b180 R14: ffff94a41fb6b180 R15: 000036ec600094a0
[ 1024.293554] FS: 00007f49a9fff000(0000) GS:ffff94a41fb40000(0000) knlGS:0000000000000000
[ 1024.293745] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1024.293881] CR2: 00007f499ae4e000 CR3: 00000002464c0006 CR4: 0000000000770ee0
[ 1024.294050] PKRU: 55555554
[ 1024.294118] Call Trace:
[ 1024.294180]
[ 1024.294232] ? watchdog_timer_fn+0x1aa/0x200
[ 1024.294337] ? softlockup_fn+0x30/0x30
[ 1024.294429] ? __hrtimer_run_queues+0x96/0x2d0
[ 1024.294537] ? hrtimer_interrupt+0x106/0x220
[ 1024.294641] ? __sysvec_apic_timer_interrupt+0x59/0x100
[ 1024.294768] ? sysvec_apic_timer_interrupt+0x65/0x90
[ 1024.294889]
[ 1024.294942]
[ 1024.294995] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 1024.295126] ? smp_call_function_many_cond+0xee/0x2a0
[ 1024.295247] ? __flush_tlb_all+0x30/0x30
[ 1024.295344] on_each_cpu_cond_mask+0x25/0x40
[ 1024.295447] __purge_vmap_area_lazy+0xc0/0x710
[ 1024.295557] ? purge_fragmented_blocks_allcpus+0x42/0x220
[ 1024.295686] ? trace_hardirqs_off+0x2a/0xd0
[ 1024.295788] _vm_unmap_aliases+0x113/0x150
[ 1024.295888] change_page_attr_set_clr+0x97/0x280
[ 1024.296001] set_pages_array_wb+0x26/0x70
[ 1024.296101] nv_free_system_pages+0x22e/0x2e0 [nvidia]
[ 1024.296511] nv_free_pages+0x8c/0x90 [nvidia]
[ 1024.296820] _nv041386rm+0x80/0xf0 [nvidia]
[ 1024.297185] ? _nv036603rm+0x27a/0x750 [nvidia]
[ 1024.297716] ? _nv036564rm+0x5f/0x130 [nvidia]
[ 1024.298057] ? _nv002832rm+0xd/0x20 [nvidia]
[ 1024.298465] ? _nv004574rm+0x1e/0xb0 [nvidia]
[ 1024.298876] ? _nv017123rm+0x59c/0x680 [nvidia]
[ 1024.299306] ? _nv045348rm+0xab/0xe0 [nvidia]
[ 1024.299662] ? _nv047053rm+0xb3/0x180 [nvidia]
[ 1024.300092] ? _nv047052rm+0x3e5/0x690 [nvidia]
[ 1024.300521] ? _nv045247rm+0xdd/0x180 [nvidia]
[ 1024.300878] ? _nv045248rm+0x41/0x70 [nvidia]
[ 1024.301234] ? _nv000571rm+0x4d/0x60 [nvidia]
[ 1024.301591] ? _nv000731rm+0x1b7/0xeb0 [nvidia]
[ 1024.301965] ? rm_ioctl+0x58/0xb0 [nvidia]
[ 1024.302322] ? nvidia_unlocked_ioctl+0x6e0/0x950 [nvidia]
[ 1024.302653] ? __x64_sys_ioctl+0x88/0xc0
[ 1024.302750] ? do_syscall_64+0x33/0x80
[ 1024.302842] ? entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 1024.302968]
[ 1024.303023] Kernel panic - not syncing: softlockup: hung tasks
[ 1024.303163] CPU: 5 PID: 24507 Comm: mivins_ros2_nai Tainted: P B O L 5.15.163-acrn-perception-vm #1
[ 1024.303399] Call Trace:
[ 1024.303460]
[ 1024.303511] dump_stack_lvl+0x45/0x5d
[ 1024.303602] panic+0x114/0x2d4
[ 1024.303680] watchdog_timer_fn.cold+0xc/0x16
[ 1024.303784] ? softlockup_fn+0x30/0x30
[ 1024.303875] __hrtimer_run_queues+0x96/0x2d0
[ 1024.303979] hrtimer_interrupt+0x106/0x220
[ 1024.304080] __sysvec_apic_timer_interrupt+0x59/0x100
[ 1024.304202] sysvec_apic_timer_interrupt+0x65/0x90
[ 1024.304318]
[ 1024.304371]
[ 1024.304423] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 1024.304547] RIP: 0010:smp_call_function_many_cond+0xee/0x2a0
[ 1024.304683] Code: 33 48 89 ee e8 93 1e 5d 00 3b 05 51 c5 fd 01 89 c7 73 21 48 63 c7 49 8b 0e 48 03 0c c5 80 38 c5 b7 8b 41 08 a8 01 74 0a f3 90 <8b> 51 08 83 e2 01 75 f6 eb cd 48 83 c4 48 5b 5d 41 5c 41 5d 41 5e
[ 1024.305118] RSP: 0018:ffffab908968b940 EFLAGS: 00000202
[ 1024.305243] RAX: 0000000000000011 RBX: 0000000000000001 RCX: ffffcb907fa494a0
[ 1024.305411] RDX: 0000000000000001 RSI: ffff94a41fb6b188 RDI: 0000000000000001
[ 1024.305580] RBP: ffff94a41fb6b188 R08: 0000000000000001 R09: 0000000000000004
[ 1024.305748] R10: 0000000000000000 R11: 00006b63c0000000 R12: 0000000000000000
[ 1024.305916] R13: ffff94a41fb6b180 R14: ffff94a41fb6b180 R15: 000036ec600094a0
[ 1024.306086] ? __flush_tlb_all+0x30/0x30
[ 1024.306181] on_each_cpu_cond_mask+0x25/0x40
[ 1024.306285] __purge_vmap_area_lazy+0xc0/0x710
[ 1024.306393] ? purge_fragmented_blocks_allcpus+0x42/0x220
[ 1024.306522] ? trace_hardirqs_off+0x2a/0xd0
[ 1024.306623] _vm_unmap_aliases+0x113/0x150
[ 1024.306723] change_page_attr_set_clr+0x97/0x280
[ 1024.306835] set_pages_array_wb+0x26/0x70
[ 1024.306933] nv_free_system_pages+0x22e/0x2e0 [nvidia]
[ 1024.307263] nv_free_pages+0x8c/0x90 [nvidia]
[ 1024.307570] _nv041386rm+0x80/0xf0 [nvidia]
[ 1024.307930] ? _nv036603rm+0x27a/0x750 [nvidia]
[ 1024.308461] ? _nv036564rm+0x5f/0x130 [nvidia]
[ 1024.308800] ? _nv002832rm+0xd/0x20 [nvidia]
[ 1024.309210] ? _nv004574rm+0x1e/0xb0 [nvidia]
[ 1024.309621] ? _nv017123rm+0x59c/0x680 [nvidia]
[ 1024.310037] ? _nv045348rm+0xab/0xe0 [nvidia]
[ 1024.310384] ? _nv047053rm+0xb3/0x180 [nvidia]
[ 1024.310800] ? _nv047052rm+0x3e5/0x690 [nvidia]
[ 1024.311218] ? _nv045247rm+0xdd/0x180 [nvidia]
[ 1024.311566] ? _nv045248rm+0x41/0x70 [nvidia]
[ 1024.311909] ? _nv000571rm+0x4d/0x60 [nvidia]
[ 1024.312254] ? _nv000731rm+0x1b7/0xeb0 [nvidia]
[ 1024.312624] ? rm_ioctl+0x58/0xb0 [nvidia]
[ 1024.312979] ? nvidia_unlocked_ioctl+0x6e0/0x950 [nvidia]
[ 1024.313308] ? __x64_sys_ioctl+0x88/0xc0
[ 1024.313404] ? do_syscall_64+0x33/0x80
[ 1024.313495] ? entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 1024.313621]
[ 1025.382058] Shutting down cpus with NMI
Help please, thanks!!!