tx2 kernel crash: nvgpu

Hi,

0. nvgpu crash

I have a kernel crash on nvgpu when I’m trying to get images from yavta or v4l2-ctl after using gst-launch-1.0 nvarguscamerasrc. This is the procedure + logs:

1. v4l2

  • Start v4l2-ctl and/or yavta to test v4l2 pipeline:
# v4l2-ctl --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --stream-mmap  --stream-count=100 -d /dev/video0
# yavta -s 1920x1080 -f SRGGB10 /dev/video0 -c

This works fine.

1.1 Logs

  • kernel message output (after stopping the stream):
  • note: same output for v4l2-ctl and for yavta.
[  355.400571] vb2:   counters for queue ffffffc1eb507b58, buffer 3: UNBALANCED!
[  355.400583] vb2:     buf_init: 1 buf_cleanup: 1 buf_prepare: 26 buf_finish: 26
[  355.400588] vb2:     buf_queue: 27 buf_done: 27
[  355.400595] vb2:     alloc: 1 put: 1 prepare: 27 finish: 26 mmap: 1
[  355.400599] vb2:     get_userptr: 0 put_userptr: 0
[  355.400604] vb2:     attach_dmabuf: 0 detach_dmabuf: 0 map_dmabuf: 0 unmap_dmabuf: 0
[  355.400609] vb2:     get_dmabuf: 0 num_users: 0 vaddr: 0 cookie: 26
  • trace output:
  • kworker/0:1-1198  [000] ....   274.735929: rtos_queue_peek_from_isr_failed: tstamp:8972382855 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   274.903933: rtos_queue_peek_from_isr_failed: tstamp:8977382861 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   275.071964: rtos_queue_peek_from_isr_failed: tstamp:8982382867 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   275.071980: rtos_queue_peek_from_isr_failed: tstamp:8982472165 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   346.087887: rtos_queue_peek_from_isr_failed: tstamp:11201811997 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   346.087891: rtcpu_start: tstamp:11201813344
         kworker/0:1-1198  [000] ....   346.087893: rtos_queue_send_from_isr_failed: tstamp:11201827802 queue:0x0b4a7258
         kworker/0:1-1198  [000] ....   346.087894: rtos_queue_send_from_isr_failed: tstamp:11201827919 queue:0x0b4aad68
         kworker/0:1-1198  [000] ....   346.087895: rtos_queue_send_from_isr_failed: tstamp:11201828031 queue:0x0b4ac998
         kworker/0:1-1198  [000] ....   346.087896: rtos_queue_send_from_isr_failed: tstamp:11201828137 queue:0x0b4ae518
         kworker/0:1-1198  [000] ....   346.087897: rtos_queue_send_from_isr_failed: tstamp:11201828242 queue:0x0b4af2d8
         kworker/0:1-1198  [000] ....   346.087897: rtos_queue_send_from_isr_failed: tstamp:11201828347 queue:0x0b4b0098
         kworker/0:1-1198  [000] ....   346.087898: rtos_queue_send_from_isr_failed: tstamp:11201828451 queue:0x0b4b0e58
         kworker/0:1-1198  [000] ....   346.087899: rtos_queue_send_from_isr_failed: tstamp:11201828554 queue:0x0b4b1c18
         kworker/0:1-1198  [000] ....   346.087900: rtos_queue_send_failed: tstamp:11201829016 queue:0x0b4a7258
         kworker/0:1-1198  [000] ....   346.087901: rtos_queue_send_from_isr_failed: tstamp:11201831302 queue:0x0b4a7258
         kworker/0:1-1198  [000] ....   346.087901: rtos_queue_send_from_isr_failed: tstamp:11201831418 queue:0x0b4aad68
         kworker/0:1-1198  [000] ....   346.087902: rtos_queue_send_from_isr_failed: tstamp:11201831524 queue:0x0b4ac998
         kworker/0:1-1198  [000] ....   346.087903: rtos_queue_send_from_isr_failed: tstamp:11201831630 queue:0x0b4ae518
         kworker/0:1-1198  [000] ....   346.087903: rtos_queue_send_from_isr_failed: tstamp:11201831735 queue:0x0b4af2d8
         kworker/0:1-1198  [000] ....   346.087904: rtos_queue_send_from_isr_failed: tstamp:11201831840 queue:0x0b4b0098
         kworker/0:1-1198  [000] ....   346.087904: rtos_queue_send_from_isr_failed: tstamp:11201831949 queue:0x0b4b0e58
         kworker/0:1-1198  [000] ....   346.087905: rtos_queue_send_from_isr_failed: tstamp:11201832054 queue:0x0b4b1c18
         kworker/0:1-1198  [000] ....   346.087906: rtos_queue_send_failed: tstamp:11201832997 queue:0x0b4a7258
         kworker/0:1-1198  [000] ....   346.199930: rtcpu_vinotify_event: tstamp:11206437017 tag:ATOMP_FS channel:0x00 frame:1 vi_tstamp:11206436608 data:0x00000000
         kworker/0:1-1198  [000] ....   346.199935: rtcpu_vinotify_event: tstamp:11206466511 tag:CHANSEL_PXL_SOF channel:0x00 frame:1 vi_tstamp:11206466117 data:0x00000001
         kworker/0:1-1198  [000] ....   346.199936: rtcpu_vinotify_event: tstamp:11206472568 tag:CHANSEL_LOAD_FRAMED channel:0x01 frame:1 vi_tstamp:11206472179 data:0x08000000
         kworker/0:1-1198  [000] ....   346.255976: rtos_queue_peek_from_isr_failed: tstamp:11206812428 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   346.255982: rtcpu_vinotify_event: tstamp:11207403085 tag:CHANSEL_PXL_EOF channel:0x00 frame:1 vi_tstamp:11207402298 data:0x04370002
         kworker/0:1-1198  [000] ....   346.255985: rtcpu_vinotify_event: tstamp:11207403278 tag:ATOMP_FE channel:0x00 frame:1 vi_tstamp:11207402532 data:0x00000000
         kworker/0:1-1198  [000] ....   346.311972: rtcpu_vinotify_event: tstamp:11209214803 tag:ATOMP_FS channel:0x00 frame:2 vi_tstamp:11209214380 data:0x00000000
         kworker/0:1-1198  [000] ....   346.311977: rtcpu_vinotify_event: tstamp:11209244323 tag:CHANSEL_PXL_SOF channel:0x00 frame:2 vi_tstamp:11209243890 data:0x00000001
         kworker/0:1-1198  [000] ....   346.311979: rtcpu_vinotify_event: tstamp:11209249275 tag:CHANSEL_LOAD_FRAMED channel:0x01 frame:2 vi_tstamp:11209248859 data:0x08000000
         kworker/0:1-1198  [000] ....   346.367969: rtcpu_vinotify_event: tstamp:11210180860 tag:CHANSEL_PXL_EOF channel:0x00 frame:2 vi_tstamp:11210180071 data:0x04370002
         kworker/0:1-1198  [000] ....   346.367976: rtcpu_vinotify_event: tstamp:11210181069 tag:ATOMP_FE channel:0x00 frame:2 vi_tstamp:11210180305 data:0x00000000
         kworker/0:1-1198  [000] ....   346.367980: rtos_queue_peek_from_isr_failed: tstamp:11211812412 queue:0x0b4b4500
         kworker/0:1-1198  [000] ....   346.423977: rtcpu_vinotify_event: tstamp:11211992575 tag:ATOMP_FS channel:0x00 frame:3 vi_tstamp:11211992154 data:0x00000000
         kworker/0:1-1198  [000] ....   346.423984: rtcpu_vinotify_event: tstamp:11212022068 tag:CHANSEL_PXL_SOF channel:0x00 frame:3 vi_tstamp:11212021663 data:0x00000001
         kworker/0:1-1198  [000] ....   346.423986: rtcpu_vinotify_event: tstamp:11212028024 tag:CHANSEL_LOAD_FRAMED channel:0x01 frame:3 vi_tstamp:11212027605 data:0x08000000
         kworker/0:1-1198  [000] ....   346.423989: rtcpu_vinotify_event: tstamp:11212958644 tag:CHANSEL_PXL_EOF channel:0x00 frame:3 vi_tstamp:11212957843 data:0x04370002
         kworker/0:1-1198  [000] ....   346.423991: rtcpu_vinotify_event: tstamp:11212958840 tag:ATOMP_FE channel:0x00 frame:3 vi_tstamp:11212958078 data:0x00000000
         kworker/0:1-1198  [000] ....   346.480111: rtcpu_vinotify_event: tstamp:11214770344 tag:ATOMP_FS channel:0x00 frame:4 vi_tstamp:11214769926 data:0x00000000
         kworker/0:1-1198  [000] ....   346.480118: rtcpu_vinotify_event: tstamp:11214799840 tag:CHANSEL_PXL_SOF channel:0x00 frame:4 vi_tstamp:11214799435 data:0x00000001
         kworker/0:1-1198  [000] ....   346.480120: rtcpu_vinotify_event: tstamp:11214805729 tag:CHANSEL_LOAD_FRAMED channel:0x01 frame:4 vi_tstamp:11214805316 data:0x08000000
         kworker/0:1-1198  [000] ....   346.535973: rtcpu_vinotify_event: tstamp:11215736398 tag:CHANSEL_PXL_EOF channel:0x00 frame:4 vi_tstamp:11215735617 data:0x04370002
         kworker/0:1-1198  [000] ....   346.535977: rtcpu_vinotify_event: tstamp:11215736587 tag:ATOMP_FE channel:0x00 frame:4 vi_tstamp:11215735850 data:0x00000000
         kworker/0:1-1198  [000] ....   346.535982: rtos_queue_peek_from_isr_failed: tstamp:11216812421 queue:0x0b4b4500
    
    • Start streaming with gst-launch-1.0 nvarguscamerasrc

    2. NVIDIA Argus

    • command line:
    gst-launch-1.0 nvarguscamerasrc \
    wbmode=9 \
    ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, framerate=(fraction)30/1' \
    ! nvvidconv flip-method=2 \
    ! queue ! xvimagesink
    
  • command line output (start/stop):
  • Setting pipeline to PAUSED ...
    Pipeline is live and does not need PREROLL ...
    Setting pipeline to PLAYING ...
    New clock: GstSystemClock
    GST_ARGUS: Creating output stream
    CONSUMER: Waiting until producer is connected...
    GST_ARGUS: Available Sensor modes :
    GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 0.100000, max 2.200000; Exposure Range min 34000, max 550385000;
    
    GST_ARGUS: Running with following settings:
       Camera index = 0 
       Camera mode  = 0 
       Output Stream W = 1920 H = 1080 
       seconds to Run    = 0 
       Frame Rate = 29.999999 
    GST_ARGUS: PowerService: requested_clock_Hz=27216000
    GST_ARGUS: Setup Complete, Starting captures for 0 seconds
    GST_ARGUS: Starting repeat capture requests.
    CONSUMER: Producer has connected; continuing.
    ^Chandling interrupt.
    Interrupt: Stopping pipeline ...
    Execution ended after 0:00:28.006229176
    Setting pipeline to PAUSED ...
    Setting pipeline to READY ...
    GST_ARGUS: Cleaning up
    GST_ARGUS: 
    PowerServiceHwVic::cleanupResources
    CONSUMER: Done Success
    GST_ARGUS: Done Success
    Setting pipeline to NULL ...
    Freeing pipeline ...
    

    3. nvgpu crash (log)

    After stopping gst-launch-1.0, the kernel message like the one is printed after stopping streamming when using v4l2 is not shown.
    Then, I can still trigger gst-launch-1.0 and get images but if I try to get images again with v4l2-ctl/yavta I can’t. Looks like
    the v4l2 application is waiting and then, if you decide to cancel it there you have the kernel crash:

    [  806.628341] ------------[ cut here ]------------
    [  806.633039] WARNING: CPU: 0 PID: 8086 at drivers/media/v4l2-core/videobuf2-core.c:1667 __vb2_queue_cancel+0x198/0x238
    [  806.644056] ---[ end trace c89faff01b7873e5 ]---
    [  806.628341] ------------[ cut here ]------------
    [  806.633039] WARNING: CPU: 0 PID: 8086 at drivers/media/v4l2-core/videobuf2-core.c:1667 __vb2_queue_cancel+0x198/0x238
    [  806.643653] Modules linked in: bnep fuse bcmdhd cfg80211 nvs_bmi160 nvs mymodule nvgpu bluedroid_pm ip_tables x_tables
    
    [  806.643745] CPU: 0 PID: 8086 Comm: yavta Not tainted 4.9.140 #974
    [  806.643753] Hardware name: quill (DT)
    [  806.643764] task: ffffffc1c0fa7000 task.stack: ffffffc176800000
    [  806.643780] PC is at __vb2_queue_cancel+0x198/0x238
    [  806.643794] LR is at __vb2_queue_cancel+0x48/0x238
    [  806.643805] pc : [<ffffff8008ae4860>] lr : [<ffffff8008ae4710>] pstate: 60400045
    [  806.643812] sp : ffffffc176803ad0
    [  806.643820] x29: ffffffc176803ad0 x28: 0000000000000009 
    [  806.643839] x27: ffffffc176803de8 x26: ffffffc1df6e8cc0 
    [  806.643856] x25: ffffff8009354000 x24: 0000000000000001 
    [  806.643873] x23: ffffffc1eadfb850 x22: ffffffc1eb507018 
    [  806.643888] x21: ffffffc1eb507b58 x20: ffffffc1eb507bb8 
    [  806.643904] x19: ffffffc1eb507b58 x18: 00000000fffffffe 
    [  806.643919] x17: 0000000000000002 x16: 0000000000000000 
    [  806.643933] x15: 0000000000000001 x14: ffffffffffffffff 
    [  806.643948] x13: ffffffc176803a10 x12: ffffffc176803914 
    [  806.643963] x11: 000000000000000b x10: ffffffc1768038d0 
    [  806.643978] x9 : 0000000000000002 x8 : 0000000000000002 
    [  806.643993] x7 : ffffff8008f67290 x6 : 0000000000000090 
    [  806.644008] x5 : 000000000000008d x4 : 0000000000000001 
    [  806.644022] x3 : 0000000000000000 x2 : 0000000000000000 
    [  806.644036] x1 : 0000000000000000 x0 : 0000000000000008 
    
    [  806.644056] ---[ end trace c89faff01b7873e5 ]---
    [  806.648689] Call trace:
    [  806.648709] [<ffffff8008ae4860>] __vb2_queue_cancel+0x198/0x238
    [  806.648727] [<ffffff8008ae4e74>] vb2_core_queue_release+0x2c/0x58
    [  806.648742] [<ffffff8008ae702c>] _vb2_fop_release+0x84/0xa0
    [  806.648758] [<ffffff8008aecb40>] tegra_channel_close+0x58/0x130
    [  806.648775] [<ffffff8008ac0fc4>] v4l2_release+0x4c/0x98
    [  806.648795] [<ffffff8008256668>] __fput+0x90/0x1c8
    [  806.648808] [<ffffff8008256818>] ____fput+0x20/0x30
    [  806.648827] [<ffffff80080d89e8>] task_work_run+0xb8/0xd0
    [  806.648844] [<ffffff80080b8cc4>] do_exit+0x3ac/0xa10
    [  806.648858] [<ffffff80080b93b4>] do_group_exit+0x3c/0xa0
    [  806.648874] [<ffffff80080c6970>] get_signal+0x2a0/0x580
    [  806.648890] [<ffffff800808af64>] do_signal+0x7c/0x530
    [  806.648904] [<ffffff800808b598>] do_notify_resume+0x90/0xb0
    [  806.648918] [<ffffff8008083754>] work_pending+0x8/0x10
    

    Any idea about this issue?
    Thank you in advance for your support.

    Hi,
    Is the issue observed in running default bayer sensor ov5693?

    Hi,

    Yes, the issue is observed running default bayer sensor ov5693 with latest L4T 32.1 and kernel 4.9.140.

    Above logs were extracted when I was using my current development camera module. Here the crash for the ov5693 (they are basically the same):

    root@nvidia-desktop:~# [   54.082007] ------------[ cut here ]------------
    [   54.086708] WARNING: CPU: 0 PID: 7619 at /dvs/git/dirty/git-master_linux/kernel/kernel-4.9/drivers/media/v4l2-core/videobuf2-core.c:1667 __vb2_queue_cancel+0x11c/0x188
    [   54.102060] ---[ end trace 2447401fe165a03c ]---
    [   54.082007] ------------[ cut here ]------------
    [   54.086708] WARNING: CPU: 0 PID: 7619 at /dvs/git/dirty/git-master_linux/kernel/kernel-4.9/drivers/media/v4l2-core/videobuf2-core.c:1667 __vb2_queue_cancel+0x11c/0x188
    [   54.101659] Modules linked in: bnep fuse bcmdhd nvs_bmi160 nvs cfg80211 nvgpu bluedroid_pm ip_tables x_tables
    
    [   54.101746] CPU: 0 PID: 7619 Comm: yavta Not tainted 4.9.140-tegra #1
    [   54.101754] Hardware name: quill (DT)
    [   54.101764] task: ffffffc1bea81c00 task.stack: ffffffc1a9d64000
    [   54.101781] PC is at __vb2_queue_cancel+0x11c/0x188
    [   54.101796] LR is at __vb2_queue_cancel+0x34/0x188
    [   54.101808] pc : [<ffffff8008b19a24>] lr : [<ffffff8008b1993c>] pstate: 60400045
    [   54.101814] sp : ffffffc1a9d67ae0
    [   54.101822] x29: ffffffc1a9d67ae0 x28: 0000000000000008 
    [   54.101842] x27: ffffff8008f52000 x26: ffffffc1a9d67de8 
    [   54.101860] x25: ffffffc1cfd15fe8 x24: ffffffc1e39052b8 
    [   54.101876] x23: 0000000000000001 x22: ffffffc1eb53d718 
    [   54.101893] x21: ffffffc1eb53d030 x20: ffffffc1eb53db58 
    [   54.101908] x19: ffffffc1eb53db58 x18: 0000000000000001 
    [   54.101923] x17: 0000000000000002 x16: 0000000000000000 
    [   54.101938] x15: ffffffffffffffff x14: ffffffc1a9d67a20 
    [   54.101952] x13: ffffffc1a9d67925 x12: 071c71c71c71c71c 
    [   54.101967] x11: ffffffc1a9d678e0 x10: ffffffc1a9d678e0 
    [   54.101982] x9 : 0000000000000002 x8 : 0000000000000002 
    [   54.101996] x7 : ffffff8008fa6dc8 x6 : 0000000000000090 
    [   54.102011] x5 : 000000000000008d x4 : 0000000000000001 
    [   54.102025] x3 : 0000000000000000 x2 : 0000000000000000 
    [   54.102038] x1 : 0000000000000000 x0 : 0000000000000008 
    
    [   54.102060] ---[ end trace 2447401fe165a03c ]---
    [   54.106695] Call trace:
    [   54.106714] [<ffffff8008b19a24>] __vb2_queue_cancel+0x11c/0x188
    [   54.106732] [<ffffff8008b1ae64>] vb2_core_queue_release+0x2c/0x58
    [   54.106744] [<ffffff8008b1d4e4>] _vb2_fop_release+0x84/0xa0
    [   54.106759] [<ffffff8008b23808>] tegra_channel_close+0x58/0x130
    [   54.106774] [<ffffff8008af6af0>] v4l2_release+0x48/0xa0
    [   54.106794] [<ffffff800825e518>] __fput+0x90/0x1d0
    [   54.106807] [<ffffff800825e6d0>] ____fput+0x20/0x30
    [   54.106826] [<ffffff80080d9bf4>] task_work_run+0xbc/0xd8
    [   54.106843] [<ffffff80080b9674>] do_exit+0x2c4/0xa08
    [   54.106856] [<ffffff80080b9e48>] do_group_exit+0x40/0xa8
    [   54.106872] [<ffffff80080c7744>] get_signal+0x26c/0x578
    [   54.106888] [<ffffff800808b150>] do_signal+0x130/0x500
    [   54.106902] [<ffffff800808b698>] do_notify_resume+0x90/0xb0
    [   54.106915] [<ffffff8008083754>] work_pending+0x8/0x10
    

    Hi,
    I can run the command for 30+ on TX2/r32.1 and don’t hit any error:

    $ gst-launch-1.0 nvarguscamerasrc wbmode=9 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, framerate=(fraction)30/1' ! nvvidconv flip-method=2 ! queue ! xvimagesink
    

    What is your failure rate? Do you have customization to kernel? Are you able to do clean re-flash through SDKManager and try again?

    Hi DaneLLL,

    To be able to reproduce the error you must execute yavta/v4l-ctl after executing and closing gst-launch-1.0:

    For this test I’m using NVIDIA L4T 32.1 release with the original kernel.

    Kernel information:

    nvidia@nvidia-desktop:~$ uname -a
    Linux nvidia-desktop 4.9.140-tegra #1 SMP PREEMPT Wed Mar 13 00:30:11 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux
    
    nvidia@nvidia-desktop:~$ lsmod
    Module                  Size  Used by
    bnep                   16562  2
    fuse                  103841  3
    nvs_bmi160             24013  0
    bcmdhd                934665  0
    nvs                    54527  1 nvs_bmi160
    cfg80211              589351  1 bcmdhd
    nvgpu                1555053  18
    bluedroid_pm           13912  0
    ip_tables              19441  0
    x_tables               28951  1 ip_tables
    

    ov5693 info:

    root@nvidia-desktop:~# dmesg | grep ov56
    [    3.551421] ov5693 2-0036: probing v4l2 sensor.
    [    3.552564] ov5693 2-0036: tegracam sensor driver:ov5693_v2.0.6
    [    4.640024] tegra-vi4 15700000.vi: subdev ov5693 2-0036 bound
    

    Command history/steps to reproduce the issue:

    $ export DISPLAY=:0
    $ gst-launch-1.0 nvarguscamerasrc wbmode=9 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, framerate=(fraction)30/1' ! nvvidconv flip-method=2 ! queue ! xvimagesink
    $ v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video0
    

    You will see after trying to execute v4l2-ctl, the command will not work but then, if you cancel it (control-c), kernel crash will be generated.

    Hi,
    Please run

    $ v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video0 <b>--set-ctrl bypass_mode=0</b>
    
    1 Like

    Hi DaneLLL,

    I’ve tested the bypass_mode and it worked. I could again capture with yavta/v4l2-ctl after capturing with gst-launch-1.0 without any kernel crash. If you forget to set the bypass_mode then, the crash is still there but you can get rid of it by setting the bypass_mode and start capturing again.

    Thanks for the help