Nvv4l2h264enc stuck issue

Dear Nvidia,

Recently I met some encoder stuck issue.
I can see following error traces on launching encoding pipelines:

tvmrVideoEncoderBitsAvailable_MSENC: ucode ERROR = 336
NvVideoEncTransferOutputBufferToBlock: DoWork failed line# 674
NvVideoEnc: NvVideoEncTransferOutputBufferToBlock TransferBufferToBlock failed Line=685

Could you please help clarify details on these error messages?
Thanks.

BTW, I also found following traces in syslog.
Not sure if they are related.

Dec 7 10:51:20 linux kernel: [ 1275.223606] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1ffd7b4000, fsynr=0x10013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.223931] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x1ffd7b5000, fsynr=0x50013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.224327] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1ffd785000, fsynr=0x10013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.224606] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x1ffd78f000, fsynr=0x50013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.225029] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1ffd79d300, fsynr=0x10013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.226254] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x1ffd7a9000, fsynr=0x50013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=186af0003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.237020] mc-err: vpr base=0:c6000000, size=20, ctrl=3, override:(a01a8340, fcee10c1, 1, 0)
Dec 7 10:51:20 linux kernel: [ 1275.245743] mc-err: (255) csw_vicswr: MC request violates VPR requirements
Dec 7 10:51:20 linux kernel: [ 1275.252498] mc-err: status = 0x0ff7406d; addr = 0xffffffff00; hi_adr_reg=008
Dec 7 10:51:20 linux kernel: [ 1275.259816] mc-err: secure: yes, access-type: write
Dec 7 10:51:20 linux kernel: [ 1275.265070] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
Dec 7 10:51:20 linux kernel: [ 1275.277662] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1ff9fb4000, fsynr=0x10013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=145486003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.291367] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x1ff9fb5000, fsynr=0x50013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=145486003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.309407] mc-err: vpr base=0:c6000000, size=20, ctrl=3, override:(a01a8340, fcee10c1, 1, 0)
Dec 7 10:51:20 linux kernel: [ 1275.317504] mc-err: (255) csw_vicswr: MC request violates VPR requirements
Dec 7 10:51:20 linux kernel: [ 1275.324654] mc-err: status = 0x0ff7406d; addr = 0xffffffff00; hi_adr_reg=008
Dec 7 10:51:20 linux kernel: [ 1275.331664] mc-err: secure: yes, access-type: write
Dec 7 10:51:20 linux kernel: [ 1275.336987] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
Dec 7 10:51:20 linux kernel: [ 1275.351626] mc-err: Too many MC errors; throttling prints
Dec 7 10:51:20 linux kernel: [ 1275.351648] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1ff9da1000, fsynr=0x10013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=1b9891003, pte=0
Dec 7 10:51:20 linux kernel: [ 1275.351660] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x1ff9da0000, fsynr=0x50013, cb=8, sid=56(0x38 - Unassigned SID), pgd=272fee003, pud=272fee003, pmd=1b9891003, pte=0
Dec 7 10:51:20 linux dbus-run-session[5673]: [GIN] 2022/12/07 - 10:51:20 | 200 | 556.776507ms | 127.0.0.1 | POST “/mediatask/create_update”
Dec 7 10:51:21 linux kernel: [ 1276.220225] irq 92: nobody cared (try booting with the “irqpoll” option)
Dec 7 10:51:21 linux kernel: [ 1276.220391] CPU: 0 PID: 20119 Comm: rtpjitterbuffer Tainted: G O 4.9.253-tegra #2
Dec 7 10:51:21 linux kernel: [ 1276.220395] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
Dec 7 10:51:21 linux kernel: [ 1276.220399] Call trace:
Dec 7 10:51:21 linux kernel: [ 1276.220422] [] dump_backtrace+0x0/0x198
Dec 7 10:51:21 linux kernel: [ 1276.220433] [] show_stack+0x24/0x30
Dec 7 10:51:21 linux kernel: [ 1276.220442] [] dump_stack+0xa0/0xc4
Dec 7 10:51:21 linux kernel: [ 1276.220449] [] __report_bad_irq+0x3c/0xf8
Dec 7 10:51:21 linux kernel: [ 1276.220456] [] note_interrupt+0x2c8/0x318
Dec 7 10:51:21 linux kernel: [ 1276.220461] [] handle_irq_event_percpu+0x50/0x60
Dec 7 10:51:21 linux kernel: [ 1276.220467] [] handle_irq_event+0x50/0x80
Dec 7 10:51:21 linux kernel: [ 1276.220473] [] handle_fasteoi_irq+0xd4/0x1c0
Dec 7 10:51:21 linux kernel: [ 1276.220478] [] generic_handle_irq+0x34/0x50
Dec 7 10:51:21 linux kernel: [ 1276.220483] [] __handle_domain_irq+0x68/0xc0
Dec 7 10:51:21 linux kernel: [ 1276.220488] [] gic_handle_irq+0x5c/0xb0
Dec 7 10:51:21 linux kernel: [ 1276.220493] [] el1_irq+0xe8/0x194
Dec 7 10:51:21 linux kernel: [ 1276.220501] [] __wake_up_sync_key+0x64/0x78
Dec 7 10:51:21 linux kernel: [ 1276.220508] [] sock_def_readable+0x48/0x80
Dec 7 10:51:21 linux kernel: [ 1276.220516] [] __sock_queue_rcv_skb+0x124/0x258
Dec 7 10:51:21 linux kernel: [ 1276.220524] [] __udp_queue_rcv_skb+0x60/0x280
Dec 7 10:51:21 linux kernel: [ 1276.220529] [] udp_queue_rcv_skb+0x354/0x540
Dec 7 10:51:21 linux kernel: [ 1276.220535] [] udp_unicast_rcv_skb+0x60/0xb8
Dec 7 10:51:21 linux kernel: [ 1276.220540] [] __udp4_lib_rcv+0x4c4/0xaa8
Dec 7 10:51:21 linux kernel: [ 1276.220546] [] udp_rcv+0x30/0x40
Dec 7 10:51:21 linux kernel: [ 1276.220553] [] ip_local_deliver_finish+0x80/0x280
Dec 7 10:51:21 linux kernel: [ 1276.220559] [] ip_local_deliver+0x54/0xf0
Dec 7 10:51:21 linux kernel: [ 1276.220564] [] ip_rcv_finish+0x1f8/0x380
Dec 7 10:51:21 linux kernel: [ 1276.220570] [] ip_rcv+0x280/0x388
Dec 7 10:51:21 linux kernel: [ 1276.220577] [] __netif_receive_skb_core+0x3b8/0xad8
Dec 7 10:51:21 linux kernel: [ 1276.220583] [] __netif_receive_skb+0x28/0x78
Dec 7 10:51:21 linux kernel: [ 1276.220590] [] netif_receive_skb_internal+0x2c/0xb0
Dec 7 10:51:21 linux kernel: [ 1276.220595] [] napi_gro_receive+0x15c/0x188
Dec 7 10:51:21 linux kernel: [ 1276.220604] [] eqos_napi_poll_rx+0x368/0x4f8
Dec 7 10:51:21 linux kernel: [ 1276.220610] [] net_rx_action+0xf4/0x358
Dec 7 10:51:21 linux kernel: [ 1276.220616] [] __do_softirq+0x13c/0x3b0
Dec 7 10:51:21 linux kernel: [ 1276.220623] [] irq_exit+0xd0/0x118
Dec 7 10:51:21 linux kernel: [ 1276.220629] [] __handle_domain_irq+0x6c/0xc0
Dec 7 10:51:21 linux kernel: [ 1276.220634] [] gic_handle_irq+0x5c/0xb0
Dec 7 10:51:21 linux kernel: [ 1276.220639] [] el0_irq_naked+0x54/0x60
Dec 7 10:51:21 linux kernel: [ 1276.220643] handlers:
Dec 7 10:51:21 linux kernel: [ 1276.220696] [] tegra_mcerr_hard_irq threaded [] tegra_mcerr_thread
Dec 7 10:51:21 linux kernel: [ 1276.220872] Disabling IRQ #92

Hi,
It looks like hardware encoder hangs in the condition and does not generate next compressed bitstream. Do you use Xavier NX production module(with embedded eMMC)? Use Jetpack 4 or Jetpack 5 release?

Its’ jetson xavier NX.
Release info as follow:

user@linux:~$ jetson_release

  • NVIDIA Jetson UNKNOWN
    • Jetpack 4.6 [L4T 32.6.1]
    • NV Power Mode: MODE_20W_6CORE - Type: 8
    • jetson_stats.service: active
  • Libraries:
    • CUDA: 10.2.300
    • cuDNN: 8.2.1.32
    • TensorRT: 8.0.1.6
    • Visionworks: 1.6.0.501
    • OpenCV: 4.1.1 compiled CUDA: YES
    • VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
    • Vulkan: 1.2.70

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Sorry for the late response, is this still an issue to support? Thanks