Nvidia_smmu_context_fault_bank errors

Hi,

I am migrating our application from 4.6 to 5.1.3,
The application seems be running fine, sometimes after a while, or after the application is restarted it will start to generate the following messages every few seconds:
000014.098878: [CaptureThread]:Capture FPS = 50.123685 , total_fps_cntr = 500
[58577.696071] nvidia_smmu_context_fault_bank: 1502 callbacks suppressed
[58577.696093] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58577.696652] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58577.696963] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeed32c0, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58577.764376] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58577.764761] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58577.765050] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec3140, fsynr=0x280013, cbfrsynra=0x3a, cb=12
[58577.765339] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeedd2c0, fsynr=0x300013, cbfrsynra=0x3a, cb=12
[58577.834321] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58577.834688] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58577.835011] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec0b00, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
000019.099026: [CaptureThread]:Capture FPS = 50.081894 , total_fps_cntr = 750
[58582.740958] nvidia_smmu_context_fault_bank: 1503 callbacks suppressed
[58582.740975] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58582.741486] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58582.741774] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeed3980, fsynr=0x280013, cbfrsynra=0x83a, cb=12
[58582.803899] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58582.804294] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58582.804629] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec3240, fsynr=0x280013, cbfrsynra=0x3a, cb=12
[58582.876138] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58582.876491] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58582.876828] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeebca80, fsynr=0x280013, cbfrsynra=0x3a, cb=12
[58582.877132] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeed9940, fsynr=0x300013, cbfrsynra=0x3a, cb=12
000024.103584: [CaptureThread]:Capture FPS = 50.049972 , total_fps_cntr = 1000
[58587.784524] nvidia_smmu_context_fault_bank: 1504 callbacks suppressed
[58587.784542] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58587.785104] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58587.785417] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeed32c0, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58587.844052] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58587.844419] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58587.844742] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec0e80, fsynr=0x280013, cbfrsynra=0x43a, cb=12
[58587.845043] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeedd300, fsynr=0x300013, cbfrsynra=0x3a, cb=12
[58587.915916] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58587.916283] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58587.916623] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec0f00, fsynr=0x280013, cbfrsynra=0x43a, cb=12
000029.104142: [CaptureThread]:Capture FPS = 50.038854 , total_fps_cntr = 1250
[58592.828460] nvidia_smmu_context_fault_bank: 1495 callbacks suppressed
[58592.828476] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58592.828999] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58592.829344] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeecfe00, fsynr=0x280013, cbfrsynra=0x83a, cb=12
[58592.883562] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58592.883937] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58592.884281] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeebc580, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58592.884592] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeeda700, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58592.953395] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58592.953773] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58592.954123] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec0a80, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
000034.103128: [CaptureThread]:Capture FPS = 50.034068 , total_fps_cntr = 1500
[58597.861605] nvidia_smmu_context_fault_bank: 1501 callbacks suppressed
[58597.861655] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58597.862190] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58597.862518] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeed3700, fsynr=0x280013, cbfrsynra=0x43a, cb=12
[58597.928016] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58597.928431] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58597.928740] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec7140, fsynr=0x280013, cbfrsynra=0x43a, cb=12
[58597.998481] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea0000, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58597.998849] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeea1400, fsynr=0x300013, cbfrsynra=0x43a, cb=12
[58597.999188] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeec0a80, fsynr=0x280013, cbfrsynra=0xc3a, cb=12
[58597.999503] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x3fbeedeb80, fsynr=0x300013, cbfrsynra=0xc3a, cb=12

The application captures video frames over MIPI from a HDMI to MIPI device, the frames arrive at 50fps and are process using OpenGL, NVTransform, Encoder etc.

I am using same nvpmodel as in 4.6: MODE_15W_6CORE
And jetson clocks show:
SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
DLA0_CORE: Online=1 MinFreq=0 MaxFreq=1100800000 CurrentFreq=1100800000
DLA0_FALCON: Online=1 MinFreq=0 MaxFreq=640000000 CurrentFreq=640000000
DLA1_CORE: Online=1 MinFreq=0 MaxFreq=1100800000 CurrentFreq=1100800000
DLA1_FALCON: Online=1 MinFreq=0 MaxFreq=640000000 CurrentFreq=640000000
PVA0_VPS0: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
PVA0_VPS1: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
PVA0_AXI: Online=1 MinFreq=0 MaxFreq=601600000 CurrentFreq=601600000
PVA1_VPS0: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
PVA1_VPS1: Online=1 MinFreq=0 MaxFreq=819200000 CurrentFreq=819200000
PVA1_AXI: Online=1 MinFreq=0 MaxFreq=601600000 CurrentFreq=601600000
CVNAS MinFreq=0 MaxFreq=576000000 CurrentFreq=576000000
FAN Dynamic Speed control=inactive hwmon4_pwm1=255
NV Power Mode: MODE_15W_6CORE

I never saw these message on 4.6 using the exact same hardware
What do these messages indicates ?

Thanks
Amir

Hi,
It is not easy to identify the exact rootcause from the print. One thing is that we deprecate NvBuffer APIs on Jetpack 5. It is replaced with NvBufSurface APIs. Do you use NvBuffer APIs in your implementation?

Hi,

No I am only using the new NvBufSurface API,
I have noticed that the issue happens much more often when I strain the system,
The issue happen most often when I am running 4 usb camera that do Jpeg to H265 transcoding,
And also have one mipi camera that also does encoding to H265, OpenGL runs on the GPU doing rotation & belnding.
And the system does very intensive CPU procssing (its at about 90% on each CPU, 6 CPUS are running in the powermodel that is used)

I dont see any issues with the preformance of the software, just these messages every few seconds,
Is it Ok to ignore these messages ? can they be masked/disabled somehow ?

Thanks
Amir

Hi,
The prints are about invalid memory access. For further check, we would need to replicate it on developer kit. Please check if you can set up Xavier NX developer kit + USB cameras to reproduce the issue. If yes, please share us the steps so that we can set up and give it a try.