ATOMP_FRAME_TRUNCATED error when encoding and doing CUDA processing

I’m having a pretty weird issue here and I cannot manage to actually find any previous threads on this.
I’m running a Gstreamer pipeline with multiple cameras (3x 2880x2160) using v4l2src with the “userptr” io-mode. This goes into a custom gstreamer element that does CUDA processing, and outputs frames that get sent to either nvv4l2h264enc or omxh264enc (both cause issues).
I am using host-allocated, pinned memory for the buffers provided to v4l2src (allocated using cudaHostAlloc).

This works fine for a while, but usually after a few minutes the pipeline gets an error from the v4l2src:

ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src1: Could not read from resource.
Additional debug info:
gstv4l2bufferpool.c(1040): gst_v4l2_buffer_pool_poll (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src1:
poll error 1: Resource temporarily unavailable (11)

Checking the system log gives the generic “PXL_SOF syncpt timeout! err = -11” error.

After enabling logging ( I see that the issue seems to be that the frame is truncated:

rtcpu_vinotify_event: tstamp:115378036900 tag:ATOMP_FRAME_TRUNCATED channel:0x02 frame:0 vi_tstamp:115378036485 data:0x00000000

From what I’ve managed to understand by searching through the TRM, it seems like this event is sent from the OFIF and is caused by memory controller congestion? If this is the case, what are some ways of mitigating this?

It seems that the source is not stable and triggers syncpt timeout in frame capturing. Do you observe the same in executing v4l2src ! fakesink?

For information, please share your release version( $ head -1 /etc/nv_tegra_release ).

Hi DaveLLL!

Release version:

# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019

After a lot more investigation yesterday I think I found what is causing the issue.

I read a bit more through the TRM and realized I was hitting the limitations mentioned in section 27.13.2 (DVFS Limitations).

I locked the EMC frequency through

cd /sys/kernel/debug/bpmp/debug/clk/emc
echo 1 > mrq_rate_locked
cat max_rate > rate

and after leaving things running overnight, they’re still running without having crashed a single time.
Without this they would crash usually after a few minutes of running.

This seems to confirm that the issue is caused by DVFS pauses changing the memory frequency.

It seems like some of the other pipelines weren’t triggering this as much, especially ones without the encoder using the memory, which would explain why this didn’t tend to happen when not encoding.

Now I only have two questions:

  1. what is the proper way of setting this
  2. the TRM mentioned something about the LOW_WATERMARK and HIGH_WATERMARK values, are these automatically set by L4T or is there some way of tuning that?

For a busy system, please execute sudo jetson_clocks. This also fixes EMC(and other hardware blocks) at max clocks.
You may also try MAXN mode. all modes are listed in development guide.

Hi DaneLLL!

Thank you for the quick reply.

Fixing EMC at max (by running jetson_clocks) did help quite a bit, but I’m still getting this sporadically, although now it’s gone from happening once every few minutes to once every few hours. Do you have any idea what it might be?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.

could you try single camera in 2880x2160? would like to know if the error is specific to multiple-camera case.