VI/ISP throughput limit

Hi,

we are trying to capture video data on a Xavier NX from three cameras connected via MIPI CSI. Each camera has a resolution of 3840x2160 and should operate at 60 fps.

Everything is working so far, we can capture the video data from each individual camera at the full framerate. If I try to capture from two camera simultaneously, it is also working, but when I enable the third camera the framerate for all three cameras is dropping to about 40 fps.

So it looks like there is a limit in the Video Input pipeline of the Jetson module. I could not find any information in the Technical Reference Manual or datasheet which could explain this. Can you please tell me if we are above some hardware limitations or if there is a way to adjust this?

The issue can be reproduced quite easily for example with gstreamer. When running these commands at the same time, it gives an output of 60 fps until the third command starts. Then the fps is dropping to 40.

$ gst-launch-1.0 nvarguscamerasrc sensor-id=0  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! nvvidconv ! fpsdisplaysink video-sink=fakesink --verbose
$ gst-launch-1.0 nvarguscamerasrc sensor-id=1  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! nvvidconv ! fpsdisplaysink video-sink=fakesink --verbose
$ gst-launch-1.0 nvarguscamerasrc sensor-id=2  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! nvvidconv ! fpsdisplaysink video-sink=fakesink --verbose

The same behavior can be observed with a custom application using libargus, which is just capturing the frames and outputs fps.

This is part of the configuration in the device tree:

i2c_mux@70 {
compatible = "nxp,pca9548";
reg = <0x70>;
#address-cells = <1>;
#size-cells = <0>;
vcc-supply = <&p3509_vdd_1v8_cvb>;
vcc_lp = "vcc";
skip_mux_detect = "yes";
force_bus_start = <CAMERA_I2C_MUX_BUS(0)>;
i2c@0 {
	cam_imx334_a@37 {
		compatible = "framos,imx334";
		/* I2C device address */
		#address-cells = <1>;
		#size-cells = <0>;
		reg = <0x37>;

		reset-gpios = <&tegra_main_gpio CAM0_RST_L GPIO_ACTIVE_HIGH>;
		/* V4L2 device node location */
		devnode = "video0";

		avdd-reg = "avdd-cam-2v8";
		dvdd-reg = "vdd-1v8-cvb";
		iovdd-reg = "vdd_sys_en";

		/* Physical dimensions of sensor */
		physical_w = "15.0";
		physical_h = "12.5";
		sensor_model = "imx334";
		use_sensor_mode_id = "true";

		mode0 { //
			mclk_khz = "24000";
			num_lanes = "4";
			tegra_sinterface = "serial_a";
			phy_mode = "DPHY";
			discontinuous_clk = "no";
			cil_settletime = "0";
			active_w = "3840";
			active_h = "2160";

			dynamic_pixel_bit_depth = "10";
			csi_pixel_bit_depth = "10";
			mode_type = "bayer";
			pixel_phase = "rggb";

			readout_orientation = "0";
			line_length = "5280";
			inherent_gain = "1";
			mclk_multiplier = "30";
			pix_clk_hz = "712800000";

			gain_factor = "10";
			min_gain_val = "0"; /* 0dB */
			max_gain_val = "720"; /* 72dB */
			step_gain_val = "3"; /* 0.3 */
			default_gain = "0";
			min_hdr_ratio = "1";
			max_hdr_ratio = "1";
			framerate_factor = "1000000";
			min_framerate = "1500000"; /* 1.5 */
			max_framerate = "60000000"; /* 60 */
			step_framerate = "1";
			default_framerate= "60000000";
			exposure_factor = "1000000";
			min_exp_time = "7";  /* us */
			max_exp_time = "15000";
			step_exp_time = "1";
			default_exp_time = "10000";/* us */
			embedded_metadata_height = "1";
		};

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				imx334_out0: endpoint {
					port-index = <0>;
					bus-width = <4>;
					remote-endpoint = <&imx334_csi_in0>;
				};
			};
		};
	};
};
num_csi_lanes = <12>;
max_lane_speed = <2500000>;
min_bits_per_pixel = <10>;
vi_peak_byte_per_pixel = <2>;
vi_bw_margin_pct = <25>;
max_pixel_rate = <800000>;
isp_peak_byte_per_pixel = <5>;
isp_bw_margin_pct = <25>;

I’m not sure if it helps in finding the reason for this, but it looks like running v4l2-ctl to capture does not have this limitation. So when running the following commands at the same time, the output reports 60 fps for all three streams.

$ v4l2-ctl -d /dev/video0 --stream-mmap --stream-count=1000
$ v4l2-ctl -d /dev/video1 --stream-mmap --stream-count=1000
$ v4l2-ctl -d /dev/video2 --stream-mmap --stream-count=1000

Thanks in advance.

Hi,
Please apply the steps and check if there is performance improvement:

  1. Run CSI/VI/ISP engines at max clock:
    Jetson/l4t/Camera BringUp - eLinux.org
  2. Run VIC engine at max clock:
    Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL

Hi @DaneLLL,

thank you for your suggestion. I ran these commands before the gstreamer commands but there is no change in the behavior, still dropping to 40 fps when the third stream is started.

# echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
# echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
# echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
# cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
460800000
# cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
576000000
# cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
314000000
#
#
# echo on > /sys/devices/13e10000.host1x/15340000.vic/power/control
# echo userspace > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/governor
# cat /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/available_frequencies
115200000 204800000 294400000 384000000 473600000 563200000 601600000
# echo 601600000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/max_freq
# echo 601600000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/userspace/set_freq

jetson_clocks is also always activated.

You may check what is your current nvpmodel :

sudo nvpmodel -q

and switch to MAXN if not yet set (-m2 or else for XavierNX), then redo the above steps (max clocks may depend on NVP model).
Also check for CPU (and more) usage with running sudo tegrastats from another terminal and report this here.
Does Argus for high pixel rate result in CPU overhead ?

The output of sudo nvpmodel -q is:

NV Fan Mode:quiet
NV Power Mode: MODE_20W_6CORE
8

I think for recent JetPack versions this is already the highest mode for Xavier NX if I’m not mistaken.

The CPU usage also does not seem to be the limit (less than 30%) and also the other resources are far from their limits (e.g. in tegrastats, htop, jtop).

Btw. we are currently using JetPack 4.6.

Can somebody maybe tell me if it is supposed to work and there it’s only a misconfiguration in our setup or if there is a hardware limit?

Your answer rules out a NVP mode to be the limitation.
I cannot answer much more without your setup, but be aware that fpsdisplaysink would use text-overlay for displaying fps in image by default, this is CPU based and can be slow at 60 fps, so try disabling it:

gst-launch-1.0 nvarguscamerasrc sensor-id=0  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! nvvidconv ! fpsdisplaysink text-overlay=0 video-sink=fakesink --verbose

# You may also try without nvvidconv, I think this should work
gst-launch-1.0 nvarguscamerasrc sensor-id=0  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! fpsdisplaysink text-overlay=0 video-sink=fakesink -v

I tried your suggestion of setting text-overlay=0. Unfortunately, there is no change in the frame rate drop with three streams.

You are also right, that your command without nvvidconv is working, too. But again, it does not make a difference in terms of the frame rate issue.

This seems to point argus as the culprit, but I may not help further.
Probably @DaneLLL could better tell about limitations if any.

[EDIT: Is it different if launching the 3 pipelines from the same gst-launch process ?

gst-launch-1.0  -v \
   nvarguscamerasrc sensor-id=0  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! fpsdisplaysink text-overlay=0 video-sink=fakesink \
   nvarguscamerasrc sensor-id=1  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! fpsdisplaysink text-overlay=0 video-sink=fakesink \
   nvarguscamerasrc sensor-id=2  ! 'video/x-raw(memory:NVMM),width=3840,height=2160,framerate=(fraction)60/1, format=NV12' ! fpsdisplaysink text-overlay=0 video-sink=fakesink 

]

Hi,
The bottleneck is in ISP engine. The maximum clock is 576MHz and may not be enough for 3x 4Kp60.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.