Camera ISO BW limited as 10% of max EMC BW

JerryChang,

Currently we are using JetPack-4.2.2 on TX2, but I checked that this BW limitation on the latest JetPack release as well.

We are able to capture data from the 12 cameras simultaneously, but we see those messages about the ISO BW that seems that the BW is too high for the capture subsystem.

In general we can stream 12-cam successfully, but sometimes we get an error:
Currently we are debugging an issue running 10 streams simultaneously using a custom application and sometimes (4 out of 100 tests) we got an error on start:

[   88.802303] tegra-vi4 15700000.vi: Status:  4 channel:00 frame:0000
[   88.808664] tegra-vi4 15700000.vi:      timestamp sof 92773965056 eof 92773966624 data 0x00000100
[   88.817635] tegra-vi4 15700000.vi:      capture_id 3 stream  0 vchan  3

We wonder if this issue could be related to the BW limitation that I already mentioned.
I think that 5971200 is a low bandwidth limitation considering that TX2 has available 3x4-lanes CSI ports with VC support.

I too am seeing the errors after starting my

4th camera stream:

misc tegra_camera_ctrl: tegra_camera_isomgr_request: failed to reserve 6510416 KBps
misc tegra_camera_ctrl: tegra_camera_update_isobw: failed to reserve 6510416 KBps with isomgr

5th camera stream:

misc tegra_camera_ctrl: tegra_camera_isomgr_request: failed to reserve 8138020 KBps
misc tegra_camera_ctrl: tegra_camera_update_isobw: failed to reserve 8138020 KBps with isomgr

6th camera stream:

misc tegra_camera_ctrl: tegra_camera_isomgr_request: failed to reserve 9000000 KBps
misc tegra_camera_ctrl: tegra_camera_update_isobw: failed to reserve 9000000 KBps with isomgr

Was there any resolution to this issue?

hello JDSchroeder,

we’re still under internal discussion for the camera ISO bandwidth,
may I know what’s your environment setups, are you also working multiple camera board with SerDes chips, what’s their resolution?
thanks

Yes, I am using 6 total cameras over 3 TI DS90UB954 SerDes chips with FPD-Link III.

All 6 cameras are identical with 1920x1080 30FPS 12-bit bayer sensors and 4 CSI-2 data lanes.

2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI0/1 with 4 lane data
2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI2/3 with 4 lane data
2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI4/5 with 4 lane data

My DT tegra-camera-platform properties are as such:

  num_csi_lanes = <24>;
  max_lane_speed = <1500000>;
  min_bits_per_pixel = <10>;
  vi_peak_byte_per_pixel = <2>;
  vi_bw_margin_pct = <25>;
  max_pixel_rate = <750000>;
  isp_peak_byte_per_pixel = <5>;
  isp_bw_margin_pct = <25>;

It seems like num_csi_lanes should only be “12” (3x4), but that just appears to limit my bandwidth further. Also, min_bits_per_pixel should be “12”, but all the other dtsi files appear to be using “10”.

Each of my cameras have these DT clock settings:

  pix_clk_hz = "48000000";
  serdes_pix_clk_hz = "833333333";

The calculation from channel.c tegra_channel_populate_dev_info()

  /* BW in kBps */
  cdev->bw = cdev->pixel_rate * cdev->bpp / 1024;

…calculates to a bandwidth of 1627604 kBps, which seems high and not quite right. 1,627,604 kBps => 1,666,666,496 Bytes per second => ~1.55 Gigabytes per second => 6 x 1.5 = ~9.3 Gigabytes of total bandwidth.

I do not have six 1920x1080 30FPS camera that output 1.55 Gigabytes per second each in my system. Worst case I think I would have 12x lanes clocked at 1.664 Gbps = 19.968 Gbps = 2.496 GigaBytes per second.

hi all,

the maximum BW value is decided by these numbers in the DT,
for example,

        tegra-camera-platform {
                num_csi_lanes = <8>;
                max_lane_speed = <1500000>;
                min_bits_per_pixel = <10>;
                vi_peak_byte_per_pixel = <2>;
                vi_bw_margin_pct = <25>;
                max_pixel_rate = <750000>;
                isp_peak_byte_per_pixel = <5>;
                isp_bw_margin_pct = <25>;

the calculations is happen here.
$L4T_Sources/r32.4.3/Linux_for_Tegra/source/public/kernel/nvidia/drivers/video/tegra/camera/tegra_camera_platform.c

static int tegra_camera_isomgr_register(struct tegra_camera_info *info,
                                        struct device *dev)
{
..
        vi_iso_bw = ((num_csi_lanes * max_lane_speed) / bits_per_pixel)
                                * vi_bpp * (100 + vi_margin_pct) / 100;
        isp_iso_bw = max_pixel_rate * isp_bpp * (100 + isp_margin_pct) / 100;

the max BW is registered with BW manager and after that we can only reserve up to max BW from camera.
for the most part these error messages are harmless. but you may tweak the values in DT based on the platform needs to increased max registered BW.
the error seen could also be the result of clock rate changes for VI when enabling a sensor when other sensors are already streaming. please have an experiment by fixing VI clock to max and see if you can still reproduce the issue.
thanks

It still seams like the math calculation from channel.c tegra_channel_populate_dev_info()

/* BW in kBps */
cdev->bw = cdev->pixel_rate * cdev->bpp / 1024;

is not correct for the Jetson TX2 with only 3 CSI-2 ports of 4 lanes each and each CSI-2 port being shared between two channels/cameras with virtual channel IDs.

The calculation ends up being 6x the bandwidth of one channel/camera because all of the cdev->bw values are summed up in tegra_camera_update_clknbw(), when in reality it should be just 3x since channels are sharing lanes and utilizing virtual channel IDs.

Therefore, the math in tegra_channel_populate_dev_info() needs to take into account that the channel is only utilizing half the bandwidth since it is being shared OR the math in tegra_camera_update_clknbw() needs to take into account that even if all the streams are active only the worst case bandwidth from CSI 0/1 channels, CSI 2/3 channels, and CSI 4/5 should be added in to the active_iso_bw and every individual active stream does not add to the total bandwidth.

hello JDSchroeder,

the BW is just the rate at which we’re writing to the memory, and it’s just the addition of pixel rates from all sensors.
could you please provide pixel rates for all the sensors that were used, please check if that match with the requests.
thanks

I’m not sure I understand what you mean be pixel rate. The image sensor pixel clock runs at 48MHz, but nothing outside of the image sensor uses that clock. I have a 1920x1080 sensor that runs at 30 FPS and the pixel size is 12-bit. My CSI-2 data rate is 480 Mbps per lane and I have 4 lanes. However, that is the remote image sensor data rate. The SerDes CSI-2 data rate is 1536 Mbps per lane and I have 4 lanes connected to the NVIDIA CSI-2 port. The SerDes takes two of those 1920x1080 image sensors running at 30 FPS and aggregates them together using Virtual Channel IDs. So based on all of that information, how do you calculate my pixel rate?

hello JDSchroeder,

FYI,
we found a bug of BW calculation; we’re calculating BW in KBps, but we are not dividing the bit rate by 8.
hence, this should be: cdev->bw = cdev->pixel_rate * cdev->bpp / (1024*8);
this should fix the BW warning messages.

BTW,
since we are already capping the BW to max.
the issue you seen might be related to CSI/ VI clocks switching during active streams.
may I have your confirmation of that, when you try fixing the CSI/ VI clock to maximum, do you still see the issue persists?
thanks

I think the only issue I have seen is a kernel error message about not being able to reserve the required memory bandwidth. If I try to reduce my numbers to what I think are reasonable values, I then frequently get cdma timeouts.

I will double check your new calculation. I still think it is only half the problem. I’ll try my best to explain below.

In my case, and many others, the problem arises with the use of a SerDes chip or any other chip that aggregates multiple camera image sensors MIPI CSI-2 interfaces into a single output CSI-2 interface to the NVIDIA SoM. In this case there are multiple sensors using Virtual Channel IDs that are sharing the same physical CSI-2 port interface to NVIDIA. In this situation the CSI-2 aggregator chip (i.e., SerDes chip) will often run the CSI-2 data lanes at a much faster rate than any individual image sensor. This is usually done to minimize latency, minimize buffer sizes and/or to ensure meeting the image sensor throughput requirements.

For example:

Sensor #0 480 Mbps x4---->Ser---->Des Port 0
                                           |---->Des 1536 Mbps x4--->SoM CSI-2 4 lanes
Sensor #1 480 Mbps x4---->Ser---->Des Port 1

Laying aside the question of how to calculate the bandwidth…
Within the system/DT you cannot just use the 480 Mbps bandwidth of the image sensors, because even if you added them together you would still not be equal to the peak bandwidth of the deserializer feeding the NVIDIA CSI-2 port. And if you only have one of the two sensors streaming then you are even worst in allocating the appropriate memory bandwidth, because you have allocated memory based on only one of the streams (i.e., 480 Mbps) and your deserializer requires a higher bandwidth. If you say well just set the bandwidth for each sensor to the deserializer bandwidth, you end up over allocating the memory bandwidth for streaming those two sensors at once. If you multiply this by three, because you have three deserializers you can exhaust the memory bandwidth allowed. Even if you say ignore the kernel error message its okay to over allocate the bandwidth, you have now wasted the memory bandwidth and have the clock(s) running the system at a much higher operating point because your requested bandwidth for your cameras has been doubled from what you truly need.

Therefore, is what is needed is to intelligently detect that two camera devices/channels are sharing the same CSI port and only allocate the bandwidth based on the maximum bandwidth of those that are sharing the port. There is also an active_pixel_rate summation of all the streaming cameras, but I’m not sure how the NVIDIA code is using that or if that follows the same logic as the active_iso_bw so I have not attempted to modify it.

Below is my attempt to make the logic correct for the total bandwidth accumulation:

int tegra_camera_update_clknbw(void *priv, bool stream_on)
{
struct tegra_camera_dev_info *cdev;
struct tegra_camera_info *info;
struct tegra_channel *chan;
int ret = 0;
struct tegra_csi_device *csi = tegra_get_mc_csi();
u64 active_bw[csi->num_channels];
u64 active_iso_bw = 0;
int i;
unsigned char csi_port;

info = dev_get_drvdata(tegra_camera_misc.parent);
if (!info)
	return -EINVAL;

for (i = 0; i < ARRAY_SIZE(active_bw); i++)
	active_bw[i] = 0;

mutex_lock(&info->device_list_mutex);
/* Need to traverse the list twice, first to make sure that
 * stream on is set for the active stream and then to
 * update clocks and BW.
 * Needed as devices could have been added in any order in the list.
 */
list_for_each_entry(cdev, &info->device_list, device_node) {
	if (priv == cdev->priv) {
		/* set stream on */
		cdev->stream_on = stream_on;
		if (stream_on) {
			info->active_pixel_rate += cdev->pixel_rate;
			info->num_active_streams++;
		} else {
			info->active_pixel_rate -= cdev->pixel_rate;
			info->num_active_streams--;
		}
		break;
	}
}

list_for_each_entry(cdev, &info->device_list, device_node) {
	if (!cdev->stream_on)
		continue;

	chan = cdev->priv;

	/* Find the CSI port for the device */
	for (i = 0; i < ARRAY_SIZE(chan->port); i++) {
		csi_port = chan->port[i];
		if (csi_port != INVALID_CSI_PORT)
			break;
	}
	if ((csi_port == INVALID_CSI_PORT) ||
			(csi_port >= ARRAY_SIZE(active_bw))) {
		dev_err(info->dev, "%s channel %d: invalid csi port %u, unable to properly assign for bw\n",
			__func__, chan->id, csi_port);
		active_iso_bw += cdev->bw; /* add bw on bad port */
		continue;
	}
	if (chan->valid_ports != 1) {
		dev_err(info->dev, "%s channel %d: unexpected number of ports %u\n",
			__func__, chan->id, chan->valid_ports);
		active_iso_bw += cdev->bw; /* add bw on unexpected */
		continue;
	}

	/* Use the maximum bw channel for each CSI port */
	if (cdev->bw > active_bw[csi_port])
		active_bw[csi_port] = cdev->bw;
}

/* Sum up all of the individual CSI port bandwidths */
for (i = 0; i < ARRAY_SIZE(active_bw); i++)
	active_iso_bw += active_bw[i];

dev_dbg(info->dev, "%s channel %d: bw %llu -> %llu\n",
	__func__, ((struct tegra_channel *)priv)->id,
	info->active_iso_bw, active_iso_bw);

info->active_iso_bw = active_iso_bw;

/* update clocks */
list_for_each_entry(cdev, &info->device_list, device_node) {
	ret = calculate_and_set_device_clock(info, cdev);
	if (ret) {
		mutex_unlock(&info->device_list_mutex);
		return -EINVAL;
	}
}
mutex_unlock(&info->device_list_mutex);

/* set BW */
tegra_camera_update_isobw();

return ret;
}

Since the forum can corrupt the coding format, I’ve also attached a patch that should cleanly apply to L4T 32.4.3 nvidia/kernel tree. camera-Fix-clk-bw-calculation-with-shared-CSI-port.patch (3.5 KB)

hello JDSchroeder,

thanks for sharing the kernel patch, it looks good but we’ll need to have fully tested before we can merge it into the code-line.

in the meanwhile,
could you please also have confirmation for your multi-cam use-case.
may I know it actually fix the intermittent failure.
thanks

hello @enrique.ramirez,

we may also need your help to confirm the patch in post #16, thanks

See my ASCII art diagram in the first post to the topic.
6 total 1080p 30FPS cameras running at 480 Mbps 4-lane to 3 total SerDes chips that run at 1536 Mbps 4-lane to CSI 0/1 4-lane, CSI 2/3 4-lane, and CSI 4/5 4-lane on the Jetson TX2 SoM.

Yes, this seems to request the appropriate active iso bandwidth for my system instead of over allocating and running out. Additionally, it allows me to specify the SerDes bandwidth requirement for each individual camera properly.

Please note, I did not attempt to handle the active_pixel_rate accumulation. Perhaps, the author of this code could take a look to determine if the active_pixel_rate should be combined in much the same way that I have done on a per port basis with the active_iso_bw.

Second, on my Jetson TX2 system I never saw anything other than INVALID_CSI_PORT in chan->port[1] and chan->port[2]. I’m not sure I understand how the channel port array is used fully. Perhaps this is more relevant for one of the other SoC families or I have not properly accumulated the bandwidth when a channel spans more than one port. I have added in the bandwidth regardless if there is an unexpected situation with the ports. The author/reviewer of this logic should have a second look as I am not totally sure how the full channel port array would be used in a real system.

Finally, the two loops over the channel device_list should probably be combined into a single loop for efficiency. I mainly left it the way it was to highlight my patch’s primary purpose and new functionality. Combining the two loops into a single loop should not be too difficult. However, the function only seems to be called when streams are started/stopped so efficiency should not be a huge concern.

Hi,

Thank you for your help.

Currently I’m not working with the required hardware, but I will ask the team to test this fix.

I’ll let you know when we have results.

1 Like

Is this still an issue to support? Any result can be shared?

It still generates kernel error messages when using 6 cameras.

hello JDSchroeder,

could you please also try adding the fix in post #15 for confirmation.
thanks

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi JDSchroeder,

Would you please also try adding the fix in post #15 for confirmation?

May I have your update?