The warning doesn’t appears anymore, but the following errors appear:
[ 146.944636] misc tegra_camera_ctrl: tegra_camera_isomgr_request: failed to reserve 9667968 KBps
[ 146.953400] misc tegra_camera_ctrl: tegra_camera_update_isobw: failed to reserve 9667968 KBps with isomgr
After some Kernel tracking I found that ISO BW reserved for “TEGRA_ISO_CLIENT_TEGRA_CAMERA” is limited to 10% of the EMC BW according to the following file: sources/kernel/nvidia/drivers/platform/tegra/mc/isomgr-pre_t19x.c
static struct isoclient_info *get_iso_client_info(int *length)
{
.
.
case TEGRA186:
cinfo = tegra18x_isoclients;
len = ARRAY_SIZE(tegra18x_isoclients);
for (i = 0; i < TEGRA_ISO_CLIENT_COUNT; i++) {
if (i == TEGRA_ISO_CLIENT_TEGRA_CAMERA)
isomgr_clients[i].limit_bw_percentage = 10;
else
isomgr_clients[i].limit_bw_percentage = 100;
}
break;
.
.
};
.
.
static bool pre_t19x_iso_plat_reserve(struct isomgr_client *cp, u32 bw,
enum tegra_iso_client client)
{
.
.
bw_check = ((u64)max_emc_bw * (u64)(cp->limit_bw_percentage) / 100);
if (bw > bw_check)
return false;
.
.
};
So, it seems that the higher ISO BW that we can set is 5971200 (max_emc_bw * 10%).
For my case the BW needed is around ~12000000 and capture seems to work fine regardless the BW defined. However, I get those errors on “tegra_isomgr_reserve()” if configure tegra-camera-platform setting to get that BW, but if I set the configuration to a lower BW I get the ISO BW message. So, for any of those options I got messages telling me that something is not good.
Questions
Why are those error messages appear for a higher BW than 5971200?
What is the purpose of reserving this “isomgr”, despite the fact that capture seems to be working if BW is not correct?
Is capture affected if ISO BW don’t match?
Is it possible to increase camera ISO BW over 10% or there’s a HW limitation?
may I know your 12 camera stream sensor resolution and also frame-rate.
we’ve had limitations on how much of the total memory bandwidth can be reserved for camera.
that Camera ISO BW is used by memory controller, If requested ISO is larger than maximum ISO, there is a potential risk where VI/CSI can overflow.
thanks
I see so there’s a VI/CSI limitation for TX2.
I also noticed that this limitation to 10% of the EMC BW is only applied to t186, but not to t210 or t19x.
Our 12 streams setup is organized as 4 streams (4 VC) per CSI port (4-lanes each). So for CSI 0, 2 and 4 we have the same group of streams.
The following are the higher resolutions supported per VC:
0: 4208x3120
1: 1296x1444
2: 2560x722
3: 128x32
Currently we are testing with the following resolutions:
0: 640x400
1: 656x804
2: 1280x402
3: 128x32
The frame-rate for all stream is always 15FPS.
The gold is to be able to run the 12 streams at the higher resolutions.
may I know which JetPack release you’re working with.
please also note that, virtual channel support only applied to Jetson-TX2, Jetson-NX, and Jetson-Xavier series devices.
since we only validate VC support with two simultaneous IMX390 sensors using the same CSI port. may I know were you able to stream 12-cam successfully?
thanks
Currently we are using JetPack-4.2.2 on TX2, but I checked that this BW limitation on the latest JetPack release as well.
We are able to capture data from the 12 cameras simultaneously, but we see those messages about the ISO BW that seems that the BW is too high for the capture subsystem.
In general we can stream 12-cam successfully, but sometimes we get an error:
Currently we are debugging an issue running 10 streams simultaneously using a custom application and sometimes (4 out of 100 tests) we got an error on start:
We wonder if this issue could be related to the BW limitation that I already mentioned.
I think that 5971200 is a low bandwidth limitation considering that TX2 has available 3x4-lanes CSI ports with VC support.
we’re still under internal discussion for the camera ISO bandwidth,
may I know what’s your environment setups, are you also working multiple camera board with SerDes chips, what’s their resolution?
thanks
Yes, I am using 6 total cameras over 3 TI DS90UB954 SerDes chips with FPD-Link III.
All 6 cameras are identical with 1920x1080 30FPS 12-bit bayer sensors and 4 CSI-2 data lanes.
2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI0/1 with 4 lane data
2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI2/3 with 4 lane data
2 cameras => DS90UB954 using VC-ID=0 and VC-ID=1 to CSI4/5 with 4 lane data
My DT tegra-camera-platform properties are as such:
It seems like num_csi_lanes should only be “12” (3x4), but that just appears to limit my bandwidth further. Also, min_bits_per_pixel should be “12”, but all the other dtsi files appear to be using “10”.
…calculates to a bandwidth of 1627604 kBps, which seems high and not quite right. 1,627,604 kBps => 1,666,666,496 Bytes per second => ~1.55 Gigabytes per second => 6 x 1.5 = ~9.3 Gigabytes of total bandwidth.
I do not have six 1920x1080 30FPS camera that output 1.55 Gigabytes per second each in my system. Worst case I think I would have 12x lanes clocked at 1.664 Gbps = 19.968 Gbps = 2.496 GigaBytes per second.
the max BW is registered with BW manager and after that we can only reserve up to max BW from camera.
for the most part these error messages are harmless. but you may tweak the values in DT based on the platform needs to increased max registered BW.
the error seen could also be the result of clock rate changes for VI when enabling a sensor when other sensors are already streaming. please have an experiment by fixing VI clock to max and see if you can still reproduce the issue.
thanks
is not correct for the Jetson TX2 with only 3 CSI-2 ports of 4 lanes each and each CSI-2 port being shared between two channels/cameras with virtual channel IDs.
The calculation ends up being 6x the bandwidth of one channel/camera because all of the cdev->bw values are summed up in tegra_camera_update_clknbw(), when in reality it should be just 3x since channels are sharing lanes and utilizing virtual channel IDs.
Therefore, the math in tegra_channel_populate_dev_info() needs to take into account that the channel is only utilizing half the bandwidth since it is being shared OR the math in tegra_camera_update_clknbw() needs to take into account that even if all the streams are active only the worst case bandwidth from CSI 0/1 channels, CSI 2/3 channels, and CSI 4/5 should be added in to the active_iso_bw and every individual active stream does not add to the total bandwidth.
the BW is just the rate at which we’re writing to the memory, and it’s just the addition of pixel rates from all sensors.
could you please provide pixel rates for all the sensors that were used, please check if that match with the requests.
thanks
I’m not sure I understand what you mean be pixel rate. The image sensor pixel clock runs at 48MHz, but nothing outside of the image sensor uses that clock. I have a 1920x1080 sensor that runs at 30 FPS and the pixel size is 12-bit. My CSI-2 data rate is 480 Mbps per lane and I have 4 lanes. However, that is the remote image sensor data rate. The SerDes CSI-2 data rate is 1536 Mbps per lane and I have 4 lanes connected to the NVIDIA CSI-2 port. The SerDes takes two of those 1920x1080 image sensors running at 30 FPS and aggregates them together using Virtual Channel IDs. So based on all of that information, how do you calculate my pixel rate?
FYI,
we found a bug of BW calculation; we’re calculating BW in KBps, but we are not dividing the bit rate by 8.
hence, this should be: cdev->bw = cdev->pixel_rate * cdev->bpp / (1024*8);
this should fix the BW warning messages.
BTW,
since we are already capping the BW to max.
the issue you seen might be related to CSI/ VI clocks switching during active streams.
may I have your confirmation of that, when you try fixing the CSI/ VI clock to maximum, do you still see the issue persists?
thanks
I think the only issue I have seen is a kernel error message about not being able to reserve the required memory bandwidth. If I try to reduce my numbers to what I think are reasonable values, I then frequently get cdma timeouts.
I will double check your new calculation. I still think it is only half the problem. I’ll try my best to explain below.
In my case, and many others, the problem arises with the use of a SerDes chip or any other chip that aggregates multiple camera image sensors MIPI CSI-2 interfaces into a single output CSI-2 interface to the NVIDIA SoM. In this case there are multiple sensors using Virtual Channel IDs that are sharing the same physical CSI-2 port interface to NVIDIA. In this situation the CSI-2 aggregator chip (i.e., SerDes chip) will often run the CSI-2 data lanes at a much faster rate than any individual image sensor. This is usually done to minimize latency, minimize buffer sizes and/or to ensure meeting the image sensor throughput requirements.
Laying aside the question of how to calculate the bandwidth…
Within the system/DT you cannot just use the 480 Mbps bandwidth of the image sensors, because even if you added them together you would still not be equal to the peak bandwidth of the deserializer feeding the NVIDIA CSI-2 port. And if you only have one of the two sensors streaming then you are even worst in allocating the appropriate memory bandwidth, because you have allocated memory based on only one of the streams (i.e., 480 Mbps) and your deserializer requires a higher bandwidth. If you say well just set the bandwidth for each sensor to the deserializer bandwidth, you end up over allocating the memory bandwidth for streaming those two sensors at once. If you multiply this by three, because you have three deserializers you can exhaust the memory bandwidth allowed. Even if you say ignore the kernel error message its okay to over allocate the bandwidth, you have now wasted the memory bandwidth and have the clock(s) running the system at a much higher operating point because your requested bandwidth for your cameras has been doubled from what you truly need.
Therefore, is what is needed is to intelligently detect that two camera devices/channels are sharing the same CSI port and only allocate the bandwidth based on the maximum bandwidth of those that are sharing the port. There is also an active_pixel_rate summation of all the streaming cameras, but I’m not sure how the NVIDIA code is using that or if that follows the same logic as the active_iso_bw so I have not attempted to modify it.
Below is my attempt to make the logic correct for the total bandwidth accumulation:
int tegra_camera_update_clknbw(void *priv, bool stream_on)
{
struct tegra_camera_dev_info *cdev;
struct tegra_camera_info *info;
struct tegra_channel *chan;
int ret = 0;
struct tegra_csi_device *csi = tegra_get_mc_csi();
u64 active_bw[csi->num_channels];
u64 active_iso_bw = 0;
int i;
unsigned char csi_port;
info = dev_get_drvdata(tegra_camera_misc.parent);
if (!info)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(active_bw); i++)
active_bw[i] = 0;
mutex_lock(&info->device_list_mutex);
/* Need to traverse the list twice, first to make sure that
* stream on is set for the active stream and then to
* update clocks and BW.
* Needed as devices could have been added in any order in the list.
*/
list_for_each_entry(cdev, &info->device_list, device_node) {
if (priv == cdev->priv) {
/* set stream on */
cdev->stream_on = stream_on;
if (stream_on) {
info->active_pixel_rate += cdev->pixel_rate;
info->num_active_streams++;
} else {
info->active_pixel_rate -= cdev->pixel_rate;
info->num_active_streams--;
}
break;
}
}
list_for_each_entry(cdev, &info->device_list, device_node) {
if (!cdev->stream_on)
continue;
chan = cdev->priv;
/* Find the CSI port for the device */
for (i = 0; i < ARRAY_SIZE(chan->port); i++) {
csi_port = chan->port[i];
if (csi_port != INVALID_CSI_PORT)
break;
}
if ((csi_port == INVALID_CSI_PORT) ||
(csi_port >= ARRAY_SIZE(active_bw))) {
dev_err(info->dev, "%s channel %d: invalid csi port %u, unable to properly assign for bw\n",
__func__, chan->id, csi_port);
active_iso_bw += cdev->bw; /* add bw on bad port */
continue;
}
if (chan->valid_ports != 1) {
dev_err(info->dev, "%s channel %d: unexpected number of ports %u\n",
__func__, chan->id, chan->valid_ports);
active_iso_bw += cdev->bw; /* add bw on unexpected */
continue;
}
/* Use the maximum bw channel for each CSI port */
if (cdev->bw > active_bw[csi_port])
active_bw[csi_port] = cdev->bw;
}
/* Sum up all of the individual CSI port bandwidths */
for (i = 0; i < ARRAY_SIZE(active_bw); i++)
active_iso_bw += active_bw[i];
dev_dbg(info->dev, "%s channel %d: bw %llu -> %llu\n",
__func__, ((struct tegra_channel *)priv)->id,
info->active_iso_bw, active_iso_bw);
info->active_iso_bw = active_iso_bw;
/* update clocks */
list_for_each_entry(cdev, &info->device_list, device_node) {
ret = calculate_and_set_device_clock(info, cdev);
if (ret) {
mutex_unlock(&info->device_list_mutex);
return -EINVAL;
}
}
mutex_unlock(&info->device_list_mutex);
/* set BW */
tegra_camera_update_isobw();
return ret;
}
See my ASCII art diagram in the first post to the topic.
6 total 1080p 30FPS cameras running at 480 Mbps 4-lane to 3 total SerDes chips that run at 1536 Mbps 4-lane to CSI 0/1 4-lane, CSI 2/3 4-lane, and CSI 4/5 4-lane on the Jetson TX2 SoM.
Yes, this seems to request the appropriate active iso bandwidth for my system instead of over allocating and running out. Additionally, it allows me to specify the SerDes bandwidth requirement for each individual camera properly.
Please note, I did not attempt to handle the active_pixel_rate accumulation. Perhaps, the author of this code could take a look to determine if the active_pixel_rate should be combined in much the same way that I have done on a per port basis with the active_iso_bw.
Second, on my Jetson TX2 system I never saw anything other than INVALID_CSI_PORT in chan->port[1] and chan->port[2]. I’m not sure I understand how the channel port array is used fully. Perhaps this is more relevant for one of the other SoC families or I have not properly accumulated the bandwidth when a channel spans more than one port. I have added in the bandwidth regardless if there is an unexpected situation with the ports. The author/reviewer of this logic should have a second look as I am not totally sure how the full channel port array would be used in a real system.
Finally, the two loops over the channel device_list should probably be combined into a single loop for efficiency. I mainly left it the way it was to highlight my patch’s primary purpose and new functionality. Combining the two loops into a single loop should not be too difficult. However, the function only seems to be called when streams are started/stopped so efficiency should not be a huge concern.