Bandwidth problem when converting mipi RAW12 to T_R16_I or T_R16 on TX2

Hello,

we work with imx264 sony monochhrome image sensors, and up to now we converted the 12bpp received on the mipi csi connection to 8bpp T_L8 (aka GREY, aka GRAY8). This works perfectly.

v4l2-ctl -d /dev/video1 --stream-mmap=3 --stream-count=256 --set-fmt-video=width=2464,height=2056,pixelformat=GREY
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 36.00 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.82 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.76 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.75 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.72 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.77 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.75 fps
<<<<

We are now trying to use the full 12 bits of each pixel and therefore converting the RAW12 mipi pixels to T_R16_I (aka 'Y14 ') or T_R16 (aka 'Y16 ') Nvidia pixels. Unfortunately we have bandwidth problems, as can be seen below.

v4l2-ctl -d /dev/video0 --stream-mmap=3 --stream-count=256 '--set-fmt-video=width=2464,height=2056,pixelformat=Y16 '
<< 0.50 fps
<<<<<<<<<< 3.64 fps
<<<<<<<<<<<<<<<<<<<< 5.40 fps
< 4.69 fps
< 4.48 fps
<< 3.92 fps
< 3.69 fps
< 3.51 fps
<< 3.51 fps
< 2.90 fps
<<<<<<<< 3.42 fps
<<<<<<<<<<<<<<<<<<<<<< 4.65 fps
<< 4.32 fps
<< 4.16 fps
< 3.98 fps
< 3.92 fps
<< 3.62 fps
< 3.57 fps
<< 3.40 fps
< 3.29 fps
< 3.28 fps
<<<<<<<<<<<<<<<<<<<< 3.95 fps
<<<< 3.82 fps
< 3.65 fps
< 3.38 fps
<< 3.27 fps
< 3.25 fps
< 3.15 fps
< 3.16 fps
<<<<<<<<<<<< 3.38 fps
<<<<<<< 3.16 fps
< 3.08 fps
<VIDIOC_DQBUF: failed: Input/output error

However, if I run

sudo ./jetson_clocks.sh

before starting the capture, then the same capture works perfectly.

v4l2-ctl -d /dev/video0 --stream-mmap=3 --stream-count=256 '--set-fmt-video=width=2464,height=2056,pixelformat=Y16 '
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 36.00 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.82 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.76 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.75 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.72 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.71 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 35.71 fps
<<<<<

Should I change something in the DT, or is it a bug somewhere in the calculation of the required bandwidth by the vi driver ? I already tried to increase ‘pix_clk_hz’, although the pixel frequency has not increased, but that does not solve the problem.

This is with jetpack-4.6.4 on a TX2

Does boost the nvcsi/vi clocks help on it?

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

Yes, it helps

As ISP is not involved at all, and NVCSI works already as is proved by capturing GRAY8 mode, I have made more specific tests :

I have tested by boosting only the VI and EMC clocks. That works too.

I have tested by boosting only the EMC clock. That works also.

I have tested by boosting only the VI clock. That does not solve the problem.

As I have seen that /sys/kernel/debug/bpmp/debug/clk/emc/rate was already set to /sys/kernel/debug/bpmp/debug/clk/emc/max_rate, I have tried to only write 1 into /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked.

sudo bash
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked

That’s enough to make the capture work without problem.

Does that give some info on what should be changed in the DT or in a driver ?

Suppose adjust the pix_clk_hz should help on it.
Does your device tree have serdes_pix_clk_hz? If yes then increase it otherwise increase pix_clk_hz.

I already tried to increase ‘pix_clk_hz’, but that did not help, and that’s normal as the definition of ‘pix_clk_hz’ is the (maximum) number of pixels per second and the value does not change with the size of pixels after transformation by the ‘VI’.

My devices does not use any serializer/deserializer, so does not have a ‘serdes_pix_clk_hz’ property in DT.

I will try adding ‘serdes_pix_clk_hz’ and set it higher than ‘pix_clk_hz’, but, frankly, that looks as a workaround, as the definition of ‘serdes_pix_clk_hz’ also is not in anyway related to the configuration of the ‘VI’. I would really prefer a patch for the VI driver in the calculation of the needed bandwidth/clock

The NVCSI/VI/EMC bandwidth acquire by the pix_clk_hz/serdes_pix_clk_hz.
You can check the tegra_camera_platform.c for the detail.

Hello Shane,

I have found what seems to be a big bug in drivers/media/platform/tegra/camera/vi/channel.c, caused by the following patch :

commit fcf471eaa60b2f1dda6877a9f8459e8b8937457d
Author: snchen <snchen@nvidia.com>
Date:   Mon Dec 14 16:12:12 2020 +0800

    camera: correct the BW calculations

    Correct the bandwidth unit from bit to byte.

...

diff --git a/drivers/media/platform/tegra/camera/vi/channel.c b/drivers/media/platform/tegra/camera/vi/channel.c
index 5de404504..120b91328 100644
--- a/drivers/media/platform/tegra/camera/vi/channel.c
+++ b/drivers/media/platform/tegra/camera/vi/channel.c
@@ -1739,6 +1739,7 @@ static void tegra_channel_populate_dev_info(struct tegra_camera_dev_info *cdev,
        cdev->bpp = chan->fmtinfo->bpp.numerator;
        /* BW in kBps */
        cdev->bw = cdev->pixel_rate * cdev->bpp / 1024;
+       cdev->bw /= 8;
 }

 void tegra_channel_remove_subdevices(struct tegra_channel *chan)

This is plain wrong : fmtinfo->bpp is documented in include/media/tegra_camera_core.h as

 * @bpp: bytes per pixel fraction (when stored in memory)

As one sees, the documentation in include/media/tegra_camera_core.h says ‘bytes per pixel’, 'not ‘bits per pixel’, and all the instances of the macro TEGRA_VIDEO_FORMAT have 1 or 2 (sometimes 4) for the numerator, never 8 or 10 or 12.
As cdev->bw has the same unit (bytes) than fmtinfo->bpp, the division by 8 is not needed.

Without this division by 8, my capture of 'Y16 ’ pixels works perfectly.

Perhaps one should divide by bpp.denominator, but for my testcase and many others, bpp.denominator has the value ‘1’.

Can the developers comment on this ?

Here’s my patch proposal

From 7bf0d69146a07dc313c79bb892e8d7f8a7306eef Mon Sep 17 00:00:00 2001
From: Philippe De Muyter <philippe.demuyter@macq.eu>
Date: Thu, 25 Jan 2024 16:13:00 +0100
Subject: [PATCH 1/2] Revert "camera: correct the BW calculations"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The comment justifying "camera: correct the BW calculations" is wrong.

fmtinfo->bpp is documented in include/media/tegra_camera_core.h as

 * @bpp: bytes per pixel fraction (when stored in memory)

As one sees, the documentation in include/media/tegra_camera_core.h says
‘bytes per pixel’, 'not ‘bits per pixel’.
As cdev->bw has the same base unit (bytes) as fmtinfo->bpp, dividing by 8
is wrong.
At the same time, fix another small error : bandwidth and clocks are
expressed with SI-prefixes, meaning powers of 1000, not 1024, so
don't divide cdev->bw by 1024 but by 1000.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
---
 drivers/media/platform/tegra/camera/vi/channel.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/media/platform/tegra/camera/vi/channel.c b/drivers/media/platform/tegra/camera/vi/channel.c
index 52e82a4c8..4857fc728 100644
--- a/drivers/media/platform/tegra/camera/vi/channel.c
+++ b/drivers/media/platform/tegra/camera/vi/channel.c
@@ -1782,8 +1782,7 @@ static void tegra_channel_populate_dev_info(struct tegra_camera_dev_info *cdev,
        cdev->pixel_bit_depth = chan->fmtinfo->width;
        cdev->bpp = chan->fmtinfo->bpp.numerator;
        /* BW in kBps */
-       cdev->bw = cdev->pixel_rate * cdev->bpp / 1024;
-       cdev->bw /= 8;
+       cdev->bw = cdev->pixel_rate * cdev->bpp / 1000;
 }

 void tegra_channel_remove_subdevices(struct tegra_channel *chan)
--
2.31.1

Please comment

Will check internal to update it.

Thanks

Hello ShaneCCC,

did you get feedback about that patch ?

ping.

I am still waiting for internal review too.

Thanks