Jetson TX2 NX PXL_SOF

Hello everyone,
I’ve been running into the error “PXL_SOF syncpt timeout! err = -11” and I’m trying to understand why I cannot get rid of that.

My first doubt is whether the width and height should include vertical and horizontal blanking or not.

I’m asking this because, even after reading the TRM descriptions for FRAME_X, FRAME_Y, OUT_X, OUT_Y and after trying to use both it’s still not clear to me which is the correct choice.

Best Regards,
Juan Pablo.

Suppose it depend on sensor.
It’s could be better consult with vendor.
You may reference to below for some Sony sensor’s case. Also you may need to check the trace log for the root cause of the PXL_SOF.

https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/camera_sensor_prog.html#wwpID0E0F50HA

Hi @ShaneCCC,

Suppose it depend on sensor.

No it doesn’t, the Jetson TX2 VI/CSI blocks should always follow the same working principle regardless the sensor.

You may reference to below for some Sony sensor’s case.

I already referenced that when I made the same sensor work on a Jetson Nano with the same custom baseboard.

Also you may need to check the trace log for the root cause of the PXL_SOF.

Kind of a pain in the ass but I may have no other choice since a lot of this things don’t go to dmesg.


Now, could yo please answer my original question?

I’d like to know whether the TX2 expects me to configure (for any possible sensor under the sky) the active image dimensions or the complete image dimensions (including blanks).

That very same parameters are used in the following piece of code extracted from vi4_fops.c

vi4_channel_write(chan, vnc_id, FRAME_X, width);
vi4_channel_write(chan, vnc_id, FRAME_Y, height);
vi4_channel_write(chan, vnc_id, SKIP_X, 0x0);
vi4_channel_write(chan, vnc_id, CROP_X, width);
vi4_channel_write(chan, vnc_id, OUT_X, width);
vi4_channel_write(chan, vnc_id, SKIP_Y, 0x0);
vi4_channel_write(chan, vnc_id, CROP_Y, height);
vi4_channel_write(chan, vnc_id, OUT_Y, height);

I already know the Nano expects the active image dimensions (or at least works fine with them).
That way doesn’t work on the TX2 NX right now and that’s why I’m asking.
I’d like to confirm if the configuration for those parameters should be the same.

Best Regards,
Juan Pablo.

Due to TX2 will check the sensor output lines and frames must to match the report size but Nano didn’t do the checked that is why if you driver report more or less few lines/frames that wouldn’t have problem on Nano. However TX2 will check if less or more will alert error. Also the active image dimension may not always the same with the output lines/frames that’s why I said it depend on sensor. Usually it maybe few lines/frames than the active dimensions.

Hi @ShaneCCC,
So the TX2 dimensions should match the total number of pixels (expressed as lines and line_length) output by the sensor, regardless of the size of the active region, right?

Best Regards,
Juan Pablo.

Yes, the output size must exactly as driver report for TX2 and Xavier.
But the line_length is doesn’t matter with the output size(lines/pixels)

Hi @ShaneCCC,
I’ve been failing for two weeks with this issue, checking every possible parameter and trying every possible combination.

Things won’t work regardless of what you try unless you enable the TPG.
With the TPG enabled I can use the same configuration as the Jetson Nano.

The problem is that enabling the TPG is not exactly a solution and smells of a bug in the drivers.

Best Regards,
Juan Pablo.

You should check the trace log may get more information to tell what going on.

HI @ShaneCCC,
I spent a week looking at other people’s trace logs and checking about every single suggestion on this forum.
I then spent another week looking at my own trace logs and the suggestions on this forum.

I’d get crc errors, frame short errors, line short errors, ecc errors and the list goes on.
I got to the point where it was impossible to have such errors and yet the trace would still show them.

The only single things that made a difference at all is the tpg.

You pick the exact same hardware with the exact same dts and exact same options for everything except that the tpg is enabled and it will work.
You pick the exact same hardware with the exact same dts and exact same options for everything except that the tpg is disabled and it won’t work.

So now I’m trying to find what the real difference would be for the tpg to fix something that is supposedly unrelated.

Best Regards,
Juan Pablo.

Have apply the patch from below topic for ECC/CRC disabled to try.

Hi @ShaneCCC,
I already tried that one last week (about ten times), so I know for a fact that it doesn’t fix the issue.

Best Regards,
Juan Pablo

OK, then I run out of idea for this case now.

Hi @juan.tettamanti,

Could you share the logs you are getting when activating traces?

When I’ve developed drivers, in the active_w, I had configure the width of the active image and in the line_length entry the full line output including horizontal blanking. In the active_h, I’ve had to sometimes modify the active image size given by the sensor, as this is sensor dependent, sometimes the image will just come with more pixel lines, in theory the traces logs should tell you that as an error.

Best regards,
Roberto

Hi @robertogs2,
I know that the pixel width/height as well as blanks are sensor dependent (I think driver/configuration dependent should be more accurate), that’s why I’ve checked those values on both the sensor driver and sensor datasheet.

I’ve set the active_w to the width of the active image, line_lenght for the full line length including horizontal blanking and active_h for the active image size (same as I did on the Nano).
Unfortunately that does not work as expected.

I also tried setting the active_w to the full line length including horizontal blanking and active_h to the image size including vertical blanking.
That doesn’t work either.

The thing that somewhat works is the first configuration (so, the same as you describe) with the condition that I have to enable the TPG driver.
If the TPG driver is enabled, the configuration will work. If the TPG driver is not enabled, the configuration will not work.
I know it doesn’t make any sense, it’s just what happens.

Best regards,
Juan Pablo.

I’ve attached a simple log where you can see the kernel boot, followed by a test where the camera doesn’t work, followed by a tpg insmod, followed by the same camera previously tested now working.

Best Regards,
Juan Pablo

jetson-tx2nx.log (54.7 KB)

Have you try boost the nvcsi/vi clocks to try?
Also did you confirm the raw data if from the sensor or tpg?

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee  /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate

Hi @ShaneCCC

Also did you confirm the raw data if from the sensor or tpg?

Yes, we confirmed before starting the topic, so I’m 100% certain the data comes from the sensor.

Have you try boost the nvcsi/vi clocks to try?

Looks like we hadn’t tried that. It looks like it has the same effect as inserting the tpg driver.


I’d appreciate if you could answer these follow up questions

  1. Could you please explain the possible reasons why boosting those clocks fixed the observed problem?
  2. Could you explain why inserting the tpg module has the exact same effect as raising this clocks?
  3. Is it possible to set these clocks to the raised values through the dts? If so, how?

Best Regards,
Juan Pablo

Looks like your pix_clk_hz or num_csi_lanes in dts too smaller cause the NVCSI/VI clock run as low speed to capture failed.
And when you install tpg those the num_csi_lanes was modify by it.

Hi @ShaneCCC,
I’m not sure what you mean, after all num_csi_lanes=4 (I know this is the correct value) and pix_clk_hz=50000000 (because the sensor outputs RAW12 pixels at a 50MHz frequency).

By the way, just issuing the following command (without touching any clock rate) has the same effect

echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked

I’d still like to know if I can set this from the dts (or somewhere inside some Nvidia driver).

Best Regards,
Juan Pablo.

Have a check the NVCSI/VI/ISP clocks runtime with failed condition. And adjust the pix_clk_hz or add serdes_pix_clk_hz and define it much bigger than pix_clk_hz for increase the NVCSI/VI clocks