Problem with IMX477 on 32.7.5 in slave mode. (bug in nvarguscamerasrc?)

hello NucleoIris,

please try again with attached binary update for testing, Topic301820_Aug09.zip (2.7 MB)
it’s added retry mechanism to restore the camera streaming for r32.7.5 release version.

Hello JerryChang,

I applied libnvscf.so you provided, in to /usr/lib/aarch64-linux-gnu/tegra/libnvscf.so. I didn’t found any improvement.
After starting exact same pipeline it failed with same errors as in previous comment. Also enableCamInfiniteTimeout=1 nvargus-daemon failed in same way as before. After applying lib a did reboot.

nvargus-daemon:

...
CSI_DEBUG_COUNTER_2_0 = 0x00000000
*****************************************
SCF: Error Timeout:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error Timeout: Worker thread ViCsiHw frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error Timeout: ISP port 0 timed out! (in src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 492)
SCF: Error InvalidState: Something went wrong with waiting on Isp frame end (in src/services/capture/NvIspHw.cpp, function waitIspFrameEnd(), line 550)
SCF: Error InvalidState:  (propagating from src/common/Utils.cpp, function workerThread(), line 116)
SCF: Error InvalidState: Worker thread IspHw frameComplete failed (in src/common/Utils.cpp, function workerThread(), line 133)
SCF: Error Timeout: ISP Stats timed out! (in src/services/capture/NvIspHw.cpp, function waitIspStatsFinished(), line 608)
Error: waitIspStatsFinished Something went wrong with waiting on stats
SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDeviceIsp.cpp, function waitCompletion(), line 423)
SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function pause(), line 951)
SCF: Error Timeout: During capture abort, syncpoint wait timeout waiting for current frame to finish (in src/services/capture/CaptureServiceDevice.cpp, function handleCancelSourceRequests(), line 1034)
PowerServiceCore:handleRequests: timePassed = 3105
SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)

pipeline:

...
GST_ARGUS: Running with following settings:
   Camera index = 2
   Camera mode  = 0
   Output Stream W = 4056 H = 3040
   seconds to Run    = 0
   Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
nvbuf_utils: dmabuf_fd -1 mapped entry NOT found
nvbuf_utils: Can not get HW buffer from FD... Exiting...
CONSUMER: ERROR OCCURRED
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: CANCELLED
Additional debug info:
Argus Error Status
EOS on shutdown enabled -- waiting for EOS after Error

hello NucleoIris,

do you see retry mechanism triggered for waiting another frame-start?

anyways, we may dig into the beginning of the failure.
could you please collect nvargus-daemon logs, $ sudo journalctl -b -u nvargus-daemon, and please attach it as single text file here for reference.

Hello JerryChang,

I don’t see retry trigger mechanism. After crash of pipeline I let it be for 10 seconds and then I terminated it. I didn’t notice any attempt of re-triggering.

Only thing which looked like re-triggering was that in dmesg after crash of nvargus camera was set up again and crashed again. But this was present also in previous versions.

Here is the log sudo journalctl -b -u nvargus-daemon you asked for:
journalctl-nvargus.txt (7.4 KB)

hello NucleoIris,

I’ve extended the timeout values and also including some debug logs.
please try again with attached binary update for testing.
for instance, Topic301820_Aug12.zip (2.7 MB)

Hello JerryChang,

Still similar result, is the file location /usr/lib/aarch64-linux-gnu/tegra/libnvscf.so correct?

Here is the long with libnvscf.so fromTopic301820_Aug12.zip:
journalctl-nvargus_12_08.txt (7.3 KB)

hello NucleoIris,

please give it another try since we don’t have slave camera for testing locally.
for instance, Topic301820_Aug13.7z (1.9 MB)

Hello JerryChang,

Sure, thanks for staying still with this problem. I understand that is hard to debug it.

journalctl-nvargus_13_08.txt (8.6 KB)

thanks for prompt test results, let’s test with Topic301820_Aug13_2.7z (1.9 MB) to see whether anything changes

Hello JerryChang,
here it is
journalctl-nvargus_13_08_2.txt (6.9 KB)

hello NucleoIris,

it looks something has changed after Topic301820_Aug13_2.7z,
let me re-cap the failures.

Aug 01 14:35:25 localhost nvargus-daemon[6325]: NvViCsiHw::waitCsiFrameStart: NvRmSyncWait()++ Line(627)
Aug 01 14:35:34 localhost nvargus-daemon[6325]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
Aug 01 14:35:34 localhost nvargus-daemon[6325]: Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc

as you can see… it waits about 9-sec till reaching timeout.
there’s nothing to do for camera software stack with a hardware timeout.
please also note that, there’s Argus FIFIO, (1st and 2nd frame will be dropped), you should trigger at least 3 pulse for sending a frame to user-space.

Hello JerryChang,

With Topic301820_Aug13_2.7z nvargus was waiting for 9 seconds for something, and it didn’t reached in 9sec so hw timeout occurred. Is that correct?

dmsg show also that it was waiting for 9 sec:

...
[ 1935.387830] imx477 30-0010: IMX477:: start streaming 1
[ 1935.423602] imx477 30-0010: imx477_set_frame_rate: val: 60000000, frame_length: 3102
[ 1935.442133] imx477 30-0010: imx477_set_frame_rate: val: 30000001, frame_length: 6205
[ 1944.698710] fence timeout on [ffffffc0f5951780] after 9000ms
[ 1944.698777] name=[nvhost_sync:4], current value=1 waiting value=2
[ 1944.698826] ---- mlocks ----

[ 1944.698904] ---- syncpts ----
[ 1944.698961] fence timeout on [ffffffc0f5951600] after 9000ms
...

Also on what was nvargus waiting start of the frame, end of the frame, next frame…?

I double check triggering signal and we are triggering camera continuously with 30Hz pulses. So In those ~9s we send trigger ~270 times.

hello NucleoIris,

is it possible to probe the MIPI signaling to confirm there’re validate frame packets on the CSI channel?

Hello JerryChang,
Unfortunately, it would be very hard for us to measure MIPI frames. Because of physical access to the bus, and decoding would be also bit hard, because of speed.

hello NucleoIris,

just suddenly thought of that… did you try updating device tree property, discontinuous_clk="yes";
please see-also Property-Value Pairs.

Hello JerryChang,
sorry for the delay, I check the tree all the paremters are present and I also tried
discontinuous_clk="yes"; and discontinuous_clk="no";No improvement. It reports same error.

you may also tested with set_mode_delay_ms property to configure hardware delay for creating capture request.
the unit set_mode_delay_ms is in milliseconds. please based-on Topic301820_Aug13_2.7z for verification.
see-also Device Properties.