Run this in one shell, while run a gstreamer with nvoverlaysink on another. As lot of output is produced, press SCROLL-LOCK in order to examine the messages - but after about tens of seconds the live image on the overlay stops - likely the buffer for scroll-lock is full and the nvcamera-daemon is put to not-running state. Pressing SCROLL-LOCK to deassert the blocking, will leave the program running again, but it will repeat the last frames. This happens with a high likelyhood and I have observed only 1 in 10 cases when the stream continues from live camera source.
The normal use case is to not use verbose messages, but the repeated frames issue was observed by us prior we knew there is this userspace dameon handling the processing. The result is - a high application load which causes scheduling inbalance will render the camera into non-working state. Sometimes it was enough to move windows around the Ubuntu desktop UI and it crashed the camera operation.
A recovery method needs to be put into nvcamera-deamon, because doing realtime tasks from regular userspace is not possible on TX2.
This was tested on single 12MP/30fps camera from Leopard Imaging.
No it was not resolved. And you should verify it yourself.
On my TX2, when I put back the kernel/dtb so that the OV5693 is detected and works through qv4l2, but it crashes the nvcamera-daemon when attempting to run nvgstcapture:
added strace which ends as:
Thread 1 getting next capture
Thread 1 is waiting
Thread 2 getting next capture
Thread 2 is waiting
Thread 3 getting next capture
Thread 3 is waiting
Thread 4 getting next capture
Thread 4 is waiting
Thread 5 getting next capture
Thread 5 is waiting
Thread 6 getting next capture
Thread 6 is waiting
Thread 7 getting next capture
Thread 7 is waiting
Thread 8 getting next capture
Thread 8 is waiting
Thread 9 getting next capture
Thread 9 is waiting
Thread 10 getting next capture
Thread 10 is waiting
Thread 11 getting next capture
Thread 11 is waiting
Thread 12 getting next capture
Thread 12 is waiting
Starting services...
Worker thread IspHw statsComplete start
Worker thread IspHw frameComplete start
Worker thread CaptureScheduler checkFramePending start
Worker thread CaptureScheduler frameStart start
Worker thread V4L2CaptureScheduler checkCaptureComplete start
Worker thread V4L2CaptureScheduler issueCaptures start
<unfinished ...>
+++ killed by SIGSEGV (core dumped) +++
./verbose-nvcamd.sh: line 8: 3027 Segmentation fault (core dumped) strace /usr/sbin/nvcamera-daemon
With the crash point being in… wait for it… your amazing CUDA!
And here you know what - you are the ones to solve it.
We have no source code access for nvcamera-daemeon thus we are unable to trace the issue further.
But lets focus on the original issue, since that is a major one, affecting the reliability of an end product.
ShaneCCC - the combination of enabling the logs and using scroll-lock is ensuring that the process is no longer scheduled as running, but blocks on a write() call.
Maybe you can mess with the scheduler either by having a high load on the system or sending some signals - I do not know. The above sequence is the way how I can reproduce the issue.
Clearly - your process is expecting that it has some real-time privileges which it has not. Any trouble causing that it is not able to perform the desired ISP handling tasks in time will result in breaking the camera operation.
Another slightly related issue is when there are signal integrity issues and the MIPI data is corrupt - a missing SOF or EOF will cause weird timeouts in the camera daemon or out of memory messages, bunch of logs in dmesg and halting operation of ALL sensors. The process often crashes, but when not, it is again not able to recover from such state. It is everything but not robust.
ShaneCCC - yes, the above guide is for an out of box TX1/TX2 with the OV5693 reference camera.
This likely applies to all current uses of SCF - with any sensor (direct MIPI to ISP processing) since you do not yet support images sourced from memory.
@ShaneCCC - I have compiled the argus as Leopard Imaging suggests for use with their IMX377 camera. I am getting many lockups and it seems that the imaging subsystem is not able to recover from any error. Otherwise this argus_camera would be a nice demo tool.
The camera’s do randomly crash - when playing with the setting, or when I enable multisession, recording of video does not work, after some settings we get 1 fps and will not get back to 30, after some we get only 20 fps. Sometimes argus_camera GUI goes black and white - nonresponding application. Switching inputs causes errors like:
Executing Argus Sample Application (argus_camera)
Argus Version: 0.96.2 (multi-process)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 212)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 315)
And in dmesg appears a lot of seemingly unhelpful data dump (attached).
It seems that this Argus tool uses the same SCF approach as the nvcamera-daemon, only that it is running in the process itself (or in so far unidentified place, did you hide your blackbox in another blackbox??). I am unable to provide any helpful messages in order to debug this other than - IT JUST DOES NOT WORK. The few seconds of a single camera is not enough for us to be able to develop anything on your Tegra platform.
After certain crashes, we are unable to get image in the Argus window - so yes, the bugginess of the platform is double-confirmed now. argus.dmesg.txt (111 KB)
@ShaneCCC - Leopard is unable to solve your closed source SCF and Argus issues, and as a scaling partner of nVidia they turned out totally useless and incompetent for this task.
The only thing Leopard is to blame is that their kernel driver calls take too long, due to muxed I2C architecture on hardware when using 3 sensors and then Sony got insanely long sensor initialization blobs. There is practically no way out of this - the communication with the hardware is sometimes slow you know.
As I have pointed out, the ISP handling code of SCF expects that things happen immediately which is not how the real world systems work. The process is unable to recover when its syscall is blocked (scroll-lock on printf), and behaves totally unstable when the V4L2 calls take too much time. If not crashing due the 250ms call to initialize a new sensor, it crashes due to intensive calls when adjusting the AE gain/exposure times on multiple sensors.
We know why the camera system is crashing.
We have a reproduce guide on stock devkit hardware.
Yet nVidia is unable to fix this issue?
We have spent many hours to figure out what is wrong with your black-box solution without you providing us any ISP documentation or SCF code. We did point to the cause as precise as we could, but you will still blame Leopard and Leopard can do nothing with the issue and it will be never going to be fixed.
@danieel
As I told in early comment we can repo your problem with the reference sensor board so we don’t know what could be wrong with ARGUS and SCF.
I will let QA to repo it again.