Argus errors with high cpu load and subsequent issues

hello kes25c,

  1. may I know those error report in comment #20 had already applied the pre-built libraries from comment #12, devtalk1065378_Oct29_prebuilts.tar.gz

  2. is this failure only reproduce with SDR and WDR mode switch?
    could you please also confirm the status by streaming all of them in the same sensor mode, also, how many cameras you’re working with?
    thanks

I’ve hit this same problem on a Nano with a single raspberry (v2) camera attached, with the system sitting doing very little:

Sep 22 06:04:00 maverick-nano nvargus-daemon[4535]: SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 667)
Sep 22 06:04:00 maverick-nano nvargus-daemon[4535]: (Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)

Created 15Gb+ of syslog until the disk filled. Seriously guys, if you’re not going to fix the underlying problem, at least fix the syslog spamming so it doesn’t take the entire system down with it. This is a ridiculous bug to have around for years without a simple fix.

1 Like

Rebooted nano, immediately spamming when it restarts:

top - 08:31:25 up 22 min,  1 user,  load average: 2.95, 3.04, 2.43
Tasks: 225 total,   2 running, 223 sleeping,   0 stopped,   0 zombie
%Cpu(s): 23.2 us, 46.5 sy,  0.0 ni, 20.5 id,  9.4 wa,  0.3 hi,  0.2 si,  0.0 st
KiB Mem :  4059356 total,   508512 free,   828656 used,  2722188 buff/cache
KiB Swap:  2029664 total,  2029664 free,        0 used.  2921924 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3386 syslog    20   0  359388   7136   3104 S 133.3  0.2  28:45.98 rsyslogd
 1906 root      19  -1   80012  25940  25144 R  99.3  0.6  21:45.12 systemd-journal
 4543 root      20   0  9.805g 153676  33120 S  39.9  3.8   7:57.23 nvargus-daemon
 1854 root      -2   0       0      0      0 S   2.6  0.0   0:10.21 mmcqd/0
 6385 root      20   0       0      0      0 D   2.0  0.0   0:00.74 kworker/u8:3
 4546 mav       20   0   30324  22312   8776 S   1.0  0.5   0:11.85 python3
 5299 mav       20   0  892936  44380  18744 S   0.7  1.1   0:08.68 mavros_node
 6995 root      20   0    9180   3652   2904 R   0.7  0.1   0:00.11 top
 4683 mav       20   0  258504  30708   8224 S   0.3  0.8   0:06.30 python3
 5276 mav       20   0  258320  29868   8508 S   0.3  0.7   0:06.41 python3

System is quiet except for all the logging activity. It runs a process doing simple video streaming through gstreamer. Perhaps the nvargus-daemon is self-triggering stress conditions with the insane amount of logging?

This insane logging activity only occurs when gstreamer is active. The system is running Jetpack 4.4 with the latest kernel 4.9.140-tegra.

@kes25c were you able to resolve these issues? I am on Jetpack 4.3 and seeing the same error messages when streaming multiple cameras using gstreamer.

Thanks,
Sanjay

The issue was never fixed AFAIK. However, the combination of the 32.2 release (jetpack 4.2.1 IIRC) and bumping the thread priority for argus threads way up (we use nice -15) basically eliminated this particular problem for us. We run six cameras with pretty heavy system load for several hours a day, and haven’t hit it in a while. We are using the argus library directly, not going through gstreamer.

1 Like

Any update or plan of fixing for this bug? at least if something crash inside can this service be restarted without rebooting the whole system?

@kes25c @SanjayD @sunxishan I add a repo with a description of all fixes for the camera work. Included information how run python opencv example and avoid error (Argus) Error InvalidState: (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
and fix for the rebooting system (bludroid_pm module)
Hope it helps.

https://github.com/stanislavkuskov/jetson_gmsl_camera_streamer

1 Like