SCHED_FIFO thread hanging Argus/MMApi with Jetpack 3.1

Hello,

In my application, which ran successfully on Jetpack 3.0, I create several threads and set them to use SCHED_FIFO. I then use the MMApi/Argus to capture and encode video from the onboard camera on a TX2 developer kit. The same application would not run once I updated to Jetpack 3.1, and I finally narrowed it down to the usage of SCHED_FIFO.

The problem is that calling ICaptureSession::createOutputStream will hang for a while, during which a lot of error messages are printed on the console. After all the logs are done printing, the device will reboot eventually.

Once I discovered the root cause, I modified the 10_camera_recording MMApi sample to set the main thread to SCHED_FIFO, and this allows me to reproduce the problem with the sample app. Below is a patch that you should be able to apply to the sample if you want to reproduce it. You will have to run the program as root in order for the SCHED_FIFO mode to take effect of course.

--- //main_original.cpp	Fri Dec  1 11:36:28 2017
+++ //main_hang.cpp	Fri Dec  1 11:30:11 2017
@@ -44,0 +45,3 @@
+#include <sys/time.h>
+#include <sys/resource.h>
+
@@ -549,0 +553,7 @@ int main(int argc, char *argv[])
+    // WARNING This code will cause problems!
+    printf( "Setting SCHED_FIFO\n" );
+    struct sched_param scheduleParams;
+    scheduleParams.sched_priority = 90;
+    pthread_setschedparam( pthread_self(), SCHED_FIFO, &scheduleParams );

Now run the sample:

sudo ./camera_recording -d 5

Here is a short snippet of the resulting logs. Note that the above patch added the printout “Setting SCHED_FIFO” which appears at the beginning of the log.

nvidia@tegra-ubuntu:~/workdir$ sudo ./camera_recording -d 5
Set governor to performance before enabling profiler
Setting SCHED_FIFO
PCLHW_DTParser
PCLHW_DTParser
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16LoadOverridesFile: looking for override file [/data/nvcam/camera_overrides.isp] 5/16LoadOverridesFile: looking for override file [/data/nvcam/settings/e3326_front_P5V27C.isp] 6/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/e3326_front_P5V27C.isp] 7/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/e3326_front_P5V27C.isp] 8/16---- imager: No override file found. ----
PRODUCER: Creating output stream
[  837.334986] host1x 13e10000.host1x: AoltmStage: syncpoint id 47 (17000000.gp10b_493) stuck waiting 2, timeout=-1
[  837.345153] ---- syncpts ----
[  837.348133] id 4 (disp_d) min 121 max 121 refs 1 (previous client : )
[  837.354566] id 5 (disp_e) min 1 max 1 refs 1 (previous client : )
[  837.360654] id 7 (vblank1) min 50073 max 0 refs 1 (previous client : )
[  837.367186] id 18 (17000000.gp10b_507) min 6866 max 6866 refs 1 (previous client : )
[  837.374920] id 19 (17000000.gp10b_506) min 22 max 22 refs 1 (previous client : )
[  837.382308] id 20 (17000000.gp10b_505) min 650 max 650 refs 1 (previous client : 17000000.gp10b_505)
[  837.391441] id 33 (17000000.gp10b_500) min 4 max 4 refs 1 (previous client : 17000000.gp10b_500)
[  837.400215] id 35 (15700000.vi_0) min 14 max 14 refs 2 (previous client : 15700000.vi_0)
[  837.408298] id 39 (150c0000.nvcsi_0) min 20 max 20 refs 2 (previous client : 150c0000.nvcsi_0)
[  837.416906] id 46 (17000000.gp10b_494) min 2 max 4 refs 1 (previous client : 17000000.gp10b_494)
[  837.425679] id 47 (17000000.gp10b_493) min 0 max 2 refs 1 (previous client : 17000000.gp10b_493)
[  837.434452] id 48 (17000000.gp10b_492) min 8 max 8 refs 1 (previous client : 17000000.gp10b_492)
[  837.443737]
[  842.957174] gk20a 17000000.gp10b: gk20a_channel_timeout_handler: Job on channel 493 timed out
[  842.965702] ---- mlocks ----
[  842.968630]
[  842.970129] ---- syncpts ----
[  842.973110] id 4 (disp_d) min 121 max 122 refs 1 (previous client : )
[  842.979559] id 5 (disp_e) min 1 max 1 refs 1 (previous client : )
[  842.985651] id 7 (vblank1) min 50411 max 0 refs 1 (previous client : )
[  842.992205] id 18 (17000000.gp10b_507) min 6866 max 6872 refs 1 (previous client : )
[  842.999942] id 19 (17000000.gp10b_506) min 22 max 22 refs 1 (previous client : )
[  843.442982] host1x 13e10000.host1x: AoltmStage: syncpoint id 47 (17000000.gp10b_493) stuck waiting 2, timeout=-1
[  843.453141] ---- syncpts ----
[  843.957088] id 20 (17000000.gp10b_505) min 650 max 656 refs 1 (previous client : 17000000.gp10b_505)
[  843.966227] id 33 (17000000.gp10b_500) min 4 max 4 refs 1 (previous client : 17000000.gp10b_500)
[  843.975054] id 35 (15700000.vi_0) min 14 max 14 refs 2 (previous client : 15700000.vi_0)
[  843.983143] id 39 (150c0000.nvcsi_0) min 20 max 20 refs 2 (previous client : 150c0000.nvcsi_0)
[  843.991768] id 46 (17000000.gp10b_494) min 2 max 4 refs 1 (previous client : 17000000.gp10b_494)
[  844.000551] id 47 (17000000.gp10b_493) min 0 max 2 refs 1 (previous client : 17000000.gp10b_493)
[  844.957620] id 48 (17000000.gp10b_492) min 8 max 8 refs 1 (previous client : 17000000.gp10b_492)
[  844.966912]
[  844.968419] ---- channels ----
[  844.971482]
[  844.971482] channel 1 - 15820000.se
[  844.971482]
[  844.977923] NvHost basic channel registers:
[  844.982119] CMDFIFO_STAT_0:  00002040
[  844.985781] CMDFIFO_RDATA_0: 57081332
[  844.989456] CMDP_OFFSET_0:   00000000
[  844.993129] CMDP_CLASS_0:    00000000
[  844.996791] CHANNELSTAT_0:   00000000
[  845.000457] The CDMA sync queue is empty.
[  845.004467]
[  845.005960]
[  845.005960] channel 2 - 15830000.se
[  845.005960]
[  845.006990] id 4 (disp_d) min 121 max 122 refs 1 (previous client : )
[  845.006993] id 5 (disp_e) min 1 max 1 refs 1 (previous client : )
[  845.006996] id 7 (vblank1) min 50532 max 0 refs 1 (previous client : )
[  845.007009] id 18 (17000000.gp10b_507) min 6866 max 6872 refs 1 (previous client : )
[  845.007012] id 19 (17000000.gp10b_506) min 22 max 22 refs 1 (previous client : )
[  845.007015] id 20 (17000000.gp10b_505) min 650 max 656 refs 1 (previous client : 17000000.gp10b_505)
[  845.007030] id 33 (17000000.gp10b_500) min 4 max 4 refs 1 (previous client : 17000000.gp10b_500)
[  845.007033] id 35 (15700000.vi_0) min 14 max 14 refs 2 (previous client : 15700000.vi_0)
[  845.007039] id 39 (150c0000.nvcsi_0) min 20 max 20 refs 2 (previous client : 150c0000.nvcsi_0)
[  845.007047] id 46 (17000000.gp10b_494) min 2 max 4 refs 1 (previous client : 17000000.gp10b_494)
[  845.007050] id 47 (17000000.gp10b_493) min 0 max 2 refs 1 (previous client : 17000000.gp10b_493)
[  845.007052] id 48 (17000000.gp10b_492) min 8 max 8 refs 1 (previous client : 17000000.gp10b_492)
[  845.007561]
[  845.982117] NvHost basic channel registers:
[  845.986322] CMDFIFO_STAT_0:  00002040
[  845.989996] CMDFIFO_RDATA_0: 111182f3
[  845.993676] CMDP_OFFSET_0:   00000000
[  845.994726] dhd_bus_rxctl: resumed on timeout, INT status=0x20C00040
[  845.994960] dhd_bus_rxctl: rxcnt_timeout=1, rxlen=0
[  845.994979] dhd_check_hang: Event HANG send up due to  re=1 te=0 e=-110 s=2
[  845.994986] dhd_check_hang: Event HANG send up due to  re=1 te=0 e=-110 s=2
[  845.994994] dhd_prot_ioctl : bus is down. we have nothing to do
[  845.997350] dhd_prot_ioctl : bus is down. we have nothing to do
[  845.997354] CFGP2P-ERROR) wl_cfgp2p_bss_isup : 'cfg bss -C 0' failed: -1
[  845.997355] CFGP2P-ERROR) wl_cfgp2p_bss_isup : NOTE: this ioctl error is normal when the BSS has not been created yet.
[  845.997362] dhd_prot_ioctl : bus is down. we have nothing to do
[  845.997378] dhd_prot_ioctl : bus is down. we have nothing to do
[  845.997381] CFG80211-ERROR) wl_dongle_down : WLC_DOWN error (-1)
[  846.972672] CMDP_CLASS_0:    00000000
[  846.976353] CHANNELSTAT_0:   00000000
[  846.980030] wl_android_wifi_off in
[  846.982980] The CDMA sync queue is empty.
[  846.982981]
[  846.982985]
[  846.982985] channel 3 - 15840000.se
[  846.982985]
[  846.982986] NvHost basic channel registers:
[  846.982988] CMDFIFO_STAT_0:  00002040
[  846.982990] CMDFIFO_RDATA_0: 8c812402
[  846.982993] CMDP_OFFSET_0:   00000000
[  846.982995] CMDP_CLASS_0:    00000000
[  846.982996] CHANNELSTAT_0:   00000000
[  846.982998] The CDMA sync queue is empty.
... many more pages of logs follow until the board reboots ...

If I can provide anything else to help debug this, please let me know.

Thanks,

Chris Richardson

@cjwti
There’s no problem for me to build the binary from the host ubuntu and copy to tegra.
No sure it’s relative for the ubuntu version. My host is ubuntu 14.04 and Jetpack 3.0 but Jetpack 3.1 is 16.04
Could you try build it on your host to try no matter 14.04 or 16.04

Hi Shane,

Thanks for the response and information. I am using an Ubuntu 16.04 host for this, as I don’t have yet another PC to setup, and don’t want to partition the drive further.

Anyways, thanks to your suggestion I tried building the sample (with the modifications) on the TX2 board itself, and was not able to reproduce the issue. I then tried the sample built on the host again, and once again couldn’t reproduce the issue. I then tried my original software with SCHED_FIFO and same result, everything is working.

I spent two days debugging this last week and I have no idea what caused the problem. Anyways thank you for the time; I’m going to mark your answer as accepted. I’ll update this thread if I discover any more information.

Thanks,

Chris Richardson

Hello,

As an update to this issue, I reproduced it again today after installing the JetPack 3.2 developer preview, flashing the board, etc. Once I saw the issue happening again, I was a little more careful to notice how I fixed it.

It turns out that the issue goes away once I change the power/performance mode with nvpmodel/jetson_clocks.sh. I’m not sure if it is nvpmodel or jetson_clock.sh that makes the issue go away, since I ran both before I re-tested. I think you should be able to reproduce the issue with my original steps above, if you freshly install JetPack 3.1/3.2 to a TX2 board. After that, to make the issue go away:

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

Thanks,

Chris Richardson