We use argus library to develop the image application with ar0144 camera on TX2. It worked well before.
But recently we modified the scheduling policy (SCHED_OTHER —> SCHED_FIFO) and raise priority(-81) of our application, then we found that sometimes our application cannot get the image and the cpuload is about ~100%, the case’s probability is about 1/20.
wo did some debug work: there are 3 threads occupy cpu,and some other threads called pthread_mutex_lock().
Is this case related to the scheduling strategy?
Is there related solution?
Our system:
- NVIDIA Jetson TX2
- Jetpack 4.3 [L4T 32.3.1]
- NV Power Mode: MAXN - Type: 0
Hello ShaneCCC,
Here is some demo code and tools.
argus_camera_agent.zip (1000.9 KB)
unzip it and the reproduce step is in readme.md
we use stress_argus.sh to stress argus_camera_agent
when argus_camera_agent start with cache_fresh.sh, argus_camera_agent will lockup with high cpuload.
But:
- if do not start cache_fresh.sh in stress_argus.sh , it works well;
or
- if comment out the process priority in argus_camera_agent.cpp(line 372), it works well;
Hello ShaneCCC,
Is there an update on this question?
We are trying to reproduce it on reference sensor board now.
I try to build it on r32.5 and run it got below error.
root@nvidia-desktop:/home/nvidia/sched_fifo/build# ./argus_camera_agent
No protocol specified
nvbuf_utils: Could not get EGL display connection
Segmentation fault (core dumped)
sudo ./argus_camera_agent [camera_idx]
in our system, camera is /dev/video1 node, so camera_idx=1:
sudo ./argus_camera_agent 1
more detailed description in readme.md and stress_argus.sh
Recently, we have some new update on this issue. It seems that only after setting the SCHED_FIFO priority and starting the cache_fresh.sh script , there will be a stuck situation. which is:
Got below error for stress_argus.sh
nvidia@nvidia-desktop:~/sched_fifo$ sudo bash stress_argus.sh
kernel.sched_rt_runtime_us = -1
2 20210406222028
taskset: failed to set pid 19141’s affinity: Invalid argument
3 20210406222033
taskset: failed to set pid 19144’s affinity: Invalid argument
thanks for testing.
Is cpu1 and cpu2 not online on your board?
can you modify line 12 of stress_argus.sh to
taskset -c 0 nohup ./build/argus_camera_agent 1 60 > ./stress_argus_log/$dd.log
and retry?
Just run the stress on r32.5.1/TX2, it can run 515 loops without problem.
Dear ShaneCCC,
could you make sure argus_camera_agent’s PR setting is valid?
top -d 1 -n 1000 | grep argus
if argus_camera_agent’s PR is such as -81,the priority setting is successful .
Yes, it is -81
nvidia@nvidia-desktop:~$ top -d 1 -n 1000 | grep argus
19653 root -81 0 16.032g 61916 25468 D 19.6 0.8 0:00.22 argus_camera_ag
19653 root -81 0 16.511g 78968 29516 S 6.8 1.0 0:00.29 argus_camera_ag
The problem occurs almost within 10 loops with stress script。
I don’t know the difference between our testing environment。
I will do more tests to check or Do you have any suggestions?
Thank you。
Hi ShaneCCC:
I have done the same test in 32.5.1, and the lockup situation still appear same as 32.3.1.
- Jetpack 4.5.1 [L4T 32.5.1]
I don’t have any debug ideas.
Do you have any other test suggestions?
Thanks.
Did you verify by reference sensor ov5693?
No, We don’t have ov5693 on hand。
I am testing with sensor ar0144 and ar0234.
I think if you buy the TX2 devkit ov5693 should be default mount on it?